Linear regression: Derive a trend line to best fit the data that can minimize the sum of squares of prediction error. (distance of each data point to the fitted line is prediction error)
P-value: Specifically, if the null hypothesis is correct, what is the probability of obtaining an effect at least as large as the one in your sample?
P−value lies between 0 and 1.It indicates how strongly, the observed data contradicts H0.
P values measures how compatible your data are with the null hypothesis. How likely is the effect observed in your sample data if the null hypothesis is true? Generally, we use 0.05 as a threshold. If p > 0.05, we say that the evidence against the null hypothesis is not strong enough, and we can’t reject the null hypothesis. If p < 0.05, we say that the evidence against the null hypothesis is strong enough, so we reject the null hypothesis and accept the alternative hypothesis.
Confidence interval: A 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population.
Power of the test: probability of rejecting a null hypothesis when it is false.
Simpson’s paradox: it is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.
And you can use visual aid to explain it.
Regularization: Regularization is a technique which adds tuning parameter and makes slight modifications to the learning algorithm such that the model generalizes better to prevent the coefficients to learn from noises and fit perfectly on the training data and cause overfitting. This in turn improves the model’s performance on the unseen data as well. In layman’s term, regularization artificially discourages complex or extreme explanations of the world even if they fit the what has been observed better, The idea is that such explanations are unlikely to generalize well to the future. They may happen to explain a few data points from the past well, but this may just because of accidents of the sample