Correlation is a statistical measurement used to quantify the strength and direction of
a linear relationship.
\(\bullet\) This value is unitless, and thus is not affected by location and scale of the variables,
and bound between -1 and 1.
\(\bullet\) It is typically denoted by \(r\) or by \(\rho\).
\(\bullet\) A correlation of -1 would mean that the data have a perfectly negative relationship, which would appear in the scatterplot as
a perfect line with a negative slope.
\(\bullet\) Similarly, a correlation of 1 would mean that the data are perfectly positively correlated, which would appear in the scatterplot as a perfect line
with a positive slope.
\(\bullet\) If data have no relationship, they would have a correlation of 0, and would appear as a random
scatter of points in the scatterplot.
\(\bullet\) The correlation presented in this application is generated using the Pearson
Correlation Coefficient method.
\(\bullet\) The formula used to calculate this is value is \({1\over n-1}\sum_{i=1}^n{(x_i-\bar{x})(y_i-\bar{y})\over s_xs_y}\)
Often, we wish to predict the value of some variable, called the response, based on
the value of another linearly related variable, called the explanatory. This idea is called linear regression.
We will deal with simple linear regression, which makes use of a single response and predictor.
In order to estimate the response variable based on the explanatory we will fit a line,
also called a model, to the data. This line is called the least squares regression line, and it attempts to
minimize the deviations of the points from the line, or the residuals.
\(\bullet\) The population regression line is: \(Y=\beta_o+\beta_1X\)
\(\bullet\) When given a random sample of data, we estimate this by: \(\hat{y}=b_0+b_1x\)
\(\bullet\)To assess, whether or not the estimated line is the best line we can look at two values.
\(\qquad\circ\) We can minimize a value called the sum of squared errors, denoted \(SSE=\sum_{i=1}^n(y_i-\hat{y_i})^2\).
\(\qquad\circ\) Equivalently, we can maximize a value called the coefficient of determination. We denote this value as
\(R^2=1-{SSE \over SST}\), where \(SST=\sum_{i=1}^n(y_i-\bar{y})^2\)