Data games

Correlation is a statistical measurement used to quantify the strength and direction of a linear relationship.
\(\bullet\) This value is unitless, and thus is not affected by location and scale of the variables, and bound between -1 and 1.
\(\bullet\) It is typically denoted by \(r\) or by \(\rho\).
\(\bullet\) A correlation of -1 would mean that the data have a perfectly negative relationship, which would appear in the scatterplot as a perfect line with a negative slope.
\(\bullet\) Similarly, a correlation of 1 would mean that the data are perfectly positively correlated, which would appear in the scatterplot as a perfect line with a positive slope.
\(\bullet\) If data have no relationship, they would have a correlation of 0, and would appear as a random scatter of points in the scatterplot.
\(\bullet\) The correlation presented in this application is generated using the Pearson Correlation Coefficient method.
\(\bullet\) The formula used to calculate this is value is \({1\over n-1}\sum_{i=1}^n{(x_i-\bar{x})(y_i-\bar{y})\over s_xs_y}\)

Number of Observations

Correlation, \(\rho\)

The intercept, \(\hat{\beta_0}\) (round to the nearest whole number)

The slope, \(\hat{\beta_1}\) (round to the nearest tenth)

Shiny app by Irvin Alcaraz

Base R code by Irvin Alcaraz

Shiny source files: GitHub Gist

Cal Poly Statistics Dept Shiny Series

Often, we wish to predict the value of some variable, called the response, based on the value of another linearly related variable, called the explanatory. This idea is called linear regression. We will deal with simple linear regression, which makes use of a single response and predictor. In order to estimate the response variable based on the explanatory we will fit a line, also called a model, to the data. This line is called the least squares regression line, and it attempts to minimize the deviations of the points from the line, or the residuals.
\(\bullet\) The population regression line is: \(Y=\beta_o+\beta_1X\)
\(\bullet\) When given a random sample of data, we estimate this by: \(\hat{y}=b_0+b_1x\)
\(\bullet\)To assess, whether or not the estimated line is the best line we can look at two values.
\(\qquad\circ\) We can minimize a value called the sum of squared errors, denoted \(SSE=\sum_{i=1}^n(y_i-\hat{y_i})^2\).
\(\qquad\circ\) Equivalently, we can maximize a value called the coefficient of determination. We denote this value as \(R^2=1-{SSE \over SST}\), where \(SST=\sum_{i=1}^n(y_i-\bar{y})^2\)