This is that r value that is always reported on charts, or the correlation coefficient. If we want to ask the question “how well does my model fit my data? r would give us that answer. R is a measure of the degree of linear relationship between two variables. The reason that we need an r value for a correlation is because in a correlation, there is no direction, as there is in a regression. In a regression, you have one variable that predicts another one. In a correlation, you know there might be a relationship between two variables, but you don’t know if a –> b, or b –> a, or something else! So r is a measure of how well a model describes the relationship between two variables, whatever the direction of that relationship may be. So r is always a value between -1 and +1.
If we want just a measure of the strength of the relationship, we can take the absolute value of r. When we convert our data to z scores and plot it, r is also the slope of the regression line. As a reminder, converting raw X and Y values to Z scores means subtracting each value from the mean, and then dividing by the standard deviation:
And even though it’s much nicer to use software or a calculator for this, as a reminder, we can measure the variance by:
Then if we take the square root of this variance (s2) to get s, this s is the standard deviation for the dataset.
Of course, no one ever talks about r, it is always r squared! This is the squared correlation coefficient, which represents the proportion of the variance of Y that can be explained by X. So this value is a percentage, between 0 and 1. A higher r squared value means that more of the variance of Y can be explained by X, so the correlation / relationship between the two is stronger. An r squared of 1 would mean that 100% of the variance of Y is explained by X, and a r squared of 0 would mean that none of it is.