How do we measure the strength of a linear relationship and fit a line for prediction?
Compute and interpret the product moment correlation coefficient, find the least squares regression line, and use it for prediction within the data range
A focused answer to the H2 Mathematics outcome on correlation and regression. The product moment correlation coefficient, the least squares regression line, choosing which line to use, and the limits of prediction.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
SEAB wants you to compute and interpret the product moment correlation coefficient, find the least squares regression line, decide which regression line to use, and use the line for prediction while recognising the dangers of extrapolation.
The answer
The product moment correlation coefficient
The coefficient measures the strength and direction of a linear relationship between two variables, taking values in :
- near : strong positive linear correlation.
- near : strong negative linear correlation.
- near : little or no linear correlation.
A value near zero does not rule out a non-linear relationship; only measures linearity. The graphing calculator computes from the data.
The least squares regression line
The least squares regression line of on minimises the sum of squared vertical distances from the points to the line. It passes through the mean point and has the form
with the gradient and the intercept (found by the calculator). It is used to predict from .
Which line to use
- Use the regression line of on to predict from a given .
- Use the regression line of on to predict from a given .
The two lines differ (they minimise distances in different directions) and only coincide when .
Prediction and extrapolation
Predictions are reliable only within the range of the data (interpolation). Extrapolation beyond the data is unreliable because the linear pattern may not continue. Always check that the prediction value lies inside the observed range.
Examples in context
Example 1. Study hours and grades. A strong positive between hours studied and grade suggests a useful linear model for predicting a grade from study time within the observed range, while reminding us other factors also matter (not pure causation).
Example 2. Temperature and ice-cream sales. Sales correlate with temperature, but extrapolating the regression line to extreme temperatures outside the data would mislead, since demand saturates or collapses beyond the observed range.
Try this
Q1. Interpret a correlation coefficient of . [2 marks]
- Cue. Strong negative linear correlation: as one variable rises the other tends to fall, with points close to a line.
Q2. A regression line of on is . Predict when . [1 mark]
- Cue. .
Q3. Explain why extrapolation can give unreliable predictions. [2 marks]
- Cue. Beyond the data range the linear pattern may not continue, so the line is not supported by evidence there.
Exam-style practice questions
Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Original4 marksA set of bivariate data gives a product moment correlation coefficient of . Interpret this value and state what it suggests about a linear model.Show worked answer →
is close to , indicating a strong positive linear correlation: as one variable increases the other tends to increase, and the points lie close to a straight line.
This suggests a linear model fits the data well, so a least squares regression line would give reliable predictions within the data range.
Markers reward interpreting the sign (positive) and magnitude (strong, near ), and the comment that a linear model is appropriate.
Original5 marksThe regression line of on is . The mean point is . Use the line to estimate when , and explain why estimating at would be unreliable.Show worked answer →
At : .
The mean point check: , confirming the line passes through .
Estimating at is extrapolation, far outside the range of the data. The linear relationship may not hold there, so the prediction is unreliable.
Markers reward the substitution for the in-range estimate, and identifying extrapolation as the reason the far estimate is unreliable.
Related dot points
- Carry out a hypothesis test for a population mean, stating hypotheses, computing a test statistic or p-value, and interpreting the conclusion in context
A focused answer to the H2 Mathematics outcome on hypothesis testing. Setting up null and alternative hypotheses, one- and two-tailed tests, the test statistic and p-value, the significance level, and interpreting the conclusion.
- Describe the distribution of the sample mean, use the Central Limit Theorem, and find unbiased estimates of the population mean and variance from a sample
A focused answer to the H2 Mathematics outcome on sampling. The distribution of the sample mean, the Central Limit Theorem, the standard error, and unbiased estimators of the population mean and variance.
- Model continuous data with the normal distribution, standardise to the Z-distribution to find probabilities, and find values from given probabilities
A focused answer to the H2 Mathematics outcome on the normal distribution. The bell curve and its parameters, standardising to Z, finding probabilities and inverse problems, and combining normal variables.
- Construct probability distributions for discrete random variables and compute the expectation and variance, including for functions of the variable
A focused answer to the H2 Mathematics outcome on discrete random variables. Building a probability distribution, the expectation and variance formulae, and the effect of linear transformations on mean and variance.