Skip to main content
SingaporeMathsSyllabus dot point

How do we measure the strength of a linear relationship and fit a line for prediction?

Compute and interpret the product moment correlation coefficient, find the least squares regression line, and use it for prediction within the data range

A focused answer to the H2 Mathematics outcome on correlation and regression. The product moment correlation coefficient, the least squares regression line, choosing which line to use, and the limits of prediction.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

SEAB wants you to compute and interpret the product moment correlation coefficient, find the least squares regression line, decide which regression line to use, and use the line for prediction while recognising the dangers of extrapolation.

The answer

The product moment correlation coefficient

The coefficient rr measures the strength and direction of a linear relationship between two variables, taking values in 1r1-1 \leq r \leq 1:

  • rr near +1+1: strong positive linear correlation.
  • rr near 1-1: strong negative linear correlation.
  • rr near 00: little or no linear correlation.

A value near zero does not rule out a non-linear relationship; rr only measures linearity. The graphing calculator computes rr from the data.

The least squares regression line

The least squares regression line of yy on xx minimises the sum of squared vertical distances from the points to the line. It passes through the mean point (xˉ,yˉ)(\bar{x}, \bar{y}) and has the form

y=a+bx,y = a + bx,

with bb the gradient and aa the intercept (found by the calculator). It is used to predict yy from xx.

Which line to use

  • Use the regression line of yy on xx to predict yy from a given xx.
  • Use the regression line of xx on yy to predict xx from a given yy.

The two lines differ (they minimise distances in different directions) and only coincide when r=±1r = \pm 1.

Prediction and extrapolation

Predictions are reliable only within the range of the data (interpolation). Extrapolation beyond the data is unreliable because the linear pattern may not continue. Always check that the prediction value lies inside the observed range.

Examples in context

Example 1. Study hours and grades. A strong positive rr between hours studied and grade suggests a useful linear model for predicting a grade from study time within the observed range, while reminding us other factors also matter (not pure causation).

Example 2. Temperature and ice-cream sales. Sales correlate with temperature, but extrapolating the regression line to extreme temperatures outside the data would mislead, since demand saturates or collapses beyond the observed range.

Try this

Q1. Interpret a correlation coefficient of r=0.91r = -0.91. [2 marks]

  • Cue. Strong negative linear correlation: as one variable rises the other tends to fall, with points close to a line.

Q2. A regression line of yy on xx is y=1+2xy = 1 + 2x. Predict yy when x=3x = 3. [1 mark]

  • Cue. y=1+2(3)=7y = 1 + 2(3) = 7.

Q3. Explain why extrapolation can give unreliable predictions. [2 marks]

  • Cue. Beyond the data range the linear pattern may not continue, so the line is not supported by evidence there.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original4 marksA set of bivariate data gives a product moment correlation coefficient of r=0.96r = 0.96. Interpret this value and state what it suggests about a linear model.
Show worked answer →

r=0.96r = 0.96 is close to 11, indicating a strong positive linear correlation: as one variable increases the other tends to increase, and the points lie close to a straight line.

This suggests a linear model fits the data well, so a least squares regression line would give reliable predictions within the data range.

Markers reward interpreting the sign (positive) and magnitude (strong, near 11), and the comment that a linear model is appropriate.

Original5 marksThe regression line of yy on xx is y=2.5+1.8xy = 2.5 + 1.8x. The mean point is (xˉ,yˉ)=(4,9.7)(\bar{x}, \bar{y}) = (4, 9.7). Use the line to estimate yy when x=5x = 5, and explain why estimating yy at x=20x = 20 would be unreliable.
Show worked answer →

At x=5x = 5: y=2.5+1.8(5)=2.5+9=11.5y = 2.5 + 1.8(5) = 2.5 + 9 = 11.5.

The mean point check: 2.5+1.8(4)=2.5+7.2=9.7=yˉ2.5 + 1.8(4) = 2.5 + 7.2 = 9.7 = \bar{y}, confirming the line passes through (xˉ,yˉ)(\bar{x}, \bar{y}).

Estimating at x=20x = 20 is extrapolation, far outside the range of the data. The linear relationship may not hold there, so the prediction is unreliable.

Markers reward the substitution for the in-range estimate, and identifying extrapolation as the reason the far estimate is unreliable.

Related dot points