SingaporeMathsSyllabus dot point

How do we measure the strength of a linear relationship and fit a line for prediction?

Compute and interpret the product moment correlation coefficient, find the least squares regression line, and use it for prediction within the data range

A focused answer to the H2 Mathematics outcome on correlation and regression. The product moment correlation coefficient, the least squares regression line, choosing which line to use, and the limits of prediction.

Generated by Claude Opus 4.89 min answerUpdated 2026-06-06

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
The answer
Examples in context
Try this

What this dot point is asking

SEAB wants you to compute and interpret the product moment correlation coefficient, find the least squares regression line, decide which regression line to use, and use the line for prediction while recognising the dangers of extrapolation.

The answer

The product moment correlation coefficient

The coefficient $r$ measures the strength and direction of a linear relationship between two variables, taking values in $-1 \leq r \leq 1$ :

$r$ near $+1$ : strong positive linear correlation.
$r$ near $-1$ : strong negative linear correlation.
$r$ near $0$ : little or no linear correlation.

A value near zero does not rule out a non-linear relationship; $r$ only measures linearity. The graphing calculator computes $r$ from the data.

The least squares regression line

The least squares regression line of $y$ on $x$ minimises the sum of squared vertical distances from the points to the line. It passes through the mean point $(\bar{x}, \bar{y})$ and has the form

y = a + bx,

with $b$ the gradient and $a$ the intercept (found by the calculator). It is used to predict $y$ from $x$ .

Which line to use

Use the regression line of $y$ on $x$ to predict $y$ from a given $x$ .
Use the regression line of $x$ on $y$ to predict $x$ from a given $y$ .

The two lines differ (they minimise distances in different directions) and only coincide when $r = \pm 1$ .

Prediction and extrapolation

Predictions are reliable only within the range of the data (interpolation). Extrapolation beyond the data is unreliable because the linear pattern may not continue. Always check that the prediction value lies inside the observed range.

Worked example

For bivariate data, a calculator gives the regression line of $y$ on $x$ as $y = 3.2 + 0.75x$ with $r = 0.88$ , and the data range for $x$ is $2$ to $15$ . Interpret $r$ , predict $y$ when $x = 10$ , and comment on predicting $y$ when $x = 25$ .

Step 1: Interpret the correlation

$r = 0.88$ is a fairly strong positive linear correlation, so a linear model is reasonable.

Step 2: Predict within the range

At $x = 10$ (inside $2$ to $15$ ): $y = 3.2 + 0.75(10) = 3.2 + 7.5 = 10.7$ . This is interpolation, so it is reliable.

Step 3: Consider x = 25

$x = 25$ lies well outside the data range $2$ to $15$ , so predicting there is extrapolation.

Step 4: Comment

The estimate at $x = 25$ is unreliable because the linear relationship is not known to hold beyond the observed data.

Examples in context

Example 1. Study hours and grades. A strong positive $r$ between hours studied and grade suggests a useful linear model for predicting a grade from study time within the observed range, while reminding us other factors also matter (not pure causation).

Example 2. Temperature and ice-cream sales. Sales correlate with temperature, but extrapolating the regression line to extreme temperatures outside the data would mislead, since demand saturates or collapses beyond the observed range.

Try this

Q1. Interpret a correlation coefficient of $r = -0.91$ . [2 marks]

Cue. Strong negative linear correlation: as one variable rises the other tends to fall, with points close to a line.

Q2. A regression line of $y$ on $x$ is $y = 1 + 2x$ . Predict $y$ when $x = 3$ . [1 mark]

Cue. $y = 1 + 2(3) = 7$ .

Q3. Explain why extrapolation can give unreliable predictions. [2 marks]

Cue. Beyond the data range the linear pattern may not continue, so the line is not supported by evidence there.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original4 marksA set of bivariate data gives a product moment correlation coefficient of

r = 0.96

. Interpret this value and state what it suggests about a linear model.

Show worked answer →

$r = 0.96$ is close to $1$ , indicating a strong positive linear correlation: as one variable increases the other tends to increase, and the points lie close to a straight line.

This suggests a linear model fits the data well, so a least squares regression line would give reliable predictions within the data range.

Markers reward interpreting the sign (positive) and magnitude (strong, near $1$ ), and the comment that a linear model is appropriate.

Original5 marksThe regression line of

y

x

y = 2.5 + 1.8x

. The mean point is

(\bar{x}, \bar{y}) = (4, 9.7)

. Use the line to estimate

y

when

x = 5

, and explain why estimating

y

x = 20

would be unreliable.

Show worked answer →

At $x = 5$ : $y = 2.5 + 1.8(5) = 2.5 + 9 = 11.5$ .

The mean point check: $2.5 + 1.8(4) = 2.5 + 7.2 = 9.7 = \bar{y}$ , confirming the line passes through $(\bar{x}, \bar{y})$ .

Estimating at $x = 20$ is extrapolation, far outside the range of the data. The linear relationship may not hold there, so the prediction is unreliable.

Markers reward the substitution for the in-range estimate, and identifying extrapolation as the reason the far estimate is unreliable.