SingaporeGeographySyllabus dot point

How do you test, objectively, whether two geographical variables are related, and how strongly?

Calculate and interpret Spearman's rank correlation coefficient to test for a relationship between two variables, and assess its statistical significance

A focused answer to the H2 Geography skill of correlation testing. Ranking paired data, calculating Spearman's rank correlation coefficient, interpreting its sign and strength, testing significance against critical values, and avoiding the correlation-causation trap.

Generated by Claude Opus 4.811 min answerUpdated 2026-06-06

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
The answer
Examples in context
Try this

What this dot point is asking

SEAB wants you to calculate Spearman's rank correlation coefficient, interpret what it says about the relationship between two variables, and test whether that relationship is statistically significant. The central insight is that Spearman's rank turns a scatter of paired field measurements into a single, objective number between minus one and plus one, and a significance test then tells you whether the pattern is strong enough to be more than chance, which is exactly what a hypothesis-driven investigation needs.

The answer

What Spearman's rank tests

Spearman's rank correlation coefficient, written $r_s$ , measures the strength and direction of the relationship between two variables by comparing their rank orders rather than their raw values. Because it works on ranks, it copes well with data that are not normally distributed and with ordinal data, which is common in geography. It is the natural partner to a scatter graph and a hypothesis about whether two things vary together.

The formula

The coefficient is calculated as:

r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)}

where $d$ is the difference between the two ranks for each pair of observations, $\sum d^2$ is the sum of the squared rank differences, and $n$ is the number of pairs.

The method, step by step

State the hypothesis and null. For example, "pebble size decreases with distance downstream" (a negative relationship) and a null of no relationship.
Collect paired data at a sample of sites, ideally at least about ten pairs for a meaningful test.
Rank each variable separately, from highest to lowest (handle ties by sharing the average rank).
Find $d$ , the difference between the two ranks for each site, then square it to get $d^2$ .
Sum the squares to get $\sum d^2$ .
Apply the formula to obtain $r_s$ .

Interpreting the coefficient

The value of $r_s$ always lies between minus one and plus one:

$r_s = +1$ : a perfect positive relationship (as one rises, so does the other).
$r_s = 0$ : no relationship.
$r_s = -1$ : a perfect negative relationship (as one rises, the other falls).

The sign gives the direction; the magnitude gives the strength. As a rough guide, values around $\pm 0.7$ to $\pm 0.9$ indicate a strong relationship, around $\pm 0.4$ to $\pm 0.6$ a moderate one, and near $0$ a weak or absent one.

Testing significance

A coefficient on its own could arise by chance, so we test it. Compare the calculated $r_s$ against the critical value for your sample size $n$ at a chosen significance level (commonly the 0.05, or 5 percent, level):

If $r_s$ is greater than or equal to the critical value, the result is statistically significant: reject the null hypothesis, concluding the relationship is unlikely to be due to chance.
If $r_s$ is below the critical value, fail to reject the null: the evidence is too weak to claim a real relationship.

The 0.05 level means there is a 5 percent probability of wrongly rejecting a true null hypothesis.

Correlation is not causation

A significant $r_s$ shows the variables are associated, not that one causes the other. The link could be coincidental, reversed, or driven by a confounding variable that affects both. A causal claim needs a credible process mechanism and, ideally, control of other variables, not just a high coefficient.

Worked example

Question: test the hypothesis that pebble size decreases with distance downstream using Spearman's rank, given paired data at several sites. [10 marks]

Step 1: State the hypotheses

Hypothesis: pebble size decreases with distance downstream (a negative relationship). Null hypothesis: there is no relationship between pebble size and distance downstream.

Step 2: Rank each variable and find d

Rank distance and mean pebble size separately from highest to lowest. For each site, subtract the two ranks to get $d$ , then square to get $d^2$ , sharing average ranks for any ties.

Step 3: Apply the formula

Sum the squared differences to get $\sum d^2$ , then compute $r_s = 1 - \dfrac{6\sum d^2}{n(n^2 - 1)}$ for the $n$ pairs. Suppose this gives $r_s = -0.82$ , indicating a strong negative relationship in line with the hypothesis.

Step 4: Test significance and conclude

Compare $-0.82$ (using its magnitude $0.82$ ) with the critical value for $n$ at the 0.05 level. As it exceeds the critical value, reject the null hypothesis: pebble size significantly decreases downstream, consistent with attrition and sorting. Note the relationship is significant, not proven causal. This full method earns the marks.

Examples in context

Example 1. Distance from a city centre and land value in Singapore. A geographer hypothesising that land value falls with distance from the Central Business District collects paired data along a transect, ranks both variables, and computes Spearman's rank, expecting a strong negative $r_s$ . Testing it against the critical value confirms whether the classic bid-rent pattern holds significantly here. It shows the test applied to a human-geography relationship, with a clear process (accessibility and competition for central land) underpinning any causal reading.

Example 2. Vegetation cover and slope angle on a hillside. Investigating whether vegetation cover decreases as slope steepens, a student ranks cover and slope at sampled quadrats and calculates $r_s$ . A moderate negative coefficient, tested against the critical value, indicates whether steeper slopes significantly support less vegetation. The example highlights the correlation-causation caution: soil depth or aspect could be a confounding variable influencing both, so a mechanism must support the conclusion.

Try this

Q1. State the Spearman's rank formula and explain what $d$ represents. [2 marks]

Cue. $r_s = 1 - \dfrac{6\sum d^2}{n(n^2-1)}$ , where $d$ is the difference between the two ranks given to each paired observation (one rank per variable) and $n$ is the number of pairs.

Q2. A geographer calculates $r_s = -0.76$ . Interpret this value. [2 marks]

Cue. It is a strong negative relationship: as one variable increases the other tends to decrease, and the magnitude ( $0.76$ ) shows the association is strong; whether it is significant depends on comparing it with the critical value for the sample size.

Q3. Explain why a result is declared "significant at the 0.05 level" and what rejecting the null hypothesis means. [3 marks]

Cue. Significant at 0.05 means there is only a 5 percent probability the result arose by chance under a true null; if the calculated coefficient meets or exceeds the critical value, we reject the null hypothesis, concluding the observed relationship is unlikely to be due to chance and a real association probably exists.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original10 marksA geographer hypothesises that infiltration rate increases with vegetation cover. Explain how Spearman's rank correlation coefficient would be used to test this, and how the result would be interpreted.

Show worked answer →

Argument: Spearman's rank correlation provides an objective test of whether the two variables are related, by comparing their rank orders and checking the result against critical values.

Set up the test: state the hypothesis (infiltration increases with vegetation cover, a positive relationship) and the null (no relationship). Collect paired data at a sample of sites (at least about ten pairs).

Explain the method: rank each variable separately from highest to lowest, find the difference $d$ between the two ranks for each site, square it, and sum the squares. Apply $r_s = 1 - \dfrac{6\sum d^2}{n(n^2 - 1)}$ , where $n$ is the number of pairs.

Interpret the coefficient: $r_s$ ranges from $+1$ (perfect positive) through $0$ (none) to $-1$ (perfect negative); a value near $+0.7$ to $+0.9$ would indicate a strong positive relationship consistent with the hypothesis.

Test significance: compare the calculated $r_s$ with the critical value for $n$ at the chosen significance level (commonly $0.05$ ). If $r_s$ exceeds the critical value, reject the null hypothesis, concluding the relationship is statistically significant and unlikely to be due to chance.

Markers reward stating the hypotheses, the ranking method and formula, interpretation of sign and strength, and the significance comparison with a conclusion about the null.

Original6 marksExplain why a statistically significant correlation does not prove that one variable causes the other, using a geographical example.

Show worked answer →

Argument: correlation measures whether two variables vary together, but a relationship can arise without one causing the other, so significance alone cannot establish causation.

Explain the reasoning: a strong, significant $r_s$ shows the variables are associated more than chance would predict, but the link could be coincidental, reversed in direction, or driven by a third confounding variable that influences both.

Use an example: infiltration rate and vegetation cover may correlate strongly, yet both could be controlled by a third factor such as soil type or slope, so vegetation does not necessarily cause higher infiltration; testing requires controlling for such confounders and a plausible process mechanism.

Add nuance: causation needs a credible physical explanation and ideally control of other variables, not just a significant coefficient. Markers reward distinguishing association from causation, naming a confounding variable, and the point that a mechanism must support any causal claim.