Skip to main content
SingaporeGeographySyllabus dot point

How do you test, objectively, whether two geographical variables are related, and how strongly?

Calculate and interpret Spearman's rank correlation coefficient to test for a relationship between two variables, and assess its statistical significance

A focused answer to the H2 Geography skill of correlation testing. Ranking paired data, calculating Spearman's rank correlation coefficient, interpreting its sign and strength, testing significance against critical values, and avoiding the correlation-causation trap.

Generated by Claude Opus 4.811 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

SEAB wants you to calculate Spearman's rank correlation coefficient, interpret what it says about the relationship between two variables, and test whether that relationship is statistically significant. The central insight is that Spearman's rank turns a scatter of paired field measurements into a single, objective number between minus one and plus one, and a significance test then tells you whether the pattern is strong enough to be more than chance, which is exactly what a hypothesis-driven investigation needs.

The answer

What Spearman's rank tests

Spearman's rank correlation coefficient, written rsr_s, measures the strength and direction of the relationship between two variables by comparing their rank orders rather than their raw values. Because it works on ranks, it copes well with data that are not normally distributed and with ordinal data, which is common in geography. It is the natural partner to a scatter graph and a hypothesis about whether two things vary together.

The formula

The coefficient is calculated as:

rs=1−6∑d2n(n2−1)r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)}

where dd is the difference between the two ranks for each pair of observations, ∑d2\sum d^2 is the sum of the squared rank differences, and nn is the number of pairs.

The method, step by step

  1. State the hypothesis and null. For example, "pebble size decreases with distance downstream" (a negative relationship) and a null of no relationship.
  2. Collect paired data at a sample of sites, ideally at least about ten pairs for a meaningful test.
  3. Rank each variable separately, from highest to lowest (handle ties by sharing the average rank).
  4. Find dd, the difference between the two ranks for each site, then square it to get d2d^2.
  5. Sum the squares to get ∑d2\sum d^2.
  6. Apply the formula to obtain rsr_s.

Interpreting the coefficient

The value of rsr_s always lies between minus one and plus one:

  • rs=+1r_s = +1: a perfect positive relationship (as one rises, so does the other).
  • rs=0r_s = 0: no relationship.
  • rs=−1r_s = -1: a perfect negative relationship (as one rises, the other falls).

The sign gives the direction; the magnitude gives the strength. As a rough guide, values around ±0.7\pm 0.7 to ±0.9\pm 0.9 indicate a strong relationship, around ±0.4\pm 0.4 to ±0.6\pm 0.6 a moderate one, and near 00 a weak or absent one.

Testing significance

A coefficient on its own could arise by chance, so we test it. Compare the calculated rsr_s against the critical value for your sample size nn at a chosen significance level (commonly the 0.05, or 5 percent, level):

  • If rsr_s is greater than or equal to the critical value, the result is statistically significant: reject the null hypothesis, concluding the relationship is unlikely to be due to chance.
  • If rsr_s is below the critical value, fail to reject the null: the evidence is too weak to claim a real relationship.

The 0.05 level means there is a 5 percent probability of wrongly rejecting a true null hypothesis.

Correlation is not causation

A significant rsr_s shows the variables are associated, not that one causes the other. The link could be coincidental, reversed, or driven by a confounding variable that affects both. A causal claim needs a credible process mechanism and, ideally, control of other variables, not just a high coefficient.

Examples in context

Example 1. Distance from a city centre and land value in Singapore. A geographer hypothesising that land value falls with distance from the Central Business District collects paired data along a transect, ranks both variables, and computes Spearman's rank, expecting a strong negative rsr_s. Testing it against the critical value confirms whether the classic bid-rent pattern holds significantly here. It shows the test applied to a human-geography relationship, with a clear process (accessibility and competition for central land) underpinning any causal reading.

Example 2. Vegetation cover and slope angle on a hillside. Investigating whether vegetation cover decreases as slope steepens, a student ranks cover and slope at sampled quadrats and calculates rsr_s. A moderate negative coefficient, tested against the critical value, indicates whether steeper slopes significantly support less vegetation. The example highlights the correlation-causation caution: soil depth or aspect could be a confounding variable influencing both, so a mechanism must support the conclusion.

Try this

Q1. State the Spearman's rank formula and explain what dd represents. [2 marks]

  • Cue. rs=1−6∑d2n(n2−1)r_s = 1 - \dfrac{6\sum d^2}{n(n^2-1)}, where dd is the difference between the two ranks given to each paired observation (one rank per variable) and nn is the number of pairs.

Q2. A geographer calculates rs=−0.76r_s = -0.76. Interpret this value. [2 marks]

  • Cue. It is a strong negative relationship: as one variable increases the other tends to decrease, and the magnitude (0.760.76) shows the association is strong; whether it is significant depends on comparing it with the critical value for the sample size.

Q3. Explain why a result is declared "significant at the 0.05 level" and what rejecting the null hypothesis means. [3 marks]

  • Cue. Significant at 0.05 means there is only a 5 percent probability the result arose by chance under a true null; if the calculated coefficient meets or exceeds the critical value, we reject the null hypothesis, concluding the observed relationship is unlikely to be due to chance and a real association probably exists.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original10 marksA geographer hypothesises that infiltration rate increases with vegetation cover. Explain how Spearman's rank correlation coefficient would be used to test this, and how the result would be interpreted.
Show worked answer →

Argument: Spearman's rank correlation provides an objective test of whether the two variables are related, by comparing their rank orders and checking the result against critical values.

Set up the test: state the hypothesis (infiltration increases with vegetation cover, a positive relationship) and the null (no relationship). Collect paired data at a sample of sites (at least about ten pairs).

Explain the method: rank each variable separately from highest to lowest, find the difference dd between the two ranks for each site, square it, and sum the squares. Apply rs=1−6∑d2n(n2−1)r_s = 1 - \dfrac{6\sum d^2}{n(n^2 - 1)}, where nn is the number of pairs.

Interpret the coefficient: rsr_s ranges from +1+1 (perfect positive) through 00 (none) to −1-1 (perfect negative); a value near +0.7+0.7 to +0.9+0.9 would indicate a strong positive relationship consistent with the hypothesis.

Test significance: compare the calculated rsr_s with the critical value for nn at the chosen significance level (commonly 0.050.05). If rsr_s exceeds the critical value, reject the null hypothesis, concluding the relationship is statistically significant and unlikely to be due to chance.

Markers reward stating the hypotheses, the ranking method and formula, interpretation of sign and strength, and the significance comparison with a conclusion about the null.

Original6 marksExplain why a statistically significant correlation does not prove that one variable causes the other, using a geographical example.
Show worked answer →

Argument: correlation measures whether two variables vary together, but a relationship can arise without one causing the other, so significance alone cannot establish causation.

Explain the reasoning: a strong, significant rsr_s shows the variables are associated more than chance would predict, but the link could be coincidental, reversed in direction, or driven by a third confounding variable that influences both.

Use an example: infiltration rate and vegetation cover may correlate strongly, yet both could be controlled by a third factor such as soil type or slope, so vegetation does not necessarily cause higher infiltration; testing requires controlling for such confounders and a plausible process mechanism.

Add nuance: causation needs a credible physical explanation and ideally control of other variables, not just a significant coefficient. Markers reward distinguishing association from causation, naming a confounding variable, and the point that a mechanism must support any causal claim.

Related dot points