How do you test, objectively, whether two geographical variables are related, and how strongly?
Calculate and interpret Spearman's rank correlation coefficient to test for a relationship between two variables, and assess its statistical significance
A focused answer to the H2 Geography skill of correlation testing. Ranking paired data, calculating Spearman's rank correlation coefficient, interpreting its sign and strength, testing significance against critical values, and avoiding the correlation-causation trap.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
SEAB wants you to calculate Spearman's rank correlation coefficient, interpret what it says about the relationship between two variables, and test whether that relationship is statistically significant. The central insight is that Spearman's rank turns a scatter of paired field measurements into a single, objective number between minus one and plus one, and a significance test then tells you whether the pattern is strong enough to be more than chance, which is exactly what a hypothesis-driven investigation needs.
The answer
What Spearman's rank tests
Spearman's rank correlation coefficient, written , measures the strength and direction of the relationship between two variables by comparing their rank orders rather than their raw values. Because it works on ranks, it copes well with data that are not normally distributed and with ordinal data, which is common in geography. It is the natural partner to a scatter graph and a hypothesis about whether two things vary together.
The formula
The coefficient is calculated as:
where is the difference between the two ranks for each pair of observations, is the sum of the squared rank differences, and is the number of pairs.
The method, step by step
- State the hypothesis and null. For example, "pebble size decreases with distance downstream" (a negative relationship) and a null of no relationship.
- Collect paired data at a sample of sites, ideally at least about ten pairs for a meaningful test.
- Rank each variable separately, from highest to lowest (handle ties by sharing the average rank).
- Find , the difference between the two ranks for each site, then square it to get .
- Sum the squares to get .
- Apply the formula to obtain .
Interpreting the coefficient
The value of always lies between minus one and plus one:
- : a perfect positive relationship (as one rises, so does the other).
- : no relationship.
- : a perfect negative relationship (as one rises, the other falls).
The sign gives the direction; the magnitude gives the strength. As a rough guide, values around to indicate a strong relationship, around to a moderate one, and near a weak or absent one.
Testing significance
A coefficient on its own could arise by chance, so we test it. Compare the calculated against the critical value for your sample size at a chosen significance level (commonly the 0.05, or 5 percent, level):
- If is greater than or equal to the critical value, the result is statistically significant: reject the null hypothesis, concluding the relationship is unlikely to be due to chance.
- If is below the critical value, fail to reject the null: the evidence is too weak to claim a real relationship.
The 0.05 level means there is a 5 percent probability of wrongly rejecting a true null hypothesis.
Correlation is not causation
A significant shows the variables are associated, not that one causes the other. The link could be coincidental, reversed, or driven by a confounding variable that affects both. A causal claim needs a credible process mechanism and, ideally, control of other variables, not just a high coefficient.
Examples in context
Example 1. Distance from a city centre and land value in Singapore. A geographer hypothesising that land value falls with distance from the Central Business District collects paired data along a transect, ranks both variables, and computes Spearman's rank, expecting a strong negative . Testing it against the critical value confirms whether the classic bid-rent pattern holds significantly here. It shows the test applied to a human-geography relationship, with a clear process (accessibility and competition for central land) underpinning any causal reading.
Example 2. Vegetation cover and slope angle on a hillside. Investigating whether vegetation cover decreases as slope steepens, a student ranks cover and slope at sampled quadrats and calculates . A moderate negative coefficient, tested against the critical value, indicates whether steeper slopes significantly support less vegetation. The example highlights the correlation-causation caution: soil depth or aspect could be a confounding variable influencing both, so a mechanism must support the conclusion.
Try this
Q1. State the Spearman's rank formula and explain what represents. [2 marks]
- Cue. , where is the difference between the two ranks given to each paired observation (one rank per variable) and is the number of pairs.
Q2. A geographer calculates . Interpret this value. [2 marks]
- Cue. It is a strong negative relationship: as one variable increases the other tends to decrease, and the magnitude () shows the association is strong; whether it is significant depends on comparing it with the critical value for the sample size.
Q3. Explain why a result is declared "significant at the 0.05 level" and what rejecting the null hypothesis means. [3 marks]
- Cue. Significant at 0.05 means there is only a 5 percent probability the result arose by chance under a true null; if the calculated coefficient meets or exceeds the critical value, we reject the null hypothesis, concluding the observed relationship is unlikely to be due to chance and a real association probably exists.
Exam-style practice questions
Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Original10 marksA geographer hypothesises that infiltration rate increases with vegetation cover. Explain how Spearman's rank correlation coefficient would be used to test this, and how the result would be interpreted.Show worked answer →
Argument: Spearman's rank correlation provides an objective test of whether the two variables are related, by comparing their rank orders and checking the result against critical values.
Set up the test: state the hypothesis (infiltration increases with vegetation cover, a positive relationship) and the null (no relationship). Collect paired data at a sample of sites (at least about ten pairs).
Explain the method: rank each variable separately from highest to lowest, find the difference between the two ranks for each site, square it, and sum the squares. Apply , where is the number of pairs.
Interpret the coefficient: ranges from (perfect positive) through (none) to (perfect negative); a value near to would indicate a strong positive relationship consistent with the hypothesis.
Test significance: compare the calculated with the critical value for at the chosen significance level (commonly ). If exceeds the critical value, reject the null hypothesis, concluding the relationship is statistically significant and unlikely to be due to chance.
Markers reward stating the hypotheses, the ranking method and formula, interpretation of sign and strength, and the significance comparison with a conclusion about the null.
Original6 marksExplain why a statistically significant correlation does not prove that one variable causes the other, using a geographical example.Show worked answer →
Argument: correlation measures whether two variables vary together, but a relationship can arise without one causing the other, so significance alone cannot establish causation.
Explain the reasoning: a strong, significant shows the variables are associated more than chance would predict, but the link could be coincidental, reversed in direction, or driven by a third confounding variable that influences both.
Use an example: infiltration rate and vegetation cover may correlate strongly, yet both could be controlled by a third factor such as soil type or slope, so vegetation does not necessarily cause higher infiltration; testing requires controlling for such confounders and a plausible process mechanism.
Add nuance: causation needs a credible physical explanation and ideally control of other variables, not just a significant coefficient. Markers reward distinguishing association from causation, naming a confounding variable, and the point that a mechanism must support any causal claim.
Related dot points
- Explain the stages of a geographical investigation and how to formulate a focused geographical question, aim and testable hypothesis
A focused answer to the H2 Geography skill of designing an investigation. The route to enquiry, framing a sharp geographical question and aim, writing a testable hypothesis and null hypothesis, choosing variables, and the importance of location, scale and feasibility.
- Explain random, systematic and stratified sampling and how to select appropriate primary and secondary data-collection methods
A focused answer to the H2 Geography skill of sampling and data collection. Why we sample, random, systematic and stratified strategies (point, line and area), sample size and bias, and choosing primary versus secondary and quantitative versus qualitative methods.
- Select and justify appropriate techniques for presenting geographical data, including graphs, located proportional symbols, choropleth maps and specialised diagrams
A focused answer to the H2 Geography skill of data presentation. Matching the technique to the data type, line and bar graphs, scatter graphs, choropleth and isoline maps, located proportional symbols, kite and triangular graphs, and how to describe a presented pattern in a data-response answer.
- Calculate and interpret measures of central tendency (mean, median, mode) and dispersion (range, interquartile range, standard deviation) for geographical data
A focused answer to the H2 Geography skill of summarising data. The mean, median and mode and when each is appropriate, the range, interquartile range and standard deviation as measures of spread, the effect of anomalies and skew, and how to interpret dispersion geographically.
- Apply the chi-square test to compare observed and expected frequencies, use degrees of freedom and critical values, and interpret statistical significance
A focused answer to the H2 Geography skill of significance testing with chi-square. Observed versus expected frequencies, the chi-square formula, degrees of freedom, comparing the statistic with critical values, the role of the significance level, and the test's conditions and limits.