SingaporeGeographySyllabus dot point

How do you test whether an observed pattern of frequencies differs from what you would expect by chance?

Apply the chi-square test to compare observed and expected frequencies, use degrees of freedom and critical values, and interpret statistical significance

A focused answer to the H2 Geography skill of significance testing with chi-square. Observed versus expected frequencies, the chi-square formula, degrees of freedom, comparing the statistic with critical values, the role of the significance level, and the test's conditions and limits.

Generated by Claude Opus 4.811 min answerUpdated 2026-06-06

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
The answer
Examples in context
Try this

What this dot point is asking

SEAB wants you to apply the chi-square test, comparing observed frequencies with expected ones, to work out degrees of freedom, to compare the statistic with a critical value, and to interpret significance. The central insight is that chi-square answers a different question from correlation: instead of asking whether two measured variables move together, it asks whether a pattern of counts across categories differs from what chance alone would produce, which is exactly the test for questions about distribution and association.

The answer

What chi-square tests

The chi-square test ( $\chi^2$ ) compares the observed frequencies (the counts you actually recorded) with the expected frequencies (the counts you would expect if the null hypothesis were true). It works on frequency data, counts in categories, not on percentages, rates or means. Typical geographical uses are testing whether a feature is evenly distributed across categories (goodness of fit) or whether two categorical variables are associated.

The formula

The test statistic is:

\chi^2 = \sum \frac{(O - E)^2}{E}

where $O$ is each observed frequency and $E$ is the corresponding expected frequency. Each category contributes $\dfrac{(O-E)^2}{E}$ ; the larger the gaps between observed and expected, the larger $\chi^2$ becomes, signalling a bigger departure from the null.

The method, step by step

State the hypotheses. The null is usually "no difference from expectation" or "no association"; the alternative is that a real difference or association exists.
Find the expected frequencies. For an even distribution across $k$ categories, $E$ is the total divided by $k$ . For an association (contingency) table, $E$ for each cell is (row total times column total) divided by the grand total.
Compute $\dfrac{(O-E)^2}{E}$ for every category and sum them to get $\chi^2$ .
Find the degrees of freedom.
Compare with the critical value and decide on the null.

Degrees of freedom

The degrees of freedom set which critical value to use:

For a goodness-of-fit test across $n$ categories: degrees of freedom $= n - 1$ .
For a contingency table: degrees of freedom $= (\text{rows} - 1)(\text{columns} - 1)$ .

For example, four soil types in a goodness-of-fit test give $4 - 1 = 3$ degrees of freedom.

Testing significance

Compare the calculated $\chi^2$ with the critical value for those degrees of freedom at a chosen significance level (commonly 0.05):

If $\chi^2$ is greater than or equal to the critical value, the result is significant: reject the null hypothesis, concluding the observed pattern differs from chance more than would be expected.
If $\chi^2$ is below the critical value, fail to reject the null: the differences are within what chance could produce.

Conditions and limits

Chi-square is only valid when its conditions hold: the data must be frequencies (not percentages or means), categories must be discrete and mutually exclusive, observations must be independent, and expected frequencies should generally be at least five in each category. Small expected values make the test unreliable, so categories are sometimes combined to meet this. The test shows that a difference exists, not how strong or why.

Worked example

Question: test whether a plant species is evenly distributed across four soil types using counts from equal quadrats. [10 marks]

Step 1: State the hypotheses

Null hypothesis: the species is evenly distributed across the four soil types (no association). Alternative: distribution is uneven.

Step 2: Find the expected frequencies

Under even distribution, the expected count for each soil type is the total count divided by four. So if 200 plants were counted in total, each $E = 50$ .

Step 3: Compute the statistic

For each soil type calculate $\dfrac{(O-E)^2}{E}$ using its observed count and $E = 50$ , then sum the four contributions to get $\chi^2$ . Larger observed-expected gaps raise the total.

Step 4: Degrees of freedom, compare and conclude

Degrees of freedom $= 4 - 1 = 3$ . Compare $\chi^2$ with the critical value at 3 degrees of freedom and the 0.05 level. If $\chi^2$ exceeds it, reject the null: the species is significantly unevenly distributed, suggesting abundance is associated with soil type. This complete procedure earns the marks.

Examples in context

Example 1. Shop types around a Singapore neighbourhood centre. A geographer testing whether the mix of shop types (food, services, retail, others) differs from an even spread counts the outlets in each category and applies a goodness-of-fit chi-square with three degrees of freedom. A statistic above the critical value would show the retail mix is significantly uneven, reflecting the centre's specialised function. It illustrates chi-square applied to categorical counts in human geography.

Example 2. Pebble orientation and a depositional process. Recording whether pebbles on a glacial or beach deposit point in particular direction classes, a geographer compares the observed counts in each direction band with an even expected spread using chi-square. A significant result indicates a preferred orientation, evidence of a directional process such as ice or current flow. The example shows the test distinguishing a real spatial pattern from random scatter, while noting it reveals difference, not strength.

Try this

Q1. State the chi-square formula and explain what $O$ and $E$ represent. [2 marks]

Cue. $\chi^2 = \sum \dfrac{(O-E)^2}{E}$ , where $O$ is the observed frequency (the count actually recorded in a category) and $E$ is the expected frequency (the count expected if the null hypothesis, such as an even distribution, were true).

Q2. A goodness-of-fit chi-square test uses five categories. State the degrees of freedom and explain how they are used. [2 marks]

Cue. Degrees of freedom $= n - 1 = 5 - 1 = 4$ ; they select which critical value to read from the chi-square table at the chosen significance level, against which the calculated statistic is compared to decide whether to reject the null.

Q3. Explain why chi-square must be calculated from frequencies rather than percentages. [3 marks]

Cue. The statistic depends on the actual counts because the size of $\dfrac{(O-E)^2}{E}$ reflects real sample sizes; percentages discard that information and would, for example, treat a pattern from 10 observations the same as one from 1000, distorting the statistic and making the significance test invalid.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original10 marksA geographer counts the number of a plant species in equal-sized quadrats on four different soil types and wants to test whether the species is evenly distributed. Explain how a chi-square test would be applied and interpreted.

Show worked answer →

Argument: the chi-square test compares the observed counts with the counts expected under a null of even distribution, and the result is judged against a critical value to decide whether the difference is significant.

Set up the hypotheses: null hypothesis is that the species is evenly distributed across the four soil types (no association between soil type and abundance); the alternative is that distribution is uneven.

Calculate expected values: under an even distribution, the expected count for each soil type is the total count divided by four. Then apply $\chi^2 = \sum \dfrac{(O - E)^2}{E}$ , where $O$ is observed and $E$ expected, summing across the four categories.

Find degrees of freedom: for this goodness-of-fit case, degrees of freedom $= n - 1 = 4 - 1 = 3$ , where $n$ is the number of categories.

Test significance: compare the calculated $\chi^2$ with the critical value at 3 degrees of freedom and the 0.05 level. If $\chi^2$ exceeds the critical value, reject the null, concluding the species is unevenly distributed and abundance is significantly associated with soil type.

Markers reward the hypotheses, expected-value calculation, the formula, correct degrees of freedom, and the significance comparison with a conclusion about the null.

Original6 marksExplain the conditions that must be met for a chi-square test to be valid, and why they matter.

Show worked answer →

Argument: the chi-square test is only valid when the data meet certain conditions, and breaking them makes the result unreliable.

State the conditions: the data must be frequencies (counts), not percentages, rates or means; the categories must be discrete and mutually exclusive; observations must be independent; the sample should be reasonably large, with expected frequencies generally at least five in each category; and the total sample is fixed.

Explain why they matter: using percentages instead of raw counts distorts the statistic because it depends on actual frequencies; small expected values inflate the contribution of any single category, making the test over-sensitive and unreliable; non-independent or overlapping categories break the test's assumptions.

Add nuance: if expected values are too small, categories can sometimes be combined to meet the condition. Markers reward listing the key conditions (frequencies, independence, expected values at least five), and explaining that violating them undermines the validity of the result.