Skip to main content
SingaporeGeographySyllabus dot point

How do you test whether an observed pattern of frequencies differs from what you would expect by chance?

Apply the chi-square test to compare observed and expected frequencies, use degrees of freedom and critical values, and interpret statistical significance

A focused answer to the H2 Geography skill of significance testing with chi-square. Observed versus expected frequencies, the chi-square formula, degrees of freedom, comparing the statistic with critical values, the role of the significance level, and the test's conditions and limits.

Generated by Claude Opus 4.811 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

SEAB wants you to apply the chi-square test, comparing observed frequencies with expected ones, to work out degrees of freedom, to compare the statistic with a critical value, and to interpret significance. The central insight is that chi-square answers a different question from correlation: instead of asking whether two measured variables move together, it asks whether a pattern of counts across categories differs from what chance alone would produce, which is exactly the test for questions about distribution and association.

The answer

What chi-square tests

The chi-square test (χ2\chi^2) compares the observed frequencies (the counts you actually recorded) with the expected frequencies (the counts you would expect if the null hypothesis were true). It works on frequency data, counts in categories, not on percentages, rates or means. Typical geographical uses are testing whether a feature is evenly distributed across categories (goodness of fit) or whether two categorical variables are associated.

The formula

The test statistic is:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}

where OO is each observed frequency and EE is the corresponding expected frequency. Each category contributes (O−E)2E\dfrac{(O-E)^2}{E}; the larger the gaps between observed and expected, the larger χ2\chi^2 becomes, signalling a bigger departure from the null.

The method, step by step

  1. State the hypotheses. The null is usually "no difference from expectation" or "no association"; the alternative is that a real difference or association exists.
  2. Find the expected frequencies. For an even distribution across kk categories, EE is the total divided by kk. For an association (contingency) table, EE for each cell is (row total times column total) divided by the grand total.
  3. Compute (O−E)2E\dfrac{(O-E)^2}{E} for every category and sum them to get χ2\chi^2.
  4. Find the degrees of freedom.
  5. Compare with the critical value and decide on the null.

Degrees of freedom

The degrees of freedom set which critical value to use:

  • For a goodness-of-fit test across nn categories: degrees of freedom =n−1= n - 1.
  • For a contingency table: degrees of freedom =(rows−1)(columns−1)= (\text{rows} - 1)(\text{columns} - 1).

For example, four soil types in a goodness-of-fit test give 4−1=34 - 1 = 3 degrees of freedom.

Testing significance

Compare the calculated χ2\chi^2 with the critical value for those degrees of freedom at a chosen significance level (commonly 0.05):

  • If χ2\chi^2 is greater than or equal to the critical value, the result is significant: reject the null hypothesis, concluding the observed pattern differs from chance more than would be expected.
  • If χ2\chi^2 is below the critical value, fail to reject the null: the differences are within what chance could produce.

Conditions and limits

Chi-square is only valid when its conditions hold: the data must be frequencies (not percentages or means), categories must be discrete and mutually exclusive, observations must be independent, and expected frequencies should generally be at least five in each category. Small expected values make the test unreliable, so categories are sometimes combined to meet this. The test shows that a difference exists, not how strong or why.

Examples in context

Example 1. Shop types around a Singapore neighbourhood centre. A geographer testing whether the mix of shop types (food, services, retail, others) differs from an even spread counts the outlets in each category and applies a goodness-of-fit chi-square with three degrees of freedom. A statistic above the critical value would show the retail mix is significantly uneven, reflecting the centre's specialised function. It illustrates chi-square applied to categorical counts in human geography.

Example 2. Pebble orientation and a depositional process. Recording whether pebbles on a glacial or beach deposit point in particular direction classes, a geographer compares the observed counts in each direction band with an even expected spread using chi-square. A significant result indicates a preferred orientation, evidence of a directional process such as ice or current flow. The example shows the test distinguishing a real spatial pattern from random scatter, while noting it reveals difference, not strength.

Try this

Q1. State the chi-square formula and explain what OO and EE represent. [2 marks]

  • Cue. χ2=∑(O−E)2E\chi^2 = \sum \dfrac{(O-E)^2}{E}, where OO is the observed frequency (the count actually recorded in a category) and EE is the expected frequency (the count expected if the null hypothesis, such as an even distribution, were true).

Q2. A goodness-of-fit chi-square test uses five categories. State the degrees of freedom and explain how they are used. [2 marks]

  • Cue. Degrees of freedom =n−1=5−1=4= n - 1 = 5 - 1 = 4; they select which critical value to read from the chi-square table at the chosen significance level, against which the calculated statistic is compared to decide whether to reject the null.

Q3. Explain why chi-square must be calculated from frequencies rather than percentages. [3 marks]

  • Cue. The statistic depends on the actual counts because the size of (O−E)2E\dfrac{(O-E)^2}{E} reflects real sample sizes; percentages discard that information and would, for example, treat a pattern from 10 observations the same as one from 1000, distorting the statistic and making the significance test invalid.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original10 marksA geographer counts the number of a plant species in equal-sized quadrats on four different soil types and wants to test whether the species is evenly distributed. Explain how a chi-square test would be applied and interpreted.
Show worked answer →

Argument: the chi-square test compares the observed counts with the counts expected under a null of even distribution, and the result is judged against a critical value to decide whether the difference is significant.

Set up the hypotheses: null hypothesis is that the species is evenly distributed across the four soil types (no association between soil type and abundance); the alternative is that distribution is uneven.

Calculate expected values: under an even distribution, the expected count for each soil type is the total count divided by four. Then apply χ2=∑(O−E)2E\chi^2 = \sum \dfrac{(O - E)^2}{E}, where OO is observed and EE expected, summing across the four categories.

Find degrees of freedom: for this goodness-of-fit case, degrees of freedom =n−1=4−1=3= n - 1 = 4 - 1 = 3, where nn is the number of categories.

Test significance: compare the calculated χ2\chi^2 with the critical value at 3 degrees of freedom and the 0.05 level. If χ2\chi^2 exceeds the critical value, reject the null, concluding the species is unevenly distributed and abundance is significantly associated with soil type.

Markers reward the hypotheses, expected-value calculation, the formula, correct degrees of freedom, and the significance comparison with a conclusion about the null.

Original6 marksExplain the conditions that must be met for a chi-square test to be valid, and why they matter.
Show worked answer →

Argument: the chi-square test is only valid when the data meet certain conditions, and breaking them makes the result unreliable.

State the conditions: the data must be frequencies (counts), not percentages, rates or means; the categories must be discrete and mutually exclusive; observations must be independent; the sample should be reasonably large, with expected frequencies generally at least five in each category; and the total sample is fixed.

Explain why they matter: using percentages instead of raw counts distorts the statistic because it depends on actual frequencies; small expected values inflate the contribution of any single category, making the test over-sensitive and unreliable; non-independent or overlapping categories break the test's assumptions.

Add nuance: if expected values are too small, categories can sometimes be combined to meet the condition. Markers reward listing the key conditions (frequencies, independence, expected values at least five), and explaining that violating them undermines the validity of the result.

Related dot points