Skip to main content
SingaporeFurther MathsSyllabus dot point

How do non-parametric tests such as the sign test and Wilcoxon tests work when we cannot assume a normal distribution?

Apply non-parametric tests including the sign test and the Wilcoxon signed-rank test, and know when they are appropriate

A focused answer to the H2 Further Mathematics outcome on non-parametric tests. When to use distribution-free methods, the sign test for a median, the Wilcoxon signed-rank test, the test statistics and how to reach a conclusion.

Generated by Claude Opus 4.811 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

SEAB wants you to apply non-parametric (distribution-free) tests, the sign test and the Wilcoxon signed-rank test, to know when they are appropriate (when the normality assumption of a tt- or zz-test cannot be made), to set up the hypotheses about a median or a difference, to compute the test statistic, and to reach a conclusion.

The answer

When non-parametric tests are used

Parametric tests (the zz-test and tt-test) assume the data come from a normal distribution (or that the sample is large enough for the Central Limit Theorem). When this cannot be assumed, because the distribution is clearly non-normal, the sample is small, or the data are only ordinal (ranks), a non-parametric test makes far weaker assumptions and is preferred. These tests are about the median rather than the mean.

The sign test

The sign test tests a hypothesis about the median (or that paired differences have median zero). For paired data, record the sign of each difference (positive or negative), discarding any zero differences. Under H0H_0 a positive and a negative sign are equally likely, so the number of one sign follows

XB(n,0.5),X \sim \mathrm{B}(n, 0.5),

where nn is the number of non-zero differences. The test is then a binomial tail probability, exactly as for a proportion of 0.50.5.

The Wilcoxon signed-rank test

The Wilcoxon signed-rank test also uses the differences but keeps more information: it ranks the absolute differences, then sums the ranks of the positive (or negative) differences to form the test statistic TT. Because it uses the magnitudes as well as the signs, it is more powerful than the sign test when the symmetry assumption it requires holds. The statistic is compared with critical values from Wilcoxon tables (or a normal approximation for large nn).

Reaching a conclusion

As with any test: state H0H_0 and H1H_1, compute the statistic, compare with the critical value (or find the pp-value), and conclude in context. For the sign test the comparison is a binomial tail; for Wilcoxon it is against the tabulated TT critical value, where a small TT gives significance.

Examples in context

Example 1. Before-and-after studies. A small trial measuring each subject before and after an intervention, with no reason to assume normal differences, is the classic setting for the sign test or Wilcoxon test, which is why these appear throughout psychology and medical pilot studies.

Example 2. Ordinal survey data. When respondents rank preferences on a scale that is not truly numerical, only a non-parametric test is valid, because the differences between ranks are not meaningful as measured quantities, a common situation in market research.

Try this

Q1. When is a non-parametric test preferred over a tt-test? [2 marks]

  • Cue. When normality cannot be assumed: non-normal data, a small sample, or ordinal (rank) data.

Q2. Under H0H_0, what distribution does the sign-test count follow? [1 mark]

  • Cue. B(n,0.5)\mathrm{B}(n, 0.5), where nn is the number of non-zero differences.

Q3. What is the main disadvantage of the sign test compared with the Wilcoxon signed-rank test? [1 mark]

  • Cue. It ignores the magnitudes of the differences, so it has lower power.

Exam-style practice questions

Practice questions written in the style of SEAB exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Original6 marksTen people rate a product before and after a redesign. Eight rate it higher afterwards and two rate it lower. Use a sign test at the 5%5\% level to test whether the redesign improved ratings.
Show worked answer →

Let pp be the probability that an individual rates the product higher after the redesign. Under H0H_0 there is no systematic change, so H0:p=0.5H_0: p = 0.5; the alternative (improvement) is H1:p>0.5H_1: p > 0.5.

Under H0H_0, the number of "higher" responses XB(10,0.5)X \sim \mathrm{B}(10, 0.5). We observed X=8X = 8 higher out of 1010. The one-tailed pp-value is

P(X8p=0.5)=P(8)+P(9)+P(10).\mathrm{P}(X \geq 8 \mid p = 0.5) = \mathrm{P}(8) + \mathrm{P}(9) + \mathrm{P}(10).

Using B(10,0.5)\mathrm{B}(10, 0.5): P(8)=45/1024\mathrm{P}(8) = 45/1024, P(9)=10/1024\mathrm{P}(9) = 10/1024, P(10)=1/1024\mathrm{P}(10) = 1/1024, total 56/10240.054756/1024 \approx 0.0547.

Since 0.0547>0.050.0547 > 0.05, we do not reject H0H_0 at the 5%5\% level: there is insufficient evidence of an improvement.

Markers reward the hypotheses on the median/sign with p=0.5p = 0.5, the binomial tail P(X8)0.0547\mathrm{P}(X \geq 8) \approx 0.0547, comparison with 0.050.05, and the conclusion not to reject H0H_0.

Original6 marksExplain when a non-parametric test should be preferred over a tt-test or zz-test, and state one advantage and one disadvantage of the sign test.
Show worked answer →

A non-parametric (distribution-free) test should be preferred when the assumptions of the parametric test are not met: in particular when the population cannot be assumed normal, the sample is small so the Central Limit Theorem does not rescue normality, or the data are ordinal (ranks) rather than measured on an interval scale.

The sign test only uses the sign of each difference (whether each value is above or below the hypothesised median), so it makes very weak assumptions: an advantage is robustness, since it works for any continuous distribution and resists outliers. A disadvantage is that, by discarding the magnitudes of the differences, it uses little of the information in the data and so has lower power than tests that use more (such as the Wilcoxon signed-rank test or a tt-test when valid).

Markers reward the condition (non-normal, small sample or ordinal data), the robustness advantage, and the low-power disadvantage from ignoring magnitudes.

Related dot points