A
AcadiFi
Core Conceptscfa

Hypothesis Testing in CFA Level I: Start With the Decision, Then Compute

AcadiFi Editorial·2026-05-20·14 min read

Hypothesis Testing in CFA Level I: Start With the Decision, Then Compute

Hypothesis testing becomes much easier when you stop treating it as a pile of formulas and start treating it as a decision process. The question is not "Which statistic do I memorize?" The question is: if the null hypothesis were true, would this sample result be unusually far away from what I should expect?

For CFA Level I, the exam focus is usually the logic chain: state the null and alternative, choose the significance level, calculate or interpret the test statistic, compare the evidence to the decision rule, and connect the conclusion to Type I and Type II errors.

The Decision Sequence

Every hypothesis test can be organized into five moves.

flowchart TD A["State H0 and Ha"] --> B["Choose significance level"] B --> C["Compute test statistic or p-value"] C --> D{"Evidence extreme enough?"} D -- "Yes" --> E["Reject H0"] D -- "No" --> F["Fail to reject H0"] E --> G["Risk: Type I error"] F --> H["Risk: Type II error"]

The most important wording is "fail to reject." You are not proving the null hypothesis true. You are saying the sample evidence is not strong enough, at the chosen significance level, to reject it.

Null and Alternative Hypotheses

The null hypothesis is the baseline assumption. In investment research, it often represents no effect, no difference, or no abnormal performance.

One-Tailed Versus Two-Tailed Tests

Use a two-tailed test when the analyst cares about a difference in either direction. Use a one-tailed test when the analyst has a directional claim before seeing the data.

Example:

  • Two-tailed: "Is the portfolio's mean active return different from zero?"
  • Right-tailed: "Is the portfolio's mean active return greater than zero?"
  • Left-tailed: "Is the mean tracking error below the mandate limit?"

The alternative hypothesis determines where the rejection region sits.

Worked Example: Testing Active Return

Juniper Fund Research is evaluating a satellite equity manager. The analyst collects 36 monthly active returns and observes:

  • Sample mean active return: 0.42% per month
  • Sample standard deviation: 1.50% per month
  • Sample size: 36
  • Null hypothesis: mean active return equals 0%
  • Alternative hypothesis: mean active return is not equal to 0%
  • Significance level: 5%

Because the population variance is unknown, use a t-statistic:

t = (sample mean - hypothesized mean) / (sample standard deviation / sqrt(n))
t = (0.42 - 0.00) / (1.50 / sqrt(36))
t = 0.42 / 0.25
t = 1.68

With 35 degrees of freedom, the approximate two-tailed 5% critical values are around +/-2.03. The observed test statistic of 1.68 is not beyond the rejection cutoff.

Conclusion: fail to reject the null hypothesis. The sample shows positive average active return, but not enough statistical evidence at the 5% two-tailed significance level to conclude that the true mean active return differs from zero.

P-Value and Critical Value Are Two Paths to the Same Decision

A critical value approach asks: did the test statistic fall in the rejection region?

A p-value approach asks: assuming the null is true, how unusual is this sample result or a more extreme result?

For the Juniper example, a p-value above 5% leads to the same conclusion as a t-statistic inside the critical values: fail to reject. If the p-value were 3%, the result would be significant at 5% but not necessarily at 1%.

Type I and Type II Errors

The error labels are easier if you connect them to the decision.

Type I Error

A Type I error occurs when the analyst rejects a true null hypothesis. The significance level is the probability of a Type I error when the null is true.

Investment example: Juniper fires a manager because the analyst concludes skill is negative, but the manager's true expected active return is actually zero. The analyst acted on a false alarm.

Type II Error

A Type II error occurs when the analyst fails to reject a false null hypothesis.

Investment example: Juniper keeps treating a manager as average even though the manager truly has positive skill. The analyst missed a real effect.

Power

Power is the probability of rejecting the null when the alternative is true. Higher power means the test is more likely to detect a real effect. Larger sample sizes, lower noise, and larger true effects generally increase power.

Why Standard Error Matters

The standard error measures how much sample means vary around the population mean. It links the central limit theorem to the test statistic.

In the Juniper example, the monthly active return volatility is 1.50%, but the standard error of the sample mean is:

1.50% / sqrt(36) = 0.25%

The test statistic asks how many standard errors the sample mean is away from the hypothesized mean. A result 1.68 standard errors away may look interesting, but it is not extreme enough for a 5% two-tailed test in this setup.

Exam Framing

When a CFA Level I item gives a hypothesis test, read it in this order:

  1. Identify the parameter being tested.
  2. Determine whether the alternative is one-tailed or two-tailed.
  3. Match the statistic to the setup.
  4. Compare the p-value to alpha or the test statistic to the critical value.
  5. State the conclusion without overstating it.

The exam trap is usually language. "Fail to reject" is not the same as "accept." "Statistically significant" is not the same as "economically important." A small p-value supports rejecting the null, but it does not tell you whether the effect is large enough to matter in a portfolio.

Ready to level up your exam prep?

Join 2,400+ finance professionals using AcadiFi to prepare for CFA, FRM, and other certification exams.

Related Articles