P-Values Are Evidence Measures, Not Confidence Scores

The Thesis

A p-value is not a confidence score, not the probability that the null hypothesis is true, and not a direct measure of how important a result is. For CFA Level I hypothesis testing, treat the p-value as a calibrated evidence measure: if the null hypothesis were true, how unusual would a result at least this extreme be?

That wording matters because it fixes the most common reversal. A larger p-value is weaker evidence against the null, not stronger evidence. A smaller p-value says the observed sample result would be relatively unusual under the null hypothesis, so rejecting the null becomes easier to justify at the chosen significance level.

The exam decision rule is compact:

Reject H0 if p-value <= alpha.
Fail to reject H0 if p-value > alpha.

The intuition behind the rule is what keeps candidates from swapping the conclusion under pressure.

The Decision Sequence

Step 1: State the Null Before Looking at the Result

The null hypothesis is the baseline statement tested by the evidence. In investment examples, it is often a no-effect, no-difference, or equality claim:

A manager's average active return is zero.
A portfolio beta equals one.
A credit model's mean forecast error equals zero.

The alternative hypothesis says what kind of departure would matter: different from zero, greater than zero, less than zero, or not equal to a benchmark.

Step 2: Choose Alpha Before Interpreting the P-Value

Alpha is the maximum Type I error rate the analyst is willing to tolerate. If alpha = 0.05, the analyst is willing to reject a true null in 5% of repeated tests. A stricter test, such as alpha = 0.01, demands stronger sample evidence before rejecting.

The p-value is compared with alpha after the test statistic is computed. It should not be used to choose alpha after the fact.

Step 3: Compare P-Value to Alpha

flowchart TD A["Start with H0 and H1"] --> B["Set alpha before testing"] B --> C["Compute test statistic"] C --> D["Find p-value from the relevant tail or tails"] D --> E{"Is p-value <= alpha?"} E -->|Yes| F["Reject H0"] E -->|No| G["Fail to reject H0"] F --> H["Check economic significance separately"] G --> H

The decision is mechanical once the hypotheses, tails, and alpha are set. The interpretation is where candidates usually lose the thread.

Worked Example: Active Return Test

Hale Ridge Advisors claims its small-cap strategy has positive monthly active return after fees. An analyst tests the claim using 48 months of data.

Null hypothesis: average monthly active return is less than or equal to zero.
Alternative hypothesis: average monthly active return is greater than zero.
Chosen significance level: alpha = 0.05.
Sample mean active return: 0.31% per month.
Standard error of the sample mean: 0.14%.
Test statistic: 0.31 / 0.14 = 2.21.
One-tailed p-value: 0.014.

Because 0.014 <= 0.05, the analyst rejects the null at the 5% significance level. The sample evidence is strong enough to support the claim of positive average active return at that alpha.

But notice what changes at a stricter level. If the investment committee required alpha = 0.01, the decision would be different because 0.014 > 0.01. The same data can reject at 5% and fail to reject at 1%. That is not a contradiction. It simply means the result is strong enough for a 5% Type I error tolerance but not strong enough for a 1% tolerance.

Minimum Alpha Interpretation

Another useful way to read the p-value is this:

The p-value is the smallest alpha at which the test would reject the null.

If the p-value is 0.014, the test rejects at any alpha of 1.4% or higher. It fails to reject at any alpha below 1.4%.

This definition is especially helpful when answer choices ask for a "minimum significance level" or give several possible alpha levels. You do not need to recompute the test. Compare each alpha to the p-value.

P-value	Alpha	Decision
0.014	0.10	Reject H0
0.014	0.05	Reject H0
0.014	0.01	Fail to reject H0

What A P-Value Does Not Say

It Is Not The Probability The Null Is True

A p-value of 0.04 does not mean there is a 4% probability that the null hypothesis is true. Classical hypothesis testing assumes the null is true for the purpose of calculating the probability of the observed evidence. It does not assign a direct probability to the null itself.

It Is Not The Probability The Result Happened By Chance

Candidates often say a p-value is "the chance the result happened by chance." That phrase is too vague for exam use. The cleaner version is: the probability of observing a result at least as extreme as the sample result, assuming the null hypothesis is true and the model assumptions hold.

It Is Not Economic Significance

Statistical significance answers whether the observed result is hard to reconcile with the null. Economic significance asks whether the size of the effect matters.

Suppose an index strategy shows a statistically significant mean active return of 0.02% per year because the dataset covers thousands of observations. The p-value might be tiny, but a two-basis-point annual edge may disappear after trading costs, taxes, or tracking constraints. The CFA exam often separates those ideas: reject the null if the p-value is small enough, then ask whether the magnitude changes an investment decision.

Exam Traps

Trap 1: Reversing The Direction

If p-value = 0.28, the result is not "28% significant." It is weak evidence against the null at common alpha levels such as 10%, 5%, or 1%.

Trap 2: Comparing The Test Statistic To Alpha

Alpha is compared to the p-value. Critical values are compared to test statistics. Do not mix the routes.

Trap 3: Using A Two-Tailed P-Value For A One-Tailed Question

The p-value must match the alternative hypothesis. A one-tailed test places all rejection probability in one tail. A two-tailed test splits alpha across both tails. If the question gives the p-value directly, use it as stated. If you compute it, match the tail structure.

Trap 4: Saying "Accept The Null"

If the p-value is greater than alpha, the usual conclusion is "fail to reject the null." The sample did not provide enough evidence against the null. That is weaker than proving the null is true.

Exam Framing

For CFA Level I, the p-value decision is usually a quick scoring opportunity when the setup is clean. The harder part is language. Keep these three phrases stable:

Smaller p-values mean stronger evidence against the null.
Reject the null when p-value <= alpha.
Statistical significance does not automatically imply economic significance.

When a question asks for the minimum significance level, read the p-value as the answer threshold. When a question asks for the decision at a stated alpha, compare directly. When a question asks for interpretation, avoid claiming the p-value is the probability that the null is true.

P-Values Are Evidence Measures, Not Confidence Scores

P-Values Are Evidence Measures, Not Confidence Scores

The Thesis

The Decision Sequence

Step 1: State the Null Before Looking at the Result

Step 2: Choose Alpha Before Interpreting the P-Value

Step 3: Compare P-Value to Alpha

Worked Example: Active Return Test

Minimum Alpha Interpretation

The p-value is the smallest alpha at which the test would reject the null.

What A P-Value Does Not Say

It Is Not The Probability The Null Is True

It Is Not The Probability The Result Happened By Chance

It Is Not Economic Significance

Exam Traps

Trap 1: Reversing The Direction

Trap 2: Comparing The Test Statistic To Alpha

Trap 3: Using A Two-Tailed P-Value For A One-Tailed Question

Trap 4: Saying "Accept The Null"

Exam Framing

Ready to level up your exam prep?

Related Articles

Brinson Performance Attribution: Reading Allocation, Selection, and Interaction Effects (CFA Level III)

Irrevocable Trust Income Taxation: Compressed Brackets, Grantor Rules, and Special Needs Planning (CFA Level III)

Wealth Transfer at Death: Bequests, Inheritance, and the At-Death Estate Planning Toolkit (CFA Level III)