Valid Benchmarks: How To Test Whether Performance Comparison Is Fair

A benchmark is not just a label next to a return number. It is the yardstick used to decide whether a manager added value, took unintended risk, or simply benefited from an easy comparison. On the CFA exam, a benchmark is valid only when it is capable of representing the manager's actual investment mandate before performance is judged.

The practical test is simple: can an informed client and manager agree, before the evaluation period, exactly what portfolio is being used as the comparison and why it fits the mandate? If not, alpha, tracking error, and manager ranking become much less meaningful.

flowchart TD A["Proposed performance benchmark"] --> B{"Was it set before evaluation?"} B -- "No" --> X["Reject: hindsight selection risk"] B -- "Yes" --> C{"Can its securities and weights be identified?"} C -- "No" --> Y["Reject: ambiguous or not measurable"] C -- "Yes" --> D{"Could the manager reasonably invest in it?"} D -- "No" --> Z["Reject: uninvestable comparison"] D -- "Yes" --> E{"Does it match mandate and style?"} E -- "No" --> W["Reject: inappropriate yardstick"] E -- "Yes" --> F["Use benchmark for return and risk evaluation"]

The Benchmark Must Exist Before The Scorecard

The cleanest benchmark is specified before the period being evaluated. That prevents benchmark shopping. If a manager trails one index but beats another, choosing the second index after the fact does not prove skill.

Suppose Harborstone Equity manages a small-cap value mandate. At the start of the year, the client statement names the North Harbor Small Value Index as the benchmark. The manager earns 9.4% while the benchmark earns 8.1%, so the comparison begins with 1.3% excess return before fees and risk adjustments.

Now change the facts. The manager earns 9.4%, reviews a list of possible indexes after year-end, and then chooses the easiest comparison. That benchmark may be useful as background information, but it is not a valid performance yardstick for manager accountability.

Exam Signal

Words such as after results, selected at year-end, or best-fitting index after the period usually point to an invalid benchmark. The defect is not the return calculation. The defect is hindsight.

A Valid Benchmark Must Be Clear And Measurable

A benchmark should be unambiguous. A statement such as compare to the market is not enough. The evaluator needs to know the exact index, custom blend, rebalancing rule, and return basis.

For a custom benchmark, weights matter. If Marin Foundation assigns 60% to a domestic equity index and 40% to an intermediate bond index, the benchmark return can be measured. If the policy simply says balanced portfolio benchmark, the comparison is too vague.

Worked Example

Assume the custom benchmark is:

60% Harbor Domestic Equity Index, return 10.0%
40% Crest Intermediate Bond Index, return 4.0%

Benchmark return:

(0.60 x 10.0%) + (0.40 x 4.0%) = 7.6%

If the balanced manager earns 8.4%, the simple excess return is:

8.4% - 7.6% = 0.8%

That comparison works only because the benchmark was stated precisely enough to measure.

Investability And Appropriateness Are Different Tests

An investable benchmark is one the manager could reasonably replicate or hold as an alternative. A benchmark can be measurable yet still not investable. For example, an index made from stale appraisal values, unavailable private holdings, or impossible transaction assumptions may not support a clean manager evaluation.

Appropriateness asks a different question: does the benchmark match the manager's mandate, constraints, and style? A global growth equity manager should not be judged against a domestic value index just because both are equity indexes.

Style Match Example

Ashfield Partners is hired to run an investment-grade short-duration credit portfolio with a maximum maturity of five years. A broad aggregate bond index includes long-duration government bonds, mortgage securities, and issuers outside the mandate. Even if that broad index is published and measurable, it may be inappropriate for Ashfield's assignment.

The better benchmark should reflect:

permitted asset classes,
credit quality constraints,
duration range,
currency exposure,
liquidity expectations,
and the manager's stated style.

Accountability Requires Agreement

The benchmark also needs to be accepted by the parties responsible for evaluating the manager. This does not mean the manager can choose an easy target. It means the client, consultant, and manager understand the yardstick and can use it consistently.

If the manager disclaims the benchmark whenever performance is poor, the benchmark has weak accountability value. If the client changes the benchmark whenever the strategic allocation changes, the benchmark documentation should explain the effective date and reason for the change.

How Benchmark Defects Distort Performance Metrics

Invalid benchmarks can make ordinary performance measures misleading:

Alpha can look positive because the benchmark is too easy.
Tracking error can look high because the benchmark has the wrong risk exposures.
Information ratio can look low or high for reasons unrelated to manager skill.
Style analysis can misdiagnose the manager if the benchmark does not match the mandate.

For example, a short-duration credit manager compared to a long-duration aggregate benchmark may appear to underperform when interest rates fall because the benchmark benefits from longer duration. That result may say more about benchmark mismatch than manager weakness.

CFA Exam Framing

CFA questions often test benchmark validity through facts rather than definitions. Slow down when the vignette gives:

a benchmark chosen after the evaluation period,
a vague phrase like market benchmark,
a custom blend without weights,
a benchmark that violates the mandate's constraints,
an index the manager could not reasonably invest in,
or a manager who refuses accountability for the stated benchmark.

The safest exam workflow is:

Identify the mandate.
Identify the proposed benchmark.
Test timing, clarity, measurability, investability, mandate fit, and accountability.
Interpret performance metrics only after the benchmark passes those tests.

When the benchmark fails, do not overinterpret alpha. The right answer is often that the performance comparison itself is unreliable.

Valid Benchmarks: How To Test Whether Performance Comparison Is Fair