Why Two Measurement Periods Change Everything

A single measurement tells you where you are. Two measurements — separated by meaningful time and targeted intervention — tell you whether anything actually changed, by how much, and why. This distinction is not semantic. It is the difference between evidence and assumption, between proof and belief. It is the scientific foundation of PDA's two-phase design.

The Baseline Problem: You Cannot Measure Change Without a Starting Point

This statement sounds obvious. Yet the vast majority of learning and development programmes, coaching engagements, and organisational interventions are designed, delivered, and evaluated without one. A workshop is run. Participants fill out a satisfaction survey. The facilitator receives positive feedback. The organisation concludes the programme was successful. None of this constitutes measurement of change — because without a baseline, there is no reference point against which to compare the post-intervention state.

The problem is not that organisations do not care about outcomes. Most do. The problem is that establishing a valid baseline requires investment — in time, in methodology, and in the discipline to measure before the intervention rather than only after. Historically, the tools required to do this rigorously were expensive, slow, and specialist. The result: the L&D industry built an entire culture of programme evaluation that measures satisfaction (easy) rather than behavioural change (hard).

89%

of L&D leaders cannot prove programme ROI to their board The primary reason is the absence of pre-measurement. Without a Phase 1 baseline, Level 3 and Level 4 evaluation (Kirkpatrick) are structurally impossible — regardless of how well the programme is designed or delivered.

HBR / Deloitte Human Capital Trends Report; see also Phillips, J.J. (1997)

60 Years of Evaluation Science: The Pre-Post Research Design

In 1963, Donald Campbell and Julian Stanley published Experimental and Quasi-Experimental Designs for Research — a paper that became the foundational text of evaluation methodology. Campbell and Stanley established a hierarchy of research designs based on their ability to produce valid causal inferences: to determine not just whether something changed, but whether the intervention caused it to change.

At the top of their hierarchy sat the true experimental design: random assignment, control group, pre-measurement, post-measurement. Below that sat a series of quasi-experimental designs for real-world settings where true randomisation is impossible. At the bottom — and explicitly identified as incapable of supporting causal inference — sat the post-test-only design: measuring outcomes after an intervention with no pre-measurement baseline.

The post-test-only design is, structurally, what most L&D evaluation looks like today. Participants are assessed after the programme. Scores are interpreted as evidence of programme impact. Campbell and Stanley demonstrated, more than 60 years ago, why this inference is invalid: without a baseline, the post-programme score could reflect pre-existing capability, natural maturation, historical events, or any number of factors unrelated to the programme itself.

"The fundamental problem of programme evaluation is not measurement — it is the absence of a counterfactual. Without knowing where participants started, we cannot know where the programme took them."

Campbell, D.T. & Stanley, J.C. (1963), Experimental and Quasi-Experimental Designs for Research

The Kirkpatrick-Phillips Framework: Why Levels 3 and 4 Require Two Points in Time

Donald Kirkpatrick's four-level evaluation model, first published in 1959, remains the most widely used framework for L&D evaluation in the world. The four levels measure reaction (participant satisfaction), learning (knowledge acquisition), behaviour (transfer to the workplace), and results (organisational impact).

Levels 1 and 2 can be measured at a single point in time, immediately after a programme. Levels 3 and 4 cannot — by definition. Behaviour change requires time to manifest and requires a comparison point. Results require a reference against which to measure improvement. Jack Phillips' later ROI methodology, which adds a fifth level (return on investment), requires the same pre-post architecture as Kirkpatrick's Levels 3 and 4.

Phase 1 — IMPACT

Baseline Measurement

Establishes the quantified starting point across all diagnostic dimensions. Equivalent to Kirkpatrick Level 2 pre-assessment.

↓

4–6 mo

Phase 2 — DELTA

Change Measurement

Quantifies the delta against the Phase 1 baseline. Enables Kirkpatrick Level 3–4 and Phillips ROI calculation.

The PDA two-phase design is the structural implementation of this 60-year-old evidence base. Phase 1 (IMPACT) establishes the baseline — the quantified starting point across all diagnostic dimensions. Phase 2 (DELTA) measures the same dimensions after targeted interventions and calculates the statistical difference. The result is what Kirkpatrick Levels 3 and 4 require: a valid, time-separated, comparable measurement of change.

The Regression to the Mean Problem

There is a second, less widely understood reason why single-point measurement misleads: regression to the mean. First identified by Francis Galton in 1886 and formalised in statistical theory, regression to the mean describes the tendency for extreme measurements to move toward the average on subsequent measurement, regardless of any intervention.

In practical terms: if a team is in a period of acute stress when measured, their scores on stress indicators will be high. If they are measured again six months later — with or without any intervention — their scores will typically be lower, simply because acute stress states are not sustained indefinitely. A coach who intervenes after a high-stress measurement and reports improvement six months later may be observing natural regression to the mean rather than the effect of their coaching.

The only way to distinguish real intervention effects from natural regression to the mean is to compare the rate of change against a validated norm, or to use a within-subject design that tracks the pattern of change over multiple time points. PDA's two-phase methodology addresses this by establishing a population-adjusted baseline in Phase 1 and measuring DELTA change against that adjusted reference in Phase 2 — separating the signal of genuine improvement from the noise of natural fluctuation.

The 4–6 Month Window: Why Timing Matters

The 4–6 month interval between Phase 1 and Phase 2 is not arbitrary. It reflects the convergence of several independent lines of research on the timescales of meaningful behavioural and organisational change.

Bandura's self-efficacy research (1977, 1997) demonstrates that sustained behavioural change requires repeated performance experiences — a process that typically unfolds over 3–6 months in organisational settings.
Edmondson's team learning research (1999, 2018) shows that psychological safety — one of the most important dimensions PDA measures — changes through sustained leadership behaviour over time, with meaningful shifts typically observable after 3–4 months of consistent intervention.
Too short an interval (less than 3 months) means interventions have not had sufficient time to produce observable behavioural change. Phase 2 data reflects the programme more than the participants.
Too long an interval (more than 9 months) introduces confounding variables — organisational changes, team composition shifts, external events — that make it difficult to attribute observed changes to the intervention.

The 4–6 month window optimises for intervention maturation while minimising confounding. It is long enough for real change to manifest and short enough to maintain causal attribution to the programme.

What One-Phase Programmes Cannot Prove

The consequences of the absence of Phase 2 measurement are not theoretical. They are experienced daily by coaches and HR professionals who deliver excellent programmes but cannot defend their value when the renewal conversation comes. Without Phase 2 data, the following questions are unanswerable with evidence:

Did engagement improve — and by how much compared to where we started?
Did the leadership perception gap narrow as a result of the coaching programme?
Which of the priority dimensions identified in Phase 1 actually responded to the intervention?
What was the statistical return on the organisation's investment in this programme?
Should we run this programme again — and if so, which components drove the most change?

These are not supplementary questions. They are the questions that CFOs ask, that boards ask, and that procurement departments ask when a programme is up for renewal. Without Phase 2 data, the answer to all of them is: "We believe the programme was effective." With Phase 2 data, the answer is: "Stress indicators fell by 41%. Engagement rose by 28%. Leadership perception scores improved by 0.8 standard deviations. Here is the board-ready evidence."

The Competitive Advantage of Measurement

For organisations and for the coaches and consultants who serve them, the two-phase methodology is ultimately about competitive advantage. Organisations that can measure the ROI of their people investment make better decisions about where to invest next. Coaches and consultants who can demonstrate measurable impact win renewals, referrals, and the ability to command premium fees.

The irony is that most barriers to measurement are not technical — they are structural. The tools required to establish a rigorous pre-post measurement framework at the team level have historically been expensive, complex, and specialist. PDA Platform removes these barriers, making board-ready two-phase measurement accessible to any team, at any scale, delivered by any qualified coach or HR professional.

The science behind why this matters has been established for 60 years. The only thing that was missing was the means to act on it.

Scientific References

Campbell, D.T. & Stanley, J.C. (1963). Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin.
Kirkpatrick, D.L. (1959). Techniques for evaluating training programs. Journal of the American Society of Training Directors, 13, 3–9.
Phillips, J.J. (1997). Return on Investment in Training and Performance Improvement Programs. Butterworth-Heinemann.
Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215.
Bandura, A. (1997). Self-Efficacy: The Exercise of Control. Freeman.
Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute, 15, 246–263.
Edmondson, A. (1999). Psychological safety and learning behavior in work teams. Administrative Science Quarterly, 44(2), 350–383.
Edmondson, A. (2018). The Fearless Organization. Wiley.
Deloitte (2024). Global Human Capital Trends. Deloitte Insights.

Why Two Measurement PeriodsChange Everything