data-scienceSeptember 17, 2025 8 min read

Using the Chi-Square Test to Measure Training Impact: A Practical HR Analytics Guide

Learn how to use the chi-square test to measure training impact on employee performance. Step-by-step HR analytics guide with a worked example.

PeoplePilot Team

PeoplePilot

You invested six figures in a leadership development program last quarter. Completion rates look healthy. Managers are saying good things. But when the CFO asks whether the training actually moved the needle on performance, you need more than anecdotes. You need a statistically defensible answer.

The chi-square test of independence is one of the most accessible yet powerful tools for answering that question. It tells you whether two categorical variables --- like training completion status and performance rating --- are genuinely related or just appear that way by chance. No regression models, no coding bootcamp required. If you can build a pivot table, you can run a chi-square test.

This guide walks you through exactly how to do it, with a real-world HR example you can adapt to your own data today.

Why HR Leaders Need Statistical Proof of Training Impact

Most L&D teams report on activity metrics: enrollment numbers, completion rates, satisfaction scores. These metrics confirm that training happened. They do not confirm that training worked.

The gap between "completed" and "effective" is where organizations waste budget. A Brandon Hall Group study found that only 8% of organizations can demonstrate a clear business impact from their learning programs. The rest rely on Kirkpatrick Level 1 (reaction) data --- essentially, whether people enjoyed the workshop.

The chi-square test bridges that gap by testing whether employees who completed training are statistically more likely to achieve higher performance outcomes. It converts a hopeful correlation into a defensible conclusion.

What the Chi-Square Test Actually Does

The chi-square test of independence examines whether two categorical variables are associated. In HR terms, it answers questions like:

Is training completion status related to performance rating category?
Is participation in a mentorship program associated with promotion outcomes?
Does completing a safety certification relate to incident frequency categories?

The test compares what you actually observe in your data against what you would expect to see if the two variables were completely unrelated. If the gap between observed and expected values is large enough, you can conclude --- with statistical confidence --- that a real relationship exists.

Key Assumptions

Before running the test, confirm these conditions hold:

Categorical data: Both variables must be categorical (e.g., "Completed / Not Completed" and "Exceeds / Meets / Below Expectations"), not continuous numbers.
Independent observations: Each employee appears in only one cell of the table. No one is counted twice.
Adequate sample size: At least 80% of expected cell frequencies should be 5 or greater. If your sample is too small, consider Fisher's exact test instead.

Worked Example: Training Completion vs. Performance Rating

Suppose you rolled out a new data literacy training program across your organization. Six months later, you want to know: are employees who completed the training more likely to receive higher performance ratings?

Step 1: Build the Contingency Table

You pull data from your HRIS and learning management system, cross-referencing training completion status with the most recent performance review cycle. Here is what you find across 300 employees:

| | Exceeds Expectations | Meets Expectations | Below Expectations | Row Total | |---|---|---|---|---| | Training Completed | 60 | 90 | 30 | 180 | | Training Not Completed | 20 | 55 | 45 | 120 | | Column Total | 80 | 145 | 75 | 300 |

At first glance, 33% of trained employees exceed expectations compared to only 17% of untrained employees. But is this difference statistically significant, or could it be random variation?

Step 2: Calculate Expected Frequencies

For each cell, the expected frequency equals (Row Total x Column Total) / Grand Total. This represents what you would see if training and performance were completely independent.

| | Exceeds Expectations | Meets Expectations | Below Expectations | |---|---|---|---| | Training Completed | (180 x 80) / 300 = 48.0 | (180 x 145) / 300 = 87.0 | (180 x 75) / 300 = 45.0 | | Training Not Completed | (120 x 80) / 300 = 32.0 | (120 x 145) / 300 = 58.0 | (120 x 75) / 300 = 30.0 |

All expected frequencies exceed 5, so the chi-square assumptions are satisfied.

Step 3: Compute the Chi-Square Statistic

The formula sums the squared differences between observed (O) and expected (E) values, divided by the expected value, across all cells:

X2 = Sum of [(O - E)2 / E]

Calculating each cell:

(60 - 48)2 / 48 = 3.000
(90 - 87)2 / 87 = 0.103
(30 - 45)2 / 45 = 5.000
(20 - 32)2 / 32 = 4.500
(55 - 58)2 / 58 = 0.155
(45 - 30)2 / 30 = 7.500

X2 = 3.000 + 0.103 + 5.000 + 4.500 + 0.155 + 7.500 = 20.258

Step 4: Determine Statistical Significance

The degrees of freedom equal (rows - 1) x (columns - 1) = (2 - 1) x (3 - 1) = 2.

At a significance level of 0.05, the critical chi-square value for 2 degrees of freedom is 5.991.

Since 20.258 is far greater than 5.991, you reject the null hypothesis. There is a statistically significant relationship between training completion and performance rating. The p-value here is less than 0.001, meaning there is less than a 0.1% probability this result occurred by chance.

Step 5: Interpret the Results for Stakeholders

The numbers tell a clear story. Employees who completed the data literacy training were significantly more likely to exceed performance expectations and significantly less likely to fall below expectations. Specifically:

Trained employees who exceeded expectations: 60 observed vs. 48 expected --- 25% more than chance would predict.
Untrained employees below expectations: 45 observed vs. 30 expected --- 50% more than chance would predict.

This does not prove causation on its own. Employees who voluntarily complete training may be more motivated to begin with. But combined with other evidence --- pre/post assessments, manager observations, controlled rollout designs --- the chi-square result provides a strong quantitative foundation for continued investment in the program.

Practical Tips for Running This Analysis

Start with clean data. The most common failure point is not the statistics --- it is messy data. Ensure training completion records are accurate and performance ratings are standardized across departments. If your learning management system and performance data live in separate tools, invest the time to match records properly.

Choose meaningful categories. Collapsing a 5-point performance scale into 3 categories (Exceeds / Meets / Below) often produces cleaner results and satisfies the minimum expected frequency requirement. Avoid categories with very few observations.

Report effect size, not just significance. A statistically significant result with a tiny effect size may not justify the training investment. Cramer's V is the standard effect size measure for chi-square tests. For this example, Cramer's V = sqrt(20.258 / (300 x 1)) = 0.26, indicating a moderate association.

Automate recurring analyses. If you run this test every review cycle, build it into your analytics workflow so results refresh automatically. This transforms a one-time study into an ongoing measurement system.

Beyond the Basics: When to Use Other Tests

The chi-square test is ideal for categorical data, but some training impact questions require different approaches. If your outcome variable is continuous (like a numeric assessment score rather than a rating category), consider an independent samples t-test or ANOVA. If you need to control for confounding variables like tenure or department, logistic regression may be more appropriate.

The chi-square test is your starting point --- the first credible answer you can bring to a stakeholder meeting. As your analytics maturity grows, you can layer on more sophisticated methods.

Frequently Asked Questions

What sample size do I need for a chi-square test to be reliable?

There is no single minimum sample size, but a practical guideline is that at least 80% of expected cell frequencies should be 5 or greater, and no expected frequency should be less than 1. For a 2x3 contingency table like the example above, this typically means you need at least 60 to 100 observations. If your sample is smaller, use Fisher's exact test instead.

Can the chi-square test prove that training caused improved performance?

No. The chi-square test identifies a statistically significant association between two variables, but it does not establish causation. Employees who complete training may differ from those who do not in motivation, tenure, or role type. To strengthen causal claims, combine chi-square results with controlled study designs, pre/post assessments, and multivariate analysis that accounts for confounding variables.

How is this different from just comparing percentages?

Comparing raw percentages (e.g., "33% of trained employees exceeded expectations vs. 17% of untrained employees") is descriptive, not inferential. It tells you what happened in your sample but not whether the difference is large enough to be meaningful beyond random variation. The chi-square test adds statistical rigor by calculating the probability that the observed difference could have occurred by chance alone. This is the difference between an observation and an evidence-based conclusion.

What tools can I use to run a chi-square test on my HR data?

You can run a chi-square test in Excel (using the CHISQ.TEST function), Google Sheets, Python (scipy.stats.chi2_contingency), R (chisq.test), or any modern analytics platform. The key requirement is a clean contingency table with accurate training and performance data. Platforms like PeoplePilot Analytics can automate the data preparation by connecting your LMS and performance review data in one place, eliminating the manual data matching that often introduces errors.

#data-science #training #analytics #learning

Continue Reading

View All

September 20, 2025 · 9 min read

K-Means Clustering for L&D Personas: A Data-Driven Approach to Personalized Learning

Learn how K-means clustering segments employees into L&D personas so you can deliver targeted training that improves engagement, skills growth, and ROI.

September 13, 2025 · 10 min read

Association Mapping for Learning Suggestions: How to Build Data-Driven Course Recommendations

Learn how association rule mining transforms course completion data into smart learning recommendations that boost L&D engagement and ROI.

September 14, 2025 · 8 min read

Transform L&D Analytics: Measure Training Impact Without Technical Expertise

Apply Kirkpatrick's 4 levels with modern analytics to measure training reaction, learning, behavior, and results — no data science background required.