analyticsSeptember 20, 2025 11 min read

Infant Attrition: Using Survival Analysis and Logistic Regression to Reduce Early Turnover

Analyze first-90-day attrition with Kaplan-Meier curves and logistic regression. Identify onboarding risk factors and build early intervention programs.

PeoplePilot Team

PeoplePilot

The Hidden Cost of Losing Employees Before They Start Contributing

A new hire resigns at day 47. Another ghosts at day 23. A third makes it to day 82, apologizes profusely, and explains the role was "not what was described." Each departure represents a total loss — recruiting costs unrecovered, onboarding hours wasted, team morale dented, and the position back at square one.

First-90-day attrition, often called infant attrition, is one of the most expensive and preventable talent problems in any organization. These employees never reached full productivity. The investment in them returned nothing. And the causes are almost always rooted in fixable onboarding and expectation-setting failures.

The challenge is that infant attrition does not follow the same patterns as general attrition. An employee who leaves at day 30 is driven by different factors than one who leaves after three years. Standard attrition models miss this nuance. Survival analysis and logistic regression, applied specifically to the first 90 days, give you the tools to understand when employees are most at risk of leaving early, what drives those departures, and where to intervene.

Why the First 90 Days Deserve Separate Analysis

Different Drivers, Different Timing

General attrition models are built on years of tenure data and capture long-term drivers like career stagnation, compensation drift, and manager burnout. Infant attrition is driven by immediate experience factors: role clarity, onboarding quality, manager accessibility, peer integration, and the gap between what was promised during recruitment and what was delivered on day one.

Disproportionate Cost

The Society for Human Resource Management estimates that replacing an employee costs 50-200% of their annual salary. For infant attrition, the denominator changes — you spent the full replacement cost but received near-zero productive output. When infant attrition rates climb above 10-15%, the compounding cost of repeated recruiting and onboarding cycles for the same position becomes a material budget drain.

Signal Value

High infant attrition is a leading indicator of systemic problems in your talent acquisition and onboarding pipeline. It tells you that something is broken between the offer letter and the 90-day mark — and fixing it has upstream benefits for employer brand, recruiter credibility, and team stability.

Understanding Survival Analysis for HR

Survival analysis is a family of statistical methods originally developed for medical research — measuring time until an event occurs (disease progression, treatment response). In HR, the "event" is voluntary termination, and the "time" is days from hire date.

Why Not Just Count Who Left?

Simple attrition rate calculations (number who left / total hired) throw away timing information. Knowing that 12% of new hires left in the first 90 days is useful. Knowing that 7% left in the first 30 days and another 5% left between days 60-90 is far more actionable because it reveals when the risk is highest and suggests different root causes for different windows.

Censored Data

Survival analysis handles a problem that standard models struggle with: censored observations. If an employee was hired 45 days ago and is still employed, they have not yet had the opportunity to leave at day 90. You cannot count them as "survived" because their clock is still running. Survival analysis accounts for these incomplete observations naturally, giving you unbiased estimates even with staggered hire dates.

Step 1: Build Kaplan-Meier Survival Curves

The Kaplan-Meier estimator calculates the probability of surviving (remaining employed) past each time point, accounting for censoring.

Prepare Your Data

For each new hire in your analysis window (the past 12-24 months), record:

Hire date
Termination date (if applicable within 90 days)
Event indicator: 1 if the employee voluntarily resigned within 90 days, 0 if still employed or passed 90 days
Tenure in days: Days between hire date and termination (or 90 if censored)
Cohort variables: Department, role level, hiring source, recruiter, manager, location

Fit the Overall Curve

from lifelines import KaplanMeierFitter

kmf = KaplanMeierFitter()
kmf.fit(
    durations=df['tenure_days'],
    event_observed=df['event'],
    label='All New Hires'
)
kmf.plot_survival_function()
plt.xlabel('Days Since Hire')
plt.ylabel('Survival Probability')
plt.title('New Hire Survival: First 90 Days')
plt.show()

Reading the Curve

A steep early drop indicates a critical vulnerability window. If the curve drops sharply between days 5-15, your first-week experience is failing. A gradual decline through day 90 suggests a slower disillusionment pattern. A flat section followed by a late drop (days 60-90) may indicate that employees are waiting to receive a paycheck cycle or complete a benefits vesting period before leaving.

Stratified Curves

The real insight comes from comparing survival curves across groups:

for department in df['department'].unique():
    mask = df['department'] == department
    kmf.fit(df[mask]['tenure_days'], df[mask]['event'], label=department)
    kmf.plot_survival_function()

plt.title('Survival by Department')
plt.show()

If the Engineering survival curve stays flat at 95% while Sales drops to 70% by day 60, you have a department-specific onboarding problem, not an organization-wide one. This prevents you from implementing blanket solutions when targeted fixes are needed.

Log-Rank Test

To determine whether differences between survival curves are statistically significant (not just visual noise), apply the log-rank test:

from lifelines.statistics import logrank_test

group_a = df[df['department'] == 'Engineering']
group_b = df[df['department'] == 'Sales']

result = logrank_test(
    group_a['tenure_days'], group_b['tenure_days'],
    event_observed_A=group_a['event'], event_observed_B=group_b['event']
)
print(f'Test statistic: {result.test_statistic:.3f}')
print(f'P-value: {result.p_value:.4f}')

A p-value below 0.05 confirms the survival difference between groups is real and warrants investigation.

Step 2: Identify Risk Factors with Logistic Regression

Survival curves tell you when and where infant attrition happens. Logistic regression tells you why. By modeling the binary outcome (left within 90 days vs. stayed) against onboarding and hire-context variables, you identify which factors most strongly predict early departure.

Feature Selection for Infant Attrition

The features that predict infant attrition differ from those in general attrition models. Focus on:

Recruitment context: Hiring source (referral vs. job board vs. agency), time-to-fill, number of interview rounds, offer-to-acceptance gap
Role alignment: Whether the hire had prior experience in the same function, job title match to prior role, seniority shift (promoted in vs. lateral)
Onboarding experience: Buddy assigned (yes/no), manager met in first week, structured onboarding program completed, IT setup ready on day one
Manager factors: Manager's tenure, manager's team size, prior infant attrition rate for this manager
Compensation context: Position within pay range (compa-ratio at hire), sign-on bonus (yes/no)

Build and Interpret the Model

from sklearn.linear_model import LogisticRegression
import numpy as np

model = LogisticRegression(class_weight='balanced', max_iter=1000)
model.fit(X_train, y_train)

# Odds ratios
odds_ratios = np.exp(model.coef_[0])
for feature, odds in zip(X.columns, odds_ratios):
    print(f'{feature}: {odds:.2f}')

Interpreting Key Findings

Typical findings from infant attrition models include:

No buddy assigned: Odds ratio of 2.8 — new hires without a peer buddy are 2.8 times more likely to leave within 90 days.
Manager did not meet in first week: Odds ratio of 2.1 — delayed manager connection strongly predicts early departure.
Job board hire (vs. referral): Odds ratio of 1.7 — referral hires have better expectation alignment and social anchoring from day one.
IT setup not ready on day one: Odds ratio of 1.9 — starting a new job unable to do the job signals organizational dysfunction.
High compa-ratio at hire: Odds ratio of 0.5 — competitive initial compensation significantly reduces early departure risk.

These odds ratios give you a prioritized list of where to invest in onboarding improvements.

Step 3: Build an Early Intervention Program

Risk Scoring New Hires

Apply your logistic regression model to every incoming new hire to generate a risk score before they even start. A hire with no buddy assigned, a job-board source, and a manager with a history of early departures might score in the top risk quartile — flagging them for enhanced onboarding support.

Intervention by Risk Window

Based on your Kaplan-Meier curves, design interventions timed to the highest-risk periods:

Days 1-7 (First impression window):

Ensure IT, equipment, and workspace are fully ready before arrival
Manager conducts a welcome meeting covering role expectations, 30-60-90 day milestones, and communication preferences
Assign a peer buddy from the same team

Days 8-30 (Integration window):

Weekly 15-minute check-ins between manager and new hire
Buddy lunch or coffee at least twice
First meaningful work assignment with clear success criteria
HR pulse check via a short onboarding survey at day 14 and day 30

Days 31-90 (Confirmation window):

30-day review: Are expectations being met on both sides?
Connect to learning pathways relevant to skill gaps surfaced in the first month
Introduce to cross-functional stakeholders and broader team context
60-day pulse survey, followed by a 90-day formal review

Manager Accountability

If your model identifies manager behavior as a top predictor of infant attrition, build manager-level metrics into your people dashboard. Track infant attrition rate by manager, flag managers with rates above the organizational average, and provide targeted coaching or management development programs for those who need it.

Step 4: Monitor and Refine

Track Leading Indicators

Do not wait for the 90-day resignation to measure success. Monitor leading indicators that predict whether interventions are working:

Day-14 survey scores: New hires reporting "role matches expectations" and "I feel welcomed by my team"
Buddy meeting completion rate: Are buddies actually meeting with new hires, or is the program nominal?
Manager check-in compliance: Are scheduled check-ins happening?
Time to first productive contribution: Is onboarding accelerating productivity, or are new hires still lost at day 30?

Quarterly Model Refresh

Retrain your logistic regression model quarterly as new hire and attrition data accumulates. As you improve onboarding, the risk factors will shift — buddy assignment may drop off as a predictor because every hire now gets one, and new factors (like onboarding content relevance) may emerge.

Benchmark Against Industry

Infant attrition rates vary by industry: 20-30% in hospitality and retail, 10-15% in professional services, 5-10% in technology. Know your benchmark and set improvement targets relative to your industry, not an arbitrary number.

How PeoplePilot Brings This Together

Conducting survival analysis and logistic regression on infant attrition requires clean data flowing from your ATS (hiring source, time-to-fill, recruiter) through onboarding (buddy assignment, training completion) to HRIS (termination dates, tenure).

PeoplePilot Analytics connects these data sources into a unified pipeline, so you can run Kaplan-Meier curves and risk models without spending weeks on data wrangling. PeoplePilot Surveys feeds real-time onboarding pulse data directly into your model, making risk scores dynamic rather than static. And PeoplePilot Learning ensures that the skill-development and role-clarity interventions your model recommends are delivered consistently to every new hire.

The goal is straightforward: every employee you hire should still be contributing at day 91 and beyond. The data tells you what stands in the way. The interventions remove the obstacles.

Frequently Asked Questions

How many new hires do I need in my dataset before survival analysis is reliable?

Aim for at least 200 new hires with a minimum of 30-40 who experienced infant attrition (the event). Survival analysis handles small event counts better than many other methods because of how it processes censored data, but below 30 events, your confidence intervals will be wide and group comparisons unreliable. If your organization hires fewer than 200 people annually, aggregate data over two to three years.

Can I combine survival analysis and logistic regression in the same model?

Yes. Cox proportional hazards regression is a survival analysis method that functions like logistic regression — it estimates the effect of covariates on the hazard (risk) of the event occurring at each time point. It gives you both timing and driver information in a single model. Use Kaplan-Meier for visualization and communication, Cox regression for multivariate driver analysis, and logistic regression for simple risk scoring.

What if my infant attrition is concentrated in one department — should I still build an organization-wide model?

Build both. An organization-wide model reveals general onboarding risk factors. A department-specific model captures unique dynamics — perhaps the Sales onboarding program is too long, or Engineering managers are too busy to meet new hires. The department model will have fewer data points, so use it for directional insights and validate with qualitative follow-up rather than relying solely on statistical significance.

How do I distinguish between infant attrition caused by bad hiring versus bad onboarding?

Look at the timing and drivers together. If departures cluster in the first two weeks and correlate with "role mismatch" survey responses and job-board sourcing, the problem is upstream in your hiring process — better job descriptions, realistic previews, and structured interviews would help. If departures cluster at days 30-60 and correlate with manager unavailability and lack of training, the problem is onboarding execution. Often, it is a combination of both, and your model will reveal the relative contribution of each.

#analytics #attrition #retention #data-driven

Continue Reading

View All

September 17, 2025 · 10 min read

Applying Time Series Analysis to Forecast Attrition: Predict Turnover Before It Happens

Use ARIMA and seasonal decomposition to forecast employee attrition trends. A practical guide to data prep, model selection, and workforce planning.

September 17, 2025 · 9 min read

Logistic Regression for Building an Attrition Risk Model: A Practical HR Guide

Learn how to build an attrition risk model using logistic regression. Step-by-step guide covering feature selection, odds ratios, and HR interventions.

September 10, 2025 · 8 min read

Using People Analytics to Identify and Retain High-Potential Employees: A Data-Driven Framework

Build a data-driven HiPo identification model using performance-potential matrices, retention risk scoring, and targeted development programs.