Analyze first-90-day attrition with Kaplan-Meier curves and logistic regression. Identify onboarding risk factors and build early intervention programs.
A new hire resigns at day 47. Another ghosts at day 23. A third makes it to day 82, apologizes profusely, and explains the role was "not what was described." Each departure represents a total loss — recruiting costs unrecovered, onboarding hours wasted, team morale dented, and the position back at square one.
First-90-day attrition, often called infant attrition, is one of the most expensive and preventable talent problems in any organization. These employees never reached full productivity. The investment in them returned nothing. And the causes are almost always rooted in fixable onboarding and expectation-setting failures.
The challenge is that infant attrition does not follow the same patterns as general attrition. An employee who leaves at day 30 is driven by different factors than one who leaves after three years. Standard attrition models miss this nuance. Survival analysis and logistic regression, applied specifically to the first 90 days, give you the tools to understand when employees are most at risk of leaving early, what drives those departures, and where to intervene.
General attrition models are built on years of tenure data and capture long-term drivers like career stagnation, compensation drift, and manager burnout. Infant attrition is driven by immediate experience factors: role clarity, onboarding quality, manager accessibility, peer integration, and the gap between what was promised during recruitment and what was delivered on day one.
The Society for Human Resource Management estimates that replacing an employee costs 50-200% of their annual salary. For infant attrition, the denominator changes — you spent the full replacement cost but received near-zero productive output. When infant attrition rates climb above 10-15%, the compounding cost of repeated recruiting and onboarding cycles for the same position becomes a material budget drain.
High infant attrition is a leading indicator of systemic problems in your talent acquisition and onboarding pipeline. It tells you that something is broken between the offer letter and the 90-day mark — and fixing it has upstream benefits for employer brand, recruiter credibility, and team stability.
Survival analysis is a family of statistical methods originally developed for medical research — measuring time until an event occurs (disease progression, treatment response). In HR, the "event" is voluntary termination, and the "time" is days from hire date.
Simple attrition rate calculations (number who left / total hired) throw away timing information. Knowing that 12% of new hires left in the first 90 days is useful. Knowing that 7% left in the first 30 days and another 5% left between days 60-90 is far more actionable because it reveals when the risk is highest and suggests different root causes for different windows.
Survival analysis handles a problem that standard models struggle with: censored observations. If an employee was hired 45 days ago and is still employed, they have not yet had the opportunity to leave at day 90. You cannot count them as "survived" because their clock is still running. Survival analysis accounts for these incomplete observations naturally, giving you unbiased estimates even with staggered hire dates.
The Kaplan-Meier estimator calculates the probability of surviving (remaining employed) past each time point, accounting for censoring.
For each new hire in your analysis window (the past 12-24 months), record:
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
kmf.fit(
durations=df['tenure_days'],
event_observed=df['event'],
label='All New Hires'
)
kmf.plot_survival_function()
plt.xlabel('Days Since Hire')
plt.ylabel('Survival Probability')
plt.title('New Hire Survival: First 90 Days')
plt.show()
A steep early drop indicates a critical vulnerability window. If the curve drops sharply between days 5-15, your first-week experience is failing. A gradual decline through day 90 suggests a slower disillusionment pattern. A flat section followed by a late drop (days 60-90) may indicate that employees are waiting to receive a paycheck cycle or complete a benefits vesting period before leaving.
The real insight comes from comparing survival curves across groups:
for department in df['department'].unique():
mask = df['department'] == department
kmf.fit(df[mask]['tenure_days'], df[mask]['event'], label=department)
kmf.plot_survival_function()
plt.title('Survival by Department')
plt.show()
If the Engineering survival curve stays flat at 95% while Sales drops to 70% by day 60, you have a department-specific onboarding problem, not an organization-wide one. This prevents you from implementing blanket solutions when targeted fixes are needed.
To determine whether differences between survival curves are statistically significant (not just visual noise), apply the log-rank test:
from lifelines.statistics import logrank_test
group_a = df[df['department'] == 'Engineering']
group_b = df[df['department'] == 'Sales']
result = logrank_test(
group_a['tenure_days'], group_b['tenure_days'],
event_observed_A=group_a['event'], event_observed_B=group_b['event']
)
print(f'Test statistic: {result.test_statistic:.3f}')
print(f'P-value: {result.p_value:.4f}')
A p-value below 0.05 confirms the survival difference between groups is real and warrants investigation.
Survival curves tell you when and where infant attrition happens. Logistic regression tells you why. By modeling the binary outcome (left within 90 days vs. stayed) against onboarding and hire-context variables, you identify which factors most strongly predict early departure.
The features that predict infant attrition differ from those in general attrition models. Focus on:
from sklearn.linear_model import LogisticRegression
import numpy as np
model = LogisticRegression(class_weight='balanced', max_iter=1000)
model.fit(X_train, y_train)
# Odds ratios
odds_ratios = np.exp(model.coef_[0])
for feature, odds in zip(X.columns, odds_ratios):
print(f'{feature}: {odds:.2f}')
Typical findings from infant attrition models include:
These odds ratios give you a prioritized list of where to invest in onboarding improvements.
Apply your logistic regression model to every incoming new hire to generate a risk score before they even start. A hire with no buddy assigned, a job-board source, and a manager with a history of early departures might score in the top risk quartile — flagging them for enhanced onboarding support.
Based on your Kaplan-Meier curves, design interventions timed to the highest-risk periods:
Days 1-7 (First impression window):
Days 8-30 (Integration window):
Days 31-90 (Confirmation window):
If your model identifies manager behavior as a top predictor of infant attrition, build manager-level metrics into your people dashboard. Track infant attrition rate by manager, flag managers with rates above the organizational average, and provide targeted coaching or management development programs for those who need it.
Do not wait for the 90-day resignation to measure success. Monitor leading indicators that predict whether interventions are working:
Retrain your logistic regression model quarterly as new hire and attrition data accumulates. As you improve onboarding, the risk factors will shift — buddy assignment may drop off as a predictor because every hire now gets one, and new factors (like onboarding content relevance) may emerge.
Infant attrition rates vary by industry: 20-30% in hospitality and retail, 10-15% in professional services, 5-10% in technology. Know your benchmark and set improvement targets relative to your industry, not an arbitrary number.
Conducting survival analysis and logistic regression on infant attrition requires clean data flowing from your ATS (hiring source, time-to-fill, recruiter) through onboarding (buddy assignment, training completion) to HRIS (termination dates, tenure).
PeoplePilot Analytics connects these data sources into a unified pipeline, so you can run Kaplan-Meier curves and risk models without spending weeks on data wrangling. PeoplePilot Surveys feeds real-time onboarding pulse data directly into your model, making risk scores dynamic rather than static. And PeoplePilot Learning ensures that the skill-development and role-clarity interventions your model recommends are delivered consistently to every new hire.
The goal is straightforward: every employee you hire should still be contributing at day 91 and beyond. The data tells you what stands in the way. The interventions remove the obstacles.
Aim for at least 200 new hires with a minimum of 30-40 who experienced infant attrition (the event). Survival analysis handles small event counts better than many other methods because of how it processes censored data, but below 30 events, your confidence intervals will be wide and group comparisons unreliable. If your organization hires fewer than 200 people annually, aggregate data over two to three years.
Yes. Cox proportional hazards regression is a survival analysis method that functions like logistic regression — it estimates the effect of covariates on the hazard (risk) of the event occurring at each time point. It gives you both timing and driver information in a single model. Use Kaplan-Meier for visualization and communication, Cox regression for multivariate driver analysis, and logistic regression for simple risk scoring.
Build both. An organization-wide model reveals general onboarding risk factors. A department-specific model captures unique dynamics — perhaps the Sales onboarding program is too long, or Engineering managers are too busy to meet new hires. The department model will have fewer data points, so use it for directional insights and validate with qualitative follow-up rather than relying solely on statistical significance.
Look at the timing and drivers together. If departures cluster in the first two weeks and correlate with "role mismatch" survey responses and job-board sourcing, the problem is upstream in your hiring process — better job descriptions, realistic previews, and structured interviews would help. If departures cluster at days 30-60 and correlate with manager unavailability and lack of training, the problem is onboarding execution. Often, it is a combination of both, and your model will reveal the relative contribution of each.