the Kaplan-Meier estimator

In my last post, I wrote about censored data. This post continues the survival analysis theme by focusing on estimation of survival curves.

In survival and reliability analysis, it is useful to determine the survival curve for the population under study. This is the curve defined by the probability that a random variable indicating the study event (e.g., patient death, product failure) exceeds a given time, for all time values in the study. However, we generally don’t know the survival curve and have to rely on an estimation of it. The Kaplan-Meier estimator provides such an estimate. To demonstrate, suppose we have the following data from a clinical trial:


We can construct the following Kaplan-Meier estimate of the survival curve from this data. (I’ll show how to do this in a later post):


From this graph, we can estimate that the median time until patient death is eight days.

The Kaplan-Meier estimator works in the presence of right censored data. Suppose a different clinical study has right censored data points at days five, seven, and twenty:


With right censoring taken into account, the Kaplan-Meier estimate looks like:


Here the censored data points are shown with tick marks.

We can derive useful information by comparing curves. Consider the following clinical study comparing a placebo with drug A:


Plotting the Kaplan-Meier estimate for each treatment group yields:


Here we see that patients getting the placebo treatment generally live longer than patients receiving drug A. The placebo group’s median time until death is 17 days, compared to an eight day median time until death for patients receiving drug A.

Post Author: badassdatascience

1 thought on “the Kaplan-Meier estimator

Leave a Reply

Your email address will not be published.