Medill’s Subscriber Data Analysis: How We Reached Conclusions

The Spiegel Research Center at Northwestern University analyzed 13 terabytes of data from the Chicago Tribune, San Francisco Chronicle and Indianapolis Star for its study, which was conducted in the fall of 2018 as part of the Medill Local News Initiative.

Data scientists at Spiegel integrated audience data and subscriber information from each of the three news organizations. This allowed researchers to trace anonymized, individual behaviors of people who kept and cancelled digital subscriptions. This cross-platform data integration by Northwestern data scientists provided a unique, vivid portrait of digital news consumption habits of local subscribers.

Spiegel, which is affiliated with Northwestern’s Medill School of Journalism, Media, Integrated Marketing Communications, used established statistical methods to measure the impact of readers’ behavior on subscription retention and express it numerically. Numbers that were negative suggested a correlation with subscription retention; those that were positive suggested a reverse correlation. Numbers above 2 in either direction were considered statistically significant.

All three news organizations showed strong retention linked to a regular habit, with the Chronicle at -10.3, the Star at -8.5 and the Tribune at -5.5.

What Spiegel Research Director Ed Malthouse calls “extent” (page views per day visited) and “depth” (time spent per page view) showed a correlation with churn for the Chronicle and the Star (13.5 and 2.1 for the Chronicle and 2.6 and 3.6 for the Star). For the Tribune, the figures were -0.8 and 1.6, indicating no statistically significant correlation either for or against retention.

Malthouse explained it this way: “The numbers are Z statistics of the slopes from a logistic regression model. A value of 0 means that a predictor has no effect on churn. Positive values mean that more of the predictor is associated with a higher chance of churning. Negative values mean that as the value of the predictor increases, the chance of churning goes down (i.e., is more likely to be retained).

“When we evaluate slopes, we commonly test the null hypothesis that the slope is 0, meaning that the variable has no effect. If we use the 95 percent confidence level, which is pretty common, then a Z statistic greater than 2 or less than -2 is called ‘statistically significant from 0.’ Values greater than 2.576 or less than -2.576 are called ‘highly statistically significant.’ Values greater than 3.29 or less than -3.29 are called ‘very highly significant.’”

The analysis included data from the main Chronicle subscriber site (, not from Sept. 17, 2017, to Oct. 16, 2018. For the Tribune, the data was from Oct. 1, 2015, to July 19, 2018. For the Star, it was from June 30, 2016, to Aug. 5, 2018.

As a follow-up to this study of the Tribune, Chronicle and Star data, Spiegel is now analyzing subscriber behavior data at small and mid-sized news outlets in the Midwest.

Latest Posts