A major North American telecom sought to identify factors driving customer churn. We applied social network analysis over several billion call records. We found that customers with a cancellation in their frequent calling network churned at twice the monthly rate.
1. The Social Effect: Predicting Telecom Customer Churn with Call Data Michael E. Driscoll, Ph.D. Principal, Founder February 16, 2010
2. Social Network Analysis with Telecom Data The following slides describe an initial project analyzing a N. American telecom’s call data on a dedicated analytics platform: We describe the analysis of a slice of a telecom’s call history data from several million customers in the several major North American markets. We demonstrate the performance gain achieved by having a dedicated analytics platform (computation of millions of relationships from tens of billions of events, spanning tens of TB of data, in less than one hour) We show that social network influence is a powerful predictor of customer churn: subscribers who experience a Telecom cancellation in their frequent calling network are 2x more likely to cancel themselves. We highlight one outbreak of cancellations in a metropolitan call network from May-June 2009.
4. Key Data: Call Detail Records A slice of several billion call detail records (CDRs) from several million subscribers drawn from three major North American markets, for May-August 2009.
7. Social Network Analysis Network is Generated from Call History Data Call history logs were pulled from the Greenplum warehouse. These were parsed and outgoing numbers were associated with subscription ids. The result is a row of data for every caller-callee connection meeting a low threshold (> 1 call and > 60 s talk-time per month). The majority are between Telecom customers and other carriers (or land-lines).
8. Our Analytics Workflow Three steps: 1. Pull from DB, 2. Analyze in R, 3. Visualize in R + Graphviz
9. Our Tool: The R Programming Language Download R at http://www.r-project.org/
10. Getting Call Data Into R for Analysis - from Files > Calls <- read.csv(“CallHistory.csv”,header=TRUE) from Databases > con <- dbConnect(driver,user,password,host,dbname) > Calls <- dbSendQuery(con, “SELECT * FROM call_history”) from the Web > con <- url('http://Telco.com/dump/CallHistory.csv') > Calls <- read.csv(con, header=TRUE) from previous R objects > load(‘CallHistory.RData’)
11. Social Network Analysis Millions of edges analyzed in minutes Full analysis of a first-order outgoing call network for our slice (~ millions of customers, three months of call history) took less than one hour. This could be further improved with further parallelization of R code (currently SQL queries run parallel on Greenplum, R is run on master node).
12. Results: People Have Small Call Networks (Three) The median size of a caller’s network is three, while the mean size is five.
13. Results: Canceling Customers are 7x More Likely to be Linked Types of Callers (Nodes) active (A) cancelled (C) Types of Connections (Edges) A-A A-C or C-A C-C C-C edges are 7x more likely in call networks than what is expected by chance
14. Results: A Customer With a Canceller in Their Network Churns at Twice the Rate Types of Connections (Edges) May C-A June C-C In essence, we are asking whether being connected to another canceller has any effect on one’s rate of cancellation. It turns out that it does. And if we only look at voluntary port-outs, we see that customers churn at 3x the rate.
15. From Data to Insights to Actions If we had known two customers’ calling networks… Could we have prevented four more from leaving?
16. The Emerging Analytics Stack Actions Apps (Email, Ad Campaigns) Analytics (R, SPSS, SAS, SAP) Insights Big Data (HDFS or Parallel RDBMS) Data
17. References Enhancing Customer Knowledge at Optus, Teradata Case-Study (September 2009). IBM’s Analytics Tapped to Predict, Prevent Churn. Telephony Online (April 2009). The Elements of Statistical Learning, Hastie, Tibshirani, Friedman. Springer. (February 2009). Study Shows Obesity Can Be Contagious, Gina Kolata, The New York Times (July 25, 2007) [great example of homophily] Contact Michael E. Driscoll, Ph.D. med@dataspora.com Follow @datasporaon Twitter
Notas do Editor
Most telcos lose 1-2% of their customers every month.It’s 7x more expensive to acquire a customer, than to retain.
Birds of a feather flock together; cancellers clump together, so do active users. Like vinegar and water, we see enrichment for “like-like” edges in our network, and dilution of “dissimilar” edges (the A-C or C-A). Upshot: people cancellationQuestion: is this all an artifact of family plans – where a bunch of subscribers quits together? In part yes, but the trends hold up even when we do a temporal analysis.
Key take-home point here is that this analysis , looking at the May to June transition, removes
The stack is loosely coupled: right tool for the right job. The need for a dedicated analytics RDBMS