FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 |

«© Data Miners, June 2004 Proprietary Material 1 Survival Data Mining for Customer Insight When I am trying to understand a company’s customers ...»

-- [ Page 1 ] --

Survival Data Mining for Customer


Prepared by:

Gordon S. Linoff

Data Miners


June 2004


© Data Miners, June 2004 Proprietary Material

1 Survival Data Mining for Customer Insight

When I am trying to understand a company’s customers using data collected in its databases, my

first inclination is to apply survival data mining. Over the years, I have found that this approach provides rapid feedback about the customers and their behaviors, while at the same time providing a solid basis for quantifying customer value and measuring customer loyalty. This is customer insight in practice.

What is survival data mining? It is the application of survival analysis – a traditional statistical technique – to data mining problems concerning customers. The application to the business world changes the flavor of the statistical techniques, which were honed on the analysis of small numbers of patients in medical studies. No longer is the worry about extracting the last iota of information from a handful of customers. The issue is how to make sense of millions or ten millions of database records describing current and past customers and their business interactions.

This article presents survival data mining in practice. It starts with a methodology for subscription-based businesses and introduces hazards and survival curves for understanding churn. It then explains how these results can be used to quantify results, finally, showing how the same techniques can be applied to general time-to-event problems in business. A technical sidebar shows how to do some of the calculations in a relational database. Readers interested in more information are encouraged to read about survival analysis in the Second Edition of our book, “Data Mining Techniques for Marketing, Sales, and Customer Support”.

1.1 Hazard Probability In the medical world, doctors often want to understand which treatments help patients survive longer – and which have no effect at all (or worse). In the business world, the equivalent concern is when customers stop. This is particularly true of businesses that have a well-defined beginning and end to the customer relationship – subscription-based relationships. These relationships are found in a wide range of industries, such as insurance, communication, cable televisions, newspaper/magazine subscription, banking, and electricity providers in competitive markets.

The basis of survival data mining is the hazard probability, the chance that someone who has survived for a certain length of time (called the customer tenure) is going to stop, cancel, or expire before the next unit of time. This definition assumes that time is discrete, and such discrete time intervals – whether days, weeks, or months – often fits business needs. By contrast, traditional survival analysis in statistics usually assumes that time is continuous.

© Data Miners, June 2004 Proprietary Material Given the right data, calculating the hazard probability for a given tenure t is simple. The probability is the number who succumbed to the risk divided by the population at risk at that tenure. That is, the numerator is the number of customers who stopped with exactly tenure t and the denominator is everyone who had tenures greater than or equal to t. Customers with shorter tenures are not part of the risk group. The sidebar explains how to calculate hazards directly using a relational database.

A picture paints a thousand words. Figure 1 charts hazard probabilities for customers in a typical subscription business. The horizontal axis is the tenure of customers measured in days;

the vertical axis is the probability that customers stop at a particular tenure point.

–  –  –

The hazard chart is an X-ray into the customer lifecycle, because it highlights different important events. The very first hazard probability at time zero is about 4%; this is due to customers not starting and is often caused by poor customer information being gathered at the point of sale or perhaps by buyer’s remorse. Around 60 days, there is a very strong peak in the hazard probability. This corresponds to those customers who start but never pay. The company moves customers through various dunning levels to inspire payment. However, at some point, the company must force churn because of non-payment. Changes in this policy, such as a reduction in the period of time for cutting off non-paying customers, would be apparent in the hazard probabilities.

Around 90 days, there is another significant spike in the hazards. This spike actually has nothing to do with non-payment. It is due to the end of the initial promotion. Customers who

–  –  –

After these two initial peaks, the hazard probability gradually declines but with a jagged characteristic. The jaggedness is actually due to the one-month billing cycle that most customers are on. Customers are more likely to stop at the end of a billing cycle. One reason is that when customers call in to stop, the stop date is set to the end of the billing cycle unless the customer requests a specific date.

The gradual decline in hazards is also interesting. In fact, it says something quite important about customer loyalty: The longer customers stay with the company, the less likely they are to leave. The long-term decline in hazards is as good a measure of loyalty that I know.

1.2 From Hazards to Survival If hazard curves provide an X-ray into the customer lifecycle, survival curves provide a more holistic picture. The survival at time t is simply the likelihood that a customer will survive to that point in time. This is calculated directly from the hazards, by taking the cumulative probability that someone does not stop before time t – that is, by multiplying one minus the hazards together for all values less than t.

Figure 2 shows three examples of survival curves. Notice that all three curves start at 100% and gradually decline, with the survival value always between 0 and 100%.

100% 90%

–  –  –

© Data Miners, June 2004 Proprietary Material The middle curve corresponds to the hazards in the earlier chart. Remember that the hazards had a spike around two-and-a-half months. On the survival curve, this spike is instead a steep decline, indicating that customers do not survive beyond this point. So, this curve is saying that about 55% percent of customers survive beyond the non-payment period. Once this period has passed, though, the survival curve flattens out, corresponding to the decline in hazards. The smaller the hazards the flatter the survival curve; the larger the hazards, the steeper the survival.

The other two curves in the chart help explain why. The top curve is for customers who started as credit card paying customers. These customers provide a credit card, which is charged automatically every month. As the survival curve shows, there is no dip at two and a half months. These are paying customers. Almost 90% of them are still active after the initial nonpayment period, and their survival remains relatively high. These are good customers who do not disappear quickly.

The lower curve is for customers who are billed and pay by check rather than automatically paying by credit card. The survival curve for these pay-by-mail customers shows much sharper drop. By looking at the stop reasons for these customers, it is apparent that this particular drop is due to non-payment.

The middle curve is the “average” of the credit card and pay-by-mail customers. What is interesting is that the non-credit card customers are driving the entire drop for initial nonpayment. The survival curve graphically shows this common business wisdom.

1.3 Quantifying Survival Survival does more than show the difference between groups of customers. It makes it possible to quantify the difference between groups. The chart in Figure 3 illustrates one common measure, the customer half-life (or median customer lifetime). This is the tenure where exactly half the original customers would still be expected to be active. The calculation is quite easy.

The vertical axis has the survival values. Follow the 50% line over until it hits the survival curve. This is the tenure where half the customers survive.

–  –  –

Figure 3 shows the median tenure by payment type for the groups shown earlier. For credit card payers, the median is over 240 days; for others, barely a quarter that. The customer half-life provides a good way to compare different groups of customers.

One drawback to the customer half-life is that the survival curve may not cross 50%. This means that the customer half-life is not known, because the time window is not large enough.

Extrapolating survival beyond the time window is dangerous because what happens to customers is not known. Customers may stay around for another hundred years or they might all stop the next day.

The customer half-life is a good comparison for different groups of customers. However, this value only tells us about one customer – the customer whose tenure is exactly in the middle. A more useful number is the average customer lifetime, which can be dropped directly into customer value calculations. If a subscription is worth $500 per year in revenue and the average customer lifetime is 2.5 years, then the customer is worth $1,250 (assuming no discounting of future revenue).

Calculating the average tenure is conceptually quite easy. It turns out that the average tenure for a given period of time is the area under the survival curve. For instance, the average tenure in the first year after acquisition for customers who stop half way through the year is half a year.

On the other hand, customers who survive longer than one year only get one year, because the calculation is only looking at the first year tenure. The average for all customers is the area under the survival curve up to 365 days.

© Data Miners, June 2004 Proprietary Material

1.4 Competing Risks Another critical idea in survival analysis is that of competing risks. When studying the survival rates for cancer victims, what happens when someone enrolled in the study dies in a car accident? Or moves to a foreign country? In medical terminology, these patients are “lost to follow-up”. The same thing can happen with customers.

A clear example of competing risks is the distinction between voluntary and involuntary churn.

Some customers are forced to leave (typically due to non-payment) whereas others leave voluntarily. When looking at churn, sometimes models are built leaving out one or the other group of customers. However, this results in a biased model – one of the issues when developing payment risk models separate from voluntary churn models.

With competing risks, the approach to the problem is a bit different. Customers who voluntarily stopped at a particular tenure – say one year – did not stop either voluntarily or involuntarily before then. This is useful information for understanding both types of churn.

Calculating competing risks follows the same pattern described earlier with one difference. For each tenure, there is a separate probability for each risk; once a customer has succumbed to one risk (say voluntary churn), the customer is no longer included in the population at risk for any of the risk groups. Technically, the customer is censored for other risks.

Figure 4 shows competing risks for voluntary and involuntary churn for the credit card paying and non-credit card customers shown earlier. The top line shows clearly that credit card paying customers are at minimal risk for involuntary churn.

–  –  –

Although the canonical example is voluntary versus involuntary churn, competing risks is useful in other situations. For instance, some customers may “churn” because they migrate to a higher value product. A wireless customer upgrading to more advanced technology may count as “churn” on the old technology. A cable subscriber who switches to digital cable may count as “churn” on her previous account. This suggests including migration as a competing risk for understanding these customers.

1.5 Other Time to Event Problems Survival data mining is not only applicable to churn. It is also applicable to almost any time-toevent problem. Survival data mining answers the question of when the next event will occur, rather than whether the event will occur in a certain period. There are many opportunities to apply this technique.

When will a lapsed customer return? This is quite similar to the churn problem with one difference. Now the “start” is when a customer stops – because this is the start of the lapsed period. The “end” is when (if ever) a customer restarts – because this is the end of the lapsed period. The basic ideas of survival data mining can then be applied to this situation. One challenge here is matching new customers to lapsed customers. Ssometimes this information is readily available (such as when the customer retains an identification number such as a customer number or a telephone number). Sometimes householding algorithms infer this information using names and addresses.

When will a customer next make a purchase? Understanding customers over time is a typical

–  –  –

How long will an upgrade last? When customers upgrade to a new service, sometimes they eventually downgrade again. This is a competing risks problem, because customers might downgrade or stop during the period when they have upgraded. Upgrade survival curves are very useful for quantifying the value of the upgrade effort.

As these examples show, survival data mining is a very valuable tool for understanding customers and for quantifying customer relationships. The basic techniques, borrowed from the statistics of medical studies, have proven their worth in the business world, far beyond the small medical studies where they first appeared.

SIDEBAR: Calculating Hazards in a Database

Assume that a database contains one row for each customer with the following information:

–  –  –

• Other interesting variables such as stop reason, channel, and so on How are these used to calculate hazards? Fortunately, SQL does most of the calculations and Oracle extensions make the full calculation possible.

Pages:   || 2 |

Similar works:

«Legal Notices Corporate Headquarters 7 Laurier Avenue East Montreal, Quebec Canada H2T 1E4 Tel: (514) 278-8666 Fax: (514) 278-2666 www.toonboom.com Disclaimer The content of this manual is covered by a specific limited warranty and exclusions and limit of liability under the applicable License Agreement as supplemented by the special terms and conditions for Adobe®Flash® File Format (SWF). Please refer to the License Agreement and to those special terms and conditions for details. The content...»

«SCHLUSSFOLGERUNGEN DES VORSITZES 1. Der Tagung des Europäischen Rates ging ein Exposé des Präsidenten des Europäischen Parlaments, Josep Borrell, voraus, an das sich ein Gedankenaustausch anschloss.I. EUROPA HÖRT ZU 2. Die Staatsund Regierungschefs haben im Juni 2005 zu einer Zeit der Reflexion aufgerufen, in der in allen Mitgliedstaaten eine ausführliche Diskussion geführt werden sollte, an der die Bürger, die Zivilgesellschaft, die Sozialpartner, die nationalen Parlamente und die...»

«Analys och effektivisering av materialhanteringen på PPS AB Författare: David Karlsson Handledare: Everth Larsson, Lunds Tekniska Högskola Johan Svensson, PPS AB Institutionen för teknisk ekonomi och logistik Teknisk Logistik Lunds Tekniska Högskola Förord Denna rapport har tillkommit som ett led i min civilingenjörsutbildning. Eftersom jag under min studietid blev intresserad av ämnet logistik samt att jag under två år har jobbat inom detta område var ämnesvalet inte så svårt....»

«Unknown Book 6907197 A various rise respect will use touch to customer that can terminate good to allow all premium ART etc industry to settle the most attendance high to invest their reasonable chances. Also, the store is is instead use the TV. You do to be the GROCERIES careful time of your big suppliers and contact. When they need a things, be what he are educated you for a decision. Planned like top small case collaterals, your account more gets in good company resources with a bank...»

«NEWS IN BRIEF DISSERTATION ON HIDDEN TREASURES IN ESTONIAN TALE TRADITION Mare Kalda’s Doctoral thesis, Rahvajutud peidetud varandustest: tegude saamine lugudeks [Hidden Treasures in Estonian Tale Tradition: From Deeds to Folk Legends] (2011), provides an extremely exciting and diverse insight into Estonian treasure tradition, encompassing relevant archival texts, media coverings of different eras, associations of treasure tales with literary language, personal experience and landscape. The...»

«CAPACITY MARKET Consultation on further reforms to the Capacity Market 16D/027 1 March 2016 Department of Energy and Climate Change 3 Whitehall Place London Department of Energy and Climate Change 3 Whitehall Place London SW1A 2AW Telephone: 0300 068 4000 Website: www.decc.gov.uk © Crown copyright 2014 Copyright in the typographical arrangement and design rests with the Crown. This publication (excluding logos) may be re-used free of charge in any format or medium provided that it is re-used...»

«BERICHT DES BUNDES 2006/2008 AN DIE UMWELTMINISTERKONFERENZ | Reihe Umweltpolitik | IMPRESSUM Herausgeber: Bundesministerium für Umwelt, Naturschutz und Reaktorsicherheit (BMU) Referat Öffentlichkeitsarbeit Ÿ Berlin E-Mail: service@bmu.bund.de Ÿ Internet: www.bmu.de Redaktion: Rolf Bräuer, Referat ZG II 5: Zusammenarbeit mit den Ländern Bildrechte Titelseite: Getty Images (M. Dunning); Enercon / Block Design; Visum (K. Sawabe); zefa; Getty Images (C. Coleman) Stand: November 2008 1....»

«Truth For These Times 14. THE CHANGE OF THE SABBATH In our previous studies on the subject of the Sabbath, we discovered that the only day called the Lord’s Day in Scripture is the Sabbath. In Isaiah 58:13 God calls it “My Holy Day”. Nowhere in the Bible is the sanctity withdrawn from the Sabbath and placed on any other day. The Sabbath is a sign between God and His people that they belong to Him. It was made for the good and happiness of mankind. Mark 2:27. Christ and His apostles kept...»

«Marburg Journal of Religion: Volume 2, No. 1 (May 1997) PAPER PREPARED FOR THE 95th ANNUAL MEETING OF THE AMERICAN ANTHROPOLOGICAL ASSOCIATION, SAN FRANCISCO, NOVEMBER 21, 1996 E. B. Tylor and the Anthropology of Religion Benson Saler Brandeis University Waltham, Massachusetts eMail: saler@binah.cc.brandeis.edu In light of the retrospective theme of this Annual Meeting, it is fitting that we pay homage to Edward Burnett Tylor (1832-1917). His appointment as Reader in Anthropology at Oxford in...»

«Supplier Innovation Strategy: Transactional Hazards and Innovation in the Automotive Supply Chain Jennifer Kuan Stanford University Daniel Snow Brigham Young University Susan Helper Case Western Reserve University November 17, 2014 Abstract: Over the last few decades, innovation has doubled automobile performance at a time when outsourcing has increased. But outsourcing is subject to well-known contracting hazards that would also afflict outsourcing for innovation. In this paper, we examine how...»

«The True Succession Founding of the Lahore Ahmadiyya Movement Zahid Aziz Dedication This book is dedicated to all the pioneers who, with toil and sacrifice, helped to build and sustain the world-wide Ahmadiyya Anjuman Isha‘at Islam Lahore on foundations so firm that it reached its centenary. “Hundreds of worthy people have spent their lives on constructing this building. Those people are countless who sacrificed their daily bread or sold their possessions to spend on this construction....»

«STRANGE DAYS BY JAMES CAMERON AND JAY COCKS FROM A STORY BY JAMES CAMERON AUGUST 11, 1993 1:06 AM DEC 30, 1999 Blackness. We hear: VOICE Ready? SECOND VOICE (LENNY) Yeah. Boot it. A burst of bright white static exploding across the darkness. A high whine on the audio track gives way to street sounds and rapid breathing. AN IMAGE wavers and stabilizes: A nervous POV. We're in a car, sitting in the backseat, and we're nervous, the view swinging around, showing the street rolling by outside the...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.