SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
486                                                                                                                               Section: Service



Data Mining in the Telecommunications
Industry
Gary M. Weiss
Fordham University, USA



INTRODUCTION                                                                       Telecommunication companies maintain data about
                                                                               the phone calls that traverse their networks in the form of
The telecommunications industry was one of the first                           call detail records, which contain descriptive informa-
to adopt data mining technology. This is most likely                           tion for each phone call. In 2001, AT&T long distance
because telecommunication companies routinely gener-                           customers generated over 300 million call detail records
ate and store enormous amounts of high-quality data,                           per day (Cortes & Pregibon, 2001) and, because call
have a very large customer base, and operate in a                              detail records are kept online for several months, this
rapidly changing and highly competitive environment.                           meant that billions of call detail records were readily
Telecommunication companies utilize data mining to                             available for data mining. Call detail data is useful for
improve their marketing efforts, identify fraud, and                           marketing and fraud detection applications.
better manage their telecommunication networks.                                    Telecommunication companies also maintain exten-
However, these companies also face a number of data                            sive customer information, such as billing information,
mining challenges due to the enormous size of their                            as well as information obtained from outside parties,
data sets, the sequential and temporal aspects of their                        such as credit score information. This information
data, and the need to predict very rare events—such as                         can be quite useful and often is combined with tele-
customer fraud and network failures—in real-time.                              communication-specific data to improve the results
    The popularity of data mining in the telecommunica-                        of data mining. For example, while call detail data
tions industry can be viewed as an extension of the use                        can be used to identify suspicious calling patterns, a
of expert systems in the telecommunications industry                           customer’s credit score is often incorporated into the
(Liebowitz, 1988). These systems were developed to                             analysis before determining the likelihood that fraud
address the complexity associated with maintaining a                           is actually taking place.
huge network infrastructure and the need to maximize                               Telecommunications companies also generate and
network reliability while minimizing labor costs. The                          store an extensive amount of data related to the opera-
problem with these expert systems is that they are ex-                         tion of their networks. This is because the network
pensive to develop because it is both difficult and time-                      elements in these large telecommunication networks
consuming to elicit the requisite domain knowledge                             have some self-diagnostic capabilities that permit them
from experts. Data mining can be viewed as a means                             to generate both status and alarm messages. These
of automatically generating some of this knowledge                             streams of messages can be mined in order to support
directly from the data.                                                        network management functions, namely fault isolation
                                                                               and prediction.
                                                                                   The telecommunication industry faces a number of
BACKGROUND                                                                     data mining challenges. According to a Winter Cor-
                                                                               poration survey (2003), the three largest databases all
The data mining applications for any industry depend                           belong to telecommunication companies, with France
on two factors: the data that are available and the busi-                      Telecom, AT&T, and SBC having databases with 29, 26,
ness problems facing the industry. This section provides                       and 25 Terabytes, respectively. Thus, the scalability of
background information about the data maintained by                            data mining methods is a key concern. A second issue
telecommunications companies. The challenges as-                               is that telecommunication data is often in the form of
sociated with mining telecommunication data are also                           transactions/events and is not at the proper semantic
described in this section.                                                     level for data mining. For example, one typically wants
                                                                               to mine call detail data at the customer (i.e., phone-

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Data Mining in the Telecommunications Industry



line) level but the raw data represents individual phone    was MCI’s Friends and Family program. This program,
calls. Thus it is often necessary to aggregate data to      long since retired, began after marketing researchers         D
the appropriate semantic level (Sasisekharan, Seshadri      identified many small but well connected subgraphs
& Weiss, 1996) before mining the data. An alternative       in the graphs of calling activity (Han, Altman, Kumar,
is to utilize a data mining method that can operate on      Mannila & Pregibon, 2002). By offering reduced rates
the transactional data directly and extract sequential or   to customers in one’s calling circle, this marketing strat-
temporal patterns (Klemettinen, Mannila & Toivonen,         egy enabled the company to use their own customers
1999; Weiss & Hirsh, 1998).                                 as salesmen. This work can be considered an early use
    Another issue arises because much of the telecom-       of social-network analysis and link mining (Getoor &
munications data is generated in real-time and many         Diehl, 2005). A more recent example uses the interac-
telecommunication applications, such as fraud identi-       tions between consumers to identify those customers
fication and network fault detection, need to operate in    likely to adopt new telecommunication services (Hill,
real-time. Because of its efforts to address this issue,    Provost & Volinsky, 2006). A more traditional approach
the telecommunications industry has been a leader in        involves generating customer profiles (i.e., signatures)
the research area of mining data streams (Aggarwal,         from call detail records and then mining these profiles
2007). One way to handle data streams is to maintain a      for marketing purposes. This approach has been used
signature of the data, which is a summary description of    to identify whether a phone line is being used for voice
the data that can be updated quickly and incrementally.     or fax (Kaplan, Strauss & Szegedy, 1999) and to clas-
Cortes and Pregibon (2001) developed signature-based        sify a phone line as belonging to a either business or
methods and applied them to data streams of call detail     residential customer (Cortes & Pregibon, 1998).
records. A final issue with telecommunication data              Over the past few years, the emphasis of marketing
and the associated applications involves rarity. For        applications in the telecommunications industry has
example, both telecommunication fraud and network           shifted from identifying new customers to measuring
equipment failures are relatively rare. Predicting and      customer value and then taking steps to retain the most
identifying rare events has been shown to be quite dif-     profitable customers. This shift has occurred because it
ficult for many data mining algorithms (Weiss, 2004)        is much more expensive to acquire new telecommunica-
and therefore this issue must be handled carefully in       tion customers than retain existing ones. Thus it is useful
order to ensure reasonably good results.                    to know the total lifetime value of a customer, which
                                                            is the total net income a company can expect from that
                                                            customer over time. A variety of data mining methods
MAIN FOCUS                                                  are being used to model customer lifetime value for
                                                            telecommunication customers (Rosset, Neumann, Eick
Numerous data mining applications have been de-             & Vatnik, 2003; Freeman & Melli, 2006).
ployed in the telecommunications industry. However,             A key component of modeling a telecommunica-
most applications fall into one of the following three      tion customer’s value is estimating how long they
categories: marketing, fraud detection, and network         will remain with their current carrier. This problem
fault isolation and prediction.                             is of interest in its own right since if a company can
                                                            predict when a customer is likely to leave, it can take
Telecommunications Marketing                                proactive steps to retain the customer. The process of
                                                            a customer leaving a company is referred to as churn,
Telecommunication companies maintain an enormous            and churn analysis involves building a model of
amount of information about their customers and, due        customer attrition. Customer churn is a huge issue in
to an extremely competitive environment, have great         the telecommunication industry where, until recently,
motivation for exploiting this information. For these       telecommunication companies routinely offered large
reasons the telecommunications industry has been a          cash incentives for customers to switch carriers. Nu-
leader in the use of data mining to identify customers,     merous systems and methods have been developed to
retain customers, and maximize the profit obtained from     predict customer churn (Wei & Chin, 2002; Mozer,
each customer. Perhaps the most famous use of data          Wolniewicz, Grimes, Johnson, & Kaushansky, 2000;
mining to acquire new telecommunications customers          Mani, Drew, Betz & Datta, 1999; Masand, Datta, Mani,

                                                                                                                   487
Data Mining in the Telecommunications Industry



& Li, 1999). These systems almost always utilize call      was augmented by comparing the new calling be-
detail and contract data, but also often use other data    havior to profiles of generic fraud—and fraud is only
about the customer (credit score, complaint history,       signaled if the behavior matches one of these profiles.
etc.) in order to improve performance. Churn predic-       Customer level data can also aid in identifying fraud.
tion is fundamentally a very difficult problem and,        For example, price plan and credit rating information
consequently, systems for predicting churn have been       can be incorporated into the fraud analysis (Rosset,
only moderately effective—only demonstrating the           Murad, Neumann, Idan, & Pinkas, 1999). More recent
ability to identify some of the customers most likely      work using signatures has employed dynamic clustering
to churn (Masand et al., 1999).                            as well as deviation detection to detect fraud (Alves
                                                           et al., 2006). In this work, each signature was placed
Telecommunications Fraud Detection                         within a cluster and a change in cluster membership
                                                           was viewed as a potential indicator of fraud.
Fraud is very serious problem for telecommunica-               There are some methods for identifying fraud that
tion companies, resulting in billions of dollars of lost   do not involve comparing new behavior against a
revenue each year. Fraud can be divided into two           profile of old behavior. Perpetrators of fraud rarely
categories: subscription fraud and superimposition         work alone. For example, perpetrators of fraud often
fraud (Fawcett and Provost, 2002). Subscription fraud      act as brokers and sell illicit service to others—and the
occurs when a customer opens an account with the           illegal buyers will often use different accounts to call
intention of never paying the account and superim-         the same phone number again and again. Cortes and
position fraud occurs when a perpetrator gains illicit     Pregibon (2001) exploited this behavior by recognizing
access to the account of a legitimate customer. In this    that certain phone numbers are repeatedly called from
latter case, the fraudulent behavior will often occur      compromised accounts and that calls to these numbers
in parallel with legitimate customer behavior (i.e., is    are a strong indicator that the current account may
superimposed on it). Superimposition fraud has been        be compromised. A final method for detecting fraud
a much more significant problem for telecommunica-         exploits human pattern recognition skills. Cox, Eick
tion companies than subscription fraud. Ideally, both      & Wills (1997) built a suite of tools for visualizing
subscription fraud and superimposition fraud should        data that was tailored to show calling activity in such
be detected immediately and the associated customer        a way that unusual patterns are easily detected by us-
account deactivated or suspended. However, because it      ers. These tools were then used to identify international
is often difficult to distinguish between legitimate and   calling fraud.
illicit use with limited data, it is not always feasible
to detect fraud as soon as it begins. This problem is      Telecommunication Network Fault
compounded by the fact that there are substantial costs    Isolation and Prediction
associated with investigating fraud, as well as costs if
usage is mistakenly classified as fraudulent (e.g., an     Monitoring and maintaining telecommunication net-
annoyed customer).                                         works is an important task. As these networks became
    The most common technique for identifying super-       increasingly complex, expert systems were developed
imposition fraud is to compare the customer’s current      to handle the alarms generated by the network elements
calling behavior with a profile of his past usage, using   (Weiss, Ros & Singhal, 1998). However, because these
deviation detection and anomaly detection techniques.      systems are expensive to develop and keep current, data
The profile must be able to be quickly updated because     mining applications have been developed to identify
of the volume of call detail records and the need to       and predict network faults. Fault identification can be
identify fraud in a timely manner. Cortes and Pregi-       quite difficult because a single fault may result in a
bon (2001) generated a signature from a data stream        cascade of alarms—many of which are not associated
of call-detail records to concisely describe the calling   with the root cause of the problem. Thus an important
behavior of customers and then they used anomaly           part of fault identification is alarm correlation, which
detection to “measure the unusualness of a new call        enables multiple alarms to be recognized as being
relative to a particular account.” Because new behavior    related to a single fault.
does not necessarily imply fraud, this basic approach

488
Data Mining in the Telecommunications Industry



    The Telecommunication Alarm Sequence Analyzer           the telecommunications industry, but as telecommuni-
(TASA) is a data mining tool that aids with fault iden-     cation companies start providing television service to       D
tification by looking for frequently occurring temporal     the home and more sophisticated cell phone services
patterns of alarms (Klemettinen, Mannila & Toivonen,        become available (e.g., music, video, etc.), it is clear
1999). Patterns detected by this tool were then used to     that new data mining applications, such as recommender
help construct a rule-based alarm correlation system.       systems, will be developed and deployed.
Another effort, used to predict telecommunication               Unfortunately, there is also one troubling trend that
switch failures, employed a genetic algorithm to mine       has developed in recent years. This concerns the in-
historical alarm logs looking for predictive sequential     creasing belief that U.S. telecommunication companies
and temporal patterns (Weiss & Hirsh, 1998). One            are too readily sharing customer records with govern-
limitation with the approaches just described is that       mental agencies. This concern arose in 2006 due to
they ignore the structural information about the under-     revelations—made public in numerous newspaper and
lying network. The quality of the mined sequences can       magazine articles—that telecommunications companies
be improved if topological proximity constraints are        were turning over information on calling patterns to
considered in the data mining process (Devitt, Duffin       the National Security Agency (NSA) for purposes of
and Moloney, 2005) or if substructures in the telecom-      data mining (Krikke, 2006). If this concern continues to
munication data can be identified and exploited to allow    grow unchecked, it could lead to restrictions that limit
simpler, more useful, patterns to be learned (Baritchi,     the use of data mining for legitimate purposes.
Cook, & Lawrence, 2000). Another approach is to use
Bayesian Belief Networks to identify faults, since they
can reason about causes and effects (Sterritt, Adamson,     CONCLUSION
Shapcott & Curran, 2000).
                                                            The telecommunications industry has been one of the
                                                            early adopters of data mining and has deployed numer-
FUTURE TRENDS                                               ous data mining applications. The primary applications
                                                            relate to marketing, fraud detection, and network
Data mining should play an important and increasing         monitoring. Data mining in the telecommunications
role in the telecommunications industry due to the          industry faces several challenges, due to the size of
large amounts of high quality data available, the com-      the data sets, the sequential and temporal nature of the
petitive nature of the industry and the advances being      data, and the real-time requirements of many of the
made in data mining. In particular, advances in mining      applications. New methods have been developed and
data streams, mining sequential and temporal data,          existing methods have been enhanced to respond to
and predicting/classifying rare events should benefit       these challenges. The competitive and changing nature
the telecommunications industry. As these and other         of the industry, combined with the fact that the industry
advances are made, more reliance will be placed on the      generates enormous amounts of data, ensures that data
knowledge acquired through data mining and less on the      mining will play an important role in the future of the
knowledge acquired through the time-intensive process       telecommunications industry.
of eliciting domain knowledge from experts—although
we expect human experts will continue to play an
important role for some time to come.                       REFERENCES
    Changes in the nature of the telecommunications
industry will also lead to the development of new ap-       Aggarwal, C. (Ed.). (2007). Data Streams: Models and
plications and the demise of some current applications.     Algorithms. New York: Springer.
As an example, the main application of fraud detection
in the telecommunications industry used to be in cellular   Alves, R., Ferreira, P., Belo, O., Lopes, J., Ribeiro, J.,
cloning fraud, but this is no longer the case because the   Cortesao, L., & Martins, F. (2006). Discovering telecom
problem has been largely eliminated due to technologi-      fraud situations through mining anomalous behavior
cal advances in the cell phone authentication process.      patterns. Proceedings of the ACM SIGKDD Workshop
It is difficult to predict what future changes will face    on Data Mining for Business Applications (pp. 1-7).
                                                            New York: ACM Press.

                                                                                                                  489
Data Mining in the Telecommunications Industry



Baritchi, A., Cook, D., & Holder, L. (2000). Discov-       Liebowitz, J. (1988). Expert System Applications to
ering structural patterns in telecommunications data.      Telecommunications. New York, NY: John Wiley &
Proceedings of the Thirteenth Annual Florida AI Re-        Sons.
search Symposium (pp. 82-85).
                                                           Mani, D., Drew, J., Betz, A., & Datta, P (1999). Statistics
Cortes, C., & Pregibon, D (1998). Giga-mining. Pro-        and data mining techniques for lifetime value modeling.
ceedings of the Fourth International Conference on         Proceedings of the Fifth ACM SIGKDD International
Knowledge Discovery and Data Mining (pp. 174-178).         Conference on Knowledge Discovery and Data Mining
New York, NY: AAAI Press.                                  (pp. 94-103). New York, NY: ACM Press.
Cortes, C., & Pregibon, D. (2001). Signature-based         Masand, B., Datta, P., Mani, D., & Li, B. (1999).
methods for data streams. Data Mining and Knowledge        CHAMP: A prototype for automated cellular churn
Discovery, 5(3), 167-182.                                  prediction. Data Mining and Knowledge Discovery,
                                                           3(2), 219-225.
Cox, K., Eick, S., & Wills, G. (1997). Visual data min-
ing: Recognizing telephone calling fraud. Data Mining      Mozer, M., Wolniewicz, R., Grimes, D., Johnson, E.,
and Knowledge Discovery, 1(2), 225-231.                    & Kaushansky, H. (2000). Predicting subscriber dis-
                                                           satisfaction and improving retention in the wireless
Devitt, A., Duffin, J., & Moloney, R. (2005). Topo-
                                                           telecommunication industry. IEEE Transactions on
graphical proximity for mining network alarm data.
                                                           Neural Networks, 11, 690-696.
Proceedings of the 2005 ACM SIGCOMM Workshop
on Mining Network Data (pp. 179-184). New York:            Rosset, S., Murad, U., Neumann, E., Idan, Y., & Gadi,
ACM Press.                                                 P. (1999). Discovery of fraud rules for telecommu-
                                                           nications—challenges and solutions. Proceedings of
Fawcett, T., & Provost, F. (2002). Fraud Detection. In
                                                           the Fifth ACM SIGKDD International Conference on
W. Klosgen & J. Zytkow (Eds.), Handbook of Data
                                                           Knowledge Discovery and Data Mining (pp. 409-413).
Mining and Knowledge Discovery (pp. 726-731). New
                                                           New York: ACM Press.
York: Oxford University Press.
                                                           Rosset, S., Neumann, E., Eick, U., & Vatnik (2003).
Freeman, E., & Melli, G. (2006). Championing of
                                                           Customer lifetime value models for decision support.
an LTV model at LTC. SIGKDD Explorations, 8(1),
                                                           Data Mining and Knowledge Discovery, 7(3), 321-
27-32.
                                                           339.
Getoor, L., & Diehl, C.P. (2005). Link mining: A survey.
                                                           Sasisekharan, R., Seshadri, V., & Weiss, S (1996). Data
SIGKDD Explorations, 7(2), 3-12.
                                                           mining and forecasting in large-scale telecommunica-
Hill, S., Provost, F., & Volinsky, C. (2006). Network-     tion networks. IEEE Expert, 11(1), 37-43.
based marketing: Identifying likely adopters via con-
                                                           Sterritt, R., Adamson, K., Shapcott, C., & Curran, E.
sumer networks. Statistical Science, 21(2), 256-276.
                                                           (2000). Parallel data mining of Bayesian networks from
Kaplan, H., Strauss, M., & Szegedy, M. (1999). Just        telecommunication network data. Proceedings of the
the fax—differentiating voice and fax phone lines us-      14th International Parallel and Distributed Processing
ing call billing data. Proceedings of the Tenth Annual     Symposium, IEEE Computer Society.
ACM-SIAM Symposium on Discrete Algorithms (pp.
                                                           Wei, C., & Chiu, I (2002). Turning telecommunications
935-936). Philadelphia, PA: Society for Industrial and
                                                           call details to churn prediction: A data mining approach.
Applied Mathematics.
                                                           Expert Systems with Applications, 23(2), 103-112.
Klemettinen, M., Mannila, H., & Toivonen, H. (1999).
                                                           Weiss, G., & Hirsh, H (1998). Learning to predict rare
Rule discovery in telecommunication alarm data.
                                                           events in event sequences. In R. Agrawal & P. Stolorz
Journal of Network and Systems Management, 7(4),
                                                           (Eds.), Proceedings of the Fourth International Confer-
395-423.
                                                           ence on Knowledge Discovery and Data Mining (pp.
Krikke, J. (2006). Intelligent surveillance empowers       359-363). Menlo Park, CA: AAAI Press.
security analysts. IEEE Intelligent Systems, 21(3),
102-104.
490
Data Mining in the Telecommunications Industry



Weiss, G., Ros, J., & Singhal, A. (1998). ANSWER:            Call Detail Record: Contains the descriptive in-
Network monitoring using object-oriented rule. Pro-       formation associated with a single phone call.           D
ceedings of the Tenth Conference on Innovative Ap-
                                                              Churn: Customer attrition. Churn prediction refers
plications of Artificial Intelligence (pp. 1087-1093).
                                                          to the ability to predict that a customer will leave a
Menlo Park: AAAI Press.
                                                          company before the change actually occurs.
Weiss, G. (2004). Mining with rarity: A unifying frame-
                                                              Signature: A summary description of a subset of
work. SIGKDD Explorations, 6(1), 7-19.
                                                          data from a data stream that can be updated incremen-
Winter Corporation (2003). 2003 Top 10 Award Win-         tally and quickly.
ners. Retrieved October 8, 2005, from http://www.
                                                             Subscription Fraud: Occurs when a perpetrator
wintercorp.com/VLDB/2003_TopTen_Survey/Top-
                                                          opens an account with no intention of ever paying for
Tenwinners.asp
                                                          the services incurred.
                                                              Superimposition Fraud: Occurs when a perpetra-
                                                          tor gains illicit access to an account being used by a
KEY TERMS                                                 legitimate customer and where fraudulent use is “su-
                                                          perimposed” on top of legitimate use.
    Bayesian Belief Network: A model that predicts
the probability of events or conditions based on causal      Total Lifetime Value: The total net income a com-
relations.                                                pany can expect from a customer over time.




                                                                                                            491

Mais conteúdo relacionado

Mais procurados

Overview of telecommunications and network
Overview of telecommunications and networkOverview of telecommunications and network
Overview of telecommunications and networkAnkush Mehrotra
 
Mobile Network Capacity Issues
Mobile Network Capacity IssuesMobile Network Capacity Issues
Mobile Network Capacity IssuesPhilip Corsano
 
A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...
A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...
A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...ijwmn
 
Semantic Technology in BI
Semantic Technology in BISemantic Technology in BI
Semantic Technology in BIJohn Davies
 
Paper review on 5 g mobile technology
Paper review on 5 g mobile technology Paper review on 5 g mobile technology
Paper review on 5 g mobile technology Madhunath Yadav
 
Energy efficient information and communication infrastructures in the smart g...
Energy efficient information and communication infrastructures in the smart g...Energy efficient information and communication infrastructures in the smart g...
Energy efficient information and communication infrastructures in the smart g...redpel dot com
 
MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...
MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...
MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...Cisco Service Provider Mobility
 
J rendon its2007 4 g europe
J rendon its2007 4 g europeJ rendon its2007 4 g europe
J rendon its2007 4 g europefwe fwef
 
A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...
A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...
A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...IDES Editor
 
Performance Evaluation of VEnodeb Using Virtualized Radio Resource Management
Performance Evaluation of VEnodeb Using Virtualized Radio Resource ManagementPerformance Evaluation of VEnodeb Using Virtualized Radio Resource Management
Performance Evaluation of VEnodeb Using Virtualized Radio Resource ManagementJIEMS Akkalkuwa
 
Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)
Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)
Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)Desong Bian
 
LTE_Inbuilding_SWP
LTE_Inbuilding_SWPLTE_Inbuilding_SWP
LTE_Inbuilding_SWPBen O'Nolan
 
Wi Max Business Plan Sergio Cruzes V4
Wi Max Business Plan Sergio Cruzes V4Wi Max Business Plan Sergio Cruzes V4
Wi Max Business Plan Sergio Cruzes V4Sergio Cruzes
 
IRJET- Advantages of Mobile Cloud Computing
IRJET- Advantages of Mobile Cloud ComputingIRJET- Advantages of Mobile Cloud Computing
IRJET- Advantages of Mobile Cloud ComputingIRJET Journal
 
Brief introduction-about-5 g-mobile-technologies
Brief introduction-about-5 g-mobile-technologiesBrief introduction-about-5 g-mobile-technologies
Brief introduction-about-5 g-mobile-technologiesritusara
 

Mais procurados (20)

White paper openmtc
White paper openmtcWhite paper openmtc
White paper openmtc
 
Overview of telecommunications and network
Overview of telecommunications and networkOverview of telecommunications and network
Overview of telecommunications and network
 
Mobile Network Capacity Issues
Mobile Network Capacity IssuesMobile Network Capacity Issues
Mobile Network Capacity Issues
 
A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...
A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...
A SURVEY ON OPPORTUNISTIC ROUTING PROTOCOLS IN CELLULAR NETWORK FOR MOBILE DA...
 
Semantic Technology in BI
Semantic Technology in BISemantic Technology in BI
Semantic Technology in BI
 
Paper review on 5 g mobile technology
Paper review on 5 g mobile technology Paper review on 5 g mobile technology
Paper review on 5 g mobile technology
 
Energy efficient information and communication infrastructures in the smart g...
Energy efficient information and communication infrastructures in the smart g...Energy efficient information and communication infrastructures in the smart g...
Energy efficient information and communication infrastructures in the smart g...
 
Cam term paper
Cam term paperCam term paper
Cam term paper
 
MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...
MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...
MTC: When Machines Communicate (A New Hot Topic Taking Over the Industry) - a...
 
J rendon its2007 4 g europe
J rendon its2007 4 g europeJ rendon its2007 4 g europe
J rendon its2007 4 g europe
 
A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...
A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...
A Social Welfare Approach in Increasing the Benefits from the Internet in Dev...
 
Performance Evaluation of VEnodeb Using Virtualized Radio Resource Management
Performance Evaluation of VEnodeb Using Virtualized Radio Resource ManagementPerformance Evaluation of VEnodeb Using Virtualized Radio Resource Management
Performance Evaluation of VEnodeb Using Virtualized Radio Resource Management
 
Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)
Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)
Analysis of Communication Schemes for Advanced Metering Infrastructure (AMI)
 
LTE_Inbuilding_SWP
LTE_Inbuilding_SWPLTE_Inbuilding_SWP
LTE_Inbuilding_SWP
 
5G design concepts
5G design concepts5G design concepts
5G design concepts
 
Wi Max Business Plan Sergio Cruzes V4
Wi Max Business Plan Sergio Cruzes V4Wi Max Business Plan Sergio Cruzes V4
Wi Max Business Plan Sergio Cruzes V4
 
IRJET- Advantages of Mobile Cloud Computing
IRJET- Advantages of Mobile Cloud ComputingIRJET- Advantages of Mobile Cloud Computing
IRJET- Advantages of Mobile Cloud Computing
 
Fb34942946
Fb34942946Fb34942946
Fb34942946
 
SOA for Telecom | Torry Harris Whitepaper
SOA for Telecom | Torry Harris WhitepaperSOA for Telecom | Torry Harris Whitepaper
SOA for Telecom | Torry Harris Whitepaper
 
Brief introduction-about-5 g-mobile-technologies
Brief introduction-about-5 g-mobile-technologiesBrief introduction-about-5 g-mobile-technologies
Brief introduction-about-5 g-mobile-technologies
 

Destaque

JavaScriptライフを10倍楽しくする方法-HTML5fun-
 JavaScriptライフを10倍楽しくする方法-HTML5fun- JavaScriptライフを10倍楽しくする方法-HTML5fun-
JavaScriptライフを10倍楽しくする方法-HTML5fun-Masayuki Abe
 
ちゃんとWeb会議
ちゃんとWeb会議ちゃんとWeb会議
ちゃんとWeb会議Masayuki Abe
 
Past continuous 3reso
Past continuous 3resoPast continuous 3reso
Past continuous 3resoteacherhector
 
Unit13 organisational structure 02
Unit13 organisational structure 02Unit13 organisational structure 02
Unit13 organisational structure 02connor-sherwin
 
L’aparell digestiu
L’aparell  digestiuL’aparell  digestiu
L’aparell digestiujvila2345
 
Goal 3: Agency-wide Priority Measures
Goal 3: Agency-wide Priority MeasuresGoal 3: Agency-wide Priority Measures
Goal 3: Agency-wide Priority Measuresserviceresources
 
презентация элективного курса по истории
презентация элективного курса по историипрезентация элективного курса по истории
презентация элективного курса по историиloksal
 
Bahan presentase evaluasi ouap
Bahan presentase evaluasi ouapBahan presentase evaluasi ouap
Bahan presentase evaluasi ouapLitel Mejik
 
Jaarverslag 2011/2012
Jaarverslag 2011/2012Jaarverslag 2011/2012
Jaarverslag 2011/2012dewittenberg
 
Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?Hiroaki Kubota
 
Best Gift Presentation Fmcg 2012
Best Gift Presentation Fmcg 2012Best Gift Presentation Fmcg 2012
Best Gift Presentation Fmcg 2012Igor Kovanov
 
Broan nutone canada green initiatives 2006 2010
Broan nutone canada green initiatives 2006 2010Broan nutone canada green initiatives 2006 2010
Broan nutone canada green initiatives 2006 2010OssoElectric
 
Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)Hiroaki Kubota
 
No sql for sql professionals
No sql for sql professionalsNo sql for sql professionals
No sql for sql professionalsRic Centre
 

Destaque (20)

JavaScriptライフを10倍楽しくする方法-HTML5fun-
 JavaScriptライフを10倍楽しくする方法-HTML5fun- JavaScriptライフを10倍楽しくする方法-HTML5fun-
JavaScriptライフを10倍楽しくする方法-HTML5fun-
 
ちゃんとWeb会議
ちゃんとWeb会議ちゃんとWeb会議
ちゃんとWeb会議
 
Abstract
AbstractAbstract
Abstract
 
Past continuous 3reso
Past continuous 3resoPast continuous 3reso
Past continuous 3reso
 
Unit13 organisational structure 02
Unit13 organisational structure 02Unit13 organisational structure 02
Unit13 organisational structure 02
 
Google inc
Google incGoogle inc
Google inc
 
L’aparell digestiu
L’aparell  digestiuL’aparell  digestiu
L’aparell digestiu
 
Sales performance 2011
Sales performance 2011Sales performance 2011
Sales performance 2011
 
Goal 3: Agency-wide Priority Measures
Goal 3: Agency-wide Priority MeasuresGoal 3: Agency-wide Priority Measures
Goal 3: Agency-wide Priority Measures
 
презентация элективного курса по истории
презентация элективного курса по историипрезентация элективного курса по истории
презентация элективного курса по истории
 
Introduction by Dr K
Introduction by Dr KIntroduction by Dr K
Introduction by Dr K
 
Kitakyushu Smart City Master Plan
Kitakyushu Smart City Master PlanKitakyushu Smart City Master Plan
Kitakyushu Smart City Master Plan
 
Syro malankara rite
Syro malankara rite Syro malankara rite
Syro malankara rite
 
Bahan presentase evaluasi ouap
Bahan presentase evaluasi ouapBahan presentase evaluasi ouap
Bahan presentase evaluasi ouap
 
Jaarverslag 2011/2012
Jaarverslag 2011/2012Jaarverslag 2011/2012
Jaarverslag 2011/2012
 
Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?
 
Best Gift Presentation Fmcg 2012
Best Gift Presentation Fmcg 2012Best Gift Presentation Fmcg 2012
Best Gift Presentation Fmcg 2012
 
Broan nutone canada green initiatives 2006 2010
Broan nutone canada green initiatives 2006 2010Broan nutone canada green initiatives 2006 2010
Broan nutone canada green initiatives 2006 2010
 
Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)
 
No sql for sql professionals
No sql for sql professionalsNo sql for sql professionals
No sql for sql professionals
 

Semelhante a Telcom data mining

Securing Digital_Adams
Securing Digital_AdamsSecuring Digital_Adams
Securing Digital_AdamsJulius Adams
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industryijsrd.com
 
Data mining in telecommunication industry
Data mining in telecommunication industryData mining in telecommunication industry
Data mining in telecommunication industryharshu966
 
Mobility report wireless technology supported by sm bs
Mobility report   wireless technology supported by sm bsMobility report   wireless technology supported by sm bs
Mobility report wireless technology supported by sm bsEnterprise Mobility Solutions
 
Curso: Redes y comunicaciones II: 02 CaaS, NaaS
Curso: Redes y comunicaciones II: 02 CaaS, NaaSCurso: Redes y comunicaciones II: 02 CaaS, NaaS
Curso: Redes y comunicaciones II: 02 CaaS, NaaSJack Daniel Cáceres Meza
 
Data Mining in telecommunication industry
Data Mining in telecommunication industryData Mining in telecommunication industry
Data Mining in telecommunication industrypragya ratan
 
Trends in telco
Trends in telcoTrends in telco
Trends in telcoEMC
 
Unleashing the Power of Telecom Network Security.pdf
Unleashing the Power of Telecom Network Security.pdfUnleashing the Power of Telecom Network Security.pdf
Unleashing the Power of Telecom Network Security.pdfSecurityGen1
 
Strengthening Your Network Against Future Incidents with SecurityGen
Strengthening Your Network Against Future Incidents with SecurityGenStrengthening Your Network Against Future Incidents with SecurityGen
Strengthening Your Network Against Future Incidents with SecurityGenSecurityGen1
 
Telecom Resilience: Strengthening Networks through Cybersecurity Vigilance
Telecom Resilience: Strengthening Networks through Cybersecurity VigilanceTelecom Resilience: Strengthening Networks through Cybersecurity Vigilance
Telecom Resilience: Strengthening Networks through Cybersecurity VigilanceSecurityGen1
 
A survey study of title security and privacy in mobile systems
A survey study of title security and privacy in mobile systemsA survey study of title security and privacy in mobile systems
A survey study of title security and privacy in mobile systemsKavita Rastogi
 
Cellular Core Enterprise White Paper by Rethink Technology Research
Cellular Core Enterprise White Paper by Rethink Technology ResearchCellular Core Enterprise White Paper by Rethink Technology Research
Cellular Core Enterprise White Paper by Rethink Technology ResearchAndy Odgers
 
The Case For Compaq Mediation Technology Hp
The Case For Compaq Mediation Technology HpThe Case For Compaq Mediation Technology Hp
The Case For Compaq Mediation Technology HpJan Lohmann, PhD
 
Real-time telecom revenue assurence
Real-time telecom revenue assurenceReal-time telecom revenue assurence
Real-time telecom revenue assurencemustafa sarac
 
Software Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdf
Software Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdfSoftware Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdf
Software Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdfMatthew Allen
 

Semelhante a Telcom data mining (20)

Securing Digital_Adams
Securing Digital_AdamsSecuring Digital_Adams
Securing Digital_Adams
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industry
 
Etiya White Paper_ABDR
Etiya White Paper_ABDREtiya White Paper_ABDR
Etiya White Paper_ABDR
 
Data mining in telecommunication industry
Data mining in telecommunication industryData mining in telecommunication industry
Data mining in telecommunication industry
 
Mobility report wireless technology supported by sm bs
Mobility report   wireless technology supported by sm bsMobility report   wireless technology supported by sm bs
Mobility report wireless technology supported by sm bs
 
Curso: Redes y comunicaciones II: 02 CaaS, NaaS
Curso: Redes y comunicaciones II: 02 CaaS, NaaSCurso: Redes y comunicaciones II: 02 CaaS, NaaS
Curso: Redes y comunicaciones II: 02 CaaS, NaaS
 
Data Mining in telecommunication industry
Data Mining in telecommunication industryData Mining in telecommunication industry
Data Mining in telecommunication industry
 
Trends in telco
Trends in telcoTrends in telco
Trends in telco
 
Unleashing the Power of Telecom Network Security.pdf
Unleashing the Power of Telecom Network Security.pdfUnleashing the Power of Telecom Network Security.pdf
Unleashing the Power of Telecom Network Security.pdf
 
Strengthening Your Network Against Future Incidents with SecurityGen
Strengthening Your Network Against Future Incidents with SecurityGenStrengthening Your Network Against Future Incidents with SecurityGen
Strengthening Your Network Against Future Incidents with SecurityGen
 
Telecom Resilience: Strengthening Networks through Cybersecurity Vigilance
Telecom Resilience: Strengthening Networks through Cybersecurity VigilanceTelecom Resilience: Strengthening Networks through Cybersecurity Vigilance
Telecom Resilience: Strengthening Networks through Cybersecurity Vigilance
 
Sigfox whitepaper
Sigfox whitepaperSigfox whitepaper
Sigfox whitepaper
 
A survey study of title security and privacy in mobile systems
A survey study of title security and privacy in mobile systemsA survey study of title security and privacy in mobile systems
A survey study of title security and privacy in mobile systems
 
Cellular Core Enterprise White Paper by Rethink Technology Research
Cellular Core Enterprise White Paper by Rethink Technology ResearchCellular Core Enterprise White Paper by Rethink Technology Research
Cellular Core Enterprise White Paper by Rethink Technology Research
 
Big Data and the Telecom Industry
Big Data and the Telecom IndustryBig Data and the Telecom Industry
Big Data and the Telecom Industry
 
The Case For Compaq Mediation Technology Hp
The Case For Compaq Mediation Technology HpThe Case For Compaq Mediation Technology Hp
The Case For Compaq Mediation Technology Hp
 
The Stupid Network
The Stupid NetworkThe Stupid Network
The Stupid Network
 
Essay On ENCE
Essay On ENCEEssay On ENCE
Essay On ENCE
 
Real-time telecom revenue assurence
Real-time telecom revenue assurenceReal-time telecom revenue assurence
Real-time telecom revenue assurence
 
Software Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdf
Software Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdfSoftware Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdf
Software Quality Assurance in the Telecom Industry - Whitepaper - HeadSpin.pdf
 

Telcom data mining

  • 1. 486 Section: Service Data Mining in the Telecommunications Industry Gary M. Weiss Fordham University, USA INTRODUCTION Telecommunication companies maintain data about the phone calls that traverse their networks in the form of The telecommunications industry was one of the first call detail records, which contain descriptive informa- to adopt data mining technology. This is most likely tion for each phone call. In 2001, AT&T long distance because telecommunication companies routinely gener- customers generated over 300 million call detail records ate and store enormous amounts of high-quality data, per day (Cortes & Pregibon, 2001) and, because call have a very large customer base, and operate in a detail records are kept online for several months, this rapidly changing and highly competitive environment. meant that billions of call detail records were readily Telecommunication companies utilize data mining to available for data mining. Call detail data is useful for improve their marketing efforts, identify fraud, and marketing and fraud detection applications. better manage their telecommunication networks. Telecommunication companies also maintain exten- However, these companies also face a number of data sive customer information, such as billing information, mining challenges due to the enormous size of their as well as information obtained from outside parties, data sets, the sequential and temporal aspects of their such as credit score information. This information data, and the need to predict very rare events—such as can be quite useful and often is combined with tele- customer fraud and network failures—in real-time. communication-specific data to improve the results The popularity of data mining in the telecommunica- of data mining. For example, while call detail data tions industry can be viewed as an extension of the use can be used to identify suspicious calling patterns, a of expert systems in the telecommunications industry customer’s credit score is often incorporated into the (Liebowitz, 1988). These systems were developed to analysis before determining the likelihood that fraud address the complexity associated with maintaining a is actually taking place. huge network infrastructure and the need to maximize Telecommunications companies also generate and network reliability while minimizing labor costs. The store an extensive amount of data related to the opera- problem with these expert systems is that they are ex- tion of their networks. This is because the network pensive to develop because it is both difficult and time- elements in these large telecommunication networks consuming to elicit the requisite domain knowledge have some self-diagnostic capabilities that permit them from experts. Data mining can be viewed as a means to generate both status and alarm messages. These of automatically generating some of this knowledge streams of messages can be mined in order to support directly from the data. network management functions, namely fault isolation and prediction. The telecommunication industry faces a number of BACKGROUND data mining challenges. According to a Winter Cor- poration survey (2003), the three largest databases all The data mining applications for any industry depend belong to telecommunication companies, with France on two factors: the data that are available and the busi- Telecom, AT&T, and SBC having databases with 29, 26, ness problems facing the industry. This section provides and 25 Terabytes, respectively. Thus, the scalability of background information about the data maintained by data mining methods is a key concern. A second issue telecommunications companies. The challenges as- is that telecommunication data is often in the form of sociated with mining telecommunication data are also transactions/events and is not at the proper semantic described in this section. level for data mining. For example, one typically wants to mine call detail data at the customer (i.e., phone- Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
  • 2. Data Mining in the Telecommunications Industry line) level but the raw data represents individual phone was MCI’s Friends and Family program. This program, calls. Thus it is often necessary to aggregate data to long since retired, began after marketing researchers D the appropriate semantic level (Sasisekharan, Seshadri identified many small but well connected subgraphs & Weiss, 1996) before mining the data. An alternative in the graphs of calling activity (Han, Altman, Kumar, is to utilize a data mining method that can operate on Mannila & Pregibon, 2002). By offering reduced rates the transactional data directly and extract sequential or to customers in one’s calling circle, this marketing strat- temporal patterns (Klemettinen, Mannila & Toivonen, egy enabled the company to use their own customers 1999; Weiss & Hirsh, 1998). as salesmen. This work can be considered an early use Another issue arises because much of the telecom- of social-network analysis and link mining (Getoor & munications data is generated in real-time and many Diehl, 2005). A more recent example uses the interac- telecommunication applications, such as fraud identi- tions between consumers to identify those customers fication and network fault detection, need to operate in likely to adopt new telecommunication services (Hill, real-time. Because of its efforts to address this issue, Provost & Volinsky, 2006). A more traditional approach the telecommunications industry has been a leader in involves generating customer profiles (i.e., signatures) the research area of mining data streams (Aggarwal, from call detail records and then mining these profiles 2007). One way to handle data streams is to maintain a for marketing purposes. This approach has been used signature of the data, which is a summary description of to identify whether a phone line is being used for voice the data that can be updated quickly and incrementally. or fax (Kaplan, Strauss & Szegedy, 1999) and to clas- Cortes and Pregibon (2001) developed signature-based sify a phone line as belonging to a either business or methods and applied them to data streams of call detail residential customer (Cortes & Pregibon, 1998). records. A final issue with telecommunication data Over the past few years, the emphasis of marketing and the associated applications involves rarity. For applications in the telecommunications industry has example, both telecommunication fraud and network shifted from identifying new customers to measuring equipment failures are relatively rare. Predicting and customer value and then taking steps to retain the most identifying rare events has been shown to be quite dif- profitable customers. This shift has occurred because it ficult for many data mining algorithms (Weiss, 2004) is much more expensive to acquire new telecommunica- and therefore this issue must be handled carefully in tion customers than retain existing ones. Thus it is useful order to ensure reasonably good results. to know the total lifetime value of a customer, which is the total net income a company can expect from that customer over time. A variety of data mining methods MAIN FOCUS are being used to model customer lifetime value for telecommunication customers (Rosset, Neumann, Eick Numerous data mining applications have been de- & Vatnik, 2003; Freeman & Melli, 2006). ployed in the telecommunications industry. However, A key component of modeling a telecommunica- most applications fall into one of the following three tion customer’s value is estimating how long they categories: marketing, fraud detection, and network will remain with their current carrier. This problem fault isolation and prediction. is of interest in its own right since if a company can predict when a customer is likely to leave, it can take Telecommunications Marketing proactive steps to retain the customer. The process of a customer leaving a company is referred to as churn, Telecommunication companies maintain an enormous and churn analysis involves building a model of amount of information about their customers and, due customer attrition. Customer churn is a huge issue in to an extremely competitive environment, have great the telecommunication industry where, until recently, motivation for exploiting this information. For these telecommunication companies routinely offered large reasons the telecommunications industry has been a cash incentives for customers to switch carriers. Nu- leader in the use of data mining to identify customers, merous systems and methods have been developed to retain customers, and maximize the profit obtained from predict customer churn (Wei & Chin, 2002; Mozer, each customer. Perhaps the most famous use of data Wolniewicz, Grimes, Johnson, & Kaushansky, 2000; mining to acquire new telecommunications customers Mani, Drew, Betz & Datta, 1999; Masand, Datta, Mani, 487
  • 3. Data Mining in the Telecommunications Industry & Li, 1999). These systems almost always utilize call was augmented by comparing the new calling be- detail and contract data, but also often use other data havior to profiles of generic fraud—and fraud is only about the customer (credit score, complaint history, signaled if the behavior matches one of these profiles. etc.) in order to improve performance. Churn predic- Customer level data can also aid in identifying fraud. tion is fundamentally a very difficult problem and, For example, price plan and credit rating information consequently, systems for predicting churn have been can be incorporated into the fraud analysis (Rosset, only moderately effective—only demonstrating the Murad, Neumann, Idan, & Pinkas, 1999). More recent ability to identify some of the customers most likely work using signatures has employed dynamic clustering to churn (Masand et al., 1999). as well as deviation detection to detect fraud (Alves et al., 2006). In this work, each signature was placed Telecommunications Fraud Detection within a cluster and a change in cluster membership was viewed as a potential indicator of fraud. Fraud is very serious problem for telecommunica- There are some methods for identifying fraud that tion companies, resulting in billions of dollars of lost do not involve comparing new behavior against a revenue each year. Fraud can be divided into two profile of old behavior. Perpetrators of fraud rarely categories: subscription fraud and superimposition work alone. For example, perpetrators of fraud often fraud (Fawcett and Provost, 2002). Subscription fraud act as brokers and sell illicit service to others—and the occurs when a customer opens an account with the illegal buyers will often use different accounts to call intention of never paying the account and superim- the same phone number again and again. Cortes and position fraud occurs when a perpetrator gains illicit Pregibon (2001) exploited this behavior by recognizing access to the account of a legitimate customer. In this that certain phone numbers are repeatedly called from latter case, the fraudulent behavior will often occur compromised accounts and that calls to these numbers in parallel with legitimate customer behavior (i.e., is are a strong indicator that the current account may superimposed on it). Superimposition fraud has been be compromised. A final method for detecting fraud a much more significant problem for telecommunica- exploits human pattern recognition skills. Cox, Eick tion companies than subscription fraud. Ideally, both & Wills (1997) built a suite of tools for visualizing subscription fraud and superimposition fraud should data that was tailored to show calling activity in such be detected immediately and the associated customer a way that unusual patterns are easily detected by us- account deactivated or suspended. However, because it ers. These tools were then used to identify international is often difficult to distinguish between legitimate and calling fraud. illicit use with limited data, it is not always feasible to detect fraud as soon as it begins. This problem is Telecommunication Network Fault compounded by the fact that there are substantial costs Isolation and Prediction associated with investigating fraud, as well as costs if usage is mistakenly classified as fraudulent (e.g., an Monitoring and maintaining telecommunication net- annoyed customer). works is an important task. As these networks became The most common technique for identifying super- increasingly complex, expert systems were developed imposition fraud is to compare the customer’s current to handle the alarms generated by the network elements calling behavior with a profile of his past usage, using (Weiss, Ros & Singhal, 1998). However, because these deviation detection and anomaly detection techniques. systems are expensive to develop and keep current, data The profile must be able to be quickly updated because mining applications have been developed to identify of the volume of call detail records and the need to and predict network faults. Fault identification can be identify fraud in a timely manner. Cortes and Pregi- quite difficult because a single fault may result in a bon (2001) generated a signature from a data stream cascade of alarms—many of which are not associated of call-detail records to concisely describe the calling with the root cause of the problem. Thus an important behavior of customers and then they used anomaly part of fault identification is alarm correlation, which detection to “measure the unusualness of a new call enables multiple alarms to be recognized as being relative to a particular account.” Because new behavior related to a single fault. does not necessarily imply fraud, this basic approach 488
  • 4. Data Mining in the Telecommunications Industry The Telecommunication Alarm Sequence Analyzer the telecommunications industry, but as telecommuni- (TASA) is a data mining tool that aids with fault iden- cation companies start providing television service to D tification by looking for frequently occurring temporal the home and more sophisticated cell phone services patterns of alarms (Klemettinen, Mannila & Toivonen, become available (e.g., music, video, etc.), it is clear 1999). Patterns detected by this tool were then used to that new data mining applications, such as recommender help construct a rule-based alarm correlation system. systems, will be developed and deployed. Another effort, used to predict telecommunication Unfortunately, there is also one troubling trend that switch failures, employed a genetic algorithm to mine has developed in recent years. This concerns the in- historical alarm logs looking for predictive sequential creasing belief that U.S. telecommunication companies and temporal patterns (Weiss & Hirsh, 1998). One are too readily sharing customer records with govern- limitation with the approaches just described is that mental agencies. This concern arose in 2006 due to they ignore the structural information about the under- revelations—made public in numerous newspaper and lying network. The quality of the mined sequences can magazine articles—that telecommunications companies be improved if topological proximity constraints are were turning over information on calling patterns to considered in the data mining process (Devitt, Duffin the National Security Agency (NSA) for purposes of and Moloney, 2005) or if substructures in the telecom- data mining (Krikke, 2006). If this concern continues to munication data can be identified and exploited to allow grow unchecked, it could lead to restrictions that limit simpler, more useful, patterns to be learned (Baritchi, the use of data mining for legitimate purposes. Cook, & Lawrence, 2000). Another approach is to use Bayesian Belief Networks to identify faults, since they can reason about causes and effects (Sterritt, Adamson, CONCLUSION Shapcott & Curran, 2000). The telecommunications industry has been one of the early adopters of data mining and has deployed numer- FUTURE TRENDS ous data mining applications. The primary applications relate to marketing, fraud detection, and network Data mining should play an important and increasing monitoring. Data mining in the telecommunications role in the telecommunications industry due to the industry faces several challenges, due to the size of large amounts of high quality data available, the com- the data sets, the sequential and temporal nature of the petitive nature of the industry and the advances being data, and the real-time requirements of many of the made in data mining. In particular, advances in mining applications. New methods have been developed and data streams, mining sequential and temporal data, existing methods have been enhanced to respond to and predicting/classifying rare events should benefit these challenges. The competitive and changing nature the telecommunications industry. As these and other of the industry, combined with the fact that the industry advances are made, more reliance will be placed on the generates enormous amounts of data, ensures that data knowledge acquired through data mining and less on the mining will play an important role in the future of the knowledge acquired through the time-intensive process telecommunications industry. of eliciting domain knowledge from experts—although we expect human experts will continue to play an important role for some time to come. REFERENCES Changes in the nature of the telecommunications industry will also lead to the development of new ap- Aggarwal, C. (Ed.). (2007). Data Streams: Models and plications and the demise of some current applications. Algorithms. New York: Springer. As an example, the main application of fraud detection in the telecommunications industry used to be in cellular Alves, R., Ferreira, P., Belo, O., Lopes, J., Ribeiro, J., cloning fraud, but this is no longer the case because the Cortesao, L., & Martins, F. (2006). Discovering telecom problem has been largely eliminated due to technologi- fraud situations through mining anomalous behavior cal advances in the cell phone authentication process. patterns. Proceedings of the ACM SIGKDD Workshop It is difficult to predict what future changes will face on Data Mining for Business Applications (pp. 1-7). New York: ACM Press. 489
  • 5. Data Mining in the Telecommunications Industry Baritchi, A., Cook, D., & Holder, L. (2000). Discov- Liebowitz, J. (1988). Expert System Applications to ering structural patterns in telecommunications data. Telecommunications. New York, NY: John Wiley & Proceedings of the Thirteenth Annual Florida AI Re- Sons. search Symposium (pp. 82-85). Mani, D., Drew, J., Betz, A., & Datta, P (1999). Statistics Cortes, C., & Pregibon, D (1998). Giga-mining. Pro- and data mining techniques for lifetime value modeling. ceedings of the Fourth International Conference on Proceedings of the Fifth ACM SIGKDD International Knowledge Discovery and Data Mining (pp. 174-178). Conference on Knowledge Discovery and Data Mining New York, NY: AAAI Press. (pp. 94-103). New York, NY: ACM Press. Cortes, C., & Pregibon, D. (2001). Signature-based Masand, B., Datta, P., Mani, D., & Li, B. (1999). methods for data streams. Data Mining and Knowledge CHAMP: A prototype for automated cellular churn Discovery, 5(3), 167-182. prediction. Data Mining and Knowledge Discovery, 3(2), 219-225. Cox, K., Eick, S., & Wills, G. (1997). Visual data min- ing: Recognizing telephone calling fraud. Data Mining Mozer, M., Wolniewicz, R., Grimes, D., Johnson, E., and Knowledge Discovery, 1(2), 225-231. & Kaushansky, H. (2000). Predicting subscriber dis- satisfaction and improving retention in the wireless Devitt, A., Duffin, J., & Moloney, R. (2005). Topo- telecommunication industry. IEEE Transactions on graphical proximity for mining network alarm data. Neural Networks, 11, 690-696. Proceedings of the 2005 ACM SIGCOMM Workshop on Mining Network Data (pp. 179-184). New York: Rosset, S., Murad, U., Neumann, E., Idan, Y., & Gadi, ACM Press. P. (1999). Discovery of fraud rules for telecommu- nications—challenges and solutions. Proceedings of Fawcett, T., & Provost, F. (2002). Fraud Detection. In the Fifth ACM SIGKDD International Conference on W. Klosgen & J. Zytkow (Eds.), Handbook of Data Knowledge Discovery and Data Mining (pp. 409-413). Mining and Knowledge Discovery (pp. 726-731). New New York: ACM Press. York: Oxford University Press. Rosset, S., Neumann, E., Eick, U., & Vatnik (2003). Freeman, E., & Melli, G. (2006). Championing of Customer lifetime value models for decision support. an LTV model at LTC. SIGKDD Explorations, 8(1), Data Mining and Knowledge Discovery, 7(3), 321- 27-32. 339. Getoor, L., & Diehl, C.P. (2005). Link mining: A survey. Sasisekharan, R., Seshadri, V., & Weiss, S (1996). Data SIGKDD Explorations, 7(2), 3-12. mining and forecasting in large-scale telecommunica- Hill, S., Provost, F., & Volinsky, C. (2006). Network- tion networks. IEEE Expert, 11(1), 37-43. based marketing: Identifying likely adopters via con- Sterritt, R., Adamson, K., Shapcott, C., & Curran, E. sumer networks. Statistical Science, 21(2), 256-276. (2000). Parallel data mining of Bayesian networks from Kaplan, H., Strauss, M., & Szegedy, M. (1999). Just telecommunication network data. Proceedings of the the fax—differentiating voice and fax phone lines us- 14th International Parallel and Distributed Processing ing call billing data. Proceedings of the Tenth Annual Symposium, IEEE Computer Society. ACM-SIAM Symposium on Discrete Algorithms (pp. Wei, C., & Chiu, I (2002). Turning telecommunications 935-936). Philadelphia, PA: Society for Industrial and call details to churn prediction: A data mining approach. Applied Mathematics. Expert Systems with Applications, 23(2), 103-112. Klemettinen, M., Mannila, H., & Toivonen, H. (1999). Weiss, G., & Hirsh, H (1998). Learning to predict rare Rule discovery in telecommunication alarm data. events in event sequences. In R. Agrawal & P. Stolorz Journal of Network and Systems Management, 7(4), (Eds.), Proceedings of the Fourth International Confer- 395-423. ence on Knowledge Discovery and Data Mining (pp. Krikke, J. (2006). Intelligent surveillance empowers 359-363). Menlo Park, CA: AAAI Press. security analysts. IEEE Intelligent Systems, 21(3), 102-104. 490
  • 6. Data Mining in the Telecommunications Industry Weiss, G., Ros, J., & Singhal, A. (1998). ANSWER: Call Detail Record: Contains the descriptive in- Network monitoring using object-oriented rule. Pro- formation associated with a single phone call. D ceedings of the Tenth Conference on Innovative Ap- Churn: Customer attrition. Churn prediction refers plications of Artificial Intelligence (pp. 1087-1093). to the ability to predict that a customer will leave a Menlo Park: AAAI Press. company before the change actually occurs. Weiss, G. (2004). Mining with rarity: A unifying frame- Signature: A summary description of a subset of work. SIGKDD Explorations, 6(1), 7-19. data from a data stream that can be updated incremen- Winter Corporation (2003). 2003 Top 10 Award Win- tally and quickly. ners. Retrieved October 8, 2005, from http://www. Subscription Fraud: Occurs when a perpetrator wintercorp.com/VLDB/2003_TopTen_Survey/Top- opens an account with no intention of ever paying for Tenwinners.asp the services incurred. Superimposition Fraud: Occurs when a perpetra- tor gains illicit access to an account being used by a KEY TERMS legitimate customer and where fraudulent use is “su- perimposed” on top of legitimate use. Bayesian Belief Network: A model that predicts the probability of events or conditions based on causal Total Lifetime Value: The total net income a com- relations. pany can expect from a customer over time. 491