SlideShare uma empresa Scribd logo
1 de 7
Baixar para ler offline
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013

FRAUD DETECTION IN ELECTRIC POWER
DISTRIBUTION NETWORKS USING AN ANN-BASED
KNOWLEDGE-DISCOVERY PROCESS
Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras
O. Eler
PDITec, Belo Horizonte, Brazil – www.pditec.com.br

ABSTRACT
Nowadays the electric utilities have to handle problems with the non-technical losses caused by frauds and
thefts committed by some of their consumers. In order to minimize this, some methodologies have been
created to perform the detection of consumers that might be fraudsters. In this context, the use of
classification techniques can improve the hit rate of the fraud detection and increase the financial income.
This paper proposes the use of the knowledge-discovery in databases process based on artificial neural
networks applied to the classifying process of consumers to be inspected. An experiment performed in a
Brazilian electric power distribution company indicated an improvement of over 50% of the proposed
approach if compared to the previous methods used by that company.

KEYWORDS
Fraud Detection, Electric Power Distribution, Low-Voltage, KDD, Artificial Neural Networks

1.

INTRODUCTION

The losses in electric power distribution networks are a common reality for the electric utilities
and can be classified as technical or non-technical losses, where the technical loss corresponds to
the electrical energy dissipated between the energy supply and the delivery points of the
consumer. On the other hand, a non-technical loss is the difference between the total losses and
the technical losses, i.e. all other losses associated to the electric power distribution, such as
energy theft, metering errors, errors in the billing process, etc. This loss type is directly related to
the distributor’s commercial management [1].
One of the usual ways to decrease non-technical losses rates is to perform local inspections to
check if there are any thefts or hoaxes being committed by the consumers. For that purpose, the
field staff is responsible for creating a methodology to generate an inspection schedule with
suspected consumers. However, this task can be a too complex on due to the large amount of
existing consumers in an electric power distribution network. Whenever a field staff performs an
inspection, it returns only two possible outcomes: consumer is fraudster (there is irregularity) or
non-fraudster (there is no irregularity).
Therefore, whenever a fraud is found, appropriate measures are applied and the situation returns
to normal. Nevertheless, if a fraud is not found some costs such the inspector’s hours and vehicles
costs are unnecessarily spent, instead of being applied to inspect others consumers that are
committing energy frauds. Increasing the hit rate of identifying irregularities is needed to reduce
DOI : 10.5121/ijaia.2013.4602

17
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013

costs in the fraud detection, saving operational costs, even more importantly, identifying the
fraudsters of the distribution network.
Some studies address this kind of problem applying well known knowledge-discovery techniques.
Cabral et al. [2] [4] proposed a fraud detector for low and high voltage consumers. At that study,
a knowledge-discovery methodology with artificial intelligence for data preprocessing and
mining was successfully applied in different scenarios. Monedero et al. [3] developed an
approach to identify non-technical losses in electric power distribution networks using artificial
neural networks applied to the city of Seville (Spain), outperforming by over 50% precision
compared to the previous methodologies.
Muniz et al. [5] presented an approach combining artificial neural networks and neuron-fuzzy
systems to identify irregularities of the low voltage consumers. J. Nagi et al. [1] proposed a
framework to detect non-technical losses in power distribution companies using pattern
recognition by means of Support Vector Machines (SVM). E. W. S. dos Angelos [8] presented a
cluster-based classification strategy with an unsupervised algorithm of two steps to identify
suspected profiles of power consumption, providing a good assertiveness in real life systems.
Over the past years, other studies in this field have been addressed applying different
computational techniques to improve the detection of non-technical losses [9-13].
Although some studies have been conducted in the recent years in this field, the fraud rate in
Brazilian low-voltage consumers is still very high, especially in metropolitan areas. There are still
many electric power companies using only human expert knowledge to treat this problem and the
results has been unsatisfactory. This paper addresses the fraud detection problem in electric
power distribution networks (low-voltage consumers). Our methodology is based on the
knowledge-discovery process using artificial neural networks to classify consumers as fraudsters
or non-fraudsters. Thereafter a list of consumers classified as fraudsters is generated to help
perform the inspections with the main objective being the improvement of the hit rate of
inspections to reduce unnecessary operational costs.
This paper is structured as follows: the Section 2 presents the methodology used in the study. In
the Section 3, the experiment and results are shown, and Section 4 presents the final discussions
and conclusions.
2.

METHODOLOGY

Our methodology is based on knowledge-discovery in databases (KDD) process using data
mining, commonly used to extract previously unknown, non-trivial, and useful patterns from
different data sources. The steps are presented in the Figure 1.

18
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013

Figure 1: KDD process [6]

This process contains three main steps: cleaning and integration, selection and transformation,
and data mining. In this case, they were applied to select consumers for inspection against
consumption irregularities in electric power distribution networks. The development is presented
in the next subsections.

2.1. Cleaning and Integration
In this step, different data sources obtained from databases and text files were processed to
generate a single and consistent database. Initially, seven data sources were considered: (1)
database of all consumers and their socio-economic characteristics; (2) history of inspections; (3)
historical consumption; (4) history of services requested by clients; (5) history of ownership
exchanges; (6) history of queries debits; and (7) history of meter reading.
All data sources were integrated using a common key: consumer installation code. Thus, it was
possible to remove incorrect or duplicate records, generating a single database from the seven
sources, used as input for the data mining algorithm (training and classification). In this context,
only the records with valid inspection results (fraudsters or non-fraudsters) were considered.

2.2. Selection and Transformation
In this step, statistical techniques were applied for data selection and transformation. The attribute
selection was performed using a multivariate correlation analysis and Information Gain method,
generating some key attributes of the model as shown in the Table 1.

19
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013

Table 1: Key attributes of the model

#
1

Name
Location

Type
nominal

2

Business class

nominal

3

Activity type

nominal

4
5

Voltage
Number of
phases
Situation
Direct debit
Metering type
Mean
consumption
Service notes

nominal
nominal

nominal

12

Ownership
exchange
Query debits

nominal

13

Meter reading

nominal

14

Inspection
(output)

nominal

6
7
8
9
10

11

nominal
nominal
nominal
numeric
nominal

Description
Consumer location (city +
neighborhood)
Business class (e.g. residential,
industrial, commercial, among others)
Activity type (e.g. residence, drugstore,
bakery, public administration, among
others)
Consumer voltage (110v, 220v)
Number of phases (1, 2, 3)
Is consumer connected? (yes, no)
Type of direct debit
Type of electricity metering
Mean consumption during the previous
12 months
Are there service notes requested by
consumers during the previous 12
months? (Yes or no)
Are there ownership exchanges during
the previous 12 months? (Yes or no)
Are there query debits during the
previous 12 months? (Yes or no)
Are there meter reading notes during the
previous 12 months? (Yes or no)
The inspection result (Fraudster or Nonfraudster)

The data transformation was performed by normalization methods. The Min-Max method was
used to transform the single numerical attribute Mean Consumption, wherein each value is
converted to a range between 0 and 1. For the nominal attributes, two distinct methods were
applied: if the number of distinct values was less than five, the Binary method was applied;
otherwise the One-of-N method was used.

2.3. Data Mining
The problem addressed here is a supervised classification one because there are real inspections’
records to be used for the training step. An artificial neural network/multilayer perceptron (ANNMLP) was used for the dataset training and classification. According to Haykin [7], the MLPs
using the Backpropagation algorithm have been successfully applied to solve many similar
complex problems, such as pattern recognition, classification, data preprocessing, etc. Therefore,
we applied it to solve our problem with the configurations that are presented here.
The ANN-MLP’s architecture used three layers: input, hidden and output layers. The input layer
contains n neurons, wherein n is equal to number of bits of the normalized values. In the hidden
layer, the number of neurons is equal to the geometrical mean between the numbers of neurons in
the input and output layers [14]. The output layer contains a single neuron to classify the
consumers as fraudsters or non-fraudsters.
20
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013

After defining the architecture, the data mining algorithm was coded using the
training/test/validation schema and the stratified k-Fold Cross Validation method. In this case, k is
equal to 10, i.e., at the each iteration the dataset is randomly partitioned into 10 equal size subsets.
Of the 10 subsets, 9 are used as training data and a single is used for testing the model. This
process is then repeated k times, and the average error is calculated. This implementation requires
parameter settings to perform the experiments successfully.
3.

RESULTS AND DISCUSSION

The evaluation of the results was performed using a confusion matrix that compares real
inspections to classified inspections by the ANN-MLP, indicating four result types: (1) TP is a
fraudster consumer correctly classified as fraudster; (2) FN is a fraudster consumer incorrectly
classified as non-fraudster; (3) FP is a consumer non-fraudster incorrectly classified as fraudster;
(4) TN is a consumer non-fraudster correctly classified as non-fraudster.
Some measures are used to check the classifier efficiency: hit rate is the percentage of records
correctly classified; recall: TP / (TP+FN); precision = TP / (TP+FP); True Positive rate = TP /
(TP+FN); and False Positive rate = FP / (FP+TN).
This approach was evaluated using real data from a Brazilian electric power distribution company
with more than seven million consumers. A lot of inspections conducted in the metropolitan
region of Minas Gerais (Brazil) during the year 2011 were used to supply the KDD process.
The first step of our methodology was applied to clean and integrate databases and text files,
generating a single data source with 22,848 records. After that, the second step was applied to
select key attributes of the model and transform data to feed the ANN-MLP. Each record
consisted of thirteen input attributes and one output attribute indicating fraud or not, as seen in the
Section 2.2.
After that, the ANN-MLP was trained with Backpropagation learning algorithm using the
Logistic activation function for all neurons. The training step was performed for 500 epochs, with
learning rate equals to 0.01, momentum factor equals to 0.95, activation threshold equals to 0.5,
and maximum error equals to 0.001. The parameter settings were empirically defined.
The algorithm was performed in 903 seconds on a personal computer and the Table 2 shows the
classification results.
Table 2: Confusion matrix with the classification results

Fraudster (a)

Non-fraudster (b)

Classified as (a)

945

508

Classified as (b)

2261

17869

The classification results were obtained with an error rate of 0.3562 after all epochs, and the
performance measures were calculated from the confusion matrix. The accuracy is equal to
87.17%, precision is equal to 65.03% and recall is equal to 29.47%.
21
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013

After obtaining the classification results, a list of inspections is generated to check if consumers
are fraudsters. The precision indicates the percentage of consumers correctly inspected by the
staff team. In this context, increasing of hit rate is essential to improve the number of fraudsters
found, and reducing the number of non-fraudsters inspected. This directly implies two
consequences: recovery of financial income and reducing of unnecessary operational costs.
According to the field measurement, currently the same electric power distribution company has
obtained a precision around 40% using different methods. Our methodology improved this
number to 65.03%, outperforming in 50% the current scenario. In other words, as previously
seen, we obtained 22,848 records of inspections from the year 2011, i.e. only 9,139 consumers
were correctly inspected by the company. At the same time, our methodology could obtain a
precision of 14,858 consumers – a difference of 5719 per year.
That difference of fraudsters consumers found represents the recovery of financial income in
three senses: (1) the recovery of value defrauded applying fines according to the fraud period, (2)
normalization of the consumers' status and the regularization of their consumption, and (3)
unnecessary expenses in incorrect inspections were avoided by reducing expenses with
employees, transportation costs and fuel consumption.
Another important point is the use of the k-Fold Cross Validation method, whose goal is to
prevent the ANN-MLP of over-fitting the training data. This allows the learning to be used for the
classification of consumers outside the training set (the real world case).
Although the algorithm used the stratified validation method, the unbalanced class problem is a
weakness that must be better handled, since the learning systems can find difficulties in the
classification of records belonging to minority class when the class distribution is unequal. This is
a need for the next steps of development of our methodology.
4.

CONCLUSION

This paper proposed the use of the KDD process for fraud detection in the power distribution
network (low-voltage consumers). For this purpose, a methodology with data preprocessing and
mining using artificial neural networks was applied. The results of the experiment indicated that
is possible to improve the hit rate obtained by the methods used currently for the classification of
fraudsters.
The experiment was performed using only historical data and a deeper evaluation in field is
needed to ensure the efficiency of our approach. Although this has not been done, we used the kFold Cross Validation method for assessing how the results will generalize to an independent data
set. Furthermore, we intend on testing different supervised classification techniques and compare
theirs results.

22
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013

REFERENCES
[1]
[2]

[3]

[4]
[5]
[6]
[7]
[8]

[9]

[10]

[11]
[12]
[13]
[14]

J. Nagi et al., (2010) “Nontechnical Loss Detection for Metered Customers in Power Utility Using
Support Vector Machines”, IEEE Transactions on Power Delivery, Vol. 25, N. 2, Apr 2010.
Cabral et al., (2004) “Fraud Detection in Electrical Energy Consumers Using Rough Sets”, 2004
IEEE International Conference on Systems, Man and Cybernetics: The Hague, Netherlands, Oct 1013.
Monedero et al., (2006) “MIDAS Detection of Non-technical Losses in Electrical Consumption Using
Neural Networks and Statistical Techniques”, In proceeding of: Computational Science and Its
Applications - ICCSA 2006, International Conference, Glasgow, UK, May 8-11.
Cabral et al., (2009) “Fraud Detection System for High and Low Voltage Electricity Consumers
Based on Data Mining”, Power and Energy Society General Meeting, Calgary, Jul 26-30.
Muniz et al., (2009) “A Neuron-fuzzy System for Fraud Detection in Electricity Distribution”,
Proceedings of the IFSA-EUSFLAT 2009, Lisbon, Portugal, Jul 20-24.
J. Han, M. Kamber, J. Pei, (2011) “Data mining: concepts and techniques”, Morgan Kaufmann, 3rd
ed.
S. Haykin, (1999) “Neural Networks: A Comprehensive Foundation”, Prentice Hall International, 2nd
ed.
E. W. S. dos Angelos et al., (2011) “Detection and Identification of Abnormalities in Customer
Consumptions in Power Distribution Systems”, IEEE Transactions on Power Delivery, Vol. 26, N. 4,
Oct 2011.
C. C. O. Ramos et al., (2012) “New Insights on Nontechnical Losses Characterization Through
Evolutionary-Based Feature Selection”, IEEE Transactions on Power Delivery, Vol. 27, N. 1, Jan
2012.
A. H. Nizar, Z. Y. Dong, Y. Wang, (2008) "Power Utility Nontechnical Loss Analysis With Extreme
Learning Machine Method", IEEE Transactions on Power Systems, Vol. 23, N. 3, pp.946-955, Aug
2008.
J. Nagi et al., (2008) "Detection of abnormalities and electricity theft using genetic Support Vector
Machines" TENCON 2008 - 2008 IEEE Region 10 Conference, pp.1-6, Nov 2008.
J. Nagi et al., (2008) "Non-Technical Loss analysis for detection of electricity theft using support
vector machines", IEEE 2nd International Power and Energy Conference, pp.907-912, 1-3 Dec 2008.
A. C. Ramos et al., (2011) "A New Approach for Nontechnical Losses Detection Based on OptimumPath Forest", IEEE Transactions on Power Systems, Vol. 26, N. 1, pp.181-189, Feb 2011.
M. Negnevitsky, “Artificial Intelligence: A Guide to Intelligent Systems”, Pearson Education Canada,
3rd ed.

23

Mais conteúdo relacionado

Mais procurados

PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
 
A Review of anomaly detection techniques in advanced metering infrastructure
A Review of anomaly detection techniques in advanced metering infrastructureA Review of anomaly detection techniques in advanced metering infrastructure
A Review of anomaly detection techniques in advanced metering infrastructurejournalBEEI
 
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...IRJET Journal
 
Optimization of network traffic anomaly detection using machine learning
Optimization of network traffic anomaly detection using machine learning Optimization of network traffic anomaly detection using machine learning
Optimization of network traffic anomaly detection using machine learning IJECEIAES
 
IRJET - Disease Detection in Plant using Machine Learning
IRJET -  	  Disease Detection in Plant using Machine LearningIRJET -  	  Disease Detection in Plant using Machine Learning
IRJET - Disease Detection in Plant using Machine LearningIRJET Journal
 
IRJET- Texture based Features Approach for Crop Diseases Classification and D...
IRJET- Texture based Features Approach for Crop Diseases Classification and D...IRJET- Texture based Features Approach for Crop Diseases Classification and D...
IRJET- Texture based Features Approach for Crop Diseases Classification and D...IRJET Journal
 
Study of Data Mining Methods and its Applications
Study of  Data Mining Methods and its ApplicationsStudy of  Data Mining Methods and its Applications
Study of Data Mining Methods and its ApplicationsIRJET Journal
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET Journal
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...IRJET Journal
 
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERSN ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERScsandit
 
A Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithmA Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
 
Design and implementation for
Design and implementation forDesign and implementation for
Design and implementation forIJDKP
 
Feature selection for multiple water quality status: integrated bootstrapping...
Feature selection for multiple water quality status: integrated bootstrapping...Feature selection for multiple water quality status: integrated bootstrapping...
Feature selection for multiple water quality status: integrated bootstrapping...IJECEIAES
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
 
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET Journal
 
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...Mining Frequent Patterns and Associations from the Smart meters using Bayesia...
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...Eswar Publications
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct DiscriminationEditor IJCATR
 

Mais procurados (20)

PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
A Review of anomaly detection techniques in advanced metering infrastructure
A Review of anomaly detection techniques in advanced metering infrastructureA Review of anomaly detection techniques in advanced metering infrastructure
A Review of anomaly detection techniques in advanced metering infrastructure
 
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
 
Optimization of network traffic anomaly detection using machine learning
Optimization of network traffic anomaly detection using machine learning Optimization of network traffic anomaly detection using machine learning
Optimization of network traffic anomaly detection using machine learning
 
IRJET - Disease Detection in Plant using Machine Learning
IRJET -  	  Disease Detection in Plant using Machine LearningIRJET -  	  Disease Detection in Plant using Machine Learning
IRJET - Disease Detection in Plant using Machine Learning
 
IRJET- Texture based Features Approach for Crop Diseases Classification and D...
IRJET- Texture based Features Approach for Crop Diseases Classification and D...IRJET- Texture based Features Approach for Crop Diseases Classification and D...
IRJET- Texture based Features Approach for Crop Diseases Classification and D...
 
Study of Data Mining Methods and its Applications
Study of  Data Mining Methods and its ApplicationsStudy of  Data Mining Methods and its Applications
Study of Data Mining Methods and its Applications
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer Prediction
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERSN ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
 
A Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithmA Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithm
 
Design and implementation for
Design and implementation forDesign and implementation for
Design and implementation for
 
Feature selection for multiple water quality status: integrated bootstrapping...
Feature selection for multiple water quality status: integrated bootstrapping...Feature selection for multiple water quality status: integrated bootstrapping...
Feature selection for multiple water quality status: integrated bootstrapping...
 
Ijcet 06 07_004
Ijcet 06 07_004Ijcet 06 07_004
Ijcet 06 07_004
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
 
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
 
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...Mining Frequent Patterns and Associations from the Smart meters using Bayesia...
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct Discrimination
 

Semelhante a Fraud detection in electric power distribution networks using an ann based knowledge-discovery process

A Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionA Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionIRJET Journal
 
Anomalies detection for smart-home energy forecasting using moving average
Anomalies detection for smart-home energy forecasting using  moving averageAnomalies detection for smart-home energy forecasting using  moving average
Anomalies detection for smart-home energy forecasting using moving averageIJECEIAES
 
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...IRJET Journal
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfShrutiGarg649495
 
Case Studies IN Machine Learning
Case Studies IN Machine Learning Case Studies IN Machine Learning
Case Studies IN Machine Learning HIMADRI BANERJI
 
Life and science journal.pdf
Life and science journal.pdfLife and science journal.pdf
Life and science journal.pdfSarita30844
 
Empirical analysis of ensemble methods for the classification of robocalls in...
Empirical analysis of ensemble methods for the classification of robocalls in...Empirical analysis of ensemble methods for the classification of robocalls in...
Empirical analysis of ensemble methods for the classification of robocalls in...IJECEIAES
 
Tax Prediction Using Machine Learning
Tax Prediction Using Machine LearningTax Prediction Using Machine Learning
Tax Prediction Using Machine LearningIRJET Journal
 
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...IRJET Journal
 
A rule-based machine learning model for financial fraud detection
A rule-based machine learning model for financial fraud detectionA rule-based machine learning model for financial fraud detection
A rule-based machine learning model for financial fraud detectionIJECEIAES
 
IRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute ResolutionIRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute ResolutionIRJET Journal
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET Journal
 
MapReduce-iterative support vector machine classifier: novel fraud detection...
MapReduce-iterative support vector machine classifier: novel  fraud detection...MapReduce-iterative support vector machine classifier: novel  fraud detection...
MapReduce-iterative support vector machine classifier: novel fraud detection...IJECEIAES
 
IRJET- GDPS - General Disease Prediction System
IRJET- GDPS - General Disease Prediction SystemIRJET- GDPS - General Disease Prediction System
IRJET- GDPS - General Disease Prediction SystemIRJET Journal
 
Predicting reaction based on customer's transaction using machine learning a...
Predicting reaction based on customer's transaction using  machine learning a...Predicting reaction based on customer's transaction using  machine learning a...
Predicting reaction based on customer's transaction using machine learning a...IJECEIAES
 
IRJET - Automated Water Meter: Prediction of Bill for Water Conservation
IRJET - Automated Water Meter: Prediction of Bill for Water ConservationIRJET - Automated Water Meter: Prediction of Bill for Water Conservation
IRJET - Automated Water Meter: Prediction of Bill for Water ConservationIRJET Journal
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
 
Detecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency DataDetecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency DataITIIIndustries
 
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERNEVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERNIRJET Journal
 

Semelhante a Fraud detection in electric power distribution networks using an ann based knowledge-discovery process (20)

A Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionA Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud Detection
 
Anomalies detection for smart-home energy forecasting using moving average
Anomalies detection for smart-home energy forecasting using  moving averageAnomalies detection for smart-home energy forecasting using  moving average
Anomalies detection for smart-home energy forecasting using moving average
 
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
 
Case Studies IN Machine Learning
Case Studies IN Machine Learning Case Studies IN Machine Learning
Case Studies IN Machine Learning
 
Life and science journal.pdf
Life and science journal.pdfLife and science journal.pdf
Life and science journal.pdf
 
Empirical analysis of ensemble methods for the classification of robocalls in...
Empirical analysis of ensemble methods for the classification of robocalls in...Empirical analysis of ensemble methods for the classification of robocalls in...
Empirical analysis of ensemble methods for the classification of robocalls in...
 
Tax Prediction Using Machine Learning
Tax Prediction Using Machine LearningTax Prediction Using Machine Learning
Tax Prediction Using Machine Learning
 
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
 
A rule-based machine learning model for financial fraud detection
A rule-based machine learning model for financial fraud detectionA rule-based machine learning model for financial fraud detection
A rule-based machine learning model for financial fraud detection
 
IRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute ResolutionIRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute Resolution
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation Forest
 
MapReduce-iterative support vector machine classifier: novel fraud detection...
MapReduce-iterative support vector machine classifier: novel  fraud detection...MapReduce-iterative support vector machine classifier: novel  fraud detection...
MapReduce-iterative support vector machine classifier: novel fraud detection...
 
IRJET- GDPS - General Disease Prediction System
IRJET- GDPS - General Disease Prediction SystemIRJET- GDPS - General Disease Prediction System
IRJET- GDPS - General Disease Prediction System
 
Predicting reaction based on customer's transaction using machine learning a...
Predicting reaction based on customer's transaction using  machine learning a...Predicting reaction based on customer's transaction using  machine learning a...
Predicting reaction based on customer's transaction using machine learning a...
 
IRJET - Automated Water Meter: Prediction of Bill for Water Conservation
IRJET - Automated Water Meter: Prediction of Bill for Water ConservationIRJET - Automated Water Meter: Prediction of Bill for Water Conservation
IRJET - Automated Water Meter: Prediction of Bill for Water Conservation
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
CREDIT_CARD.ppt
CREDIT_CARD.pptCREDIT_CARD.ppt
CREDIT_CARD.ppt
 
Detecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency DataDetecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency Data
 
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERNEVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERN
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Fraud detection in electric power distribution networks using an ann based knowledge-discovery process

  • 1. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013 FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte, Brazil – www.pditec.com.br ABSTRACT Nowadays the electric utilities have to handle problems with the non-technical losses caused by frauds and thefts committed by some of their consumers. In order to minimize this, some methodologies have been created to perform the detection of consumers that might be fraudsters. In this context, the use of classification techniques can improve the hit rate of the fraud detection and increase the financial income. This paper proposes the use of the knowledge-discovery in databases process based on artificial neural networks applied to the classifying process of consumers to be inspected. An experiment performed in a Brazilian electric power distribution company indicated an improvement of over 50% of the proposed approach if compared to the previous methods used by that company. KEYWORDS Fraud Detection, Electric Power Distribution, Low-Voltage, KDD, Artificial Neural Networks 1. INTRODUCTION The losses in electric power distribution networks are a common reality for the electric utilities and can be classified as technical or non-technical losses, where the technical loss corresponds to the electrical energy dissipated between the energy supply and the delivery points of the consumer. On the other hand, a non-technical loss is the difference between the total losses and the technical losses, i.e. all other losses associated to the electric power distribution, such as energy theft, metering errors, errors in the billing process, etc. This loss type is directly related to the distributor’s commercial management [1]. One of the usual ways to decrease non-technical losses rates is to perform local inspections to check if there are any thefts or hoaxes being committed by the consumers. For that purpose, the field staff is responsible for creating a methodology to generate an inspection schedule with suspected consumers. However, this task can be a too complex on due to the large amount of existing consumers in an electric power distribution network. Whenever a field staff performs an inspection, it returns only two possible outcomes: consumer is fraudster (there is irregularity) or non-fraudster (there is no irregularity). Therefore, whenever a fraud is found, appropriate measures are applied and the situation returns to normal. Nevertheless, if a fraud is not found some costs such the inspector’s hours and vehicles costs are unnecessarily spent, instead of being applied to inspect others consumers that are committing energy frauds. Increasing the hit rate of identifying irregularities is needed to reduce DOI : 10.5121/ijaia.2013.4602 17
  • 2. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013 costs in the fraud detection, saving operational costs, even more importantly, identifying the fraudsters of the distribution network. Some studies address this kind of problem applying well known knowledge-discovery techniques. Cabral et al. [2] [4] proposed a fraud detector for low and high voltage consumers. At that study, a knowledge-discovery methodology with artificial intelligence for data preprocessing and mining was successfully applied in different scenarios. Monedero et al. [3] developed an approach to identify non-technical losses in electric power distribution networks using artificial neural networks applied to the city of Seville (Spain), outperforming by over 50% precision compared to the previous methodologies. Muniz et al. [5] presented an approach combining artificial neural networks and neuron-fuzzy systems to identify irregularities of the low voltage consumers. J. Nagi et al. [1] proposed a framework to detect non-technical losses in power distribution companies using pattern recognition by means of Support Vector Machines (SVM). E. W. S. dos Angelos [8] presented a cluster-based classification strategy with an unsupervised algorithm of two steps to identify suspected profiles of power consumption, providing a good assertiveness in real life systems. Over the past years, other studies in this field have been addressed applying different computational techniques to improve the detection of non-technical losses [9-13]. Although some studies have been conducted in the recent years in this field, the fraud rate in Brazilian low-voltage consumers is still very high, especially in metropolitan areas. There are still many electric power companies using only human expert knowledge to treat this problem and the results has been unsatisfactory. This paper addresses the fraud detection problem in electric power distribution networks (low-voltage consumers). Our methodology is based on the knowledge-discovery process using artificial neural networks to classify consumers as fraudsters or non-fraudsters. Thereafter a list of consumers classified as fraudsters is generated to help perform the inspections with the main objective being the improvement of the hit rate of inspections to reduce unnecessary operational costs. This paper is structured as follows: the Section 2 presents the methodology used in the study. In the Section 3, the experiment and results are shown, and Section 4 presents the final discussions and conclusions. 2. METHODOLOGY Our methodology is based on knowledge-discovery in databases (KDD) process using data mining, commonly used to extract previously unknown, non-trivial, and useful patterns from different data sources. The steps are presented in the Figure 1. 18
  • 3. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013 Figure 1: KDD process [6] This process contains three main steps: cleaning and integration, selection and transformation, and data mining. In this case, they were applied to select consumers for inspection against consumption irregularities in electric power distribution networks. The development is presented in the next subsections. 2.1. Cleaning and Integration In this step, different data sources obtained from databases and text files were processed to generate a single and consistent database. Initially, seven data sources were considered: (1) database of all consumers and their socio-economic characteristics; (2) history of inspections; (3) historical consumption; (4) history of services requested by clients; (5) history of ownership exchanges; (6) history of queries debits; and (7) history of meter reading. All data sources were integrated using a common key: consumer installation code. Thus, it was possible to remove incorrect or duplicate records, generating a single database from the seven sources, used as input for the data mining algorithm (training and classification). In this context, only the records with valid inspection results (fraudsters or non-fraudsters) were considered. 2.2. Selection and Transformation In this step, statistical techniques were applied for data selection and transformation. The attribute selection was performed using a multivariate correlation analysis and Information Gain method, generating some key attributes of the model as shown in the Table 1. 19
  • 4. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013 Table 1: Key attributes of the model # 1 Name Location Type nominal 2 Business class nominal 3 Activity type nominal 4 5 Voltage Number of phases Situation Direct debit Metering type Mean consumption Service notes nominal nominal nominal 12 Ownership exchange Query debits nominal 13 Meter reading nominal 14 Inspection (output) nominal 6 7 8 9 10 11 nominal nominal nominal numeric nominal Description Consumer location (city + neighborhood) Business class (e.g. residential, industrial, commercial, among others) Activity type (e.g. residence, drugstore, bakery, public administration, among others) Consumer voltage (110v, 220v) Number of phases (1, 2, 3) Is consumer connected? (yes, no) Type of direct debit Type of electricity metering Mean consumption during the previous 12 months Are there service notes requested by consumers during the previous 12 months? (Yes or no) Are there ownership exchanges during the previous 12 months? (Yes or no) Are there query debits during the previous 12 months? (Yes or no) Are there meter reading notes during the previous 12 months? (Yes or no) The inspection result (Fraudster or Nonfraudster) The data transformation was performed by normalization methods. The Min-Max method was used to transform the single numerical attribute Mean Consumption, wherein each value is converted to a range between 0 and 1. For the nominal attributes, two distinct methods were applied: if the number of distinct values was less than five, the Binary method was applied; otherwise the One-of-N method was used. 2.3. Data Mining The problem addressed here is a supervised classification one because there are real inspections’ records to be used for the training step. An artificial neural network/multilayer perceptron (ANNMLP) was used for the dataset training and classification. According to Haykin [7], the MLPs using the Backpropagation algorithm have been successfully applied to solve many similar complex problems, such as pattern recognition, classification, data preprocessing, etc. Therefore, we applied it to solve our problem with the configurations that are presented here. The ANN-MLP’s architecture used three layers: input, hidden and output layers. The input layer contains n neurons, wherein n is equal to number of bits of the normalized values. In the hidden layer, the number of neurons is equal to the geometrical mean between the numbers of neurons in the input and output layers [14]. The output layer contains a single neuron to classify the consumers as fraudsters or non-fraudsters. 20
  • 5. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013 After defining the architecture, the data mining algorithm was coded using the training/test/validation schema and the stratified k-Fold Cross Validation method. In this case, k is equal to 10, i.e., at the each iteration the dataset is randomly partitioned into 10 equal size subsets. Of the 10 subsets, 9 are used as training data and a single is used for testing the model. This process is then repeated k times, and the average error is calculated. This implementation requires parameter settings to perform the experiments successfully. 3. RESULTS AND DISCUSSION The evaluation of the results was performed using a confusion matrix that compares real inspections to classified inspections by the ANN-MLP, indicating four result types: (1) TP is a fraudster consumer correctly classified as fraudster; (2) FN is a fraudster consumer incorrectly classified as non-fraudster; (3) FP is a consumer non-fraudster incorrectly classified as fraudster; (4) TN is a consumer non-fraudster correctly classified as non-fraudster. Some measures are used to check the classifier efficiency: hit rate is the percentage of records correctly classified; recall: TP / (TP+FN); precision = TP / (TP+FP); True Positive rate = TP / (TP+FN); and False Positive rate = FP / (FP+TN). This approach was evaluated using real data from a Brazilian electric power distribution company with more than seven million consumers. A lot of inspections conducted in the metropolitan region of Minas Gerais (Brazil) during the year 2011 were used to supply the KDD process. The first step of our methodology was applied to clean and integrate databases and text files, generating a single data source with 22,848 records. After that, the second step was applied to select key attributes of the model and transform data to feed the ANN-MLP. Each record consisted of thirteen input attributes and one output attribute indicating fraud or not, as seen in the Section 2.2. After that, the ANN-MLP was trained with Backpropagation learning algorithm using the Logistic activation function for all neurons. The training step was performed for 500 epochs, with learning rate equals to 0.01, momentum factor equals to 0.95, activation threshold equals to 0.5, and maximum error equals to 0.001. The parameter settings were empirically defined. The algorithm was performed in 903 seconds on a personal computer and the Table 2 shows the classification results. Table 2: Confusion matrix with the classification results Fraudster (a) Non-fraudster (b) Classified as (a) 945 508 Classified as (b) 2261 17869 The classification results were obtained with an error rate of 0.3562 after all epochs, and the performance measures were calculated from the confusion matrix. The accuracy is equal to 87.17%, precision is equal to 65.03% and recall is equal to 29.47%. 21
  • 6. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013 After obtaining the classification results, a list of inspections is generated to check if consumers are fraudsters. The precision indicates the percentage of consumers correctly inspected by the staff team. In this context, increasing of hit rate is essential to improve the number of fraudsters found, and reducing the number of non-fraudsters inspected. This directly implies two consequences: recovery of financial income and reducing of unnecessary operational costs. According to the field measurement, currently the same electric power distribution company has obtained a precision around 40% using different methods. Our methodology improved this number to 65.03%, outperforming in 50% the current scenario. In other words, as previously seen, we obtained 22,848 records of inspections from the year 2011, i.e. only 9,139 consumers were correctly inspected by the company. At the same time, our methodology could obtain a precision of 14,858 consumers – a difference of 5719 per year. That difference of fraudsters consumers found represents the recovery of financial income in three senses: (1) the recovery of value defrauded applying fines according to the fraud period, (2) normalization of the consumers' status and the regularization of their consumption, and (3) unnecessary expenses in incorrect inspections were avoided by reducing expenses with employees, transportation costs and fuel consumption. Another important point is the use of the k-Fold Cross Validation method, whose goal is to prevent the ANN-MLP of over-fitting the training data. This allows the learning to be used for the classification of consumers outside the training set (the real world case). Although the algorithm used the stratified validation method, the unbalanced class problem is a weakness that must be better handled, since the learning systems can find difficulties in the classification of records belonging to minority class when the class distribution is unequal. This is a need for the next steps of development of our methodology. 4. CONCLUSION This paper proposed the use of the KDD process for fraud detection in the power distribution network (low-voltage consumers). For this purpose, a methodology with data preprocessing and mining using artificial neural networks was applied. The results of the experiment indicated that is possible to improve the hit rate obtained by the methods used currently for the classification of fraudsters. The experiment was performed using only historical data and a deeper evaluation in field is needed to ensure the efficiency of our approach. Although this has not been done, we used the kFold Cross Validation method for assessing how the results will generalize to an independent data set. Furthermore, we intend on testing different supervised classification techniques and compare theirs results. 22
  • 7. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 6, November 2013 REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] J. Nagi et al., (2010) “Nontechnical Loss Detection for Metered Customers in Power Utility Using Support Vector Machines”, IEEE Transactions on Power Delivery, Vol. 25, N. 2, Apr 2010. Cabral et al., (2004) “Fraud Detection in Electrical Energy Consumers Using Rough Sets”, 2004 IEEE International Conference on Systems, Man and Cybernetics: The Hague, Netherlands, Oct 1013. Monedero et al., (2006) “MIDAS Detection of Non-technical Losses in Electrical Consumption Using Neural Networks and Statistical Techniques”, In proceeding of: Computational Science and Its Applications - ICCSA 2006, International Conference, Glasgow, UK, May 8-11. Cabral et al., (2009) “Fraud Detection System for High and Low Voltage Electricity Consumers Based on Data Mining”, Power and Energy Society General Meeting, Calgary, Jul 26-30. Muniz et al., (2009) “A Neuron-fuzzy System for Fraud Detection in Electricity Distribution”, Proceedings of the IFSA-EUSFLAT 2009, Lisbon, Portugal, Jul 20-24. J. Han, M. Kamber, J. Pei, (2011) “Data mining: concepts and techniques”, Morgan Kaufmann, 3rd ed. S. Haykin, (1999) “Neural Networks: A Comprehensive Foundation”, Prentice Hall International, 2nd ed. E. W. S. dos Angelos et al., (2011) “Detection and Identification of Abnormalities in Customer Consumptions in Power Distribution Systems”, IEEE Transactions on Power Delivery, Vol. 26, N. 4, Oct 2011. C. C. O. Ramos et al., (2012) “New Insights on Nontechnical Losses Characterization Through Evolutionary-Based Feature Selection”, IEEE Transactions on Power Delivery, Vol. 27, N. 1, Jan 2012. A. H. Nizar, Z. Y. Dong, Y. Wang, (2008) "Power Utility Nontechnical Loss Analysis With Extreme Learning Machine Method", IEEE Transactions on Power Systems, Vol. 23, N. 3, pp.946-955, Aug 2008. J. Nagi et al., (2008) "Detection of abnormalities and electricity theft using genetic Support Vector Machines" TENCON 2008 - 2008 IEEE Region 10 Conference, pp.1-6, Nov 2008. J. Nagi et al., (2008) "Non-Technical Loss analysis for detection of electricity theft using support vector machines", IEEE 2nd International Power and Energy Conference, pp.907-912, 1-3 Dec 2008. A. C. Ramos et al., (2011) "A New Approach for Nontechnical Losses Detection Based on OptimumPath Forest", IEEE Transactions on Power Systems, Vol. 26, N. 1, pp.181-189, Feb 2011. M. Negnevitsky, “Artificial Intelligence: A Guide to Intelligent Systems”, Pearson Education Canada, 3rd ed. 23