1. CRIME ANALYSIS AND PREDICTION USING
DATA MINING
CHETAN HIREHOLI,
M.TECH, SOFTWARE ENGINEERING
2. Data Mining, what is it?
Data mining is about finding
new information in a lot of data.
• Generally, data mining (sometimes called data
or knowledge discovery) is the process of
analyzing data from different perspectives and
summarizing it into useful information -
information that can be used to increase
revenue, cuts costs, or both.
• Data mining software is one of a
number of analytical tools for analyzing
data.
3. Timeline
John W.
Tukey-
Exploratory
Data Analysis,
1962
Gregory Piatetsky- Shapiro organizes and
chairs the first Knowledge Discovery in
Databases (KDD) workshop, 1989
BusinessWeek publishe
s a cover story on
“Database Marketing”,
1994
For the first time, the
term “data science” is
included in the title of
the conference (“Data
science, classification,
and related methods”),
1996 by IFCS
The ability to take data—to be able to
understand it, to process it, to extract
value from it, to visualize it, to
communicate it—that’s going to be a
hugely important skill in the next
decades… - Hal Varian, Google’s Chief
Economist, 2009
4. Application and Trends…
Financial Data Analysis
Retail Industry
Telecommunication Industry
Biological Data Analysis
Other Scientific Applications
Intrusion Detection
5. Feel Good, Do Good!
“Crime Analysis and Prediction Using Data Mining”
Shiju Sathyadevan, Devan M.S and Surya Gangadharan. S, 2014 IEEE
6. Abstract
What is Crime analysis?- Crime analysis is a law enforcement function that involves systematic
analysis for identifying and analyzing patterns and trends in crime and disorder.
The proposed system has an approach between computer science and criminal justice to
develop a data mining procedure that can help solve crimes faster.
7. Introduction
It is only within the last few decades that the technology made spatial data
mining a practical solution for wide audiences of Law enforcement officials which
is affordable and available.
Huge chunks of data to be collected- web sites, news sites, blogs, social media,
RSS feeds etc.
So the main challenge in front of us is developing a better, efficient crime pattern
detection tool to identify crime patterns effectively.
8. Doing analysis is a hard job!
The reason for choosing this(Clustering):
Only known data present with us
Classification technique will not predict well
Also nature of crimes change over time
So in order to be able to detect newer and
unknown patterns in future, clustering
techniques work better.
9. Steps in doing Crime Analysis
Data Collection
Classification
Pattern
Prediction
Visualization
10. Related Work
Using Series Finder
will get me more
Films!
Series Finder for finding the patterns in burglary.
For achieving this they used the modus operandi of offender and they extracted
some crime patterns which were followed by offender.
The algorithm constructs modus operandi of the offender.
In your dreams…
You can’t catch
me!,
I’m KRISHH!
11. Methodology
Data Collection
Collecting data from various sources like news sites, blogs, social media,
RSS feeds etc.
But the data we got is ‘VERY UNSTRUCTURED’!, and how do we store it?!
The advantage of NoSQL database over SQL database is that it allows insertion
of data without a predefined schema.
Object-oriented programming- hence is easy to use and flexible.
Unlike SQL database it not need to know what we are storing in advance, specify
its size etc.
Okay! Enough of humor,
come lets get serious, and
look into how it
actually works!
12. Methodology
Classification
Naïve Bayes- a supervised learning method as well as a statistical method
The algorithm classifies a news article into a crime type to which it fits the
best Eg. "What is the probability that a crime document D belongs to a given
class C?“
Thomas Bayes
13. Methodology
Classification
Naïve Bayes has it’s advantages:
Simple, and converges quicker than logistic regression.
Compared to SVM (Support Vector Machine), it is easy to implement and comes with
high performance. Also in case of SVM as size of training set increases the speed of
execution decreases.
Works well for small amount of training to calculate the classification parameters.
Also it fixes the Zero-frequency problem!
14. Methodology
Classification
Using Naive Bayes algorithm we create a model by training crime data related to
vandalism, murder, robbery, burglary, sex abuse, gang rape, arson, armed robbery,
highway robbery, snatching etc.
Test results shows that Naive Bayes shows more than 90% accuracy!!
16. Methodology
Classification
Named Entity Recognition(NER)- also known as Entity Extraction
finds and classify elements in text into predefined categories such
as the person names, organizations, locations, date, time etc.
Sample NER
17. Methodology
Classification
Coreference Resolution- Find the referenced entities in a text.
Input: E.g.: A pillion bike rider snatched away a gold mangalsutra
worth Rs 85,000 of a 60-year-old woman
pedestrian in sector 19, Kharghar on Friday. The victim,
Shakuntala Mande, was walking towards a vegetable outlet
around 9.40am, when a bike came close to her and the pillion
rider snatched her mangalsutra. A robbery case has been
registered at Kharghar police station.
18. Methodology
Pattern Identification
Apriori algorithm- used to determine association rules which highlight general trends
The result of this phase is the crime pattern for a particular place.
After getting a general crime pattern for a place, when a new case arrives and if it follows
the same crime pattern then we can say that the area has a chance for crime occurrence.
Information regarding patterns helps police officials to facilitate resources in an effective
manner.
19. Methodology
Prediction
Decision tree- It is simple to understand and interpret!
Its robust nature and also it works well with large data
sets.
Root node
Leaf node
Splitting ?
20. Methodology
Visualization
A heat map which indicates level of activity, usually
darker colors to indicate low activity and brighter colors
to indicate high activity.
21. Methodology
Visualization
In the x-axis all main locations in India are
plotted whereas in y-axis the crime rate is
plotted.
The graph shows the regions which has
maximum crime rate.
The data plotted here is based on the historical
records.
22. Methodology
Visualization
Shows the rate/percentage of crime occurrence
in places like airport, temples, bus station,
railway stations, bank, casino, jewelry shops, bar,
ATM, airport, bus station, highways etc..
In the x axis the main spots like temple, bank,
bus station, railway station, ATM etc. are plotted
while in y-axis the rate of crime is plotted.
23. Future Work
Criminal Profiling
Helps the crime investigators to record the characteristics of criminals.
The main goal of doing criminal profiling is that:
To provide crime investigators with a social and psychological assessment of the offender
To evaluate belongings found in the possession of the offender.
For doing this, the maximum details of each criminals is collected from criminal records
and the modus operandi is found out
24. Future Work
Criminal Profiling
Sifting through each crime record after a particular crime occurrence is tedious task.
So instead we can use some visualization mechanisms to represent the criminal details in
a human understandable form.
26. Conclusion Data
Collection
• Web sites, news channels,
blogs, etc.
Classification
• Using Naïve Bayes theorem, a
predictor is created
Patten
Identification
• Apriori Algorithm
Prediction • Decision Tree
Visualization
• Neo4j
• GraphDB