SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
2020
Crime Data Analysis and
Prediction for City of
Los Angeles
HETA D PAREKH AND KALIKA SAXENA
1
Crime Data Analysis and Prediction for City of
Los Angeles
By Heta Parekh and Kalika Saxena
Abstract
Crime is one the most common social problems in the country impacting public welfare,
children development, and adult socio-economic status. Crime analysis and prevention is a
systematic approach for identifying and analyzing trends. With the advent of modern
technology, crime data analysts can help Law Enforcement agencies to prevent crime incidents.
This research paper represents an extensive proposal about the varying crimes in the city of
Los Angeles. The purpose of this data analysis is to be aware of the patterns in criminal manners
in order to anticipate crime activity and how can it be prevented. This paper uses techniques of
data mining and predictive analytics for investigating various crime trends and patterns. The
main objective of this paper is to provide year-based, location based, and crime-type based rate
and count, which may help the society to minimize the crime incidents. This research can
foresee regions which can have a high possibility for crime occurrence and represents crime
prone areas in the city. With the help of crime data analysis, officers can speed up the process
of solving crimes and spread awareness in the high prone crime areas.
Key words: Crime rate analysis, Data analytics, Data Visualizations, Crime Prediction, Time
Series Forecasting, Trend analysis
2
Introduction
Crimes are the most serious threat to the human society and is increasing now-a-days in most
of the cities in various parts of the world. Data mining algorithms plays an important role in
predicting the number and type of crime events likely to take place in the future. Also, various
other technologies and hi-tech methods may help Law Enforcement agencies to track the
pattern of crimes and predict the probability of crime incidents. Though predictions cannot be
100% accurate, yet the probability for its occurrence can be detected.
Crime incidents in a community are often based on many characteristics related to the
community and society. Such characteristics are — different races, different income groups,
different age groups, different sex groups, family structure (single, divorced, married, number
of kids), level of education, number of police officers allocated to a locality, number of
employed and unemployed people etc. Using data mining, one can classify localities based on
these parameters and identify the relationship with criminal activities.
In this paper, we are analyzing the crime data for City of Los Angeles. The primary objective
of this research study is to illustrate visualization, analysis and prediction of crimes in relation
to place and time. The utilization of prediction in crime analysis and exploration can make
crime free region. Our research can predict regions which are highly sensitive to criminal
activities based on various parameters. We have used predictive analysis to infer what areas of
the Los Angeles city will be more sensitive or less sensitive. Based on the study we can
recommend law enforcement officers to adopt measures accordingly.
Since this is a descriptive analysis, major focus of this paper would be composing data, slicing,
dicing, visualization and initiating insights. This paper covers the literature review and related
work to the paper (Section A), Data preparation and methodology (Section B), Research
3
hypothesis (Section C) and Data analysis and interpretation (Section D) and conclusion (Section
E).
Literature Review
Data mining is the process of analyzing large amounts of data to discover patterns,
relationships, and trends that cannot easily be discovered through slicing and dicing techniques.
These discoveries can then be utilized and applied to various datasets/domains in order to adopt
and implement better strategies. Law enforcement often uses these techniques to identify
patterns of crime in order to allocate resources to the areas that are most impacted by criminal
activity which not only helps to reduce crime but also the cost of enforcement in designated
zones. Agencies can also utilize these techniques to predict the probable new areas of crime
based on real time data.
Crime is dependent on various parameters such as average household income, literacy, poverty
etc. and thus it is important to understand the same to get clear insights. It is also important for
people of the country to be informed about the criminal activities going around them and hence
crime reports are published to inform citizens on timely basis.
Whereas our paper represents additional insights of future analysis and forecast of crime rate;
and how COVID-19 has affected the crime rate in Los Angeles in 2020 (Figure 5), Geospatial
data (Figure 1), types of weapons used in crimes (Figure 2), type of crimes common throughout
past 10 years (Figure 4) race and sex worst impacted by the crimes (Figure 6).
From the literature review, it can be deduced that crime data is increasing enormously and is
important to keep a track on it. Due to this need, it is important to find techniques which are
more efficient for accurate predictions.
Related Work
4
A related project on crime rate data analysis in Los Angeles, was presented on 24th
annual
student symposium where they drew some of the insights like total occurrences of each crime,
No. Of crimes in 2014 and per area with the help of HiveQL and Hadoop (Refer link 1).
Another related research project “2014 Chicago Crime data analysis” by Yawen Li (Refer link
2) lists about the top 10 and least 10 communities with crimes. Their research includes analysis
on the primary crime types, which hour of the day is the most dangerous, and the location
where highest crimes are performed. It also explains the relationship of crime and arrest data
of Chicago city in the year 2014.
Methodology
To make informed decisions for the Law Enforcement Agencies, it is required to handle data
strategically. Hence, to proceed with analytics, we have taken following steps –
Data collection
For this research paper, we have used Crime Data from 2010 to October 2020 for Los Angeles
City. The data has been extracted from data.lacity.org The dataset reflects the incidents of crime
and is provided by Los Angeles Police Department.
5
It has 2.12M rows and 28 columns consisting of victim’s age, weapon used, location of crime
etc. Data provided in this dataset is numeric, string and is normalized. Data in each instance
belong to different areas of City of Los Angeles.
Some of the information which is utilized in our analysis is as followed:
Column Name Description
Date OCC The timestamp at which the crime occurred.
Data Type: Floating Timestamp
Area Name of the geographic area
Data Type: Text
Crime Cd Indicates the crime committed
Data Type: Text
Vict Age Indicates the age of the victim
Data Type: Numeric
Vict Sex Indicates the sex of the victim
Data Type: Text
Vict Descent Indicates the race of the victim
Data Type: Text
Weapon Desc Weapon used for the crime
Data Type: Text
Location This gives the Street address of crime incident
Data Type: Text
Lat It gives the latitude/coordinates of the crime.
Data Type: Numeric
Long It gives the longitude/coordinates of the crime.
6
Data Type: Numeric
Household Income It gives the median household income of City
of Los Angeles.
Data Type: Numeric
Data Cleaning
The dataset used for this analysis has few instances which contain some missing values/null
values. In order to perform data processing, it is required to improve the data quality. There are
various techniques available to improve the data quality of which we have used Data Cleaning.
We have manually cleaned the data based on our understanding and intellect. We separated the
chosen attributes which do not contain the missing values. All the values in the dataset were
carefully copied to a new dataset and cross-checked multiple times to eradicate any chances of
errors.
Classification Analysis
Classification is a data mining function that assigns items in a collection to target categories or
classes. The major benefit of classification analysis can help us in identify patters and
predictions of each case in the data. In this paper, we have used a predictive analysis on a month
wise crime case in 2020. Because of the massive pandemic, many of the crimes have been
reduced as residents prefer to stay in home. Classification analysis can be categorized to two
major types: Discriminative and Generative. In this research, we have made use of the
discriminative classifier as it determines one class for each row of data. It tries to model
depending on the observed data and depends heavily on the quality of data in every case.
7
Pattern Identification and Association Analysis
For the purpose of identifying the pattern and association between crime rate and household
income, we have used association analysis. We have identified how the average household
income impacts the crime rate of an area. The result of this phase is the crime pattern for a
particular location. If a pattern is identified that low household income groups face more
chances of crime, it means that there is a probability for crime occurrence at low-income
groups. Information regarding patterns can help police officials to facilitate resources in an
effective manner. It can help in avoiding crime occurrence by providing more security in
suspected areas, fixing alarms and CCTV etc.
Time Series Analysis
Management talks about the future, so the more insights we obtain about the future, better
crime management will follow, provided our insights are accurate. Predictions are forecasted
from a dataset where all the prior observations are considered equally. Here, a time dimension
is considered as a direct order dependence between observations. It implies developing models
that best represent an observed time series in order to understand the underlying reasons. It can
be either univariate where a single observation is recorded over the equal time intervals in a
sequential manner; and multivariate where multiple observations are recorded for the same. In
this analysis, we use past data points as the basis for projecting the future. We have plotted
graph for consecutive years pre and post COVID-19 to identify the trend in crime rate. We
have used SAP Lumira for making of time series forecasting about the Crime rate in the coming
years.
8
Visualization
Decision making relies on data, which comes with an overwhelming velocity and volume, that
we can’t comprehend it without some layer of abstraction, such as a visual one. To have a better
understanding of the data, it is better to visualize it before drawing inferences. In this analysis,
crime prone areas are represented using Geomaps with crime density data. We have also used
line, tag cloud, scatter plot type of visuals to showcase a variety of data. Using these visuals,
we can infer and examine burst of information within no time.
It is advantageous to visualize data as one can analyze only the data which is significant to the
research. Out of range data is automatically discarded and hence does not make the audience
confused. By knowing the probable regions where crime incidents occur the most, preventive
actions can be taken.
Research Hypothesis
This section provides simple hypothesis for our research paper. A Hypothesis is a prediction,
but it involves more than an educated guess. In this, we will explore the effects on one variable
on another variable. A variable is a measurable factor that can be changed and manipulated in
ways that are observable. For our research we have formulated two research hypothesis which
will be proved by using Crime Data from the year 2010-2020 for the City of Los
Angeles.H1:Median household income of an area is associated to the number of crimes in city
9
of Los Angeles.
10
Figure 1 Geospatial map showing Association between household income and number of
crimes
Our hypothesis holds true as Association analysis accurately depicts the relationship between
the median household income and number of crimes recorded in the respective areas. It is found
that lower the median household income, higher the crime rate. In this hypothesis, median
household income is our independent variable and number of crimes is a dependent variable.
It is inferred that there is a direct relationship between the two variables and are inversely
proportional. Due to lower income groups, people are often drawn into theft, burglary and
making other criminal activities.
Based on this association Los Angeles Police Department can understand the probability of
suspected crime zones and hence can increase patrolling and can take stringent actions.
H2:
Victim’s age is related to the weapon used to perform crime.
11
Figure 2 : Use of weapons as per the victim age
The above hypothesis of the use of weapons according to the victim age holds true. Here, it
depicts a relationship of the use of weapons and the victim’s age. As, it can be inferred from
the visualization that weapons and objects vary from age to age. For the victims who are young
criminal try to use weapons which are quick and faster to respond like Gun/revolver/handgun.
12
On the other hand, for older age group lighter weapons are used such as threats, for enacting
the crime. In order to control the crime, law enforcement officers can make stricter rules for
purchasing of a gun and a revolver which will further reduce the use of it and thus resulting in
low crime rate.
For the purpose of this hypothesis, we have used scatter plot to get a relationship/ correlation
between the two variables. In this case, age is an independent variable whereas weapon used is
a dependent variable.
Research Questions
I. What is the trend in Crime rate for a decade from 2010 to 2020
Figure 3 Trend analysis of number of crimes by year
In this visual, we have used line chart type to depict the number of crime incidents over a period
of ten years i.e., from 2010 – 2020. Based on our analysis, we see that the number of crimes
has been declining with time. It is also to be noted that number of crimes considerably declined
13
from the year 2014 to 2015. Seasonality exhibits a degree of randomness; that is, it is not
identical for every year.
Using more parameters or multivariate time series analysis, one can investigate the possible
reasons of the sudden decline in crime incidents. We can infer from the visual that due to
outbreak of the pandemic, in the year 2020, crime rate has significantly dropped.
II. Which type of Crimes occur more commonly in all the years from 2015-2020?
Figure 4 Type of crimes reported
In this visual, we have used tag cloud to show the description of crime incident which is
observed to be the most common in the collected dataset. It is seen that robbery is the most
prevalent crime incident followed by Burglary and Vandalism.
This is a vital information for Law enforcement agencies as they can utilize this information to
prevent such cases by fixing more CCTV cameras, increase patrolling etc. Officers can also
14
intimidate people about the probability of type of crimes so that one can take probable measures
to prevent the same.
Data Analytics
I. Analysis of monthly 2020 crime data to observe the impact of COVID-19 and
prediction of crime rate post lockdown till 2021.
Figure 5 Analysis and Prediction of crime data during COVID-19 in 2020 and 2021
In this visual, we have used line graph to analyze the crime data during the pandemic/ lockdown
and have forecasted crime rate post lockdown.
We have used R-single exponential smoothing algorithm for forecasting the crime rate post
lockdown. In this time series analysis, we have used historical data points to forecast the future.
From the research, it can be inferred that during March-May 2020 the crimes rates are
tremendously low because of the “Shelter-in place” order by the government. Whereas from
June onwards when the city started to open, the number of crimes is also seen increasing but
15
comparatively less than past few years. Using the alpha value as 0.9, it shows the prediction of
same number in crime rate for the next year of 2021.
II. Gender worst impacted by crime incidents over a decade
Figure 6 Visual depicting worst impacted gender due to crimes from 2010 to 2020
16
In this visual, we have used stacked bar graph to visualize the gender worst affected by crimes
in city of Los Angeles in the period of 2010 – 2020. We have used victim’s sex data, number
of crimes and year of occurrence for this visualization.
From the visual, it is evident that females are consistently the worst affected by the crimes.
Using this data, we can suggest to the Law Enforcement Agencies to put a strict check on the
areas where these crimes occur, and females have to suffer. Police department should increase
patrolling and should keep a check on homelessness and shelter homes.
III. Analysis on Crime rate by area
Figure 7: Heat Map of areas with high to low crimes reported
The above heat map shows the number of crimes as per area name. Highest number of crimes
are reported from Southwest followed by 77th
Street, Mission, N Hollywood, Foothill and so
on. Measures like night patrolling and more awareness among residents can lead to low crime
rate and thus it should be incorporated by the officers.
17
Conclusion
We understand that Crime is one of the most sensitive part of our lives and structured
visualizations with well-planned expertise can play an important role in controlling the same.
Based on our analytics reports we can communicate to law-enforcement agencies to take
measures on how crime rate can be reduced.
Utilizing applications of data mining is a lengthy and tedious process in cases when large
volumes of data is to be handled. However, the precision that is gained during the process is
unbeatable. One could infer and create knowledge on how to slow down crime for the safety
and well-being of people.
We have determined the geographic locations where the intensity of crime is the highest along
with the crime patterns. There are other applications of data mining in the domain of law
enforcement such as determining criminal "hot spots", creating criminal profiles, and learning
crime trends. City data collected supports our hypothesis and hence our research work can
prove to be useful to the Law Enforcement Agencies. We found a close association between
household income of the area and number of crime incidents. Also, it was proved from our
hypothesis that the weapon chosen for performing a crime was highly correlated and dependent
on the age of the victim.
On the closing note, we want to say that data mining techniques can be used to reduce crimes
significantly. Law enforcement agencies can further take advantage of other parameters to
predict performance and association of crime activities.
18
References
(n.d.). Retrieved from https://data.lacity.org/A-Safe-City/Crime-Data-from-2010-to-
2019/63jg-8b9z
(n.d.). Retrieved from https://www.slideshare.net/Yawenli/2014-chicago(2)
(n.d.). Retrieved from https://www.slideshare.net/RamdharanDonda/crime-rate-data-analysis-
in-los-angeles(1)
Crime Data from 2010 to 2019. (n.d.). Retrieved from data.lacity.org: https://data.lacity.org/A-
Safe-City/Crime-Data-from-2010-to-2019/63jg-8b9z
Dahiya, M. (2017). Crime Data Investigating using Machine Learning Algorithms.
International Journal of Computational Intelligence Research.
Jones, N. K. (2020). Practical Analytics. Epistemy Press.
MacGregor, J. (n.d.). Predictive Analysis with SAP. Galileo Press.
Rizwan Iqbal1*, M. A. (n.d.). An Experimental Study of Classification. Indian Journal of
Science and Technology.
Rizwan Iqbal1*, M. A. (n.d.). An Experimental Study of Classification Algorithms for Crime
Prediction. Retrieved from Indian Journal of Science and Technology.
Shiju Sathyadevan, D. M. (2014). Crime Analysis and Prediction Using Data Mining. IEEE.
Shivam Maurya, S. M. (2018). Crime Prediction Using Data Analytics Tools; A Review.
Proceedings of IC4T. Lucknow.

Mais conteúdo relacionado

Mais procurados

Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
Reuben George
 
Concepts of Intelligence Led Policing
Concepts of Intelligence Led PolicingConcepts of Intelligence Led Policing
Concepts of Intelligence Led Policing
groundskeeper20
 

Mais procurados (20)

Crime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspectiveCrime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspective
 
Introduction to chicago crime data with hive and pig
Introduction to chicago crime data with hive and pigIntroduction to chicago crime data with hive and pig
Introduction to chicago crime data with hive and pig
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern Detection
 
PredPol: How Predictive Policing Works
PredPol: How Predictive Policing WorksPredPol: How Predictive Policing Works
PredPol: How Predictive Policing Works
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
CCTNS
CCTNS CCTNS
CCTNS
 
Predictive Policing
Predictive PolicingPredictive Policing
Predictive Policing
 
Introduction to Police Technology
Introduction to Police TechnologyIntroduction to Police Technology
Introduction to Police Technology
 
Crime analysis
Crime analysisCrime analysis
Crime analysis
 
Social Media Forensics
Social Media ForensicsSocial Media Forensics
Social Media Forensics
 
Dr.C.Muthuraja's "Community Policing in India"
Dr.C.Muthuraja's "Community Policing in India"Dr.C.Muthuraja's "Community Policing in India"
Dr.C.Muthuraja's "Community Policing in India"
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Organized Crime
Organized CrimeOrganized Crime
Organized Crime
 
Crime analysis
Crime analysisCrime analysis
Crime analysis
 
S.m.a.r.t. (Policing in Smart Cities)
S.m.a.r.t. (Policing in Smart Cities)S.m.a.r.t. (Policing in Smart Cities)
S.m.a.r.t. (Policing in Smart Cities)
 
E-mail Investigation
E-mail InvestigationE-mail Investigation
E-mail Investigation
 
Concepts of Intelligence Led Policing
Concepts of Intelligence Led PolicingConcepts of Intelligence Led Policing
Concepts of Intelligence Led Policing
 
Onlinecrime and New Cyber Laws in Pakistan
Onlinecrime  and New Cyber Laws in PakistanOnlinecrime  and New Cyber Laws in Pakistan
Onlinecrime and New Cyber Laws in Pakistan
 
Community Relations and Policing
Community Relations and Policing Community Relations and Policing
Community Relations and Policing
 
Email Forensics
Email ForensicsEmail Forensics
Email Forensics
 

Semelhante a Crime Data Analysis and Prediction for city of Los Angeles

Student #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxStudent #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docx
johniemcm5zt
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
Parang Saraf
 
1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx
1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx
1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx
drennanmicah
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docx
healdkathaleen
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docx
todd271
 
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
IJARIIT
 

Semelhante a Crime Data Analysis and Prediction for city of Los Angeles (20)

Crime prediction based on crime types
Crime prediction based on crime typesCrime prediction based on crime types
Crime prediction based on crime types
 
Crime
CrimeCrime
Crime
 
Student #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxStudent #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docx
 
An Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
An Intelligence Analysis of Crime Data for Law Enforcement Using Data MiningAn Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
An Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
 
Database and Analytics Programming - Project report
Database and Analytics Programming - Project reportDatabase and Analytics Programming - Project report
Database and Analytics Programming - Project report
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation Data
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysis
 
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
 
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
 
GEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCESGEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCES
 
1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx
1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx
1CONTEXTUAL THINKING ABOUT DIFFERENT SCENARIOS Scenario A L.docx
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data Analytics
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docx
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docx
 
Crime analysis of different situations
Crime analysis of different situationsCrime analysis of different situations
Crime analysis of different situations
 
Technical Seminar
Technical SeminarTechnical Seminar
Technical Seminar
 
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
 
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
 
report
reportreport
report
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Último (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 

Crime Data Analysis and Prediction for city of Los Angeles

  • 1. 2020 Crime Data Analysis and Prediction for City of Los Angeles HETA D PAREKH AND KALIKA SAXENA
  • 2. 1 Crime Data Analysis and Prediction for City of Los Angeles By Heta Parekh and Kalika Saxena Abstract Crime is one the most common social problems in the country impacting public welfare, children development, and adult socio-economic status. Crime analysis and prevention is a systematic approach for identifying and analyzing trends. With the advent of modern technology, crime data analysts can help Law Enforcement agencies to prevent crime incidents. This research paper represents an extensive proposal about the varying crimes in the city of Los Angeles. The purpose of this data analysis is to be aware of the patterns in criminal manners in order to anticipate crime activity and how can it be prevented. This paper uses techniques of data mining and predictive analytics for investigating various crime trends and patterns. The main objective of this paper is to provide year-based, location based, and crime-type based rate and count, which may help the society to minimize the crime incidents. This research can foresee regions which can have a high possibility for crime occurrence and represents crime prone areas in the city. With the help of crime data analysis, officers can speed up the process of solving crimes and spread awareness in the high prone crime areas. Key words: Crime rate analysis, Data analytics, Data Visualizations, Crime Prediction, Time Series Forecasting, Trend analysis
  • 3. 2 Introduction Crimes are the most serious threat to the human society and is increasing now-a-days in most of the cities in various parts of the world. Data mining algorithms plays an important role in predicting the number and type of crime events likely to take place in the future. Also, various other technologies and hi-tech methods may help Law Enforcement agencies to track the pattern of crimes and predict the probability of crime incidents. Though predictions cannot be 100% accurate, yet the probability for its occurrence can be detected. Crime incidents in a community are often based on many characteristics related to the community and society. Such characteristics are — different races, different income groups, different age groups, different sex groups, family structure (single, divorced, married, number of kids), level of education, number of police officers allocated to a locality, number of employed and unemployed people etc. Using data mining, one can classify localities based on these parameters and identify the relationship with criminal activities. In this paper, we are analyzing the crime data for City of Los Angeles. The primary objective of this research study is to illustrate visualization, analysis and prediction of crimes in relation to place and time. The utilization of prediction in crime analysis and exploration can make crime free region. Our research can predict regions which are highly sensitive to criminal activities based on various parameters. We have used predictive analysis to infer what areas of the Los Angeles city will be more sensitive or less sensitive. Based on the study we can recommend law enforcement officers to adopt measures accordingly. Since this is a descriptive analysis, major focus of this paper would be composing data, slicing, dicing, visualization and initiating insights. This paper covers the literature review and related work to the paper (Section A), Data preparation and methodology (Section B), Research
  • 4. 3 hypothesis (Section C) and Data analysis and interpretation (Section D) and conclusion (Section E). Literature Review Data mining is the process of analyzing large amounts of data to discover patterns, relationships, and trends that cannot easily be discovered through slicing and dicing techniques. These discoveries can then be utilized and applied to various datasets/domains in order to adopt and implement better strategies. Law enforcement often uses these techniques to identify patterns of crime in order to allocate resources to the areas that are most impacted by criminal activity which not only helps to reduce crime but also the cost of enforcement in designated zones. Agencies can also utilize these techniques to predict the probable new areas of crime based on real time data. Crime is dependent on various parameters such as average household income, literacy, poverty etc. and thus it is important to understand the same to get clear insights. It is also important for people of the country to be informed about the criminal activities going around them and hence crime reports are published to inform citizens on timely basis. Whereas our paper represents additional insights of future analysis and forecast of crime rate; and how COVID-19 has affected the crime rate in Los Angeles in 2020 (Figure 5), Geospatial data (Figure 1), types of weapons used in crimes (Figure 2), type of crimes common throughout past 10 years (Figure 4) race and sex worst impacted by the crimes (Figure 6). From the literature review, it can be deduced that crime data is increasing enormously and is important to keep a track on it. Due to this need, it is important to find techniques which are more efficient for accurate predictions. Related Work
  • 5. 4 A related project on crime rate data analysis in Los Angeles, was presented on 24th annual student symposium where they drew some of the insights like total occurrences of each crime, No. Of crimes in 2014 and per area with the help of HiveQL and Hadoop (Refer link 1). Another related research project “2014 Chicago Crime data analysis” by Yawen Li (Refer link 2) lists about the top 10 and least 10 communities with crimes. Their research includes analysis on the primary crime types, which hour of the day is the most dangerous, and the location where highest crimes are performed. It also explains the relationship of crime and arrest data of Chicago city in the year 2014. Methodology To make informed decisions for the Law Enforcement Agencies, it is required to handle data strategically. Hence, to proceed with analytics, we have taken following steps – Data collection For this research paper, we have used Crime Data from 2010 to October 2020 for Los Angeles City. The data has been extracted from data.lacity.org The dataset reflects the incidents of crime and is provided by Los Angeles Police Department.
  • 6. 5 It has 2.12M rows and 28 columns consisting of victim’s age, weapon used, location of crime etc. Data provided in this dataset is numeric, string and is normalized. Data in each instance belong to different areas of City of Los Angeles. Some of the information which is utilized in our analysis is as followed: Column Name Description Date OCC The timestamp at which the crime occurred. Data Type: Floating Timestamp Area Name of the geographic area Data Type: Text Crime Cd Indicates the crime committed Data Type: Text Vict Age Indicates the age of the victim Data Type: Numeric Vict Sex Indicates the sex of the victim Data Type: Text Vict Descent Indicates the race of the victim Data Type: Text Weapon Desc Weapon used for the crime Data Type: Text Location This gives the Street address of crime incident Data Type: Text Lat It gives the latitude/coordinates of the crime. Data Type: Numeric Long It gives the longitude/coordinates of the crime.
  • 7. 6 Data Type: Numeric Household Income It gives the median household income of City of Los Angeles. Data Type: Numeric Data Cleaning The dataset used for this analysis has few instances which contain some missing values/null values. In order to perform data processing, it is required to improve the data quality. There are various techniques available to improve the data quality of which we have used Data Cleaning. We have manually cleaned the data based on our understanding and intellect. We separated the chosen attributes which do not contain the missing values. All the values in the dataset were carefully copied to a new dataset and cross-checked multiple times to eradicate any chances of errors. Classification Analysis Classification is a data mining function that assigns items in a collection to target categories or classes. The major benefit of classification analysis can help us in identify patters and predictions of each case in the data. In this paper, we have used a predictive analysis on a month wise crime case in 2020. Because of the massive pandemic, many of the crimes have been reduced as residents prefer to stay in home. Classification analysis can be categorized to two major types: Discriminative and Generative. In this research, we have made use of the discriminative classifier as it determines one class for each row of data. It tries to model depending on the observed data and depends heavily on the quality of data in every case.
  • 8. 7 Pattern Identification and Association Analysis For the purpose of identifying the pattern and association between crime rate and household income, we have used association analysis. We have identified how the average household income impacts the crime rate of an area. The result of this phase is the crime pattern for a particular location. If a pattern is identified that low household income groups face more chances of crime, it means that there is a probability for crime occurrence at low-income groups. Information regarding patterns can help police officials to facilitate resources in an effective manner. It can help in avoiding crime occurrence by providing more security in suspected areas, fixing alarms and CCTV etc. Time Series Analysis Management talks about the future, so the more insights we obtain about the future, better crime management will follow, provided our insights are accurate. Predictions are forecasted from a dataset where all the prior observations are considered equally. Here, a time dimension is considered as a direct order dependence between observations. It implies developing models that best represent an observed time series in order to understand the underlying reasons. It can be either univariate where a single observation is recorded over the equal time intervals in a sequential manner; and multivariate where multiple observations are recorded for the same. In this analysis, we use past data points as the basis for projecting the future. We have plotted graph for consecutive years pre and post COVID-19 to identify the trend in crime rate. We have used SAP Lumira for making of time series forecasting about the Crime rate in the coming years.
  • 9. 8 Visualization Decision making relies on data, which comes with an overwhelming velocity and volume, that we can’t comprehend it without some layer of abstraction, such as a visual one. To have a better understanding of the data, it is better to visualize it before drawing inferences. In this analysis, crime prone areas are represented using Geomaps with crime density data. We have also used line, tag cloud, scatter plot type of visuals to showcase a variety of data. Using these visuals, we can infer and examine burst of information within no time. It is advantageous to visualize data as one can analyze only the data which is significant to the research. Out of range data is automatically discarded and hence does not make the audience confused. By knowing the probable regions where crime incidents occur the most, preventive actions can be taken. Research Hypothesis This section provides simple hypothesis for our research paper. A Hypothesis is a prediction, but it involves more than an educated guess. In this, we will explore the effects on one variable on another variable. A variable is a measurable factor that can be changed and manipulated in ways that are observable. For our research we have formulated two research hypothesis which will be proved by using Crime Data from the year 2010-2020 for the City of Los Angeles.H1:Median household income of an area is associated to the number of crimes in city
  • 11. 10 Figure 1 Geospatial map showing Association between household income and number of crimes Our hypothesis holds true as Association analysis accurately depicts the relationship between the median household income and number of crimes recorded in the respective areas. It is found that lower the median household income, higher the crime rate. In this hypothesis, median household income is our independent variable and number of crimes is a dependent variable. It is inferred that there is a direct relationship between the two variables and are inversely proportional. Due to lower income groups, people are often drawn into theft, burglary and making other criminal activities. Based on this association Los Angeles Police Department can understand the probability of suspected crime zones and hence can increase patrolling and can take stringent actions. H2: Victim’s age is related to the weapon used to perform crime.
  • 12. 11 Figure 2 : Use of weapons as per the victim age The above hypothesis of the use of weapons according to the victim age holds true. Here, it depicts a relationship of the use of weapons and the victim’s age. As, it can be inferred from the visualization that weapons and objects vary from age to age. For the victims who are young criminal try to use weapons which are quick and faster to respond like Gun/revolver/handgun.
  • 13. 12 On the other hand, for older age group lighter weapons are used such as threats, for enacting the crime. In order to control the crime, law enforcement officers can make stricter rules for purchasing of a gun and a revolver which will further reduce the use of it and thus resulting in low crime rate. For the purpose of this hypothesis, we have used scatter plot to get a relationship/ correlation between the two variables. In this case, age is an independent variable whereas weapon used is a dependent variable. Research Questions I. What is the trend in Crime rate for a decade from 2010 to 2020 Figure 3 Trend analysis of number of crimes by year In this visual, we have used line chart type to depict the number of crime incidents over a period of ten years i.e., from 2010 – 2020. Based on our analysis, we see that the number of crimes has been declining with time. It is also to be noted that number of crimes considerably declined
  • 14. 13 from the year 2014 to 2015. Seasonality exhibits a degree of randomness; that is, it is not identical for every year. Using more parameters or multivariate time series analysis, one can investigate the possible reasons of the sudden decline in crime incidents. We can infer from the visual that due to outbreak of the pandemic, in the year 2020, crime rate has significantly dropped. II. Which type of Crimes occur more commonly in all the years from 2015-2020? Figure 4 Type of crimes reported In this visual, we have used tag cloud to show the description of crime incident which is observed to be the most common in the collected dataset. It is seen that robbery is the most prevalent crime incident followed by Burglary and Vandalism. This is a vital information for Law enforcement agencies as they can utilize this information to prevent such cases by fixing more CCTV cameras, increase patrolling etc. Officers can also
  • 15. 14 intimidate people about the probability of type of crimes so that one can take probable measures to prevent the same. Data Analytics I. Analysis of monthly 2020 crime data to observe the impact of COVID-19 and prediction of crime rate post lockdown till 2021. Figure 5 Analysis and Prediction of crime data during COVID-19 in 2020 and 2021 In this visual, we have used line graph to analyze the crime data during the pandemic/ lockdown and have forecasted crime rate post lockdown. We have used R-single exponential smoothing algorithm for forecasting the crime rate post lockdown. In this time series analysis, we have used historical data points to forecast the future. From the research, it can be inferred that during March-May 2020 the crimes rates are tremendously low because of the “Shelter-in place” order by the government. Whereas from June onwards when the city started to open, the number of crimes is also seen increasing but
  • 16. 15 comparatively less than past few years. Using the alpha value as 0.9, it shows the prediction of same number in crime rate for the next year of 2021. II. Gender worst impacted by crime incidents over a decade Figure 6 Visual depicting worst impacted gender due to crimes from 2010 to 2020
  • 17. 16 In this visual, we have used stacked bar graph to visualize the gender worst affected by crimes in city of Los Angeles in the period of 2010 – 2020. We have used victim’s sex data, number of crimes and year of occurrence for this visualization. From the visual, it is evident that females are consistently the worst affected by the crimes. Using this data, we can suggest to the Law Enforcement Agencies to put a strict check on the areas where these crimes occur, and females have to suffer. Police department should increase patrolling and should keep a check on homelessness and shelter homes. III. Analysis on Crime rate by area Figure 7: Heat Map of areas with high to low crimes reported The above heat map shows the number of crimes as per area name. Highest number of crimes are reported from Southwest followed by 77th Street, Mission, N Hollywood, Foothill and so on. Measures like night patrolling and more awareness among residents can lead to low crime rate and thus it should be incorporated by the officers.
  • 18. 17 Conclusion We understand that Crime is one of the most sensitive part of our lives and structured visualizations with well-planned expertise can play an important role in controlling the same. Based on our analytics reports we can communicate to law-enforcement agencies to take measures on how crime rate can be reduced. Utilizing applications of data mining is a lengthy and tedious process in cases when large volumes of data is to be handled. However, the precision that is gained during the process is unbeatable. One could infer and create knowledge on how to slow down crime for the safety and well-being of people. We have determined the geographic locations where the intensity of crime is the highest along with the crime patterns. There are other applications of data mining in the domain of law enforcement such as determining criminal "hot spots", creating criminal profiles, and learning crime trends. City data collected supports our hypothesis and hence our research work can prove to be useful to the Law Enforcement Agencies. We found a close association between household income of the area and number of crime incidents. Also, it was proved from our hypothesis that the weapon chosen for performing a crime was highly correlated and dependent on the age of the victim. On the closing note, we want to say that data mining techniques can be used to reduce crimes significantly. Law enforcement agencies can further take advantage of other parameters to predict performance and association of crime activities.
  • 19. 18 References (n.d.). Retrieved from https://data.lacity.org/A-Safe-City/Crime-Data-from-2010-to- 2019/63jg-8b9z (n.d.). Retrieved from https://www.slideshare.net/Yawenli/2014-chicago(2) (n.d.). Retrieved from https://www.slideshare.net/RamdharanDonda/crime-rate-data-analysis- in-los-angeles(1) Crime Data from 2010 to 2019. (n.d.). Retrieved from data.lacity.org: https://data.lacity.org/A- Safe-City/Crime-Data-from-2010-to-2019/63jg-8b9z Dahiya, M. (2017). Crime Data Investigating using Machine Learning Algorithms. International Journal of Computational Intelligence Research. Jones, N. K. (2020). Practical Analytics. Epistemy Press. MacGregor, J. (n.d.). Predictive Analysis with SAP. Galileo Press. Rizwan Iqbal1*, M. A. (n.d.). An Experimental Study of Classification. Indian Journal of Science and Technology. Rizwan Iqbal1*, M. A. (n.d.). An Experimental Study of Classification Algorithms for Crime Prediction. Retrieved from Indian Journal of Science and Technology. Shiju Sathyadevan, D. M. (2014). Crime Analysis and Prediction Using Data Mining. IEEE. Shivam Maurya, S. M. (2018). Crime Prediction Using Data Analytics Tools; A Review. Proceedings of IC4T. Lucknow.