In this talk, Vincent Emanuele tells the story of how Anidata has applied data science techniques in collaboration with law enforcement to fight human trafficking.
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Using Data Science for Social Good: Fighting Human Trafficking
1. Using Data Science for Social
Good: Fighting Human
Trafficking
Vincent A. Emanuele II, Ph.D
President/Co-founder, Anidata
info@anidata.org
August 31, 2016
2. Why I’m here
Here to tell you a story
● A group of aspiring and practicing data scientists
● Found an important problem to work on together and in collaboration with
professionals in the community (human trafficking)
● Had some successes and failures along the way, with a ton of personal and career
growth
We are a group of people who are JUST LIKE YOU
3. Professional Bio
● PhD in Electrical and Computer Engineering from Georgia Tech(Signal
Processing and Machine Learning) (2010)
● CDC Visiting Scientist (2006 - 2013)
● First data scientist at Wellcentive. Founded Data Quality, Data Governance, and
Data Science Teams (2013 - 2016)
● Co-founder of Anidata (2016)
● Founder of Zylinium Research (2016)
5. Guiding Principle #1: Mentorship
Make sure the professionals (data scientists) who come behind you end up better data
scientists that you were, faster than you were able to get there
6. Guiding Principle #2: Community Improvement
Leave the community you live in better than how you found it
7. And this is how Anidata came to be
There are many people with both the Aptitude and Motivation to become data
scientists
They are missing
1) good data science mentors
2) exposure and experience working on real projects
This is how Anidata plans to serve the community
8. The Human Trafficking Problem
Commercial Sexual Exploitation of Children
Information provided by:
Irina Khasin, JD
Chief Senior Assistant District Attorney
Human Trafficking Unit, Fulton County District Attorney’s Office
9. Commercial Sexual Exploitation of Children (CSEC)
● How many children run away from home annually in the US?
○ 1.6 million
● In 2014, the National Center for Missing & Exploited Children estimate that 1 in 6
endangered runaways reported to them were victims of sex trafficking
● For teen runaways, what percent will become victims of CSEC?
○ 1 in 3 teens will become a CSEC victim within 48 hours of leaving home and becoming homeless
(“survival sex”)
● What is the average age of entry into the commercial sex industry in the US?
○ 12-14 years old
● In 2014, the National Human Trafficking Resources Center hotline, operated by
Polaris, received reports of 3,598 sex trafficking cases inside the United States
10.
11. The Truth about CSEC
● Victims don’t choose prostitution
● Victims are young and vulnerable children
● Average age of entry is 12-14 years old
● Overwhelming number are runaways
● 70-90% have been sexually abused
● Most come from troubled homes
○ Abuse, neglect, parental crimes, and addiction
● Truancy and school problems are common
12. How does CSEC happen?
● Traditional
○ The track: Fulton Industrial/Old National
○ Truckstops
○ Massage parlors/strip clubs
○ Residential Brothels
● Modern Trend
○ Backpage
○ Craigslist
○ Escort services
● A digital data trail is a logical target for a data science-centric approach to this
problem!
14. Getting and organizing the data
This is often said to be 80% of a data scientist’s work and it can be messy
Approach
● Target adult web endpoints that are known to be channels for human trafficking
● Write python code to scrape advertisements from these sites
○ We choose NOT to store or process any image data
● Store in a database
● Anonymize our data collection by using Tor and make requests look “human-like”
● Focus on the southeast region, with Atlanta as a central anchor but extending as
far as Nashville, TN
After running for 2.5 months, we collected a database of approximately 160,000 ads
17. Anidata 1.0 Human Trafficking Hackathon
Purpose: The purpose of this hackathon is to develop one or more prototypes and/or data-focused product
concepts that could lead to a measurable impact on preventing, prosecuting, investigating, or otherwise aiding
the fight against human trafficking within the next 12 months.
One day hackathon on Saturday, March 5 from 7:30AM to 7:30PM
Free food and caffeine all day
Invite-only to Anidata members
Judging: This is a non-competitive hackathon. We will not be judging “winners”. However, all teams are
required to give a 7 minute pitch presentation to the “pitch panel”, which will consist of recognized experts
from the human trafficking/sex crimes legal, investigative, and research community. The pitch panel will give
invaluable feedback on the value of the product concepts demonstrated. We will strictly enforce the time limit
for the pitch presentations.
18. Collaborations
Data scientists
● We had 14 data scientists and aspiring data scientists
Law enforcement
● We had 12 volunteers from the District Attorney’s office
○ Prosecutors, investigators, law-enforcement, victim interview specialists
19. Hackathon Schedule
Time Task Primary Contact
7:30:00 AM Check-in: Breakfast & Meet and Greet Vince/Irina/Bob/Dan
8:00:00 AM Introductions Vince
8:15:00 AM Walk through of the data and existing infrastructure (informal w/o slides) Dan
8:45:00 AM Team and task formation Vince/Bob
9:00:00 AM Start Hacking Vince
10:00:00 AM Mentor Office Hours I Vince
11:45:00 AM Lunch & Team Status Updates (informal) Vince
12:15:00 PM More Hacking Vince
4:00:00 PM Mentor Office Hours II Vince
4:30:00 PM Final push and polish hacking Vince
6:00:00 PM Product Pitches (7m per team) Vince
6:30:00 PM Pitch Panel Recess (share notes, discuss, prepare feedback) Vince
7:00:00 PM Pitch Panel Review of Product Pitches Pitch Panel/Irina
7:30:00 PM Concluding Remarks Vince/Irina
20. Results of Hackathon 1
We had four teams pitch product concepts, with some ideas partially implemented and
some demos
Pitch panel of Human Trafficking experts gave us INVALUABLE feedback to shape
the product directions
Recurring theme across teams: Connecting the dots to understand underlying
trafficker as fast as possible is important.
We identified a product-market fit, and we abstracted the real data science/business
problem the DA faced as follows….
24. Our star characters in this story
Meet Mary, our human
trafficking victim
Mary ran away from an abusive
home at the age of 16
Within 48 hours, she was
“recruited” by Zed
Mary is not alone. One out of
three juvenile runaways are
propositioned by a trafficker or
purchaser within 48 hours
25. Our star characters in this story
Meet Alice, our Assistant
District Attorney in the Fulton
County District Attorney’s
Office
She was called by APD to
interview Mary at a placement
after she had been recovered
26. Our star characters in this story
Meet Bob, the investigator
helping Alice collect evidence
during her interview with Mary
If Alice and Bob can work fast
enough, they can identify Zed
and get a warrant for his arrest
Most victims like Mary run away
from placement within a week
This initial investigation is
CRITICAL
27. We call this window of opportunity “The Golden Week”
Mary
put in a
placement
Mary
arrested
by APD
APD calls
Alice to
ask for
help
Alice and
Bob
Interview
Window
Alice and
Bob
Interview
Window
Mary runs
away from
placement
35. Projected Measurable Impact on Law Enforcement
● Reduce time between first victim interview to arrest of perp
● Increased number of convictions per year
● Increased number of indictments per year
37. Stages of Innovation
MVD: “Minimum Viable Demonstration”
● Not quite a prototype, more than a mockup, and something that can be shown in a simple demo
38. Anidata 1.1 Hackweek MVD Plan
● Goals
○ Create polished demo
○ Gain full stack skills
○ Have fun collaboratively building a “product”
○ Tighten up the cool work we did in Anidata 1.0
● What we hoped to create
○ A mobile-responsive web UI that allows one to query the dataset and browse returned results
○ A backend service that allows the web frontend to query the database for results
○ Clean algorithm code that can be run on scraped data
■ Phone number cleaning
■ Email type combing
■ Phone-email-UID entity resolution (people graphs)
○ A mechanism to run everything and turn on the web service
40. “Connect-the-dots” Framework: Graphy Theory
The information attached to distinct trafficking entities can be modeled using graph
theory
Connected subgraphs represent all information we know about a single underlying
trafficker
42. Web App Prototype: The Stack
node.js
angular.js
Ruby on rails
Python
networkX
Postgresql
43. Conclusions
● Hackathons are a great way to build the data science community
● Hackathons with partners in the community are a great way to raise awareness of
how data science can solve business problems
○ Caveat: If you focus on the business problem
○ This helps create jobs and a local economy for data science
● Event planning is time consuming
● We are not that different from YOU. Just a group of motivated individuals with a
common purpose.