SlideShare uma empresa Scribd logo
1 de 79
Baixar para ler offline
Harnessing the power of
data science in the
service of humanity.
Our Purpose: We amplify the impact of social organizations.
Our Customer: Social organizations that have a clear theory of
change for reducing human suffering
Our Competitive Advantage: Our network of pro bono data scientists
Our Product: Data science services, i.e. predictive analytics, machine
learning, AI
Our Style: Human-centered design, jargon-free, accessible
What We Do
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
DataKind and ICAAD
Classifying UPR Records
ICAAD: International Center for Advocates
Against Discrimination
Non-profit organization that combats structural discrimination through
monitoring global trends, fostering research and designing interventions
Promote religious freedom in France
Combat gender-based violence in the Pacific Islands
Better documentation of hate crimes
Mapping discrimination
18
19
What was the DataCorps problem?
We have a database of text records from the United
Nations: the “Universal Periodic Review”
21
Data: Universal Periodic Review
What was the DataCorps problem?
How do we leverage these UPR records to better
understand human rights conditions across the world?
Labeling with Sustainable Development Goals
Adopted in 2015, the SDGs are a set of seventeen aspirational goals that all UN
member states are committed to achieve, covering a broad range of human
rights and development issues
Successor to the Millennium Development Goals
Task: How do we map a UPR to an SDG(s)?
Deliverables
1) Build an MVP algorithm that systematically classifies Universal Periodic
Review (UPR) records using Sustainable Development Goals
2) Using the results from the algorithm, create a dashboard that visualizes
global patterns of discrimination
These two tools will enable ICAAD to better allocate their resources towards
the most important human rights interventions, as well as better disseminate
their findings to other related organizations.
What We Did...
UPR Data
Source Number of Records Number of Labels
ICAAD labeled 1247 2351
DataKind labeled 349 628
All-organic, self-harvested and hand-labeled...
Data Prep
1) Each UPR = (very short) document
2) Clean, tokenize and create (1,2)-grams
3) Create term-document matrix
4) Feed bag-of-words matrix into ML model
5) ML model = two step “ensemble”
Machine Learning Layer: Multi-Label SVM
Support Vector Machine
Linear (no kernel)
Loss function: Squared hinge
Penalty type: L1
Regularization constant: 2.0
Keyword Lookup Layer
If UPR text contains the word “corruption” → SDG #16
If UPR text contains the word “HIV” or “AIDS” → SDG #3
If UPR text contains the word “ICRMW” → SDG #10
And so on...
Final Ensemble Model: CV Metrics
ML Layer ML + Keyword Lookup
Precision 0.827 0.772
Recall 0.758 0.848
F1-Score 0.787 0.802
ML Layer by itself does very well, but by adding the Keyword layer,
we can sacrifice a little bit of precision for a large gain in recall,
and get overall better performance.
Dashboard Visualizations: http://52.3.119.223/
The Aftermath Part 1
● Proof of concept algorithm delivered last October
● Demonstrated and implicated among various project
partners
The Aftermath Part 2
● Team from Xerox brought in to build v2 of algorithm
○ Main SDG category contains 169 additional sub-goals
○ ICAAD wants to classify UPR records using these sub-goals
○ Army of volunteer lawyers doing a lot of manual labeling
● Something concrete by next summer!
What We Learned (Parting Shots)
1. Easier = better
2. Small data is hard
3. Simple Boolean logic works surprisingly well
4. Data scientists are paid (and sometimes not) to do the
dirty work
DataCorps Team
Ben Cohen: Software Engineer @ Warby Parker
Rebecca Wei: PhD Student @ Northwestern
Karry Lu: Senior Data Scientist @ Plated
Project Repo
https://github.com/karry-lu/datakind-icaad-model
39
How do we find evidence?
How do we communicate evidence?
How do we use evidence?
42
43
Information overload
Respondents from UK
conservation community
indicate desire to use
evidence but:
Lacked a support framework
to quickly sort and evaluate
evidence
Experience-
based
Evidence-
based
modified from Pullin et al. 2004
Evidence gap
Evidence gap
Need for knowledge on
effectiveness
Evidence-based decision making:
Using findings to inform actions
Desired outcomes achieved
Research project
Communicate findings
Monitor and evaluate
progress and outcomes
Identify
knowledge gaps
Synthesize
knowledge gluts
Determine
indicators
Adjust actions
RESEARCHERS
PRACTITIONERS
Theory of change
The need
Practitioners need standardized
storage and access to research
insights from academic and grey
literature for evidence-based
decision making
Researchers need a framework to
follow to create these resources
Best
Science
Expert
Opinion
Society’s needs
and preferences
Evidence based
Decision-making
Systematic mapping process
51
Systematic Map
Problem #1: interactivity
Thorn, Jessica PR, et al. "What evidence exists for the
effectiveness of on-farm conservation land
management strategies for preserving ecosystem
services in developing countries? A systematic map."
Environmental Evidence 5.1 (2016): 13.
The AskProblem # 2: Manual screening
Mapping example
Problem # 3: Tools exist
Less
More
colandrapp.com
Tool framework
System 1: relevance ranking
• Citations are ranked by expected relevance
depending on the availability and number of
user-labeled examples
– 1st uses search terms from review planning:
computes the amount of overlap between those
terms and citations' title + abstract + keywords
– 2nd after enough examples have been labeled,
uses distributional word vectors (word2vec) as
features for a support vector classifier that predicts
inclusion or exclusion; use confidence of that
classification as expected relevance
• Citations are randomly sampled each time, to avoid
hasty generalization
Unscreened Relevance is learned every 10
citations and documents are
re-sorted
System 2: extraction and tagging
• A better methodology might be to use the training
data to find sentences in the document that might
indicate a label. (provide provenance)
• We can train the system to over-predict (predict
sentences from a large number of the labels), so
that the system can focus on recall, while human
annotators can focus on precision
• For locations we can use a "Named Entity
Recognition" system to find mentioned locations
in the document, and suggest these as labels
• For other metadata, we can train a model which
predicts the relevance of sentences to a label
• We show the sentences that best predict labels to
the user, who can then use that information to
pick the correct labels
Data extraction
This process also learns
relevance, set at 50 reviews
before it presents confidence
www.natureandpeopleevidence.org
Interaction with the data portal
•Output CSV file with
individual citations and
factor tags
•CSV ingested by receiving
system
http://natureandpeopleevidence.org/
Measuring evidence synthesis and
dissemination on the “T” impact model
Sector (diffuse) impact as
measured by
• Access and operability
• Common vs. uncommon solution
• Dissemination framework
Organization (deep) impact as
measured by
• Operational efficiency
• Increased productivity
• Expanded service
16 reviews
13 review
leads
28
users
Two weeks
of soft
launch
colandr
Two virtual
trainings
conducted
Data Portal
~1,400 SESSIONS
8 MONTHS 47 REGISTERED USERS
Multiple in person trainings
“Evidence Based Conservation”
ARTICLES
ON IT
139
ARTICLES
CITING THEM
2100 HOW MANY PEOPLE
USE EVIDENCE IN
DECISION MAKING?
73
●
●
●
●
●
●
●
79

Mais conteúdo relacionado

Mais procurados

Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsIOSR Journals
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your DataElaine Martin
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as CommoditiesMathieu d'Aquin
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)Donald Palmer
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentAmrapali Zaveri, PhD
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisManuel Martín
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
 
CuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGCuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGRobert Oostenveld
 

Mais procurados (20)

Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
 
DS4G
DS4GDS4G
DS4G
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
Artificial Intelligence in Data Curation
Artificial Intelligence in Data CurationArtificial Intelligence in Data Curation
Artificial Intelligence in Data Curation
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your Data
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as Commodities
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)Metadata Views (by Donald Palmer)
Metadata Views (by Donald Palmer)
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Amrapali Zaveri Defense
Amrapali Zaveri DefenseAmrapali Zaveri Defense
Amrapali Zaveri Defense
 
CuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEGCuttingEEG - Open Science, Open Data and BIDS for EEG
CuttingEEG - Open Science, Open Data and BIDS for EEG
 

Semelhante a ODSC East 2017: Data Science Models For Good

Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Gianluca Tarasconi
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdfAdhySugara2
 
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Ringgold Inc
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Stuart Shulman
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...Kathleen Jagodnik
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
 

Semelhante a ODSC East 2017: Data Science Models For Good (20)

Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdf
 
Nicolson
NicolsonNicolson
Nicolson
 
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
data mining
data miningdata mining
data mining
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Lowenberg Making Data Count
Lowenberg Making Data CountLowenberg Making Data Count
Lowenberg Making Data Count
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
Data mining
Data miningData mining
Data mining
 

Último

(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...tanu pandey
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)ahcitycouncil
 
2024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 292024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 29JSchaus & Associates
 
Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfahcitycouncil
 
Booking open Available Pune Call Girls Shukrawar Peth 6297143586 Call Hot In...
Booking open Available Pune Call Girls Shukrawar Peth  6297143586 Call Hot In...Booking open Available Pune Call Girls Shukrawar Peth  6297143586 Call Hot In...
Booking open Available Pune Call Girls Shukrawar Peth 6297143586 Call Hot In...tanu pandey
 
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...tanu pandey
 
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...nservice241
 
CBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCongressional Budget Office
 
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...CedZabala
 
Climate change and occupational safety and health.
Climate change and occupational safety and health.Climate change and occupational safety and health.
Climate change and occupational safety and health.Christina Parmionova
 
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceCunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceHigh Profile Call Girls
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxPeter Miles
 
(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Service
(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Service(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Service
(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Último (20)

(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In  Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCeCall Girls In  Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
 
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)
 
2024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 292024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 29
 
Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdf
 
Rohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Booking open Available Pune Call Girls Shukrawar Peth 6297143586 Call Hot In...
Booking open Available Pune Call Girls Shukrawar Peth  6297143586 Call Hot In...Booking open Available Pune Call Girls Shukrawar Peth  6297143586 Call Hot In...
Booking open Available Pune Call Girls Shukrawar Peth 6297143586 Call Hot In...
 
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
 
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
 
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
 
CBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related Topics
 
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
 
Climate change and occupational safety and health.
Climate change and occupational safety and health.Climate change and occupational safety and health.
Climate change and occupational safety and health.
 
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceCunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
 
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
 
(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Service
(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Service(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Service
(DIVYA) Call Girls Wakad ( 7001035870 ) HI-Fi Pune Escorts Service
 

ODSC East 2017: Data Science Models For Good

  • 1.
  • 2. Harnessing the power of data science in the service of humanity.
  • 3. Our Purpose: We amplify the impact of social organizations. Our Customer: Social organizations that have a clear theory of change for reducing human suffering Our Competitive Advantage: Our network of pro bono data scientists Our Product: Data science services, i.e. predictive analytics, machine learning, AI Our Style: Human-centered design, jargon-free, accessible What We Do
  • 4.
  • 5.
  • 7.
  • 11.
  • 13.
  • 14.
  • 15.
  • 17. ICAAD: International Center for Advocates Against Discrimination Non-profit organization that combats structural discrimination through monitoring global trends, fostering research and designing interventions Promote religious freedom in France Combat gender-based violence in the Pacific Islands Better documentation of hate crimes Mapping discrimination
  • 18. 18
  • 19. 19
  • 20. What was the DataCorps problem? We have a database of text records from the United Nations: the “Universal Periodic Review”
  • 21. 21
  • 23. What was the DataCorps problem? How do we leverage these UPR records to better understand human rights conditions across the world?
  • 24. Labeling with Sustainable Development Goals Adopted in 2015, the SDGs are a set of seventeen aspirational goals that all UN member states are committed to achieve, covering a broad range of human rights and development issues Successor to the Millennium Development Goals
  • 25. Task: How do we map a UPR to an SDG(s)?
  • 26. Deliverables 1) Build an MVP algorithm that systematically classifies Universal Periodic Review (UPR) records using Sustainable Development Goals 2) Using the results from the algorithm, create a dashboard that visualizes global patterns of discrimination These two tools will enable ICAAD to better allocate their resources towards the most important human rights interventions, as well as better disseminate their findings to other related organizations.
  • 28. UPR Data Source Number of Records Number of Labels ICAAD labeled 1247 2351 DataKind labeled 349 628 All-organic, self-harvested and hand-labeled...
  • 29. Data Prep 1) Each UPR = (very short) document 2) Clean, tokenize and create (1,2)-grams 3) Create term-document matrix 4) Feed bag-of-words matrix into ML model 5) ML model = two step “ensemble”
  • 30. Machine Learning Layer: Multi-Label SVM Support Vector Machine Linear (no kernel) Loss function: Squared hinge Penalty type: L1 Regularization constant: 2.0
  • 31. Keyword Lookup Layer If UPR text contains the word “corruption” → SDG #16 If UPR text contains the word “HIV” or “AIDS” → SDG #3 If UPR text contains the word “ICRMW” → SDG #10 And so on...
  • 32. Final Ensemble Model: CV Metrics ML Layer ML + Keyword Lookup Precision 0.827 0.772 Recall 0.758 0.848 F1-Score 0.787 0.802 ML Layer by itself does very well, but by adding the Keyword layer, we can sacrifice a little bit of precision for a large gain in recall, and get overall better performance.
  • 34. The Aftermath Part 1 ● Proof of concept algorithm delivered last October ● Demonstrated and implicated among various project partners
  • 35. The Aftermath Part 2 ● Team from Xerox brought in to build v2 of algorithm ○ Main SDG category contains 169 additional sub-goals ○ ICAAD wants to classify UPR records using these sub-goals ○ Army of volunteer lawyers doing a lot of manual labeling ● Something concrete by next summer!
  • 36. What We Learned (Parting Shots) 1. Easier = better 2. Small data is hard 3. Simple Boolean logic works surprisingly well 4. Data scientists are paid (and sometimes not) to do the dirty work
  • 37. DataCorps Team Ben Cohen: Software Engineer @ Warby Parker Rebecca Wei: PhD Student @ Northwestern Karry Lu: Senior Data Scientist @ Plated
  • 39. 39
  • 40.
  • 41.
  • 42. How do we find evidence? How do we communicate evidence? How do we use evidence? 42
  • 43. 43
  • 44.
  • 46. Respondents from UK conservation community indicate desire to use evidence but: Lacked a support framework to quickly sort and evaluate evidence Experience- based Evidence- based modified from Pullin et al. 2004 Evidence gap
  • 48. Need for knowledge on effectiveness Evidence-based decision making: Using findings to inform actions Desired outcomes achieved Research project Communicate findings Monitor and evaluate progress and outcomes Identify knowledge gaps Synthesize knowledge gluts Determine indicators Adjust actions RESEARCHERS PRACTITIONERS Theory of change
  • 49. The need Practitioners need standardized storage and access to research insights from academic and grey literature for evidence-based decision making Researchers need a framework to follow to create these resources Best Science Expert Opinion Society’s needs and preferences Evidence based Decision-making
  • 50.
  • 52. Problem #1: interactivity Thorn, Jessica PR, et al. "What evidence exists for the effectiveness of on-farm conservation land management strategies for preserving ecosystem services in developing countries? A systematic map." Environmental Evidence 5.1 (2016): 13.
  • 53. The AskProblem # 2: Manual screening
  • 55.
  • 56. Problem # 3: Tools exist
  • 57.
  • 58.
  • 62.
  • 63. System 1: relevance ranking • Citations are ranked by expected relevance depending on the availability and number of user-labeled examples – 1st uses search terms from review planning: computes the amount of overlap between those terms and citations' title + abstract + keywords – 2nd after enough examples have been labeled, uses distributional word vectors (word2vec) as features for a support vector classifier that predicts inclusion or exclusion; use confidence of that classification as expected relevance • Citations are randomly sampled each time, to avoid hasty generalization
  • 64. Unscreened Relevance is learned every 10 citations and documents are re-sorted
  • 65. System 2: extraction and tagging • A better methodology might be to use the training data to find sentences in the document that might indicate a label. (provide provenance) • We can train the system to over-predict (predict sentences from a large number of the labels), so that the system can focus on recall, while human annotators can focus on precision • For locations we can use a "Named Entity Recognition" system to find mentioned locations in the document, and suggest these as labels • For other metadata, we can train a model which predicts the relevance of sentences to a label • We show the sentences that best predict labels to the user, who can then use that information to pick the correct labels
  • 66. Data extraction This process also learns relevance, set at 50 reviews before it presents confidence
  • 68. Interaction with the data portal •Output CSV file with individual citations and factor tags •CSV ingested by receiving system http://natureandpeopleevidence.org/
  • 69. Measuring evidence synthesis and dissemination on the “T” impact model Sector (diffuse) impact as measured by • Access and operability • Common vs. uncommon solution • Dissemination framework Organization (deep) impact as measured by • Operational efficiency • Increased productivity • Expanded service
  • 70. 16 reviews 13 review leads 28 users Two weeks of soft launch colandr Two virtual trainings conducted
  • 71. Data Portal ~1,400 SESSIONS 8 MONTHS 47 REGISTERED USERS Multiple in person trainings
  • 72. “Evidence Based Conservation” ARTICLES ON IT 139 ARTICLES CITING THEM 2100 HOW MANY PEOPLE USE EVIDENCE IN DECISION MAKING?
  • 73. 73
  • 74.
  • 78.
  • 79. 79