SlideShare uma empresa Scribd logo
1 de 12
CS410 Course Project Presentation
Petition Predictor
CS410 Spring 2013
Lucky Adike
Martin McEnroe
Dann Ormond
1
CS410 Course Project Presentation
Problem Statement
Congress shall make no law respecting an establishment of religion, or prohibiting the free
exercise thereof; or abridging the freedom of speech, or of the press; or the right of the
people peaceably to assemble, and to petition the Government for a redress of grievances.
- The First Amendment of the United States Constitution
• January 2012: Congress proposed legislation on behalf of content distributors
• The internet community grew increasingly alarmed about the change and side effects
• Several well publicized events took place on January 18, 2012 as part of the SOPA
blackout day: Google, Reddit, Wired, Wikipedia and 115,000 other websites modified
their web presence to protest the pending legislation.
• January 20th the legislation was shelved indefinitely
What useful information retrieval tool could be built?
• Could this citizenry-government action have been anticipated and predicted?
• Could information retrieval and analysis of the online conversation anticipate and
predict the end result?
2
CS410 Course Project Presentation 3
100,000
signatures
in 30 days Which new
petitions will
hit threshold?
Reach
threshold and
Whitehouse
responds
Must
register with
email and
zip code
Related work: On 2/21/13 Whitehouse hosts
hackathon and releases project results on 5/1.
Pulse predicts when the threshold will pass 100k:
http://youtu.be/5-2P4GFZf8Y
https://github.com/DruRly/pulse
CS410 Course Project Presentation
Solution Approach
• 1st Idea: Classify the petition:
– “1” : Petition will receive 100,000 discrete, validated signatures within 30 days
– “0” : Petition will not pass 1000,000 threshold in time
• How to make a classification decision?
1. statistical analysis of past performance.
• Wrote a Python program to scrape the whitehouse website every 8 hours.
Stored in a JSON object for use in subsequent analysis and retrieval
4
Text of petitionSignature count every 8
hours starting 4/28
Petition create date (but only
viewable on website after 150
signatures
a unique identifier, also
useful as a search term
Title of petition
During course of
project we changed to
ranking petitions
CS410 Course Project Presentation
Logarithmic Curve Fit of 10 Most Likely Petitions
5
150
1500
15000
150000
0 10 20 30 40 50 60 70 80 90 100
NumberofSignatures(Logscale)
Time in 8 hour increments (petitions time shifted to common origin = creation date)
archbishops
marijuana3
airgun
postal
Malaysian
assault
habeas
aggag
thallium
transnational
Log. (archbishops)
Log. (marijuana3)
Log. (airgun)
Log. (postal)
Log. (Malaysian)
Log. (assault)
Log. (habeas)
Log. (aggag)
Log. (thallium)
Log. (transnational)
Threshold @ 100,000 signatures
Curve fit then predict the 30th day value
(x = 90 since we sample every 8 hours)
Petition ‘fatigue’ suggests logarithmic
model is better predictor
- Ln used (base w1 = e) can be tuned
CS410 Course Project Presentation
Twitter: Tweets and Followers
After signing, wh.gov site encourages you to promote the petition
• Used public Twitter REST API
• Search on the petition title
• Tweet Rate = count / # of days (twitter limits age of tweets in API)
• Use transformation of rate to reward place in rank, not absolute value
difference
– sublinear
– linear
– exponential
• Guess: linear
6
Tweet Weight Adjusted for ∑ƒ(followers)
• Are some tweeters more important than others?
• Can we develop something like authorities/hubs?
• Weighted Rate incorporates number of followers to
increase/decrease score of each tweet
Adj. Score = ∑ log5(followers) /
days of tweets
Base 5 -> Pivot point is w2 = 5 followers – can be tuned
rank
1.05
0.95
CS410 Course Project Presentation
Transforming Rank to Boost Factors
7
• Petition rank is mapped via a linear function – function type can be tuned
• Tuning scaling parameter applied based on judgment of importance of each IR category
– tweet rate: w3= .02 1st -> 1.10; 10th -> .90
– follower adjusted tweet rate: w4 = .04 1st -> 1.20; 10th -> .80
Petition ID
Ln Curve Fit
w1= e
Tweets
Tweet Rate
per Day
Rank
Boost
Factor
Boost
Follower
Weighted
Rate
Weighted/Ra
te Ratio
Rank
Boost
Factor
Boost
xNskxL1q 16,545 94 11.8 10 0.90 -1,655 39.7 3.379 7 0.92 -1,324
xqNMVRB4 9,115 97 12.1 9 0.92 -729 44.1 3.636 1 1.20 1,823
khpw6LCt 50,898 1022 127.8 2 1.08 4,072 459.8 3.600 3 1.12 6,108
drCmyCHZ 21,280 231 28.9 5 1.02 426 103.1 3.570 4 1.08 1,702
nBqKR7bm 446,841 1676 838.0 1 1.10 44,684 2675.7 3.193 9 0.84 -71,494
kVhNfHQ1 14,720 168 21.0 6 0.98 -294 71.7 3.412 6 0.96 -589
bMJpDrNq 6,769 114 14.3 8 0.94 -406 49.6 3.479 5 1.04 271
KQWSvsKr 5,380 127 15.9 7 0.96 -215 57.7 3.635 2 1.16 861
Rd8C54p1 83,231 93 31.0 4 1.04 3,329 63.5 2.047 10 0.80 -16,646
V3hNt2fB 17,376 508 63.5 3 1.06 1,043 208.5 3.283 8 0.88 -2,085
CS410 Course Project Presentation
Can Google Trends help us?
8
Chunks,value (0 – 100)
Revoke US Visa,7
on,83
National Security Grounds,7
to,83
Venezuelan Government Officials,0
involved,65
in,93
Transnational Organized Crime,65
Converted the petition title into search
phrases using OpenNLP
• sentence detector
• tokenizer & POS tagger => Chunker
Some observations
• Chunking produced common terms with high scores
• Would be more useful to build a custom Query
background language model – need more data
• Not clear how Google trends computes values from 0
to 100 – different petitions are not relative to each
other
• Doesn’t appear to be “bag of words” model. What
about semantically equivalent terms? We were
hoping for a tf-idf weighting from the web
• Is there another tool out there? Is there functions of
the API we didn’t exploit? Will the API evolve?
• most unreliable IR source therefore w5= .01
Results from web interface
Results from API interface
CS410 Course Project Presentation
Authority Sites via Bing API
• Created list of 30 authoritative web sites (e.g., cnn.com). Each weighted equally.
• Sent full title of petition as query to Bing API exactly as listed on wh.gov:
“Invest and deport Jasmine Sun who was the main suspect of a famous Thallium
poison murder case (victim:Zhu Lin) in China”
• Measured number of responses in the top 50 results that came from an authoritative
domain - eliminated self-posting parts of domain: http://ireport.cnn.com/docs/DOC-965382
• Observation: Most petitions do not receive mainstream attention
• Second most reliable w6= .03
9
Petition ID keyword
Close
Date
Ln Curve
Fit
Authority
Sites
Rank
Boost
Factor
Boost
xNskxL1q archbishops 5/27 16,545 5 3 1.09 1489
xqNMVRB4 marijuana3 5/17 9,115 8 1 1.15 1367
khpw6LCt airgun 5/15 50,898 4 4 1.06 3054
drCmyCHZ postal 5/24 21,280 6 2 1.12 2554
nBqKR7bm Malaysian 6/4 446,841 2 6 0.97 -13405
kVhNfHQ1 assault 5/21 14,720 0 10 0.85 -2208
bMJpDrNq habeas 5/27 6,769 3 5 1.03 203
KQWSvsKr aggag 5/10 5,380 2 8 0.91 -484
Rd8C54p1 thallium 6/4 83,231 0 9 0.88 -9988
V3hNt2fB transnational 6/3 17,376 2 7 0.85 -2606
CS410 Course Project Presentation
Putting it together
• Our focus was on acquiring data and constructing a model and automated where
necessary and using open tools, APIs, and information sources
• Some work about transfer between modules and final ranking and computation needs
more automation if we are to run unattended
• Much data analysis, both manual and automated to guess at important sources and
parameters. Many initial ideas didn’t pan out:
– Sentiment analysis (no such thing as bad publicity)
– Google trends surprisingly useless – forced to do manual manipulation – very low
confidence in this as a prediction
– Facebook button on wh.gov but didn’t appear to be used as much as twitter
– No training data to choose parameters. Choose simple “boost” model to start and
used intuition from project to guess at relative size of boost from different sources.
10
Stop 85
seismic airgun testing 0
for 86
oil and gas 77
off 80
the U.S. East Coast . 0
CS410 Course Project Presentation
Putting Our Money Where Our Mouth Is…
Ranked predictions of 10 most likely1 of the 842 petitions started between April 5 and May 4
and ranked predictions. How will we do?
11
1. Only petitions that have at least 150 signatures are visible to us
2. One petition ( 0MNp0Bys ) started on 4/15 and hit 100k before we started collecting statistics so we excluded this form our data set
Petition ID keyword
Close
Date
Linear Curve
Fit
Naïve
Order
Ln Curve Fit
w1= e
Twitter
w3 = .02
Twitter+
w2 = 5
w4 = .04
Google
Trends
w5 = .01
Authority
Sites
w6 = .03
Combined
model
Predicted
Order
xNskxL1q archbishops 5/27 31,309 5 16,545 -1,655 -1,324 165 1,489 15,222 5
xqNMVRB4 marijuana3 5/17 10,048 9 9,115 -729 1,823 456 1,367 12,032 7
khpw6LCt airgun 5/15 57,185 4 50,898 4,072 6,108 -2,036 3,054 62,096 2
drCmyCHZ postal 5/24 27,929 6 21,280 426 1,702 -426 2,554 25,536 4
nBqKR7bm Malaysian 6/4 2,895,387 1 446,841 44,684 -71,494 17,874 -13,405 424,499 1
kVhNfHQ1 assault 5/21 16,568 7 14,720 -294 -589 -442 -2,208 11,187 8
bMJpDrNq habeas 5/27 12,036 8 6,769 -406 271 -68 203 6,769 9
KQWSvsKr aggag 5/10 5,448 10 5,380 -215 861 -269 -484 5,273 10
Rd8C54p1 thallium 6/4 734,304 2 83,231 3,329 -16,646 1,665 -9,988 61,591 3
V3hNt2fB transnational 6/3 88,236 3 17,376 1,043 -2,085 521 -2,606 14,248 6
Baseline IR Model Prediction
CS410 Course Project Presentation
Quo Vadis?
Do Research
• Collect more data, train parameters, learn different ways to make predictions
• Publish
• Awesome? idea for a team competition homework 5 in a future class
Sharpen CS skills
• Whitehouse.gov released API on 5/1 and a historical corpus on 5/2
• Next Whitehouse hackathon on 6/1
Make money
• Turn this into an actual app and host it on web site
– Business model: tweet dashboard link to anyone who tweets a petition, dashboard
site is advertising supported
• Apply methods to other petition sites:
change.org, gopetition.com, ipetitions.com, signon.org, thepetitionsite.com, care2.com
(or get a job at one of these companies)
Give back
• Fraudulent petition signature detection
• Mine the web for new petition topics with high success potential
12

Mais conteúdo relacionado

Destaque

Technical Data Sheet 2
Technical Data Sheet 2Technical Data Sheet 2
Technical Data Sheet 2Brian Nam
 
Export Booster Brochure 2012
Export Booster Brochure 2012Export Booster Brochure 2012
Export Booster Brochure 2012Brian Nam
 
Chic Plants for Hip Gardeners
Chic Plants for Hip GardenersChic Plants for Hip Gardeners
Chic Plants for Hip GardenersKelly Norris
 
Internet networking tools for teachers
Internet networking tools for teachersInternet networking tools for teachers
Internet networking tools for teacherstcone
 
Zeus catalogue 2014
Zeus catalogue 2014Zeus catalogue 2014
Zeus catalogue 2014Brian Nam
 
Daeheung pump systems catalogue 2012
Daeheung pump systems catalogue 2012Daeheung pump systems catalogue 2012
Daeheung pump systems catalogue 2012Brian Nam
 
Daejin manual 2013
Daejin manual 2013Daejin manual 2013
Daejin manual 2013Brian Nam
 
Daejin japanese catalogue 2011 for web
Daejin japanese catalogue 2011 for webDaejin japanese catalogue 2011 for web
Daejin japanese catalogue 2011 for webBrian Nam
 
A 가변용량형피스톤펌프
A 가변용량형피스톤펌프A 가변용량형피스톤펌프
A 가변용량형피스톤펌프Brian Nam
 
Technical Data Sheet
Technical Data SheetTechnical Data Sheet
Technical Data SheetBrian Nam
 
Rachel\'s Final
Rachel\'s FinalRachel\'s Final
Rachel\'s Finalmcwizard
 
Business Success Through VA/VE
Business Success Through VA/VEBusiness Success Through VA/VE
Business Success Through VA/VEgojo67
 

Destaque (13)

Technical Data Sheet 2
Technical Data Sheet 2Technical Data Sheet 2
Technical Data Sheet 2
 
Export Booster Brochure 2012
Export Booster Brochure 2012Export Booster Brochure 2012
Export Booster Brochure 2012
 
Chic Plants for Hip Gardeners
Chic Plants for Hip GardenersChic Plants for Hip Gardeners
Chic Plants for Hip Gardeners
 
Internet networking tools for teachers
Internet networking tools for teachersInternet networking tools for teachers
Internet networking tools for teachers
 
Intermediation
IntermediationIntermediation
Intermediation
 
Zeus catalogue 2014
Zeus catalogue 2014Zeus catalogue 2014
Zeus catalogue 2014
 
Daeheung pump systems catalogue 2012
Daeheung pump systems catalogue 2012Daeheung pump systems catalogue 2012
Daeheung pump systems catalogue 2012
 
Daejin manual 2013
Daejin manual 2013Daejin manual 2013
Daejin manual 2013
 
Daejin japanese catalogue 2011 for web
Daejin japanese catalogue 2011 for webDaejin japanese catalogue 2011 for web
Daejin japanese catalogue 2011 for web
 
A 가변용량형피스톤펌프
A 가변용량형피스톤펌프A 가변용량형피스톤펌프
A 가변용량형피스톤펌프
 
Technical Data Sheet
Technical Data SheetTechnical Data Sheet
Technical Data Sheet
 
Rachel\'s Final
Rachel\'s FinalRachel\'s Final
Rachel\'s Final
 
Business Success Through VA/VE
Business Success Through VA/VEBusiness Success Through VA/VE
Business Success Through VA/VE
 

Semelhante a Petition predictor final

ADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the EdgeADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the EdgeDATAVERSITY
 
Value Stream Mapping – Stories From the Trenches
Value Stream Mapping – Stories From the TrenchesValue Stream Mapping – Stories From the Trenches
Value Stream Mapping – Stories From the TrenchesDevOps.com
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
 
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...Quantopian
 
Us Ignite Global City Teams Challenge Funding Opportunities
Us Ignite Global City Teams Challenge Funding OpportunitiesUs Ignite Global City Teams Challenge Funding Opportunities
Us Ignite Global City Teams Challenge Funding OpportunitiesUS-Ignite
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...DevClub_lv
 
Rakesh-Nune-Incident-Management-for-DDOT
Rakesh-Nune-Incident-Management-for-DDOTRakesh-Nune-Incident-Management-for-DDOT
Rakesh-Nune-Incident-Management-for-DDOTRakesh Nune
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceLucidworks
 
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016Glenn Gutmacher
 
Discover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and DiscoveryDiscover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and DiscoveryNew Delhi Salesforce Developer Group
 
Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...
Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...
Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...Agile Greece
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...Steve Kramer
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkJim Kaplan CIA CFE
 
Odata V4 : The New way to REST for Your Applications
Odata V4 : The New way to REST for Your Applications Odata V4 : The New way to REST for Your Applications
Odata V4 : The New way to REST for Your Applications Alok Chhabria
 
Colman Hackathon Webhose.io API Reference
Colman Hackathon Webhose.io API ReferenceColman Hackathon Webhose.io API Reference
Colman Hackathon Webhose.io API ReferenceOhad Flinker
 
Using Chaos to Disentangle an ISIS-Related Twitter Network
Using Chaos to Disentangle an ISIS-Related Twitter NetworkUsing Chaos to Disentangle an ISIS-Related Twitter Network
Using Chaos to Disentangle an ISIS-Related Twitter NetworkSteve Kramer
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambdadarach
 
Strategic agency of a Pakistani offshoring service provider
Strategic agency of a Pakistani offshoring service providerStrategic agency of a Pakistani offshoring service provider
Strategic agency of a Pakistani offshoring service providerUmair Shafi Choksy
 

Semelhante a Petition predictor final (20)

ADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the EdgeADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the Edge
 
Value Stream Mapping – Stories From the Trenches
Value Stream Mapping – Stories From the TrenchesValue Stream Mapping – Stories From the Trenches
Value Stream Mapping – Stories From the Trenches
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
 
Us Ignite Global City Teams Challenge Funding Opportunities
Us Ignite Global City Teams Challenge Funding OpportunitiesUs Ignite Global City Teams Challenge Funding Opportunities
Us Ignite Global City Teams Challenge Funding Opportunities
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
 
Rakesh-Nune-Incident-Management-for-DDOT
Rakesh-Nune-Incident-Management-for-DDOTRakesh-Nune-Incident-Management-for-DDOT
Rakesh-Nune-Incident-Management-for-DDOT
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
 
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016
 
Discover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and DiscoveryDiscover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and Discovery
 
Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...
Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...
Improving Agility (Learning from Maersk Line's Journey) | Özlem Yüce | Agile ...
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS ...
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
 
Odata V4 : The New way to REST for Your Applications
Odata V4 : The New way to REST for Your Applications Odata V4 : The New way to REST for Your Applications
Odata V4 : The New way to REST for Your Applications
 
Colman Hackathon Webhose.io API Reference
Colman Hackathon Webhose.io API ReferenceColman Hackathon Webhose.io API Reference
Colman Hackathon Webhose.io API Reference
 
Using Chaos to Disentangle an ISIS-Related Twitter Network
Using Chaos to Disentangle an ISIS-Related Twitter NetworkUsing Chaos to Disentangle an ISIS-Related Twitter Network
Using Chaos to Disentangle an ISIS-Related Twitter Network
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambda
 
Strategic agency of a Pakistani offshoring service provider
Strategic agency of a Pakistani offshoring service providerStrategic agency of a Pakistani offshoring service provider
Strategic agency of a Pakistani offshoring service provider
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Petition predictor final

  • 1. CS410 Course Project Presentation Petition Predictor CS410 Spring 2013 Lucky Adike Martin McEnroe Dann Ormond 1
  • 2. CS410 Course Project Presentation Problem Statement Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances. - The First Amendment of the United States Constitution • January 2012: Congress proposed legislation on behalf of content distributors • The internet community grew increasingly alarmed about the change and side effects • Several well publicized events took place on January 18, 2012 as part of the SOPA blackout day: Google, Reddit, Wired, Wikipedia and 115,000 other websites modified their web presence to protest the pending legislation. • January 20th the legislation was shelved indefinitely What useful information retrieval tool could be built? • Could this citizenry-government action have been anticipated and predicted? • Could information retrieval and analysis of the online conversation anticipate and predict the end result? 2
  • 3. CS410 Course Project Presentation 3 100,000 signatures in 30 days Which new petitions will hit threshold? Reach threshold and Whitehouse responds Must register with email and zip code Related work: On 2/21/13 Whitehouse hosts hackathon and releases project results on 5/1. Pulse predicts when the threshold will pass 100k: http://youtu.be/5-2P4GFZf8Y https://github.com/DruRly/pulse
  • 4. CS410 Course Project Presentation Solution Approach • 1st Idea: Classify the petition: – “1” : Petition will receive 100,000 discrete, validated signatures within 30 days – “0” : Petition will not pass 1000,000 threshold in time • How to make a classification decision? 1. statistical analysis of past performance. • Wrote a Python program to scrape the whitehouse website every 8 hours. Stored in a JSON object for use in subsequent analysis and retrieval 4 Text of petitionSignature count every 8 hours starting 4/28 Petition create date (but only viewable on website after 150 signatures a unique identifier, also useful as a search term Title of petition During course of project we changed to ranking petitions
  • 5. CS410 Course Project Presentation Logarithmic Curve Fit of 10 Most Likely Petitions 5 150 1500 15000 150000 0 10 20 30 40 50 60 70 80 90 100 NumberofSignatures(Logscale) Time in 8 hour increments (petitions time shifted to common origin = creation date) archbishops marijuana3 airgun postal Malaysian assault habeas aggag thallium transnational Log. (archbishops) Log. (marijuana3) Log. (airgun) Log. (postal) Log. (Malaysian) Log. (assault) Log. (habeas) Log. (aggag) Log. (thallium) Log. (transnational) Threshold @ 100,000 signatures Curve fit then predict the 30th day value (x = 90 since we sample every 8 hours) Petition ‘fatigue’ suggests logarithmic model is better predictor - Ln used (base w1 = e) can be tuned
  • 6. CS410 Course Project Presentation Twitter: Tweets and Followers After signing, wh.gov site encourages you to promote the petition • Used public Twitter REST API • Search on the petition title • Tweet Rate = count / # of days (twitter limits age of tweets in API) • Use transformation of rate to reward place in rank, not absolute value difference – sublinear – linear – exponential • Guess: linear 6 Tweet Weight Adjusted for ∑ƒ(followers) • Are some tweeters more important than others? • Can we develop something like authorities/hubs? • Weighted Rate incorporates number of followers to increase/decrease score of each tweet Adj. Score = ∑ log5(followers) / days of tweets Base 5 -> Pivot point is w2 = 5 followers – can be tuned rank 1.05 0.95
  • 7. CS410 Course Project Presentation Transforming Rank to Boost Factors 7 • Petition rank is mapped via a linear function – function type can be tuned • Tuning scaling parameter applied based on judgment of importance of each IR category – tweet rate: w3= .02 1st -> 1.10; 10th -> .90 – follower adjusted tweet rate: w4 = .04 1st -> 1.20; 10th -> .80 Petition ID Ln Curve Fit w1= e Tweets Tweet Rate per Day Rank Boost Factor Boost Follower Weighted Rate Weighted/Ra te Ratio Rank Boost Factor Boost xNskxL1q 16,545 94 11.8 10 0.90 -1,655 39.7 3.379 7 0.92 -1,324 xqNMVRB4 9,115 97 12.1 9 0.92 -729 44.1 3.636 1 1.20 1,823 khpw6LCt 50,898 1022 127.8 2 1.08 4,072 459.8 3.600 3 1.12 6,108 drCmyCHZ 21,280 231 28.9 5 1.02 426 103.1 3.570 4 1.08 1,702 nBqKR7bm 446,841 1676 838.0 1 1.10 44,684 2675.7 3.193 9 0.84 -71,494 kVhNfHQ1 14,720 168 21.0 6 0.98 -294 71.7 3.412 6 0.96 -589 bMJpDrNq 6,769 114 14.3 8 0.94 -406 49.6 3.479 5 1.04 271 KQWSvsKr 5,380 127 15.9 7 0.96 -215 57.7 3.635 2 1.16 861 Rd8C54p1 83,231 93 31.0 4 1.04 3,329 63.5 2.047 10 0.80 -16,646 V3hNt2fB 17,376 508 63.5 3 1.06 1,043 208.5 3.283 8 0.88 -2,085
  • 8. CS410 Course Project Presentation Can Google Trends help us? 8 Chunks,value (0 – 100) Revoke US Visa,7 on,83 National Security Grounds,7 to,83 Venezuelan Government Officials,0 involved,65 in,93 Transnational Organized Crime,65 Converted the petition title into search phrases using OpenNLP • sentence detector • tokenizer & POS tagger => Chunker Some observations • Chunking produced common terms with high scores • Would be more useful to build a custom Query background language model – need more data • Not clear how Google trends computes values from 0 to 100 – different petitions are not relative to each other • Doesn’t appear to be “bag of words” model. What about semantically equivalent terms? We were hoping for a tf-idf weighting from the web • Is there another tool out there? Is there functions of the API we didn’t exploit? Will the API evolve? • most unreliable IR source therefore w5= .01 Results from web interface Results from API interface
  • 9. CS410 Course Project Presentation Authority Sites via Bing API • Created list of 30 authoritative web sites (e.g., cnn.com). Each weighted equally. • Sent full title of petition as query to Bing API exactly as listed on wh.gov: “Invest and deport Jasmine Sun who was the main suspect of a famous Thallium poison murder case (victim:Zhu Lin) in China” • Measured number of responses in the top 50 results that came from an authoritative domain - eliminated self-posting parts of domain: http://ireport.cnn.com/docs/DOC-965382 • Observation: Most petitions do not receive mainstream attention • Second most reliable w6= .03 9 Petition ID keyword Close Date Ln Curve Fit Authority Sites Rank Boost Factor Boost xNskxL1q archbishops 5/27 16,545 5 3 1.09 1489 xqNMVRB4 marijuana3 5/17 9,115 8 1 1.15 1367 khpw6LCt airgun 5/15 50,898 4 4 1.06 3054 drCmyCHZ postal 5/24 21,280 6 2 1.12 2554 nBqKR7bm Malaysian 6/4 446,841 2 6 0.97 -13405 kVhNfHQ1 assault 5/21 14,720 0 10 0.85 -2208 bMJpDrNq habeas 5/27 6,769 3 5 1.03 203 KQWSvsKr aggag 5/10 5,380 2 8 0.91 -484 Rd8C54p1 thallium 6/4 83,231 0 9 0.88 -9988 V3hNt2fB transnational 6/3 17,376 2 7 0.85 -2606
  • 10. CS410 Course Project Presentation Putting it together • Our focus was on acquiring data and constructing a model and automated where necessary and using open tools, APIs, and information sources • Some work about transfer between modules and final ranking and computation needs more automation if we are to run unattended • Much data analysis, both manual and automated to guess at important sources and parameters. Many initial ideas didn’t pan out: – Sentiment analysis (no such thing as bad publicity) – Google trends surprisingly useless – forced to do manual manipulation – very low confidence in this as a prediction – Facebook button on wh.gov but didn’t appear to be used as much as twitter – No training data to choose parameters. Choose simple “boost” model to start and used intuition from project to guess at relative size of boost from different sources. 10 Stop 85 seismic airgun testing 0 for 86 oil and gas 77 off 80 the U.S. East Coast . 0
  • 11. CS410 Course Project Presentation Putting Our Money Where Our Mouth Is… Ranked predictions of 10 most likely1 of the 842 petitions started between April 5 and May 4 and ranked predictions. How will we do? 11 1. Only petitions that have at least 150 signatures are visible to us 2. One petition ( 0MNp0Bys ) started on 4/15 and hit 100k before we started collecting statistics so we excluded this form our data set Petition ID keyword Close Date Linear Curve Fit Naïve Order Ln Curve Fit w1= e Twitter w3 = .02 Twitter+ w2 = 5 w4 = .04 Google Trends w5 = .01 Authority Sites w6 = .03 Combined model Predicted Order xNskxL1q archbishops 5/27 31,309 5 16,545 -1,655 -1,324 165 1,489 15,222 5 xqNMVRB4 marijuana3 5/17 10,048 9 9,115 -729 1,823 456 1,367 12,032 7 khpw6LCt airgun 5/15 57,185 4 50,898 4,072 6,108 -2,036 3,054 62,096 2 drCmyCHZ postal 5/24 27,929 6 21,280 426 1,702 -426 2,554 25,536 4 nBqKR7bm Malaysian 6/4 2,895,387 1 446,841 44,684 -71,494 17,874 -13,405 424,499 1 kVhNfHQ1 assault 5/21 16,568 7 14,720 -294 -589 -442 -2,208 11,187 8 bMJpDrNq habeas 5/27 12,036 8 6,769 -406 271 -68 203 6,769 9 KQWSvsKr aggag 5/10 5,448 10 5,380 -215 861 -269 -484 5,273 10 Rd8C54p1 thallium 6/4 734,304 2 83,231 3,329 -16,646 1,665 -9,988 61,591 3 V3hNt2fB transnational 6/3 88,236 3 17,376 1,043 -2,085 521 -2,606 14,248 6 Baseline IR Model Prediction
  • 12. CS410 Course Project Presentation Quo Vadis? Do Research • Collect more data, train parameters, learn different ways to make predictions • Publish • Awesome? idea for a team competition homework 5 in a future class Sharpen CS skills • Whitehouse.gov released API on 5/1 and a historical corpus on 5/2 • Next Whitehouse hackathon on 6/1 Make money • Turn this into an actual app and host it on web site – Business model: tweet dashboard link to anyone who tweets a petition, dashboard site is advertising supported • Apply methods to other petition sites: change.org, gopetition.com, ipetitions.com, signon.org, thepetitionsite.com, care2.com (or get a job at one of these companies) Give back • Fraudulent petition signature detection • Mine the web for new petition topics with high success potential 12

Notas do Editor

  1. 0 sec
  2. 25 Sec
  3. 20 Sec
  4. 15 SecWe noticed that of approximately 90 current predictions, three of them have to do with marijuana. This gave us an idea.
  5. work to do: Marty – compute numbers, sort by start date