Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
voting advice slides
1. Voting Advice via Direct Access to the Relevant Data 1
Voting Advice via Direct Access to
the Relevant Data
Maarten Marx
Universiteit van Amsterdam
Politicologen etmaal, Amsterdam, 2011-06-09
2. Voting Advice via Direct Access to the Relevant Data 2
Outline
• Two types of voting advice systems
• Lipschits on the web
• Technical details
• Conclusions and how further?
3. Voting Advice via Direct Access to the Relevant Data 3
2 types of voting advices systems
• Lipschits method (1977–1998)
• Stemwijzer method (on the web, from 1998)
Same users: voters
Same motivation: help voters in making a choice, based on
manifesto data
4. Voting Advice via Direct Access to the Relevant Data 4
Primary goals
Lipschits • quickly find standpoint of party in manifesto on topic X
• easily compare standpoints of parties on topic X
Stemwijzer quickly find which parties best fit the user on a (for all
voters) fixed set of topics
5. Voting Advice via Direct Access to the Relevant Data 5
Secondary goals
Reuse obtained data for scientific research.
Lipschits
• standardize manifestos
• rich list of salient topics for each election
• rich controlled vocabulary
• (Now) excellent training data for creating classifiers
KiesKompas
• positions of parties on several topics
• positions of “the electorate” on these topics
6. Voting Advice via Direct Access to the Relevant Data 6
Differences between Lipschits and Stemwijzer
• One size fits all vs user decides on topics and parties
• Direct vs indirect access to primary sources
• Different input-output behaviour
7. Voting Advice via Direct Access to the Relevant Data 7
Input-output
In Out
Stemwijzer answer to questions ranked list of parties
Kieskompas ” model of user as a party
Lipschits controled vocabulary terms relevant paragraphs for each party
VerkiezingsKijker ” or free search terms ”
8. Voting Advice via Direct Access to the Relevant Data 8
Demo: ’Lipschits on the web’
verkiezingskijker.nl
9. Voting Advice via Direct Access to the Relevant Data 9
History VerkiezingsKijker
TK 2006 UvA-Stemwijzer.
• Eddy Habben Jansen: take Lipschits as inspiration
• Motivation Stemwijzer: add “proof” for party positions from
their manifestos
PS 2007 UvA-Kieskompas. Verkiezingskijker used to facilitate large
amount of party-placements (12 provinces × 10 parties × 36
positions = 4320 placements.
DNPP corpus UvA Bsc thesis: search engine for DNPP manifesto
corpus.
TK 2010 Google
10. Voting Advice via Direct Access to the Relevant Data 10
Technical Details: outline
1. Idea
2. How to do it
3. Main problem
4. Solutions
11. Voting Advice via Direct Access to the Relevant Data 11
Idea verkiezingskijker
• Replicate Lipschits, “Google style”
• Add free keyword search
• Make it scalable, faster to make (and without a Lipschits . . . )
12. Voting Advice via Direct Access to the Relevant Data 12
How to do that?
• Collect manifestos (in time . . . )
• Standardize them into one data format
• Partition each manifesto into meaningful units (paragraphs)
Outcome Basic Google style search engine which returns on each
search term a ranked list of paragraphs
Advanced search restrict to parties
13. Voting Advice via Direct Access to the Relevant Data 13
Main problem: Semantic gap
Voter and manifesto use different terms to talk about the same topic
• different parties use different terms to talk about the same topic
• small amounts of text per retrieval unit make this problem worse
• Recall Problem: system does not retrieve all relevant paragraphs.
14. Voting Advice via Direct Access to the Relevant Data 14
Two solutions to this
Hierarchical controlled vocabulary
• Basically back to Lipschits.
• Burden at the user.
Document expansion
• find related terms (schiphol vliegveld luchthaven vliegtuig . . .
• expand paragraphs: if it contains one term, add all others
• Aim: Improve recall.
• Danger: topic drift (thus more false positives)
15. Voting Advice via Direct Access to the Relevant Data 15
Predicament
With both solutions we seem to be back at Lipschits and need to do
all the work he did . . .
16. Voting Advice via Direct Access to the Relevant Data 16
Our solution: learning from examples
• Stemwijzer created a list of 100 important election topics.
• For each topic, Stemwijzer found 5 highly relevant paragraphs
• From these paragraphs we harvested all overused terms (using
corpus comparison techniques [Rayson, Garside 2000])
• For each topic we took the top k terms
• Quick manual check to remove outliers
• Output: classifier for each topic, and set of expansion terms.
17. Voting Advice via Direct Access to the Relevant Data 17
Conclusion and what next?
• Both systems are complimentary.
• Modern Lipschits system is useful for both makers and users of
stemwijzer-like systems.
• Fine grained classification of manifestos (and alternatives . . . ) is
useful for comparative research (e.g., Breeman-Timmermans,
Louwerse)
18. Voting Advice via Direct Access to the Relevant Data 18
What next/Discussion
• Standardization of controlled vocabularies and development of
high quality gold standard data is desirable
• Soon: Lipschits 1998 available in Excel and as a fully searchable
hyperlinked web-docoment.
• Wish? same for the “Verkiezingsprogramma’s met cd-rom”
(Holsteyn et al) series?
Or is the system by Google sufficient?