SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
ALESSIO CIMARELLI 
Data scientist at Dataninja 
jenkin@dataninja.it | @jenkin27 
dtnj.it/erice14 
International School of Science Journalism 
The Digital World (Erice, June 10th, 2014)
aka jenkin 
PAST 
Master Degree in Physics at the University of Rome "La Sapienza" 
Master in Science Communication at the International School for 
Advanced Studies (SISSA-ISAS) in Trieste 
Press officer at the European Laboratory for Non-Linear Spectroscopy 
(LENS) in Florence 
PRESENT 
Freelance data journalist, web developer, open data activist, citizen 
scientist, ...
Data journalism & data visualization made in Italy
You know very well how it works... :)
As topic 
Stories about the edge of scientific research and human knowledge. 
Key role in relationship between science and society. 
Science journalist can be a watchdog against false science and scientific 
frauds.
As method 
It would be evident in , because the workflow is 
similar to police inquiries or scientific research. 
Many informations from different sources, accountability problems, 
hypothesis and proofs, trial and error cycles, and so on. 
Not only a story, but also a discovery itself...
A word in a buzzwords era 
when his investigation 
is ultimately based on (or driven by) digital data, he acquires such prefix. 
If a journalist want to tell the world, and the world is now made of digital 
and quantitative informations, he has to acquire skills in management 
and interpretation of data, or he will miss an opportunity.
Teamwork and multidisciplinary 
Nose for news, public interest, intuition based on contest knowledge 
Analytical mind, mathematical and statistical skills, intuition based on 
science of numbers
Teamwork and multidisciplinary 
Problem solving, hi-tech knowledge in hardware and software, nerd (or 
geek, if you prefer) mood 
Artistic sensibility and intuition, knowledge in User Experience theory and 
techniques
Miners, dustmen, researchers, and story tellers 
Public search engines or deep web? Official 5-stars open data or web 
spiders and screen scrapers? Monitor and keyboard, smartphone and 
touch, or boots and mud? 
Data should be read by machines and not by humans! Datasets could 
hide errors, inconsistencies, lies... or show only a part of a story.
Miners, dustmen, researchers, and story tellers 
Normalizations and comparisons, filtering, grouping, aggregation, 
correlations, ... 
How to represent numbers and relations among numbers? Yes, with 
arabic numerals, but pictures are worth a thousand words... as long as 
you keep in mind that there are facts behind the numbers, and 
(copyright of The Guardian).
In method 
You run into a dataset and feel the presence of a possible news... 
OR 
... you have an interest, an idea, a thesis, so you are looking for data. 
Having quantitative data about a phenomenon means that somewhere 
there is a you have to understand, test, 
verify... and interpret! 
Data themselves can suggest new ways for your investigation or even 
falsify some hypothesis or assumptions. 
Common sense, intellectual honesty, professional ethics
Some random examples 
New Scientist Apps 
tornadoes 
warmingworld 
exoplanets 
planck 
sealevel 
The Telegraph map of wind farm 
Sorting algorithms 
Meteorites 
Earth Journalism Network
by Global Editors Network 
Health 
American Way of Birth, Costliest in the World 
Inside the Government's Drug Data 
Which Emergency Room Will See You the Fastest? 
New York floods 
Breathless and Burdened 
When Italy is shaking 
Italy, a delicate land 
Kepler’s Tally of Planets 
Biomassa 
(NYT) 
(ProPublica) 
(ProPublica) 
Environment 
(ProPublica) 
(Center for Public Integrity) 
(La Stampa) 
(La Stampa) 
Astronomy 
(NYT) 
Energy 
(Planbureau voor de Leefomgeving)
Research data, science world, citizen science
Hard sciences and social sciences 
Ok, neither LHC petabytes are for journalists, nor statistical data from 
epidemiologic surveys. 
But , or (open) 
, why not? 
If you are not specialized in a specific topic or if you lack the knowledge 
about the framework, you can ask to an expert you trust. 
You can also use numbers not in an investigation, but to tell a complex 
story using infographics and interactive visualizations.
Bibliographies, social networks of scientists, infrastructures 
Science is a human activity and an industry (almost) like any other. 
How are the European funds invested in scientific research? Where are 
the centers specialized in the treatment of specific diseases? Why some 
well known monitoring technologies are not used in some countries?
Sensor-based journalism 
Cheap electronics and sensors 
+ 
open hardware 
+ 
free information sharing 
= 
data from stakeholders other than scientists 
It's early, but promising: 
Swiss Make Open Data Camps 
Japan Geigermap at-a-glance 
Citizen Science & Sensors
If you have data, it's better if you know how to deal with them. 
If you think you may find some data, it's better if you use them. 
If someone use data, it's better if you can check his claims. 
Play with data is funny!
Welcome to the jungle!
Some examples 
Public administration 
International organizations 
NGOs 
Civic activists 
Press offices 
Leaks 
Social networks 
Journalistic sources 
Single journalists 
Ourselves...
Data made public and reusable 
Data.gov 
Data.gov.uk 
Open Data Hub 
OpenIR 
(USA) 
(UK) 
(Italy) 
(Indonesia) 
...
Remember the buzzword era? 
Data from big science experiments (Atlas, Human Brain Project, ...) 
Social networks (Facebook, Twitter, but also eBay, Amazon, ...) 
Maybe it's not for journalists, but it's a hot topic... 
Google Earth Engine
For machine, not for human 
The keyword is ! 
A well-formed table represent a structured data set. A list of facebook 
comments, articles of a newspaper, a recorded speech are not structured 
data (and so are not machine-readable).
It all depends on the format 
If we have Gladstone Gander as best friend: 
spreadsheet (xls, xlsx, ods, csv, tsv); 
not-so-common good formats (xml, sql, json, shp, kml, ...). 
If we are not so lucky: 
tables or lists in web pages (html); 
simple tables in well-done pdfs (pdf). 
If we have Murphy as worst enemy: 
scanned images, even if in a pdf wrapper (png, jpg, pdf); 
digital data behind complex search engines. 
And if we have the best data ever, but under closed license?
Well-formed data sets 
Numbers are numbers, strings are strings and not numbers, datetime 
must always have a single format (ie. yyyy/mm/dd), localization is 
important, no gender values in names' column or similar mixings, every 
elements should be named with a Unique Identifier (ID). 
Data types computer understands: 
integers (with sign, zero included), 
floating numbers (with sign), 
datetime, 
characters and string (case sensitive), 
null value (the strange case of a value that states "I'm not a value"). 
And simple comparisons are strictly equalities, also in strings!
Aggregation, average, normalization, relative difference, distribution, ... 
A single rule: correlation does not imply causation! 
Spurious correlations: 
Correlated: 
http://www.tylervigen.com/ 
http://www.correlated.org/
At a glance
With great power comes great responsibility 
The basic idea is quite simple: you have quantities expressed in numbers 
and geometric objects defined by dimensions (ie. radius in a circle), so you 
just have to decide how connect your quantities to visual dimensions. 
There are several (un)common charts and endless combinations: scatter 
plots, lines, bars, areas, pies, donuts, bubble charts, treemaps, word 
clouds, alluvional diagrams, dendrograms, networks, streamgraphs, 
gauges, chord diagrams, motion charts, parallel coordinates, sankey 
diagrams, maps, choropleth, ... 
On there is an endless d3js.org gallery list of examples!
Building a simple dataset or a large and complex database focused on a 
topic of public interest leads to a valuable product: the database itself, 
intended as a collection of (linked) data plus metadata. 
Can a public frontend to such database, designed for citizens, journalists, 
stakeholders, be considered a journalistic outcome? If journalism is a 
public good, it can be a service, not only a product...
Scraping 
"Copy & Paste" combo 
Data Miner 
IMPORTXML() 
Tabula 
for Chrome browser 
Google Spreadsheet function 
for simple pdfs 
Python (or other languages) scripts and libraries 
Cleaning 
Filters and "Find & Replace" tools in spreadsheets 
Open Refine 
Analysis 
Pivot tables and simple charts in spreadsheets 
Dedicated softwares (ie. open-source or ) 
Viz 
QtiPlot QGIS 
Datawrapper RAW Google Fusion Tables Tableau CartoDB 
infogr.am easel.ly Timelinejs Timemapper StoryMap d3js 
, , , , , 
, , , , , , ...
Tina Casagrand, " Data journalism for science journalists 
", The Open 
Notebook (2014) 
Paul Bradshaw, " Scraping for Journalists 
", Leanpub (2014) 
John Mair, Richard Lance Keeble, " Data Journalism 
", abramis (2014) 
Paul Bradshaw, " Data Journalism Heist 
" 
Claire Miller, " Getting Started with Data Journalism 
", Leanpub (2013) 
Nathan Yau, " Data Points 
", Wiley (2013) 
Simon Rogers, " Facts are Sacred 
", Faber & Faber (2013) 
Jonathan Gray, " The Data Journalism Handbook 
", O'Reilly (2012) 
Nathan Yau, " Visualize This 
", Wiley (2011)
Alessio "jenkin" Cimarelli 
jenkin@dataninja.it 
@ 
Dataninja 
jenkin27 
www.dataninja.it 
school.dataninja.it 
dataninja.it/newsletter 
Q&A 
school.dataninja.it/qa 
SWIM 
sciencewritersinitaly.wordpress.com
Hacking + Marathon = Hackathon 
ESPAD (European students and drugs): http://www.espad.org/en/ 
RASFF (EU food safety): http://ec.europa.eu/food/food/rapidalert/
http://ec.europa.eu/food/food/rapidalert/ 
The Rapid Alert System for Food and Feed (RASFF) was put in place to 
provide food and feed control authorities with an effective tool to 
exchange information about measures taken responding to serious risks 
detected in relation to food or feed. This exchange of information helps 
Member States to act more rapidly and in a coordinated manner in 
response to a health threat caused by food or feed. 
dtnj.it/rasff2013
http://www.espad.org/en/ 
This is the report from the fifth data-collection wave of the European 
School Survey Project on Alcohol and Other Drugs (ESPAD). It is based on 
data from more than 100,000 European students. Over the years about 
500,000 European students have answered the ESPAD questionnaire. A 
total of 36 countries and regions have contributed data to the 
2011 ESPAD Database. Drugs list includes cigarettes, alcohol, cannabis, 
other illecit drugs, tranquillants and sedatives without prescriptions. 
dtnj.it/espad2011

Mais conteúdo relacionado

Destaque

Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...
Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...
Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...José Ignacio Sánchez Amezua
 
FANSHOES Informational Packet
FANSHOES Informational PacketFANSHOES Informational Packet
FANSHOES Informational PacketNick Rovisa
 
White Paper: Social Monitoring
White Paper: Social MonitoringWhite Paper: Social Monitoring
White Paper: Social MonitoringCory Grassell
 
Final report for oap butterfly garden
Final report for oap butterfly gardenFinal report for oap butterfly garden
Final report for oap butterfly gardenmiaomiaopig
 
Meeting planner guide Baix Llobregat
Meeting planner guide Baix LlobregatMeeting planner guide Baix Llobregat
Meeting planner guide Baix LlobregatTurismeBaixLlobregat
 
Pdf de taller apicultura marzo
Pdf de taller apicultura marzoPdf de taller apicultura marzo
Pdf de taller apicultura marzoRuben NotFun
 
Investment press release
Investment press releaseInvestment press release
Investment press releaseDuczko
 
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50CPV
 
Silsilah keluarga gaffar
Silsilah keluarga gaffarSilsilah keluarga gaffar
Silsilah keluarga gaffarWarnet Raha
 

Destaque (15)

Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...
Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...
Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...
 
Northeast Asia Tourism Forum Cross border tourism planning presentation
Northeast Asia Tourism Forum Cross border tourism planning presentationNortheast Asia Tourism Forum Cross border tourism planning presentation
Northeast Asia Tourism Forum Cross border tourism planning presentation
 
الهوبيت والفلسفة
الهوبيت والفلسفةالهوبيت والفلسفة
الهوبيت والفلسفة
 
Final Presentaion BD
Final Presentaion BDFinal Presentaion BD
Final Presentaion BD
 
FANSHOES Informational Packet
FANSHOES Informational PacketFANSHOES Informational Packet
FANSHOES Informational Packet
 
الإدارة بالحب
الإدارة بالحبالإدارة بالحب
الإدارة بالحب
 
White Paper: Social Monitoring
White Paper: Social MonitoringWhite Paper: Social Monitoring
White Paper: Social Monitoring
 
Accidente coche moto
Accidente coche motoAccidente coche moto
Accidente coche moto
 
Mayas 8
Mayas 8Mayas 8
Mayas 8
 
Final report for oap butterfly garden
Final report for oap butterfly gardenFinal report for oap butterfly garden
Final report for oap butterfly garden
 
Meeting planner guide Baix Llobregat
Meeting planner guide Baix LlobregatMeeting planner guide Baix Llobregat
Meeting planner guide Baix Llobregat
 
Pdf de taller apicultura marzo
Pdf de taller apicultura marzoPdf de taller apicultura marzo
Pdf de taller apicultura marzo
 
Investment press release
Investment press releaseInvestment press release
Investment press release
 
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50
 
Silsilah keluarga gaffar
Silsilah keluarga gaffarSilsilah keluarga gaffar
Silsilah keluarga gaffar
 

Semelhante a When data journalism meets science | Erice, June 10th, 2014

The era of artificial intelligence
The era of artificial intelligenceThe era of artificial intelligence
The era of artificial intelligencePrajjwal Kushwaha
 
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingBernhard Rieder
 
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistRebecca Davis
 
New Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive ComputingNew Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive ComputingPaul King
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
Data science innovations
Data science innovations Data science innovations
Data science innovations suresh sood
 
Visualization in the Digital Humanities
Visualization in the Digital HumanitiesVisualization in the Digital Humanities
Visualization in the Digital HumanitiesCornelius Puschmann
 
Harvesting collective intelligence.
Harvesting collective intelligence. Harvesting collective intelligence.
Harvesting collective intelligence. Alberto Cottica
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...The Higher Education Academy
 
Human-machine Inter-agencies
Human-machine Inter-agenciesHuman-machine Inter-agencies
Human-machine Inter-agenciesmo-seph
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceMhd Sb
 
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...MaikThiele
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st centuryMartinFrigaard
 
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...Azamat Abdoullaev
 

Semelhante a When data journalism meets science | Erice, June 10th, 2014 (20)

The era of artificial intelligence
The era of artificial intelligenceThe era of artificial intelligence
The era of artificial intelligence
 
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative Thinking
 
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
 
New Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive ComputingNew Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive Computing
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Visualization in the Digital Humanities
Visualization in the Digital HumanitiesVisualization in the Digital Humanities
Visualization in the Digital Humanities
 
AI 3.0
AI 3.0AI 3.0
AI 3.0
 
Harvesting collective intelligence.
Harvesting collective intelligence. Harvesting collective intelligence.
Harvesting collective intelligence.
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 
Human-machine Inter-agencies
Human-machine Inter-agenciesHuman-machine Inter-agencies
Human-machine Inter-agencies
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
What is Data?
What is Data?What is Data?
What is Data?
 
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
AI Ethics
AI EthicsAI Ethics
AI Ethics
 

Mais de Dataninja

Confiscatibene data & community driven journalism
Confiscatibene data & community driven journalismConfiscatibene data & community driven journalism
Confiscatibene data & community driven journalismDataninja
 
The Migrants’ Files, one year later
The Migrants’ Files, one year laterThe Migrants’ Files, one year later
The Migrants’ Files, one year laterDataninja
 
#migrantsfiles international
#migrantsfiles international#migrantsfiles international
#migrantsfiles internationalDataninja
 
Confiscati Bene a Ferrara
Confiscati Bene a FerraraConfiscati Bene a Ferrara
Confiscati Bene a FerraraDataninja
 
Guida galattica per i data journalists
Guida galattica per i data journalistsGuida galattica per i data journalists
Guida galattica per i data journalistsDataninja
 
Un giornalista tra dati e sensori
Un giornalista tra dati e sensoriUn giornalista tra dati e sensori
Un giornalista tra dati e sensoriDataninja
 
Storie che nascono dai dati, come cambia il giornalismo nell'età della Rete
Storie che nascono dai dati, come cambia il giornalismo nell'età della ReteStorie che nascono dai dati, come cambia il giornalismo nell'età della Rete
Storie che nascono dai dati, come cambia il giornalismo nell'età della ReteDataninja
 
Data journalism: fare giornalismo con metodo (scientifico)
Data journalism: fare giornalismo con metodo (scientifico)Data journalism: fare giornalismo con metodo (scientifico)
Data journalism: fare giornalismo con metodo (scientifico)Dataninja
 
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014Dataninja
 
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...Dataninja
 
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014Data Journalism: strumenti operativi | Bologna, 9 giugno 2014
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014Dataninja
 
Introduzione al data journalism | Roma, 7 giugno 2014
Introduzione al data journalism | Roma, 7 giugno 2014Introduzione al data journalism | Roma, 7 giugno 2014
Introduzione al data journalism | Roma, 7 giugno 2014Dataninja
 
Dispensa Datajournalism | Maggio 2014 | school.dataninja.it
Dispensa Datajournalism | Maggio 2014 | school.dataninja.itDispensa Datajournalism | Maggio 2014 | school.dataninja.it
Dispensa Datajournalism | Maggio 2014 | school.dataninja.itDataninja
 
Tra dati e notizie
Tra dati e notizieTra dati e notizie
Tra dati e notizieDataninja
 
Data visualization in data journalism workflow
Data visualization in data journalism workflowData visualization in data journalism workflow
Data visualization in data journalism workflowDataninja
 
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014Dataninja
 
Come nasce un'inchiesta data-driven
Come nasce un'inchiesta data-drivenCome nasce un'inchiesta data-driven
Come nasce un'inchiesta data-drivenDataninja
 
Pools of data
Pools of dataPools of data
Pools of dataDataninja
 
Web scraping e Datawrapper per giornalisti locali
Web scraping e Datawrapper per giornalisti localiWeb scraping e Datawrapper per giornalisti locali
Web scraping e Datawrapper per giornalisti localiDataninja
 
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...Dataninja
 

Mais de Dataninja (20)

Confiscatibene data & community driven journalism
Confiscatibene data & community driven journalismConfiscatibene data & community driven journalism
Confiscatibene data & community driven journalism
 
The Migrants’ Files, one year later
The Migrants’ Files, one year laterThe Migrants’ Files, one year later
The Migrants’ Files, one year later
 
#migrantsfiles international
#migrantsfiles international#migrantsfiles international
#migrantsfiles international
 
Confiscati Bene a Ferrara
Confiscati Bene a FerraraConfiscati Bene a Ferrara
Confiscati Bene a Ferrara
 
Guida galattica per i data journalists
Guida galattica per i data journalistsGuida galattica per i data journalists
Guida galattica per i data journalists
 
Un giornalista tra dati e sensori
Un giornalista tra dati e sensoriUn giornalista tra dati e sensori
Un giornalista tra dati e sensori
 
Storie che nascono dai dati, come cambia il giornalismo nell'età della Rete
Storie che nascono dai dati, come cambia il giornalismo nell'età della ReteStorie che nascono dai dati, come cambia il giornalismo nell'età della Rete
Storie che nascono dai dati, come cambia il giornalismo nell'età della Rete
 
Data journalism: fare giornalismo con metodo (scientifico)
Data journalism: fare giornalismo con metodo (scientifico)Data journalism: fare giornalismo con metodo (scientifico)
Data journalism: fare giornalismo con metodo (scientifico)
 
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014
 
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...
 
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014Data Journalism: strumenti operativi | Bologna, 9 giugno 2014
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014
 
Introduzione al data journalism | Roma, 7 giugno 2014
Introduzione al data journalism | Roma, 7 giugno 2014Introduzione al data journalism | Roma, 7 giugno 2014
Introduzione al data journalism | Roma, 7 giugno 2014
 
Dispensa Datajournalism | Maggio 2014 | school.dataninja.it
Dispensa Datajournalism | Maggio 2014 | school.dataninja.itDispensa Datajournalism | Maggio 2014 | school.dataninja.it
Dispensa Datajournalism | Maggio 2014 | school.dataninja.it
 
Tra dati e notizie
Tra dati e notizieTra dati e notizie
Tra dati e notizie
 
Data visualization in data journalism workflow
Data visualization in data journalism workflowData visualization in data journalism workflow
Data visualization in data journalism workflow
 
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014
 
Come nasce un'inchiesta data-driven
Come nasce un'inchiesta data-drivenCome nasce un'inchiesta data-driven
Come nasce un'inchiesta data-driven
 
Pools of data
Pools of dataPools of data
Pools of data
 
Web scraping e Datawrapper per giornalisti locali
Web scraping e Datawrapper per giornalisti localiWeb scraping e Datawrapper per giornalisti locali
Web scraping e Datawrapper per giornalisti locali
 
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...
 

Último

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Último (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

When data journalism meets science | Erice, June 10th, 2014

  • 1. ALESSIO CIMARELLI Data scientist at Dataninja jenkin@dataninja.it | @jenkin27 dtnj.it/erice14 International School of Science Journalism The Digital World (Erice, June 10th, 2014)
  • 2. aka jenkin PAST Master Degree in Physics at the University of Rome "La Sapienza" Master in Science Communication at the International School for Advanced Studies (SISSA-ISAS) in Trieste Press officer at the European Laboratory for Non-Linear Spectroscopy (LENS) in Florence PRESENT Freelance data journalist, web developer, open data activist, citizen scientist, ...
  • 3. Data journalism & data visualization made in Italy
  • 4.
  • 5. You know very well how it works... :)
  • 6. As topic Stories about the edge of scientific research and human knowledge. Key role in relationship between science and society. Science journalist can be a watchdog against false science and scientific frauds.
  • 7. As method It would be evident in , because the workflow is similar to police inquiries or scientific research. Many informations from different sources, accountability problems, hypothesis and proofs, trial and error cycles, and so on. Not only a story, but also a discovery itself...
  • 8. A word in a buzzwords era when his investigation is ultimately based on (or driven by) digital data, he acquires such prefix. If a journalist want to tell the world, and the world is now made of digital and quantitative informations, he has to acquire skills in management and interpretation of data, or he will miss an opportunity.
  • 9. Teamwork and multidisciplinary Nose for news, public interest, intuition based on contest knowledge Analytical mind, mathematical and statistical skills, intuition based on science of numbers
  • 10. Teamwork and multidisciplinary Problem solving, hi-tech knowledge in hardware and software, nerd (or geek, if you prefer) mood Artistic sensibility and intuition, knowledge in User Experience theory and techniques
  • 11. Miners, dustmen, researchers, and story tellers Public search engines or deep web? Official 5-stars open data or web spiders and screen scrapers? Monitor and keyboard, smartphone and touch, or boots and mud? Data should be read by machines and not by humans! Datasets could hide errors, inconsistencies, lies... or show only a part of a story.
  • 12. Miners, dustmen, researchers, and story tellers Normalizations and comparisons, filtering, grouping, aggregation, correlations, ... How to represent numbers and relations among numbers? Yes, with arabic numerals, but pictures are worth a thousand words... as long as you keep in mind that there are facts behind the numbers, and (copyright of The Guardian).
  • 13.
  • 14. In method You run into a dataset and feel the presence of a possible news... OR ... you have an interest, an idea, a thesis, so you are looking for data. Having quantitative data about a phenomenon means that somewhere there is a you have to understand, test, verify... and interpret! Data themselves can suggest new ways for your investigation or even falsify some hypothesis or assumptions. Common sense, intellectual honesty, professional ethics
  • 15. Some random examples New Scientist Apps tornadoes warmingworld exoplanets planck sealevel The Telegraph map of wind farm Sorting algorithms Meteorites Earth Journalism Network
  • 16. by Global Editors Network Health American Way of Birth, Costliest in the World Inside the Government's Drug Data Which Emergency Room Will See You the Fastest? New York floods Breathless and Burdened When Italy is shaking Italy, a delicate land Kepler’s Tally of Planets Biomassa (NYT) (ProPublica) (ProPublica) Environment (ProPublica) (Center for Public Integrity) (La Stampa) (La Stampa) Astronomy (NYT) Energy (Planbureau voor de Leefomgeving)
  • 17. Research data, science world, citizen science
  • 18. Hard sciences and social sciences Ok, neither LHC petabytes are for journalists, nor statistical data from epidemiologic surveys. But , or (open) , why not? If you are not specialized in a specific topic or if you lack the knowledge about the framework, you can ask to an expert you trust. You can also use numbers not in an investigation, but to tell a complex story using infographics and interactive visualizations.
  • 19. Bibliographies, social networks of scientists, infrastructures Science is a human activity and an industry (almost) like any other. How are the European funds invested in scientific research? Where are the centers specialized in the treatment of specific diseases? Why some well known monitoring technologies are not used in some countries?
  • 20. Sensor-based journalism Cheap electronics and sensors + open hardware + free information sharing = data from stakeholders other than scientists It's early, but promising: Swiss Make Open Data Camps Japan Geigermap at-a-glance Citizen Science & Sensors
  • 21. If you have data, it's better if you know how to deal with them. If you think you may find some data, it's better if you use them. If someone use data, it's better if you can check his claims. Play with data is funny!
  • 22. Welcome to the jungle!
  • 23. Some examples Public administration International organizations NGOs Civic activists Press offices Leaks Social networks Journalistic sources Single journalists Ourselves...
  • 24. Data made public and reusable Data.gov Data.gov.uk Open Data Hub OpenIR (USA) (UK) (Italy) (Indonesia) ...
  • 25. Remember the buzzword era? Data from big science experiments (Atlas, Human Brain Project, ...) Social networks (Facebook, Twitter, but also eBay, Amazon, ...) Maybe it's not for journalists, but it's a hot topic... Google Earth Engine
  • 26. For machine, not for human The keyword is ! A well-formed table represent a structured data set. A list of facebook comments, articles of a newspaper, a recorded speech are not structured data (and so are not machine-readable).
  • 27. It all depends on the format If we have Gladstone Gander as best friend: spreadsheet (xls, xlsx, ods, csv, tsv); not-so-common good formats (xml, sql, json, shp, kml, ...). If we are not so lucky: tables or lists in web pages (html); simple tables in well-done pdfs (pdf). If we have Murphy as worst enemy: scanned images, even if in a pdf wrapper (png, jpg, pdf); digital data behind complex search engines. And if we have the best data ever, but under closed license?
  • 28. Well-formed data sets Numbers are numbers, strings are strings and not numbers, datetime must always have a single format (ie. yyyy/mm/dd), localization is important, no gender values in names' column or similar mixings, every elements should be named with a Unique Identifier (ID). Data types computer understands: integers (with sign, zero included), floating numbers (with sign), datetime, characters and string (case sensitive), null value (the strange case of a value that states "I'm not a value"). And simple comparisons are strictly equalities, also in strings!
  • 29. Aggregation, average, normalization, relative difference, distribution, ... A single rule: correlation does not imply causation! Spurious correlations: Correlated: http://www.tylervigen.com/ http://www.correlated.org/
  • 31. With great power comes great responsibility The basic idea is quite simple: you have quantities expressed in numbers and geometric objects defined by dimensions (ie. radius in a circle), so you just have to decide how connect your quantities to visual dimensions. There are several (un)common charts and endless combinations: scatter plots, lines, bars, areas, pies, donuts, bubble charts, treemaps, word clouds, alluvional diagrams, dendrograms, networks, streamgraphs, gauges, chord diagrams, motion charts, parallel coordinates, sankey diagrams, maps, choropleth, ... On there is an endless d3js.org gallery list of examples!
  • 32. Building a simple dataset or a large and complex database focused on a topic of public interest leads to a valuable product: the database itself, intended as a collection of (linked) data plus metadata. Can a public frontend to such database, designed for citizens, journalists, stakeholders, be considered a journalistic outcome? If journalism is a public good, it can be a service, not only a product...
  • 33. Scraping "Copy & Paste" combo Data Miner IMPORTXML() Tabula for Chrome browser Google Spreadsheet function for simple pdfs Python (or other languages) scripts and libraries Cleaning Filters and "Find & Replace" tools in spreadsheets Open Refine Analysis Pivot tables and simple charts in spreadsheets Dedicated softwares (ie. open-source or ) Viz QtiPlot QGIS Datawrapper RAW Google Fusion Tables Tableau CartoDB infogr.am easel.ly Timelinejs Timemapper StoryMap d3js , , , , , , , , , , , ...
  • 34. Tina Casagrand, " Data journalism for science journalists ", The Open Notebook (2014) Paul Bradshaw, " Scraping for Journalists ", Leanpub (2014) John Mair, Richard Lance Keeble, " Data Journalism ", abramis (2014) Paul Bradshaw, " Data Journalism Heist " Claire Miller, " Getting Started with Data Journalism ", Leanpub (2013) Nathan Yau, " Data Points ", Wiley (2013) Simon Rogers, " Facts are Sacred ", Faber & Faber (2013) Jonathan Gray, " The Data Journalism Handbook ", O'Reilly (2012) Nathan Yau, " Visualize This ", Wiley (2011)
  • 35. Alessio "jenkin" Cimarelli jenkin@dataninja.it @ Dataninja jenkin27 www.dataninja.it school.dataninja.it dataninja.it/newsletter Q&A school.dataninja.it/qa SWIM sciencewritersinitaly.wordpress.com
  • 36. Hacking + Marathon = Hackathon ESPAD (European students and drugs): http://www.espad.org/en/ RASFF (EU food safety): http://ec.europa.eu/food/food/rapidalert/
  • 37. http://ec.europa.eu/food/food/rapidalert/ The Rapid Alert System for Food and Feed (RASFF) was put in place to provide food and feed control authorities with an effective tool to exchange information about measures taken responding to serious risks detected in relation to food or feed. This exchange of information helps Member States to act more rapidly and in a coordinated manner in response to a health threat caused by food or feed. dtnj.it/rasff2013
  • 38. http://www.espad.org/en/ This is the report from the fifth data-collection wave of the European School Survey Project on Alcohol and Other Drugs (ESPAD). It is based on data from more than 100,000 European students. Over the years about 500,000 European students have answered the ESPAD questionnaire. A total of 36 countries and regions have contributed data to the 2011 ESPAD Database. Drugs list includes cigarettes, alcohol, cannabis, other illecit drugs, tranquillants and sedatives without prescriptions. dtnj.it/espad2011