SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Data Journalism

Online Journalism - Magazines MA
          City University
        February 16 2012
What is data journalism?
The key thing here is to learn how to solve
your own problems. Asking a tutor should be
your last resort - they will not be there for the rest
of your life!
1.Coming up with a question
You need to find a data source. But where?Spend 15 minutes mapping out potential
data sources related to your field. They might be commercial or governmental; they
might need collecting or already be compiled somewhere. For example, if your field
was cycling there will be :
   ● transport data
   ● crime data
   ● health data (encouraging people to cycle as part of healthy lifestyle, for
     example)
   ● environmental data (pollution)
   ● community data (things being shared online by cyclists)

Also take a look at the examples at http://delicious.com/paulb/foieg
2. Use advanced search techniques to find data for a journalistic
question




There are lots of different ways to search, not just typing things
into Google.

You can limit by file type, domain, site and use Boolean limits.
● Limit by filetype:
    ○ filetype:xls will restrict results to Excel spreadsheets;
    ○ filetype:csv to 'comma separated values' spreadsheets;
    ○ filetype:doc to Word documents - often used for internal documents
    ○ filetype:pdf to PDFs - often used for official reports
● Limit by domain:
          ■ site:gov.uk will restrict results to UK government websites
          ■ .ac.uk to UK educational establishments (not all of them
            reputable) - the US equivalent is .edu
          ■ .org.uk to (mostly) nonprofit organisations - again, this is not
            guaranteed. You can also try .org although this will include
            results from other countries.
          ■ .mod.uk - the Ministry of Defence
          ■ .nhs.uk - NHS sites
          ■ .dh.gov.uk - Department of Health
          ■ .police.uk - police websites, including British Transport Police,
            the Met
    ○ Limit by website:
          ■ site:bolton.gov.uk will further limit results to just one website,
            rather than all local authority websites.
          ■ Likewise site:city.ac.uk would only return results from City
            University's website
    ○ You can limit your search further by using quotation marks so that
      only pages containing the exact phrase are returned, e.g. "annual
      report"
    ○ You can also expand it by using 'Boolean' operators like OR, e.g.
Then put it all together:

e.g. "deaths in police custody filetype:xls site:gov.uk"




Try other 'operators' such as

  ● + before a search term to ensure it is in the pages
    themselves, e.g. +custody
  ● phrases in quotes, e.g. "deaths in custody"
  ● The * wildcard, e.g. "deaths in * custody"
  ● The ~ operator for synonyms, e.g. ~deaths
3. Making sense of the data
Chances are that the data you've found will raise further questions.
There may be:
  ● jargon that you need to understand,
  ● codes that need translating,
  ● holes in the data,
  ● contextual data needed: the populations of different regions; data
    for previous years; etc.
  ● questions about how it was gathered - the methodology

  Use your journalistic skills to answer those
                 questions.
Spreadsheet skills
You can also use some spreadsheet techniques to put the data into a
form that is going to be easier to interrogate - for example try the
following:

 ● split addresses so that the postcode is in a separate column
   (Data > Text into columns in Excel, or =SPLIT in Google Docs) -
   or separate forename and surname.
 ● Or you want to count how many times a value appears
   (=COUNTIF), or how many values are above a certain number.
 ● Work out the total using =SUM(D:D) if your numbers are in
   column D, for example
 ● Work out the amount per day by using =SUM(D:D)/30 for a 30
   day month, etc.
 ● Work out a median average by using a formula like =MEDIAN(D:
   D). Compare that with other types of average like =AVERAGE(D:
   D) or =MODE(D:D)
4. Basic visualisations
Find a transcript of a politician's - or two politicians' - speeches and
visualise them using Wordle.com, Tagxedo or ManyEyes. (The
advanced search techniques mentioned above may help)

You can either compare one politician's speeches on a particular issue before
and after taking office - or one politician's speech with his or her replacement.

Spend some time tweaking the visualisation:

  ● Are similar words treated differently, e.g. "patient" and "patients" or
    "choice" and "options"? Should you combine the counts to clarify the
    emphases? What are the ethical issues of doing so?
  ● Should you reduce your sample to the top 10 or 20 words or phrases to
    make it clearer?
  ● Can you customise the words included (try copying into a text editor first),
    colour scheme, arrangement, fonts, etc. to greater effect?
  ● Is a word cloud best - or should you use a bar chart based on word
    counts?
Advanced tutorial 1 - GDoc webscraper

Follow the tutorials tagged 'importHTML' on Excel Notes: http://excelnotes.posterous.
com/tag/importhtml
...and 'importXML' on the Online Journalism Blog - http://onlinejournalismblog.
com/tag/importxml (start from the bottom)

For a really 'live' scraper, see instructions on how to grab XML from Backtweets or
RSS from a Twitter search in this tutorial:
http://www.brelson.com/2009/11/using-google-spreadsheets-to-extract-twitter-
data/
Advanced tutorial 2 - interrogating data

Follow the tutorial at http://excelnotes.posterous.com/tag/filters
And the one at http://excelnotes.posterous.com/tag/sumifs

Or if you want to play with Google Refine, search for 'Getting Started
With Local Council Spending Data' or go to http://blog.ouseful.
info/2011/01/28/getting-started-with-local-council-spending-data/
Advanced tutorial 3 - Scraper tools

Data can come in all sorts of forms. Based on the data you found already, try
one or more of the following:

  ● Using a PDF conversion service to get to the data within - a list here: http:
    //helpmeinvestigate.posterous.com/tag/pdfs - also: http://www.
    pdftoexcelonline.com/


  ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub
    (free version stores 100 results; buy a licence for more)

Mais conteúdo relacionado

Mais de Patrick Smith

Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Patrick Smith
 
Journalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityJournalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityPatrick Smith
 
UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012Patrick Smith
 
Pulse social media pres - July 2012
Pulse social media pres - July 2012Pulse social media pres - July 2012
Pulse social media pres - July 2012Patrick Smith
 
City Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaCity Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaPatrick Smith
 
City Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingCity Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingPatrick Smith
 
City Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyCity Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyPatrick Smith
 
City Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksCity Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksPatrick Smith
 
City Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksCity Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksPatrick Smith
 
SIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithSIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithPatrick Smith
 
Data journalism 10 march
Data journalism   10 marchData journalism   10 march
Data journalism 10 marchPatrick Smith
 
It’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithIt’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithPatrick Smith
 
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Patrick Smith
 

Mais de Patrick Smith (13)

Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013
 
Journalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityJournalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high quality
 
UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012
 
Pulse social media pres - July 2012
Pulse social media pres - July 2012Pulse social media pres - July 2012
Pulse social media pres - July 2012
 
City Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaCity Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - Multimedia
 
City Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingCity Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reporting
 
City Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyCity Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategy
 
City Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksCity Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networks
 
City Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksCity Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networks
 
SIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithSIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick Smith
 
Data journalism 10 march
Data journalism   10 marchData journalism   10 march
Data journalism 10 march
 
It’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithIt’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick Smith
 
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

City Journalism - Magazines MA - week 8 - Data journalism

  • 1. Data Journalism Online Journalism - Magazines MA City University February 16 2012
  • 2. What is data journalism?
  • 3. The key thing here is to learn how to solve your own problems. Asking a tutor should be your last resort - they will not be there for the rest of your life!
  • 4. 1.Coming up with a question You need to find a data source. But where?Spend 15 minutes mapping out potential data sources related to your field. They might be commercial or governmental; they might need collecting or already be compiled somewhere. For example, if your field was cycling there will be : ● transport data ● crime data ● health data (encouraging people to cycle as part of healthy lifestyle, for example) ● environmental data (pollution) ● community data (things being shared online by cyclists) Also take a look at the examples at http://delicious.com/paulb/foieg
  • 5. 2. Use advanced search techniques to find data for a journalistic question There are lots of different ways to search, not just typing things into Google. You can limit by file type, domain, site and use Boolean limits.
  • 6. ● Limit by filetype: ○ filetype:xls will restrict results to Excel spreadsheets; ○ filetype:csv to 'comma separated values' spreadsheets; ○ filetype:doc to Word documents - often used for internal documents ○ filetype:pdf to PDFs - often used for official reports ● Limit by domain: ■ site:gov.uk will restrict results to UK government websites ■ .ac.uk to UK educational establishments (not all of them reputable) - the US equivalent is .edu ■ .org.uk to (mostly) nonprofit organisations - again, this is not guaranteed. You can also try .org although this will include results from other countries. ■ .mod.uk - the Ministry of Defence ■ .nhs.uk - NHS sites ■ .dh.gov.uk - Department of Health ■ .police.uk - police websites, including British Transport Police, the Met ○ Limit by website: ■ site:bolton.gov.uk will further limit results to just one website, rather than all local authority websites. ■ Likewise site:city.ac.uk would only return results from City University's website ○ You can limit your search further by using quotation marks so that only pages containing the exact phrase are returned, e.g. "annual report" ○ You can also expand it by using 'Boolean' operators like OR, e.g.
  • 7. Then put it all together: e.g. "deaths in police custody filetype:xls site:gov.uk" Try other 'operators' such as ● + before a search term to ensure it is in the pages themselves, e.g. +custody ● phrases in quotes, e.g. "deaths in custody" ● The * wildcard, e.g. "deaths in * custody" ● The ~ operator for synonyms, e.g. ~deaths
  • 8. 3. Making sense of the data Chances are that the data you've found will raise further questions. There may be: ● jargon that you need to understand, ● codes that need translating, ● holes in the data, ● contextual data needed: the populations of different regions; data for previous years; etc. ● questions about how it was gathered - the methodology Use your journalistic skills to answer those questions.
  • 9. Spreadsheet skills You can also use some spreadsheet techniques to put the data into a form that is going to be easier to interrogate - for example try the following: ● split addresses so that the postcode is in a separate column (Data > Text into columns in Excel, or =SPLIT in Google Docs) - or separate forename and surname. ● Or you want to count how many times a value appears (=COUNTIF), or how many values are above a certain number. ● Work out the total using =SUM(D:D) if your numbers are in column D, for example ● Work out the amount per day by using =SUM(D:D)/30 for a 30 day month, etc. ● Work out a median average by using a formula like =MEDIAN(D: D). Compare that with other types of average like =AVERAGE(D: D) or =MODE(D:D)
  • 10. 4. Basic visualisations Find a transcript of a politician's - or two politicians' - speeches and visualise them using Wordle.com, Tagxedo or ManyEyes. (The advanced search techniques mentioned above may help) You can either compare one politician's speeches on a particular issue before and after taking office - or one politician's speech with his or her replacement. Spend some time tweaking the visualisation: ● Are similar words treated differently, e.g. "patient" and "patients" or "choice" and "options"? Should you combine the counts to clarify the emphases? What are the ethical issues of doing so? ● Should you reduce your sample to the top 10 or 20 words or phrases to make it clearer? ● Can you customise the words included (try copying into a text editor first), colour scheme, arrangement, fonts, etc. to greater effect? ● Is a word cloud best - or should you use a bar chart based on word counts?
  • 11. Advanced tutorial 1 - GDoc webscraper Follow the tutorials tagged 'importHTML' on Excel Notes: http://excelnotes.posterous. com/tag/importhtml ...and 'importXML' on the Online Journalism Blog - http://onlinejournalismblog. com/tag/importxml (start from the bottom) For a really 'live' scraper, see instructions on how to grab XML from Backtweets or RSS from a Twitter search in this tutorial: http://www.brelson.com/2009/11/using-google-spreadsheets-to-extract-twitter- data/
  • 12. Advanced tutorial 2 - interrogating data Follow the tutorial at http://excelnotes.posterous.com/tag/filters And the one at http://excelnotes.posterous.com/tag/sumifs Or if you want to play with Google Refine, search for 'Getting Started With Local Council Spending Data' or go to http://blog.ouseful. info/2011/01/28/getting-started-with-local-council-spending-data/
  • 13. Advanced tutorial 3 - Scraper tools Data can come in all sorts of forms. Based on the data you found already, try one or more of the following: ● Using a PDF conversion service to get to the data within - a list here: http: //helpmeinvestigate.posterous.com/tag/pdfs - also: http://www. pdftoexcelonline.com/ ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub (free version stores 100 results; buy a licence for more)