data, big data, open data

Innovazione tecnologica,
web e statistica

data, big data, open data
Vincenzo Patruno

Roma, 29 gennaio 2013

Creating a “single source of truth”

Combining disparate data sources of potential donors, volunteers and voters
(email, postal, telephone, mobile and social contacts with historical voting
records, polling and fundraising data)

They built a single view of individuals that informed
their strategies for raising funds, mobilizing
volunteers and securing votes.

Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423

Profiling and predicting

Demographics and data collected by fieldwork on the campaign trail were
added to the mix, allowing predictive modelling to score people on their
likelihood to donate or vote for the Democrats.

Channels of communication were optimized, and the
type of messaging was tailored to maximize the
likelihood of response.


Turning data into the human touch

The power of localised networks and
neighbourhoods
Using centralized data to provide geo-targeted insight, campaign
volunteers could base themselves in the areas that mattered most, talking
to the voters they had got to know since the start of the 2008 campaign.

Deliver their message from within communities

The impact of this saw them receive double the votes they achieved in
2008 in the marginal states.


Turning data into the human touch

Sono stati oltre due milioni i piccoli donatori che hanno
versato nelle casse della sua campagna oltre 427
milioni di dollari.

Circa il 55% dei fondi raccolti sono arrivate da donazioni
sotto i 200 dollari.


Focus on the swing states

Regular polling of states like Ohio throughout the
campaign provided valuable data for the team to process
and analyze trends.
For example, the analysts could track the impact of the three TV debates on
the democratic vote in real-time and were able to identify specific segments to
target with campaign material – split by region, demographics and the profile
scoring that had been modeled in the new database. One Democrat official
commented that they scenario tested the election 66,000 times every night in
order to calculate predicted outcomes for swing states.
Campaign resource was then allocated appropriately to persuade undecided
voters most likely to pledge their allegiance to Obama.

By the time election day came around, the Democrats had
a clear idea of how voting in the swing states was looking.

Data science involvement in the election wasn’t
just restricted to the candidates’ teams.

Nate Silver used sabermetrics to accurately predict the outcome of
all 50 state votes


Big Data – What Is It?
Big Data – What Is It?

Volume. Variety. Velocity.
Volume. Variety. Velocity.
Variability. Complexity.
Taken together, these three “Vs” of Big Data were originally posited by Gartner’s
Doug Laney in a 2001 research report.

Variability. Complexity.

Taken together, these three “Vs” of Big Data were
originally posited by Gartner’s Doug Laney in a 2001 research report.

“It’s difficult to imagine the
power that you’re going to have
when so many different sorts of
data are available”

Tim Berners Lee

Facebook World
Source: http://ipcarrier.blogspot.it/2010/12/facebook-world.html

Mass Opinion Business Intelligence (MOBI) analyzes and
classifies comments made online and distills the information into a
pre-defined, structured database.

MOBI methodology combines online measurement, cloud
computing and market research that provides live consumer
sentiment data around brands, products and purchase influencing
factors using decision-supported information from millions of
unsolicited opinions.

http://en.wikipedia.org/wiki/WiseWindow

Financial Services Industry: Bloomberg and
WiseWindow use social media and big data to improve
investment returns.
http://en.wikipedia.org/wiki/WiseWindow

Natural disasters: Twitter was a richer and more up-to-
date source of information about the 5.8 magnitude
quake in Virginia.

http://youtu.be/PThAriHjk10

Traffic Twitter after Japan earthquake

Automotive Industry: Big data analysis of social media
comments can predict trends in automotive equipment
failures.

Telecommunications: T-Mobile used big data integrated
with its transaction systems and social media to
dramatically cut customer defections in one quarter.

Energy/Utility Industry: GE is going to use social media
reports to track outages faster and better.

Advertising Industry: Dachis Group used big data
analysis of social media to create a more up-to-date and
accurate ranking of the competitive position of
engagement at large companies.

Marketing: Nestle is using social media listening and
analytics to engage at scale in the market using its big
data powered central command center.

Education Industry: DoSomething.org engaged 200,000
people worldwide in Facebook to combat bullying in
schools and analyzed their sentiments.

Criminal Justice: Police department around the United
States now use social media analysis extensively to
fight crime.

Health Care Industry: Using social media and big data to
track cholera outbreaks in Haiti faster and more
accurately.

API

Application
Programming
Interface

API
http://apistat.istat.it/?q=gettable&dataset=DCIS_POPORESBIL&dim=82,0,0,0&lang=
0&tr=&te=

query string

http://developers.facebook.com/ https://dev.twitter.com/

Es:
https://stream.twitter.com/1.1/statuses/sample.json

7% work
Thanx Piet! 

50% pointless babble

3%
5%
TV and Radio
politics
10% spare time activities

Top 5 Myths about Big Data
1. Big Data is Only About Massive Data Volume
Generally speaking, experts consider petabytes of data volumes as the starting point for
Big Data, although this volume indicator is a moving target. Therefore, while volume is
important, the next two “Vs” are better individual indicators.
Variety refers to the many different data and file types that are important to manage and
analyze more thoroughly, but for which traditional relational databases are poorly suited.
Some examples of this variety include sound and movie files, images, documents, geo-
location data, web logs, and text strings.
Velocity is about the rate of change in the data and how quickly it must be used to create
real value. Traditional technologies are especially poorly suited to storing and using high-
velocity data. So new approaches are needed. If the data in question is created and
aggregates very quickly and must be used swiftly to uncover patterns and problems, the
greater the velocity and the more likely that you have a Big Data opportunity.

2. Big Data Means Hadoop
Hadoop is the Apache open-source software framework for working with Big Data. It was derived
from Google technology and put to practice by Yahoo and others. But, Big Data is too varied and
complex for a one-size-fits-all solution. While Hadoop has surely captured the greatest name
recognition, it is just one of three classes of technologies well suited to storing and managing Big
Data. The other two classes are NoSQL and Massively Parallel Processing (MPP) data stores.
(See myth number five below for more about NoSQL.) Examples of MPP data stores include
EMC’s Greenplum, IBM’s Netezza, and HP’s Vertica.


3. Big Data Means Unstructured Data
Big Data is probably better termed “multi-structured” as it could include text strings,
documents of all types, audio and video files, metadata, web pages, email
messages, social media feeds, form data, and so on. The consistent trait of these
varied data types is that the data schema isn’t known or defined when the data is
captured and stored. Rather, a data model is often applied at the time the data is
used.

4. Big Data is for Social Media Feeds and
Sentiment Analysis
Simply put, if your organization needs to broadly analyze web traffic, IT system logs,
customer sentiment, or any other type of digital shadows being created in record
volumes each day, Big Data offers a way to do this. Even though the early pioneers of
Big Data have been the largest, web-based, social media companies -- Google, Yahoo,
Facebook -- it was the volume, variety, and velocity of data generated by their services
that required a radically new solution rather than the need to analyze social feeds or
gauge audience sentiment.

5. NoSQL means No SQL
NoSQL means “not only” SQL because these types of data stores offer domain-specific access and
query techniques in addition to SQL or SQL-like interfaces. Technologies in this NoSQL category
include key value stores, document-oriented databases, graph databases, big table structures, and
caching data stores. The specific native access methods to stored data provide a rich, low-latency
approach, typically through a proprietary interface. SQL access has the advantage of familiarity and
compatibility with many existing tools. Although this is usually at some expense of latency driven by the
interpretation of the query to the native “language” of the underlying system.
For example, Cassandra, the popular open source key value store offered in commercial form by
DataStax, not only includes native APIs for direct access to Cassandra data, but CQL (it’s SQL-like
interface) as its emerging preferred access mechanism. It’s important to choose the right NoSQL
technology to fit both the business problem and data type and the many categories of NoSQL
technologies offer plenty of choice.

http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data
_press_release_final_2.pdf

http://open.nasa.gov/blog/2012/10/04/what-is-nasa-doing-with-big-
data-today/

Possono i BD essere utilizzati per misurare
fenomeni Economici, Sociali, Ambientali?

Indagini Campionarie

Archivi Amministrativi

Significance magazine august 2012
Big Data and City Living – what can it do for us?

Big Data Sources

Sensors Transact
ional

Adminis
trative Behavio
ural
Tracking
Devices

Web Scraping

http://www.comune.torino.it/ambiente/aria/qualita_aria/dati_aria/valori
_annuali_pm10.shtml

https://scraperwiki.com/scrapers/valori_pm10_in_comune_di_torino/

Esempi

Web Scraping

http://thebiobucket.blogspot.it/2011/10/little-webscraping-
exercise.html#more

Esempi
Milano, 13 Dicembre 2012

Web Scraping

Esempi
http://www.metoffice.gov.uk/climate/uk/stationdata/armaghdata.txt

http://elezionistorico.interno.it/

Open Data
L'Open Data si basa sulla
constatazione che il dato pubblico
è stato prodotto con denaro
pubblico, quindi della collettività.
Ed è a questa che il dato deve
essere restituito.

Open Data

Dati liberamente accessibili a tutti
in formato aperto senza restrizioni
di copyright, brevetti o altre forme
di controllo che ne limitino
l’utilizzo.

Open Government

Si intende un modello di Governance a
livello centrale e locale basato sull'apertura
(partecipazione e collaborazione) e sulla
trasparenza nei confronti dei cittadini

Open Data
Government
Data

Corporate Community
Data Open Data Data

I formati degli Open Data

Es. http://www.istat.it/it/files/2012/12/Tavole_XLS.zip

I cataloghi di dati
territorio
categoria
titolo
fonte
licenza
data descrizione

Metadati
url

Volume Fonti

Relazioni Contesto

Data Integration

Ricoveri
ospedalieri

Data Integration Concessio
ni edilizie

Cause di
morte

Casellario Ricoveri
Giudiziario ospedalieri
Delibere
comunali

Industrie
per ATECO

Dati Spesa
ambientali sanitaria Provvedim
enti
Regionali
Mappe

Dichiarazio
ni dei
Politici

Data Integration Concessio
ni edilizie

Cause di
morte

Casellario Ricoveri
Giudiziario ospedalieri
Delibere
comunali

Industrie
per ATECO

Dati Spesa
ambientali sanitaria Provvedim
enti
Regionali
Dati
Geografici

Dichiarazio
ni dei
Politici

Linked Open Data

Semantic Web

Grazie dell’attenzione!

@vincpatruno

vincenzo.patruno@istat.it

http://www.vincenzopatruno.org

data, big data, open data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (13)

Semelhante a data, big data, open data

Semelhante a data, big data, open data (20)

Mais de Vincenzo Patruno

Mais de Vincenzo Patruno (20)

Último

Último (20)

data, big data, open data