SlideShare uma empresa Scribd logo
1 de 29
HLG Big Data project 
and Sandbox 
Carlo Vaccari (Istat) – IAOS October 2014 1
This material is distributed under the Creative Commons 
"Attribution - NonCommercial - Share Alike - 3.0", available at 
http://creativecommons.org/licenses/by-nc-sa/3.0/ 
Carlo Vaccari (Istat) – IAOS October 2014 2
Carlo Vaccari (Istat) – IAOS October 2014 3 
I 
nt 
er 
nati 
onal 
High Level Group to coordinate groups working on Statistical 
Standards: UNECE, OECD, Eurostat, National Statistical Org.
May 2013: task team with the aim to define a project to be 
presented to international statistical community: 
Three main objectives: 
To identify the main possibilities and the main strategic and 
methodological issues that Big Data poses for the official statistics 
To analyze the feasibility of efficient production of official 
statistics using Big Data sources, and the possibility to replicate 
these approaches across different national contexts 
To facilitate the sharing across organizations of knowledge, 
expertise, tools and methods for the production of statistics using 
Big Data sources 
Carlo Vaccari (Istat) – IAOS October 2014 4 
Bi 
g 
Dat 
a 
Pr 
oj 
ect
Project presented to HLG and CES 
Task teams composed by people from 13 organisations 
The project composed of four task teams: 
Partnership Task Team 
Privacy Task Team 
Quality Task Team 
Sandbox Task Team 
Carlo Vaccari (Istat) – IAOS October 2014 5 
Bi 
g 
Dat 
a 
Pr 
oj 
ect
Carlo Vaccari (Istat) – IAOS October 2014 6 
Part 
ner 
Providers s 
hi 
p 
Task 
and sources of data - challenges: access to data, 
managing privacy and confidentiality 
Government (Administrative records) 
Private (Commercial records) 
Social Media and other Internet sites 
Design - research design and development 
Academia 
Private and/or public research institutes 
NGOs 
International organizations
Carlo Vaccari (Istat) – IAOS October 2014 7 
Part 
ner 
Technology s 
hi 
p 
Task 
- Tools, data and infrastructure for data 
processing, data mining, real-time analytics, storage, 
computing, and data visualization 
Private sector (technology providers, IT companies) 
Data providers themselves 
Analysis - NSOs can provide standards and methodology 
whereas others provide analytical capacity and modeling 
Academia 
Private and/or public research institutes 
NGOs 
International organizations
Overview of existing tools for risk management in view of privacy 
issues 
Carlo Vaccari (Istat) – IAOS October 2014 8 
Pri 
v 
acy 
Task 
Tea 
Risks to privacy - Privacy software 
Data access strategies (onsite, remote access, microdata) 
Overview of database privacy technologies 
Evaluation of different privacy approaches 
Big Data characteristics and their implications for data privacy 
Data access strategies for Big Data 
Computer Science and Statistical Disclosure approaches 
Disclosure Risk assessment for Big Data
Information Integration and Governance (DB monitoring, 
security, transport security) 
Statistical Disclosure Limitations 
Carlo Vaccari (Istat) – IAOS October 2014 9 
Pri 
v 
acy 
Task 
Tea 
Preserving confidentiality 
Balance between “Data utility” and “Disclosure Risk” 
SDL methods: 
Data masking 
Traditional approaches: aggregation, obfuscation, 
perturbations, data swapping 
Modern approaches: sampling and simulation 
Managing potential risk to reputation: ethical practices, 
controls, communication, dialog with public
Carlo Vaccari (Istat) – IAOS October 2014 10 
Quali 
Input t 
y 
Task 
Tea 
quality framework with indicators: 
Source: data-source, reliability, privacy, availability, costs, procedures, 
... 
Metadata: representativeness, usability, completeness, id, ... 
Data: collection, coverage, complexity, efficiency, integrability 
Output quality framework with indicators: 
Metadata: clarity, accessibility, completeness, comprehensiveness 
Data: relevance, accuracy, timeliness, accessibility, coherence, 
predictivity, selectivity 
Process quality with indicators : 
Cleaning: unambiguous, objectivity, granularity, reliability 
Transformations: compliance, categorization, precision 
Linking: completeness, selectivity, accuracy, id, time_related 
Aggregation: quantity, confidentiality, Integration, validity, accuracy
Carlo Vaccari (Istat) – IAOS October 2014 11 
Sandbox 
Sandbox: web-accessible environment where researchers coming 
from different institutions explore tools and methods needed for 
statistical production and the feasibility of producing Big Data-derived 
statistics 
List of tools chosen: Hadoop, Hortonworks, Pentaho, RHadoop 
Open list ...
Carlo Vaccari (Istat) – IAOS October 2014 12 
Sandbox 
Sandbox hosted at the Irish Center for High- 
End Computing (ICHEC) which will assist 
the task team for the testing and evaluation 
of Hadoop work-flows and associated data 
analysis application software 
The mission of ICHEC is to provide High- 
Performance Computing (HPC) resources, 
support, education and training for 
researchers
Carlo Vaccari (Istat) – IAOS October 2014 13 
Sandbox 
c 
onfi 
gur 
The hardware on which the 
sandbox system is based is a High 
Performance Computing Linux 
cluster hosted in the National 
University of Ireland (Galway) 
composed of 30 nodes each of 
which has two quad-core 
processors, 48GB of RAM and a 
1TB local disk 
Each node is connected to two 
networks – one for accessing the 
shared Lustre and one Gigabit 
Ethernet network for management 
20TB shared filesystem is available 
to all nodes
Virtual Sprint (March 2014) → first document 
Workshop in Rome (April 2014) 
Training in Rome (May 2014) 
Sandbox installation and verification 
Workshop in Heerlen (September 2014) 
Testing scenarios for BD usage in Official Statistics: 
Carlo Vaccari (Istat) – IAOS October 2014 14 
Sandbox i 
n 
2014 
use as auxiliary information to improve an existing survey 
replacing all or part of an existing survey with Big Data 
producing a predefined statistical output either with or 
without supplementation of survey data 
producing a statistical output guided by findings from the 
data
Carlo Vaccari (Istat) – IAOS October 2014 15 
Sandbox 
partner 
Software: 
Hortonworks – Granted a free enterprise support 
subscription for the duration of the project 
Pentaho – Free trial of enterprise platform 
Data: 
Mobile data from Orange 
Smart meters data from Irish power agency 
Smart meters from Canadian power agency
Carlo Vaccari (Istat) – IAOS October 2014 16 
Sandbox 
ex 
peri 
Organized in Task teams, one for each source: 
Consumer Price Index 
Mobile phone data 
Smart meters 
Traffic loops 
Social Data 
Web scraping 
Job vacancies
Carlo Vaccari (Istat) – IAOS October 2014 17 
Ex 
peri 
ment 
Cons 
Sources: 
Web scraping from ONS (UK supermarkets) 
Synthetic scanner data from Istat 
Test performance of big data technologies applied to the 
computation of a simplified consumer price index, based on 
synthetic data sets modeling scanner data 
A first version of the price generator was tested successfully in 
generating a sample csv file with 11 billions rows, successfully 
uploaded in the sandbox 
Comparison between Hadoop ↔ NoSQL ↔ RDBMS 
Visual analysis of data through Pentaho suite
Carlo Vaccari (Istat) – IAOS October 2014 18 
Ex 
peri 
ment 
Mobil 
Four dataset from Orange provider for Ivory Coast: 
calls and duration for pair of cells for each hour 
calls coming from 500k phones with time and cell 
calls coming from 500k randomly sampled individuals 
communication sub-graphs for 5k users 
Experiments: 
Classification of Caller: workers, students, business, not LF, 
... 
Classification of zones (cells): industrial, residential, 
school/university, farmers, high/low traffic 
Temporal distribution of Calls (day/week/season)
Carlo Vaccari (Istat) – IAOS October 2014 19 
Ex 
peri 
ment 
Mobil 
Parallel experiment on Slovenian and Orange data: → 
exchange of methods, tools, findings 
Searching for other datasets from other providers
Carlo Vaccari (Istat) – IAOS October 2014 20 
Ex 
peri 
ment 
Datasets: 
S 
mart 
Smart meter data from Ireland (household level, linked 
with 2 surveys) 
Synthetic smart meter data from Canada (household 
level, covering several years, time stamped hourly 
electricity consumption linked with hourly weather data 
and hourly price data, matched with quarterly survey 
data) 
Experiment: Rhadoop code for visualizing synthetic Canadian 
smart meter data, providomg time elapsed for the following: 
Hourly Consumption (kWh) v Hourly Temperature (C) for all 
data 
Hourly Consumption (kWh) v Hourly Price (c) for all data
Carlo Vaccari (Istat) – IAOS October 2014 21 
Ex 
peri 
ment 
Tr 
affi 
In the Netherlands, 20,000 traffic loops, counting the number 
of vehicles each minute, are located on approximately 3,000 
km of speedway. All this data is collected by a central agency, 
the NDW (National data warehouse for traffic). Data loaded for 
one year for the area of South Limburg, consisting of about 
800 of these traffic loop 
Experiment: 
Find out how to deal with multiple files in Hadoop 
See how the traffic develops during a year 
Deliverables: 
Code for aggregating the data in Hive and RHadoop 
A graphical representation about the development of the 
traffic on these roads and in this region
Carlo Vaccari (Istat) – IAOS October 2014 22 
Ex 
peri 
ment 
Tr 
affi
Carlo Vaccari (Istat) – IAOS October 2014 23 
Ex 
peri 
ment 
Soci 
Set of tweets generated in Mexico from January to July 2014: 
Sentimental analysis techniques in obtaining indicators of 
subjective wellbeing (compare with stats) 
Use geo-tagged tweets for analysing people movement 
State of origin of tourists visiting "Magic Towns" in Mexico
Carlo Vaccari (Istat) – IAOS October 2014 24 
Ex 
peri 
ment 
Soci 
Next steps: 
Geo-located tweets experiments on: 
Working patterns / commuting from morning to night 
Weekends / Holydays / Seasonal movements 
South – North mobility / Commerce at the North border 
Work on emoticons and media acronyms analysis: 
Develop a small emoticons dictionary / review research 
papers 
Count of emoticons on the tweets that we have, and how 
many tweets have emoticons to have an idea of their 
representativity power 
Review of algorithms: work with some MapReduce 
adaptations, Spark, Scala
The Job-vacancies team works on (historical) job vacancies 
data, scraped from various sites on the web – goals: 
to identify possible both free and commercial data sources 
and its APIs and illustrate potential use cases 
to scrape job vacancies data from the biggest national 
websites (possibly international also) 
to test scraping tools (Irobotsoft and Kimonolabs) 
to test statistical process of data manipulation 
Carlo Vaccari (Istat) – IAOS October 2014 25 
Ex 
peri 
ment 
J 
ob
Carlo Vaccari (Istat) – IAOS October 2014 26 
Ex 
peri 
ment 
Web 
8,600 Italian websites, indicated by the 19,000 enterprises 
responding to ICT survey of year 2013, have been scraped 
and the acquired texts have been processed 
The scraping and processing work took about 33 hours on a 
virtual server in Italy, the goal of this activity is to reproduce the 
used software configuration and rerun the process on a more 
powerful environment in order to measure the time 
consumption 
Experiment: 
Configure a Nutch job runnable in the Sandbox environment 
Execute the scraping job in order to produce the scraped 
data in HDFS 
Compare the performance of the sandbox with the 
performance of a single server
Carlo Vaccari (Istat) – IAOS October 2014 27 
St 
at 
e 
of t 
he 
Pr 
All teams are running experiments and have defined 
objectives for final deliverables (preliminary results due for 
end of November, final end of year) 
Outline of final deliverables defined in September meetings 
Developed training material, available for all participants and 
public in future 
Effective cooperation and exchange of ideas: all participants 
requested more time for developing other experiments and 
look forward to extending the project
Carlo Vaccari (Istat) – IAOS October 2014 28 
Less 
ons 
Lear 
ned 
International cooperation can multiply the ideas 
Data acquisition can be a long process. (eg: five months to 
get Orange mobile data) 
group suggested other possible approaches for the future 
need “political”/legal sponsorship 
Setup of the environment required time → difficult to achieve 
"stable" configuration 
Training should operate on different skills: IT, statistical and 
algorithms. Need of people open to learn new tools, 
techniques, methods...
Thank you for your attention!

Mais conteúdo relacionado

Mais procurados

Mind My Value: A decentralised infrastructure for fair and trusted IoT data ...
Mind My Value:  A decentralised infrastructure for fair and trusted IoT data ...Mind My Value:  A decentralised infrastructure for fair and trusted IoT data ...
Mind My Value: A decentralised infrastructure for fair and trusted IoT data ...Paolo Missier
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 
Open Science Building technical and social bridges in the era of the Europea...
Open Science Building technical and social bridges  in the era of the Europea...Open Science Building technical and social bridges  in the era of the Europea...
Open Science Building technical and social bridges in the era of the Europea...OpenAIRE
 
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU project
ISWC 2016 Tutorial: Semantic Web of Things  M3 framework & FIESTA-IoT EU projectISWC 2016 Tutorial: Semantic Web of Things  M3 framework & FIESTA-IoT EU project
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU projectFIESTA-IoT
 
Sshoc kick off meeting - Work Package 9 Pitch
Sshoc kick off meeting - Work Package 9 PitchSshoc kick off meeting - Work Package 9 Pitch
Sshoc kick off meeting - Work Package 9 PitchSSHOC
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009pkdoorn
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingBesnik Fetahu
 
Visualizing the information of a Linked Open Data enabled Research Informatio...
Visualizing the information of a Linked Open Data enabled Research Informatio...Visualizing the information of a Linked Open Data enabled Research Informatio...
Visualizing the information of a Linked Open Data enabled Research Informatio...andimou
 
Geographic Information Management Transformation
Geographic Information Management TransformationGeographic Information Management Transformation
Geographic Information Management TransformationPat Kenny
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...Craig Knoblock
 
Wikidata as a toolbox for public service media companies
Wikidata as a toolbox for public service media companiesWikidata as a toolbox for public service media companies
Wikidata as a toolbox for public service media companiesMicke Hindsberg
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataPat Kenny
 

Mais procurados (18)

Management and Analysis of Large Scale Heterogeneous Time-Series Data
Management and Analysis of Large Scale Heterogeneous Time-Series Data Management and Analysis of Large Scale Heterogeneous Time-Series Data
Management and Analysis of Large Scale Heterogeneous Time-Series Data
 
Mind My Value: A decentralised infrastructure for fair and trusted IoT data ...
Mind My Value:  A decentralised infrastructure for fair and trusted IoT data ...Mind My Value:  A decentralised infrastructure for fair and trusted IoT data ...
Mind My Value: A decentralised infrastructure for fair and trusted IoT data ...
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
Open Science Building technical and social bridges in the era of the Europea...
Open Science Building technical and social bridges  in the era of the Europea...Open Science Building technical and social bridges  in the era of the Europea...
Open Science Building technical and social bridges in the era of the Europea...
 
Keynote27nov
Keynote27novKeynote27nov
Keynote27nov
 
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU project
ISWC 2016 Tutorial: Semantic Web of Things  M3 framework & FIESTA-IoT EU projectISWC 2016 Tutorial: Semantic Web of Things  M3 framework & FIESTA-IoT EU project
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU project
 
Euler-time Diagrams
Euler-time DiagramsEuler-time Diagrams
Euler-time Diagrams
 
euBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic DataeuBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic Data
 
Sshoc kick off meeting - Work Package 9 Pitch
Sshoc kick off meeting - Work Package 9 PitchSshoc kick off meeting - Work Package 9 Pitch
Sshoc kick off meeting - Work Package 9 Pitch
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linking
 
Visualizing the information of a Linked Open Data enabled Research Informatio...
Visualizing the information of a Linked Open Data enabled Research Informatio...Visualizing the information of a Linked Open Data enabled Research Informatio...
Visualizing the information of a Linked Open Data enabled Research Informatio...
 
Geographic Information Management Transformation
Geographic Information Management TransformationGeographic Information Management Transformation
Geographic Information Management Transformation
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
 
Wikidata as a toolbox for public service media companies
Wikidata as a toolbox for public service media companiesWikidata as a toolbox for public service media companies
Wikidata as a toolbox for public service media companies
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open Data
 

Destaque

Focus group 27.09.2010 Carlo Vaccari
Focus group 27.09.2010 Carlo VaccariFocus group 27.09.2010 Carlo Vaccari
Focus group 27.09.2010 Carlo VaccariRoberto Galoppini
 
Opendata day Marche 2013
Opendata day Marche 2013Opendata day Marche 2013
Opendata day Marche 2013Carlo Vaccari
 
Ricerca del lavoro e social network
Ricerca del lavoro e social networkRicerca del lavoro e social network
Ricerca del lavoro e social networkCarlo Vaccari
 
Dall'open-source agli open-data
Dall'open-source agli open-dataDall'open-source agli open-data
Dall'open-source agli open-dataCarlo Vaccari
 
IT tools for statistics, visualization, open data
IT tools for statistics, visualization, open dataIT tools for statistics, visualization, open data
IT tools for statistics, visualization, open dataCarlo Vaccari
 
International guidelines for data dissemination and fiscal transparency
International guidelines for data dissemination and fiscal transparencyInternational guidelines for data dissemination and fiscal transparency
International guidelines for data dissemination and fiscal transparencyCarlo Vaccari
 
CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011Carlo Vaccari
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaCarlo Vaccari
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchersCarlo Vaccari
 
per una Rete Professionale Italiana dell'Open Source
per una Rete Professionale Italiana dell'Open Sourceper una Rete Professionale Italiana dell'Open Source
per una Rete Professionale Italiana dell'Open SourceCarlo Vaccari
 
Open Gov and Open Data intro
Open Gov and Open Data introOpen Gov and Open Data intro
Open Gov and Open Data introCarlo Vaccari
 
Per un economia dell'open source
Per un economia dell'open sourcePer un economia dell'open source
Per un economia dell'open sourceCarlo Vaccari
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleCarlo Vaccari
 
Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Carlo Vaccari
 
spaghettiopendata a greenopendata
spaghettiopendata a greenopendataspaghettiopendata a greenopendata
spaghettiopendata a greenopendataCarlo Vaccari
 
CORE final workshop introduction
CORE final workshop introductionCORE final workshop introduction
CORE final workshop introductionCarlo Vaccari
 
Interoperability of data management for data dissemination
Interoperability of data management for data disseminationInteroperability of data management for data dissemination
Interoperability of data management for data disseminationCarlo Vaccari
 

Destaque (20)

Focus group 27.09.2010 Carlo Vaccari
Focus group 27.09.2010 Carlo VaccariFocus group 27.09.2010 Carlo Vaccari
Focus group 27.09.2010 Carlo Vaccari
 
Opendata day Marche 2013
Opendata day Marche 2013Opendata day Marche 2013
Opendata day Marche 2013
 
Ricerca del lavoro e social network
Ricerca del lavoro e social networkRicerca del lavoro e social network
Ricerca del lavoro e social network
 
E commerce
E commerceE commerce
E commerce
 
Dall'open-source agli open-data
Dall'open-source agli open-dataDall'open-source agli open-data
Dall'open-source agli open-data
 
IT tools for statistics, visualization, open data
IT tools for statistics, visualization, open dataIT tools for statistics, visualization, open data
IT tools for statistics, visualization, open data
 
International guidelines for data dissemination and fiscal transparency
International guidelines for data dissemination and fiscal transparencyInternational guidelines for data dissemination and fiscal transparency
International guidelines for data dissemination and fiscal transparency
 
CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientifica
 
Cora final meeting
Cora final meetingCora final meeting
Cora final meeting
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchers
 
per una Rete Professionale Italiana dell'Open Source
per una Rete Professionale Italiana dell'Open Sourceper una Rete Professionale Italiana dell'Open Source
per una Rete Professionale Italiana dell'Open Source
 
Open Gov and Open Data intro
Open Gov and Open Data introOpen Gov and Open Data intro
Open Gov and Open Data intro
 
Per un economia dell'open source
Per un economia dell'open sourcePer un economia dell'open source
Per un economia dell'open source
 
Web 2.0: a course
Web 2.0: a courseWeb 2.0: a course
Web 2.0: a course
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionale
 
Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1
 
spaghettiopendata a greenopendata
spaghettiopendata a greenopendataspaghettiopendata a greenopendata
spaghettiopendata a greenopendata
 
CORE final workshop introduction
CORE final workshop introductionCORE final workshop introduction
CORE final workshop introduction
 
Interoperability of data management for data dissemination
Interoperability of data management for data disseminationInteroperability of data management for data dissemination
Interoperability of data management for data dissemination
 

Semelhante a HLG Big Data project and Sandbox

Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesOntology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesPaolo Nesi
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniDataStadt Wien
 
Infrastructures for Open, Digital Science
Infrastructures for Open, Digital ScienceInfrastructures for Open, Digital Science
Infrastructures for Open, Digital ScienceCarl-Christian Buhr
 
EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...
EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...
EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...EUDAT
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Toolsijeei-iaes
 
Snap4City November 2019 Course: Smart City IOT Data Analytics
Snap4City November 2019 Course: Smart City IOT Data AnalyticsSnap4City November 2019 Course: Smart City IOT Data Analytics
Snap4City November 2019 Course: Smart City IOT Data AnalyticsPaolo Nesi
 
BDE SC4 Hangout - Simon Scerri, Introduction
BDE SC4 Hangout - Simon Scerri, IntroductionBDE SC4 Hangout - Simon Scerri, Introduction
BDE SC4 Hangout - Simon Scerri, IntroductionBigData_Europe
 
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Sergio Consoli
 
Open Digital Science & e-infrastructures
Open Digital Science & e-infrastructuresOpen Digital Science & e-infrastructures
Open Digital Science & e-infrastructuresCarl-Christian Buhr
 
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBig Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBigData_Europe
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euEUDAT
 
BDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBigData_Europe
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingAndrea Scharnhorst
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analyticssuresh sood
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewBigData_Europe
 
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...European Data Forum
 

Semelhante a HLG Big Data project and Sandbox (20)

Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesOntology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniData
 
Infrastructures for Open, Digital Science
Infrastructures for Open, Digital ScienceInfrastructures for Open, Digital Science
Infrastructures for Open, Digital Science
 
EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...
EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...
EUDAT 3rd Conference: Bringing Data e-Infrastructures to Horizon2020 - Carl-C...
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Tools
 
Primer: Data-Driven Startups
Primer: Data-Driven StartupsPrimer: Data-Driven Startups
Primer: Data-Driven Startups
 
Snap4City November 2019 Course: Smart City IOT Data Analytics
Snap4City November 2019 Course: Smart City IOT Data AnalyticsSnap4City November 2019 Course: Smart City IOT Data Analytics
Snap4City November 2019 Course: Smart City IOT Data Analytics
 
BDE SC4 Hangout - Simon Scerri, Introduction
BDE SC4 Hangout - Simon Scerri, IntroductionBDE SC4 Hangout - Simon Scerri, Introduction
BDE SC4 Hangout - Simon Scerri, Introduction
 
Semantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop PerspectiveSemantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop Perspective
 
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
 
Open Digital Science & e-infrastructures
Open Digital Science & e-infrastructuresOpen Digital Science & e-infrastructures
Open Digital Science & e-infrastructures
 
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBig Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.eu
 
BDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWC
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research funding
 
Participatory Web
Participatory WebParticipatory Web
Participatory Web
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analytics
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project Overview
 
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
 

Mais de Carlo Vaccari

Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataCarlo Vaccari
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityCarlo Vaccari
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentCarlo Vaccari
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerCarlo Vaccari
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksCarlo Vaccari
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Vaccari
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practiceCarlo Vaccari
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDBCarlo Vaccari
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteCarlo Vaccari
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinCarlo Vaccari
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Carlo Vaccari
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Carlo Vaccari
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheCarlo Vaccari
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network Carlo Vaccari
 
Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013Carlo Vaccari
 
Turismo e social network
Turismo e social networkTurismo e social network
Turismo e social networkCarlo Vaccari
 
Concetta De Vivo: Open Data Day Marche 2013
Concetta De Vivo: Open Data Day Marche 2013Concetta De Vivo: Open Data Day Marche 2013
Concetta De Vivo: Open Data Day Marche 2013Carlo Vaccari
 
Web2.0 e nuovi media
Web2.0 e nuovi mediaWeb2.0 e nuovi media
Web2.0 e nuovi mediaCarlo Vaccari
 

Mais de Carlo Vaccari (20)

Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open Data
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & University
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environment
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed reader
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networks
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for business
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practice
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suite
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs Linkedin
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione Marche
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network
 
Start up innovative
Start up innovativeStart up innovative
Start up innovative
 
Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013
 
Turismo e social network
Turismo e social networkTurismo e social network
Turismo e social network
 
Turismo: i siti web
Turismo: i siti webTurismo: i siti web
Turismo: i siti web
 
Concetta De Vivo: Open Data Day Marche 2013
Concetta De Vivo: Open Data Day Marche 2013Concetta De Vivo: Open Data Day Marche 2013
Concetta De Vivo: Open Data Day Marche 2013
 
Web2.0 e nuovi media
Web2.0 e nuovi mediaWeb2.0 e nuovi media
Web2.0 e nuovi media
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

HLG Big Data project and Sandbox

  • 1. HLG Big Data project and Sandbox Carlo Vaccari (Istat) – IAOS October 2014 1
  • 2. This material is distributed under the Creative Commons "Attribution - NonCommercial - Share Alike - 3.0", available at http://creativecommons.org/licenses/by-nc-sa/3.0/ Carlo Vaccari (Istat) – IAOS October 2014 2
  • 3. Carlo Vaccari (Istat) – IAOS October 2014 3 I nt er nati onal High Level Group to coordinate groups working on Statistical Standards: UNECE, OECD, Eurostat, National Statistical Org.
  • 4. May 2013: task team with the aim to define a project to be presented to international statistical community: Three main objectives: To identify the main possibilities and the main strategic and methodological issues that Big Data poses for the official statistics To analyze the feasibility of efficient production of official statistics using Big Data sources, and the possibility to replicate these approaches across different national contexts To facilitate the sharing across organizations of knowledge, expertise, tools and methods for the production of statistics using Big Data sources Carlo Vaccari (Istat) – IAOS October 2014 4 Bi g Dat a Pr oj ect
  • 5. Project presented to HLG and CES Task teams composed by people from 13 organisations The project composed of four task teams: Partnership Task Team Privacy Task Team Quality Task Team Sandbox Task Team Carlo Vaccari (Istat) – IAOS October 2014 5 Bi g Dat a Pr oj ect
  • 6. Carlo Vaccari (Istat) – IAOS October 2014 6 Part ner Providers s hi p Task and sources of data - challenges: access to data, managing privacy and confidentiality Government (Administrative records) Private (Commercial records) Social Media and other Internet sites Design - research design and development Academia Private and/or public research institutes NGOs International organizations
  • 7. Carlo Vaccari (Istat) – IAOS October 2014 7 Part ner Technology s hi p Task - Tools, data and infrastructure for data processing, data mining, real-time analytics, storage, computing, and data visualization Private sector (technology providers, IT companies) Data providers themselves Analysis - NSOs can provide standards and methodology whereas others provide analytical capacity and modeling Academia Private and/or public research institutes NGOs International organizations
  • 8. Overview of existing tools for risk management in view of privacy issues Carlo Vaccari (Istat) – IAOS October 2014 8 Pri v acy Task Tea Risks to privacy - Privacy software Data access strategies (onsite, remote access, microdata) Overview of database privacy technologies Evaluation of different privacy approaches Big Data characteristics and their implications for data privacy Data access strategies for Big Data Computer Science and Statistical Disclosure approaches Disclosure Risk assessment for Big Data
  • 9. Information Integration and Governance (DB monitoring, security, transport security) Statistical Disclosure Limitations Carlo Vaccari (Istat) – IAOS October 2014 9 Pri v acy Task Tea Preserving confidentiality Balance between “Data utility” and “Disclosure Risk” SDL methods: Data masking Traditional approaches: aggregation, obfuscation, perturbations, data swapping Modern approaches: sampling and simulation Managing potential risk to reputation: ethical practices, controls, communication, dialog with public
  • 10. Carlo Vaccari (Istat) – IAOS October 2014 10 Quali Input t y Task Tea quality framework with indicators: Source: data-source, reliability, privacy, availability, costs, procedures, ... Metadata: representativeness, usability, completeness, id, ... Data: collection, coverage, complexity, efficiency, integrability Output quality framework with indicators: Metadata: clarity, accessibility, completeness, comprehensiveness Data: relevance, accuracy, timeliness, accessibility, coherence, predictivity, selectivity Process quality with indicators : Cleaning: unambiguous, objectivity, granularity, reliability Transformations: compliance, categorization, precision Linking: completeness, selectivity, accuracy, id, time_related Aggregation: quantity, confidentiality, Integration, validity, accuracy
  • 11. Carlo Vaccari (Istat) – IAOS October 2014 11 Sandbox Sandbox: web-accessible environment where researchers coming from different institutions explore tools and methods needed for statistical production and the feasibility of producing Big Data-derived statistics List of tools chosen: Hadoop, Hortonworks, Pentaho, RHadoop Open list ...
  • 12. Carlo Vaccari (Istat) – IAOS October 2014 12 Sandbox Sandbox hosted at the Irish Center for High- End Computing (ICHEC) which will assist the task team for the testing and evaluation of Hadoop work-flows and associated data analysis application software The mission of ICHEC is to provide High- Performance Computing (HPC) resources, support, education and training for researchers
  • 13. Carlo Vaccari (Istat) – IAOS October 2014 13 Sandbox c onfi gur The hardware on which the sandbox system is based is a High Performance Computing Linux cluster hosted in the National University of Ireland (Galway) composed of 30 nodes each of which has two quad-core processors, 48GB of RAM and a 1TB local disk Each node is connected to two networks – one for accessing the shared Lustre and one Gigabit Ethernet network for management 20TB shared filesystem is available to all nodes
  • 14. Virtual Sprint (March 2014) → first document Workshop in Rome (April 2014) Training in Rome (May 2014) Sandbox installation and verification Workshop in Heerlen (September 2014) Testing scenarios for BD usage in Official Statistics: Carlo Vaccari (Istat) – IAOS October 2014 14 Sandbox i n 2014 use as auxiliary information to improve an existing survey replacing all or part of an existing survey with Big Data producing a predefined statistical output either with or without supplementation of survey data producing a statistical output guided by findings from the data
  • 15. Carlo Vaccari (Istat) – IAOS October 2014 15 Sandbox partner Software: Hortonworks – Granted a free enterprise support subscription for the duration of the project Pentaho – Free trial of enterprise platform Data: Mobile data from Orange Smart meters data from Irish power agency Smart meters from Canadian power agency
  • 16. Carlo Vaccari (Istat) – IAOS October 2014 16 Sandbox ex peri Organized in Task teams, one for each source: Consumer Price Index Mobile phone data Smart meters Traffic loops Social Data Web scraping Job vacancies
  • 17. Carlo Vaccari (Istat) – IAOS October 2014 17 Ex peri ment Cons Sources: Web scraping from ONS (UK supermarkets) Synthetic scanner data from Istat Test performance of big data technologies applied to the computation of a simplified consumer price index, based on synthetic data sets modeling scanner data A first version of the price generator was tested successfully in generating a sample csv file with 11 billions rows, successfully uploaded in the sandbox Comparison between Hadoop ↔ NoSQL ↔ RDBMS Visual analysis of data through Pentaho suite
  • 18. Carlo Vaccari (Istat) – IAOS October 2014 18 Ex peri ment Mobil Four dataset from Orange provider for Ivory Coast: calls and duration for pair of cells for each hour calls coming from 500k phones with time and cell calls coming from 500k randomly sampled individuals communication sub-graphs for 5k users Experiments: Classification of Caller: workers, students, business, not LF, ... Classification of zones (cells): industrial, residential, school/university, farmers, high/low traffic Temporal distribution of Calls (day/week/season)
  • 19. Carlo Vaccari (Istat) – IAOS October 2014 19 Ex peri ment Mobil Parallel experiment on Slovenian and Orange data: → exchange of methods, tools, findings Searching for other datasets from other providers
  • 20. Carlo Vaccari (Istat) – IAOS October 2014 20 Ex peri ment Datasets: S mart Smart meter data from Ireland (household level, linked with 2 surveys) Synthetic smart meter data from Canada (household level, covering several years, time stamped hourly electricity consumption linked with hourly weather data and hourly price data, matched with quarterly survey data) Experiment: Rhadoop code for visualizing synthetic Canadian smart meter data, providomg time elapsed for the following: Hourly Consumption (kWh) v Hourly Temperature (C) for all data Hourly Consumption (kWh) v Hourly Price (c) for all data
  • 21. Carlo Vaccari (Istat) – IAOS October 2014 21 Ex peri ment Tr affi In the Netherlands, 20,000 traffic loops, counting the number of vehicles each minute, are located on approximately 3,000 km of speedway. All this data is collected by a central agency, the NDW (National data warehouse for traffic). Data loaded for one year for the area of South Limburg, consisting of about 800 of these traffic loop Experiment: Find out how to deal with multiple files in Hadoop See how the traffic develops during a year Deliverables: Code for aggregating the data in Hive and RHadoop A graphical representation about the development of the traffic on these roads and in this region
  • 22. Carlo Vaccari (Istat) – IAOS October 2014 22 Ex peri ment Tr affi
  • 23. Carlo Vaccari (Istat) – IAOS October 2014 23 Ex peri ment Soci Set of tweets generated in Mexico from January to July 2014: Sentimental analysis techniques in obtaining indicators of subjective wellbeing (compare with stats) Use geo-tagged tweets for analysing people movement State of origin of tourists visiting "Magic Towns" in Mexico
  • 24. Carlo Vaccari (Istat) – IAOS October 2014 24 Ex peri ment Soci Next steps: Geo-located tweets experiments on: Working patterns / commuting from morning to night Weekends / Holydays / Seasonal movements South – North mobility / Commerce at the North border Work on emoticons and media acronyms analysis: Develop a small emoticons dictionary / review research papers Count of emoticons on the tweets that we have, and how many tweets have emoticons to have an idea of their representativity power Review of algorithms: work with some MapReduce adaptations, Spark, Scala
  • 25. The Job-vacancies team works on (historical) job vacancies data, scraped from various sites on the web – goals: to identify possible both free and commercial data sources and its APIs and illustrate potential use cases to scrape job vacancies data from the biggest national websites (possibly international also) to test scraping tools (Irobotsoft and Kimonolabs) to test statistical process of data manipulation Carlo Vaccari (Istat) – IAOS October 2014 25 Ex peri ment J ob
  • 26. Carlo Vaccari (Istat) – IAOS October 2014 26 Ex peri ment Web 8,600 Italian websites, indicated by the 19,000 enterprises responding to ICT survey of year 2013, have been scraped and the acquired texts have been processed The scraping and processing work took about 33 hours on a virtual server in Italy, the goal of this activity is to reproduce the used software configuration and rerun the process on a more powerful environment in order to measure the time consumption Experiment: Configure a Nutch job runnable in the Sandbox environment Execute the scraping job in order to produce the scraped data in HDFS Compare the performance of the sandbox with the performance of a single server
  • 27. Carlo Vaccari (Istat) – IAOS October 2014 27 St at e of t he Pr All teams are running experiments and have defined objectives for final deliverables (preliminary results due for end of November, final end of year) Outline of final deliverables defined in September meetings Developed training material, available for all participants and public in future Effective cooperation and exchange of ideas: all participants requested more time for developing other experiments and look forward to extending the project
  • 28. Carlo Vaccari (Istat) – IAOS October 2014 28 Less ons Lear ned International cooperation can multiply the ideas Data acquisition can be a long process. (eg: five months to get Orange mobile data) group suggested other possible approaches for the future need “political”/legal sponsorship Setup of the environment required time → difficult to achieve "stable" configuration Training should operate on different skills: IT, statistical and algorithms. Need of people open to learn new tools, techniques, methods...
  • 29. Thank you for your attention!