World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
Educating Data Scientists: the SoBigData master experience
1. Social Mining & Big Data Ecosystem
Educating Data Scientists:
the SoBigData master
experience
www.sobigdata.eu
Fosca Giannotti, Valerio Grossi
ISTI-CNR Pisa
H2020-INFRAIA-2014-2015
Grant Agreement N. 654024
2. Modern science is data-intensive,
multidisciplinary, collaborative and global
– efficiency of data management (noSQL paradigms and
cloud computing play important role here) and
curation, search, sharing, transfer.
– managing the complexity of the analytical process is a
key issue (scalable distributed analytical methods and
and Visual Analytics are crucial here).
Firenze, 14 Nov 2016
4. Interdisciplinary and collaborative
• for sharing data/models/processes and results of
experiments (different level of interoperability and semantic
enrichment)
• to realize experiments by combining resources (data, methods
and results) belonging to different communities.
– This call for tools facilitating the govern of complex
analytical process in a workflow style or mega-modeling.
– This call also for sophisticate search that supports resource
discovery.
Firenze, 14 Nov 2016
5. Data scientist
A new kind of professional
has emerged, the data
scientist, who combines the
skills of software
programmer, statistician and
storyteller/artist to extract
the nuggets of gold hidden
under mountains of data.
Firenze, 14 Nov 2016
6. Four core points of a data scientist
• Data Procurement and Curation
• Making sense of Data
• Story-telling
• Respond step-by-step on technical correctness and
legal and ethical issues
Firenze, 14 Nov 2016
7. SoBigData is…
A Multidisciplinary European Infrastructure for Big Data and Social
Data Mining providing an integrated ecosystem for ethically
sensitive scientific discoveries and advanced applications of social
data mining on the various dimensions of social life, as recorded by
“big data”.
Firenze, 14 Nov 2016
8. Social Mining - Answer to:
Firenze, 14 Nov 2016
• Who will win US elections? What’s the elector’s current
intention of vote? How reliable is it?
• Which are the indicators of social well-being (beyond GDP)
and how can they be computed and monitored?
• How is the aging population effectively helped by the social
participation to digital community services?
• What is the link between media ownership and media
content? Is there bias in news reporting? And in content
reviews?
• Is an infective disease emerging? How is its diffusion model?
11. Predicting Success
“Football is a simple game: 22 men chase a ball for 90
minutes and at the end, the Germans always win”
-- Gary Lieneker (after Italy 1990 Final)
Firenze, 14 Nov 2016
12. Managing Data does not means
Support discover
Provide access, Verify the quality of data, Clean errors, outliers, anomalier
Transform data in a format suitable for specific data analytical tools
It must include support for
• legal interoperability
– copyright management,
– licensing of single and derivative products
– terms of use
• fine-grained policies
– attribution,
– citation policy,
– provenance management
• Ethics issues
Managing Data: what this means?
Firenze, 14 Nov 2016
13. Metadata in the SoBigData RI
experience
• Huge datasets often describe human activities, which implies
privacy and ethical issues
• As a Research Infrastructure FAIRness is one of our main targets
– The success of the RI is directly connected to the fact that
datasets are Findable, Accessible, Interoperable and
Reusable
– The intellectual property has to be considered
– The design of a highly structured metadata schema allows
the RI to automatically grant or deny access to a dataset, to
force the acceptance of terms of use or signing NDAs…
14. SoBigData metadata structure
• A highly structured and detailed metadata structure
has been designed in order to provide information
about:
– Description of the dataset (to make it Findable)
– How the dataset has been produced
– Intellectual Property
– Privacy issues
– Who can access the data and how (terms of use,
NDA…)
• Mainly based on the DataCite standard
15. The ethics of SoBigData
• Gathering large quantities of data has serious consequences
that SoBigData is trying to address. These consequences range
from personal harm, to issues of autonomy, injustice and
inequality.
• In order to deal with these problems, SoBigData adheres to a
value-sensitive design approach. This approach consists in using
design solutions to overcome ethical dilemma’s, in this case
those between the utility of the data gathered vs. the
protection of the individuals subject to the research.
• In order to make the ideals of SoBigData successful, scientific
methods also need to be developed in order embed moral
principles in practice.
16. Ethics: the challenge for SoBigData
• How do we create an infrastructure in which such data
and methods can be disseminated and improved
upon?
1. A Massive Online Open Cource (MOOC) which instructs all
prospective researchers about the legal and ethical
dangers of big data research and the steps they can take to
minimise these;
2. A set of workflows that outline the steps researchers can
take when designing their approach;
3. Information pop-ups which redirect researchers to state-of-
the-art ethical methods.
21. Education
• Big Data Sensing
• Big Data Mining
• Big Data Story Telling
• Big Data Technology
• Big Data for Social Good
• Big Data Ethics
Firenze, 14 Nov 2016