The Web and social media are the environments where people post their content, opinions, activities, and resources. Therefore, a considerable amount of user-generated content is produced every day for a wide variety of purposes. On the other side, people live their everyday life immersed in the physical world, where society, economy, politics and personal relations continuously evolve. These two opposite and complementary environment are today fully integrated: they reflect each other and they interact with each other in a stronger and stronger way.
Exploring and studying content and data coming from both environments offers a great opportunity to understand the ever evolving modern society, in terms of topics of interest, events, relations, and behaviour.
In this speech I will discuss through business cases and socio-political scenarios how we can extract insights and understand reality by combining and analyzing data from the digital and physical world, so as to reach a better overall picture of reality itself. Along this path, we need to keep into account that reality is complex and varies in time, space and along many other dimensions, including societal and economic variables. The speech highlights the main challenges that need to be addressed and outlines some data science strategies that can be applied to tackle these specific challenges.
This slide deck has been presented as a keynote speech at WISE 2022 in Biarritz, France.
4. There are more things
In heaven and earth, Horatio,
Than are dreamt of in your philosophy.
Shakespeare (Hamlet Act 1, scene 5)
5. The Answer to the Great Question...
Of Life, the Universe and Everything
Data
Information
Knowledge
Wisdom
Context
independence
Understanding
Understanding relations
Understanding patterns
Understanding principles
8. Twitter sentiment vs. Gallup Poll of
Consumer Confidence
Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In ICWSM-2010
9. Stance in time on the Brexit process
Brambilla, Calisir. Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit Case.
10. Stance drift on Twitter
after the Brexit referendum
Brambilla, Calisir. Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit Case.
27. Data Quality Issue
Gartner Report
33% of the largest global companies experienced an information crisis due
to their inability to adequately value, govern and trust their enterprise
information.
If you torture the data long enough,
it will confess to anything
– Darrell Huff
39. Space Granularity: the Grid
• Regular squared grid
• Irregular grid with official business-driven meaning
• Irregular grid with data-driven definition
12/4
41. Multi-granularity and Multi-source
City-scale: mobile telephone and (gross-grain geo-located)
social media data
Street/square: people counting & profiling IoT
sensors
Point of Interest:
people counting
sensor, WiFi log analysis,
beacons and (fine grain geo-
located)
social media
55. Specific: target a specific area for improvement
Measurable: quantify or at least suggest an indicator of progress
Assignable: specify who will do it
Realistic: state what results can realistically be achieved, given available resources
Time-related: specify when the result(s) can be achieve
S.M.A.R.T. Goals!
57. A new role: the Business Translator
• Understands the business needs
• Is able to discuss with the stakeholders
• Is able to negotiate with the technical roles
TRANSFORMS BUSINESS REQUIREMENTS
INTO ACTIONABLE DATA STRATEGIES
58. GOAL: SMART Objective!
USER QUESTIONS BUSINESS METRICS TECHNICAL & QUALITY
METRICS
ACTION
USE CASE
REACTION
TIME
•Who is the
user?
•How is the
data product
used?
•Which questions do
the user ask to
reach the Goal?
•Which question do
the user ask during
that Use Case?
•Which are the dimensions that cover this
question?
•Are there any targets for that metric?
•What is the time frame to be taken into
consideration for that metric?
•Are there any relationships between the
metrics?
•Are there quality metrics to achieve in
models?
•What kind of
reasoning is
stimulated by
reading the
metric?
•What are the
actions that derive
from the
reasoning?
•When? How
often?
The Goal Poster
60. References
• Marco Di Giovanni, Francesco Pierri, Christopher Torres-Lugo, Marco Brambilla:
VaccinEU: COVID-19 Vaccine Conversations on Twitter in French, German and Italian. ICWSM 2022: 1236-1244
• Alireza Javadian Sabet, Marco Brambilla, Marjan Hosseini: A multi-perspective approach for analyzing long-running live events on social
media. A case study on the "Big Four" international fashion weeks. Online Soc. Networks Media 24: 100140 (2021)
• Marco Di Giovanni, Marco Brambilla:
Content-based Stance Classification of Tweets about the 2020 Italian Constitutional Referendum. SocialNLP@NAACL 2021: 14-23
• Marco Brambilla, Alireza Javadian Sabet, Amin Endah Sulistiawati:
Conversation Graphs in Online Social Media. ICWE 2021: 97-112
• Giorgia Ramponi, Marco Brambilla, Stefano Ceri, Florian Daniel, Marco Di Giovanni:
Content-based characterization of online social communities. Inf. Process. Manag. 57(6): 102133 (2020)
• Emre Calisir, Marco Brambilla:
The Long-Running Debate about Brexit on Social Media. ICWSM 2020: 848-852
• Marco Balduini, Marco Brambilla, Emanuele Della Valle, Christian Marazzi, Tahereh Arabghalizi, Behnam Rahdari, Michele Vescovi:
Models and Practices in Urban Data Science at Scale. Big Data Res. 17: 66-84 (2019)
• …
61. Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it
http://datascience.deib.polimi.it
http://www.marco-brambilla.com