SlideShare uma empresa Scribd logo
1 de 48
From data and information to
knowledge: the Web of tomorrow
Serge Abiteboul
INRIA & ENS Cachan
Académie des sciences
1
Organization
• The Web today
• The Web tomorrow
– The scale
– The new frontier: knowledge
– How is knowledge acquired?
– Reasoning with distributed knowledge
– Other issues
• Conclusion
Abiteboul, Cordeliers, 2014 2
The Web today
Abiteboul, Cordeliers, 2014 3
hypertext universal library of text
and multimedia
Network of machines (Internet)
Network of content (Web)
Network of people (social networks)
personal/private data social dataAbiteboul, Cordeliers, 2014 4
Data & information
Originally, computers were seen as systems to compute
– Solving differential equations
– Cryptography
– Etc.
They are primarily used to store/manage data/information
– Accounting, banking (transactions)
– Inventory, catalogue
– Agenda, contacts
– Library, etc.
– Sciences: measurements, simulation
Abiteboul, Cordeliers, 2014 5
What has changed recently?
• The technology
– Larger/faster disks, memories
– Faster networks/processors
– Algorithmic progress, e.g., parallel computing
• The scale
– Size of data
– Number of machines
– Number of users
• The Web
– Information was living in islands with different formats,
application programming languages​​, operating systems
– The web brought universal standards for sharing information
Abiteboul, Cordeliers, 2014 6
The digital world
Hundreds of millions of web
sites
Billions of communicating
objects
Thousands of billions of pages
7Abiteboul, Cordeliers, 2014
Two main achievements of computer
science of the 20th century
• Data: database systems
• Information: Web search engines
Abiteboul, Cordeliers, 2014 8
Data: database systems
A database system plays the role of a mediator between an
intelligent user and objects storing information
Huge impact on our lives
Abiteboul, Cordeliers, 2014 9
 t,d ( Film(t, d,
« Bogart ») 
Showing(t, s, h) )
Where and
when may I
watch a movie
featuring
Bogart?
intget(intkey){
inthash=(key%TS
);while(table[has
h]=NULL&&table
[hash]-
>getKey()=key)
hash=(hash…
Logic is the beginning of wisdom, not the end. Mr. Spock, Star Trek
Where and
when may I
watch a movie
featuring
Humphrey
Bogart?
Information: Web search engine
A Web search engine plays the role of a mediator between an
intelligent user and objects storing information
Huge impact on our lives
Abiteboul, Cordeliers, 2014 10
movie
Humphrey
Bogart
Where and
when may I
watch a movie
featuring
Humphrey
Bogart?
intget(intkey){
inthash=(key%TS
);while(table[has
h]=NULL&&table
[hash]-
>getKey()=key)
hash=(hash…
Don’t be evil Google corporate motto
The secret of success
11
The improvement of a functionality
or a new functionality
Better interfaces
Better ranking of pages
Mathematical tools
Logic & algebra
Probabilities & matrix multiplications
Smart algorithms
Query optimization
Parallel algorithms
Solid engineering
Failure recovery
Clusters of thousands of machines
Taking advantage of hardware
Disk capacity
Huge and cheap memories
Informatics
Mathematics
Abiteboul, Cordeliers, 2014
Engineering
Hardware
The Web tomorrow
How can the Web evolve
to best serve you?
Garbage in Garbage outAbiteboul, Cordeliers, 2014 12
Assumption
1. The size will continue to grow
2. The information will continue to be managed by many
systems
– Vs. a company will conquer all the information of the world
3. These systems will be intelligent
– In the sense that they produce/consume knowledge and not simply
raw data
4. These systems will be willing to exchange knowledge and
collaborate to serve you
– Vs. capture you and keep you within islands of proprietary knowledge
Abiteboul, Cordeliers, 2014 13
Prediction is very difficult, especially about the future. Niels Bohr
The difficulties: The 4+1 Vs of Big data
• Volume
– Mass of data – more than can be stored/retrieved
• Velocity
– Data streams & frequent changes
• Variety
– Heterogeneity of structure, schema, ontologies…
– Distribution of sources
• Veracity
– Uncertainty, errors, imprecision, contradictions…
– Lack of quality
– Opinions, sentiments, and lies.
• Value
Abiteboul, Cordeliers, 2014 14
The scale
Abiteboul, Cordeliers, 2014 15
d
a
t
a
time
Perhaps the main problem : surviving the deluge
• More and more information available
• An issue for computer systems is to select the information
the user wants to see
– Based on importance, quality, personal interest…
Abiteboul, Cordeliers, 2014 16
A technical challenge
very large scale data/information analysis
• An old problem since the early days of computer science
• Data mining, business intelligence, big data…
• Statistics
• Complex algorithms
17
Informatics
Engineering
Hardware
Mathematics
Abiteboul, Cordeliers, 2014
The new frontier: knowledge
But of the tree of the knowledge of good and evil, thou shalt not eat of
it: for in the day that thou eatest thereof thou shalt surely die.
Genesis 2:17
Abiteboul, Cordeliers, 2014 18
Data Elementary description of
some reality
Temperature
measurements in a weather
station
Information Data with a meaning
(to construct a
representation of some
reality)
A curb giving the evolution
of the average temperature
in a place throughout the
year
Knowledge Information equipped
with some notion of truth
and more generally some
general laws that have
been inferred
The fact that temperature
on the earth is augmenting
because of human activity
Data, information, knowledge
Abiteboul, Cordeliers, 2014 19
Ontologies
Knowledge representation: logical sentences such as
The royal society is_a An academy
The royal society is_a Scientific organization
The royal society is_a National institution
The royal society is_located London, United Kingdom
The royal society founded 1660
L’académie des sciences founded 1666
…
A collection of such statements is called an ontology
Abiteboul, Cordeliers, 2014 20
Machines prefer formatted knowledge
Machines have difficulties with
human languages
Machines are better at handling
more formatted knowledge
Abiteboul, Cordeliers, 2014 21
Text (from Wikipedia) Knowledge
The Royal Society of London
for Improving Natural
Knowledge, commonly
known as the Royal Society,
is a learned society for
science, and is possibly the
oldest such society still in
existence…
is_a(The royal society,
Scientific organization)
founded(The royal society, 1660)
∀x,y ( founded(x,y) ⋀
is_a(x, Scientific organization)
⇒ y≥1660 )
What are ontologies useful for?
To answer queries more precisely
When was the Royal Society Where is the Royal Society
founded? located?
Abiteboul, Cordeliers, 2014 22
Science
Academy
Royal
Society
London
Royal Society
of London
is_located
is_a
1660
founded
To integrate data from several sources
When was the science academy located
in London founded?
How is knowledge obtained?
Abiteboul, Cordeliers, 2014 23
How is knowledge obtained?
• Humans specify explicitly the knowledge
– Standard in science, much less in “real-world”
• But
– People like to talk/write in their natural languages
– People like to publish on the web: web pages, blogs, tweets, votes,
recommendations, opinions…
– They do not appreciate the constraints of a knowledge editor
– They want to keep their visibility
• Machines will have to obtain knowledge
Abiteboul, Cordeliers, 2014 24
Automatic and collective
acquisition of knowledge
(*) knowledge = formal/digital knowledge EARLIER
• Extraction from text
• Web graph analysis
• Collaboration
• User’s opinion
• Recommendation
• Reasoning and inference
Main issue: quality
Abiteboul, Cordeliers, 2014 25
Knowledge extraction from text
Abiteboul, Cordeliers, 2014 26
Construction of large knowledge bases
– Complex linguistic problem with inaccuracies and errors
– E.g.: Yago (from Wikipedia) at Max Plank Institute
Search for syntactic patterns such as
– <person> is married to <person>
– Woody Allen got hooked with Soon-Yi Previn
You may think that the problem is to find such sentences
Wrong: The problem is to keep a balance between
precision: fraction of results that are correct
recall: fraction of correct results that are retrieved
Woody Allen Soon-Yi Previn
Aligning ontologies
• Manually – see Linked data further
• Automatically
– E.g.: Paris system at Inria & Telecom ParisTech
Abiteboul, Cordeliers, 2014 27
Rodin
Camille
Claudel
Poet
Claudel
profession
Rodin
id445
lover
Sculptress
Claudel
Camille Camille French
professionnationalityFNfn
Claudel
name
✕
Knowledge acquisition: collaboration
Internauts perform collectively tasks they cannot solve individually
– No payment & little management
Wikipedia: encyclopedia
– 281 editions; 3 millions articles for the English version
– Extremely used
– Covers a much larger spectrum than a traditional encyclopedia
– Controversial quality
You probably heard that this is the work of amateurs and thus that it
cannot be correct
Wrong: the main issue is the stronger presence of professionals with
personal agendas
Abiteboul, Cordeliers, 2014 28
Collaboration (end)
Linked data project
– Publish ontologies : Publish facts/relations and links between them
• (2011) 31 billion facts/relations;
– Align them with other ontologies: Publish links
• (2011) 504 million links
Open source software E.g., Linux: operating system
– You may think that the Internet and the Web are major successes of the
industry notably US
– Wrong: They are mostly running on open-source code
Crowdsourcing Publish questions ☛ Internauts answer
– Mechanical Turk of Amazon
– Reference to "The Turk," a chess-playing automaton of the 18th
Foldit: decoding the structure of an enzyme close to the AIDS virus
Abiteboul, Cordeliers, 2014 29
Opinions & recommendations
Learn the opinions of the
Internauts
– Quantitative (***)
– Qualitative (“dating spot”)
Increasingly standard
– Movies in eBay
– Lodging in Airbnb
– Food in TripAdvisor…
Users are taking over the role
of specialists - spamming
From users opinions/purchases,
derive recommendations
– Meetic organizes dates
– Netflix suggests movies
– Amazon suggests books
Statistical analysis to discover
proximities
– Between customers in Meetic
– Between customers and
products in Netflix or Amazon
Abiteboul, Cordeliers, 2014 30
Opinions: voting with Web graph analysis
You were perhaps told that a search engine is great because of the
amount of information it indexes
Wrong: indexing billions of Web pages is simple compared to
selecting the links that appear on the first page of answers, which
has become a business and even political issue
• Based on the opinions of Web users
Abiteboul, Cordeliers, 2014 31
Reasoning with
distributed knowledge
Abiteboul, Cordeliers, 2014 32
Inferring knowledge
Using facts such as
Psycho is a Hitchcock movie
And rules such as
WantsToSee( Alice, t )  Film( t, Hitchcock, a ), not Seen( Alice, t )
We can infer “intentional” facts such as
Alice would like to see the movie Psycho
Inference is complicated
– Facts on the Web are uncertain: probabilities
– Facts on the Web may contain contradictions
– Scale
• Avoid inferring all that may be inferred – too costly
Abiteboul, Cordeliers, 2014 33
Where is the truth?
• Is everything true?
– People rarely publish that something is false, e.g., “Elvis was not French”
• There are too many false statements to state
– One can use inference: “Elvis was American”; so it is likely that “Elvis was
not French”
• We have to deal with contradictions
– Elvis Presley died on August 16, 1977 & The King is alive
– Using voting, a machine may derive that “it is very likely that Elvis died”
• Difficulties
– Reason with inconsistencies ⇒ problems with logic
– Truth becomes uncertain ⇒ measure uncertainty with probabilities
– There may be different truths (viewpoints)
Abiteboul, Cordeliers, 2014 34
The more I see, the less I know for sure. John Lennon
Illustration: Recent work on corroboration
To determine the true/false facts:
– Use voting, get an estimate of the “truth value” of facts
– Based on that, determine the “quality” of the data sources
– Using this quality, get a better estimate of the “truth value” of facts
– Then, get a better estimate of the “quality” of data sources…
A human learns the difference between a newspaper and a tabloid
On the Web, too many sources of information – machines will
have to separate the wheat from the chaff
Abiteboul, Cordeliers, 2014 35
Where is the truth? (end)
Abiteboul, Cordeliers, 2014 36
In a classical database: everything that is not in the database is
assumed to be false – closed world assumption
On the Web, if you don’t know something, it may be out there…
Or not – open world assumption
Difficulties
– We cannot bring all the Web locally
– We cannot visit all the Web
– How do I know where to find something?
– How do I know it is not out there?
Explaining
• Users want to understand the information
they see, the answers they are given
In their professional life and
in their social life as well
– E.g., Alice has a new boyfriend – How was this determined?
• Analogy: in sciences, knowing the result of an experiment
without knowing its conditions is often useless
• Difficulties
– Reasoning with large number of facts
– Information is often probabilistic and not public
– Requires knowing how the information was obtained (its provenance)
Abiteboul, Cordeliers, 2014 37
Other issues
Abiteboul, Cordeliers, 2014 38
Privacy
• Users want to control their data
• They have personal data
– They want to share with their friends
– They are willing to give to systems to get more personalized services
• The systems want to get their personal data
– For personalized ads
– To sell to other business
• How to do we that?
– Better systems
– Better laws
– Better users (digital literacy)
Abiteboul, Cordeliers, 2014 39
Serendipity
• You may hear by chance a
song that is going to totally
obsess you
• A librarian may suggest your
reading an article that will
transform your research
This is serendipity
• A perfect search engine
• A perfect recommendation
system
• A perfect computer assistant
Such systems are boring
They lack serendipity
Abiteboul, Cordeliers, 2014 40
Design programs that would introduce serendipity in our lives
• Random algorithms & sentiment analysis
Hypermnesia
Exceptionally exact or vivid memory,
especially as associated with
certain mental illnesses
For a user: We cannot live knowing
that any word, any move will leave
a trace?
For the ecosystem: We cannot store all
the data we produce – lack of
storage resources
Abiteboul, Cordeliers, 2014 41
Forgetting is Key to a Healthy Mind
Scientific American
Image: Aaron Goodman
• A main issue is to select the information we choose to keep
The Babel of human-machine-interaction
• Each time a user interacts with a data source, does he have to
use the ontology of that source ?
– How the source organizes information
– The particular terminology it uses
– Its sequencing of tasks, its codes…
• No!
• Instead of a user adapting to the ontologies of the N systems
he uses each day
• We want the N systems to adapt to the user’s ontology
Abiteboul, Cordeliers, 2014 42
Religion…science…machines
• Knowledge used to be determined by religion
• Knowledge determined scientifically
• Knowledge determined by machines
– Match making
– Medical diagnosis…
• Decisions are increasingly made by machines
– Stock market (automatic trading)
– Fully automated factory
– Fully automated metros
– Death penalty (killer drones)…
Abiteboul, Cordeliers, 2014 43
Conclusion
Abiteboul, Cordeliers, 2014 44
to the digital world!
• The massive use of digital information has modified in depth
all facets of our life : work, science, education, health, politics,
etc.
• We will soon be living in a world surrounded by machines that
– acquire knowledge for us
– remember knowledge for us
– reason for us
– communicate with other machines at a level unthinkable before
• What will we do with that technology?
• Will we become smarter ?
• Will we become master or slave of the new technology?
Abiteboul, Cordeliers, 2014 45
Les soins personnalisés
• Toutes les données
médicales de la personne
– Son génome
• Toutes ses données sociales
• Soins personnalisés
• Mesures prédictives
Les polices personnalisées
• Plus chères pour les
personnes à risque
• Personnes « trop » à risque
non assurées
• Mutualisation des risques
de plus en plus limitée
C’est la même science qui rend ça possible
Quel monde souhaitons-nous?
Abiteboul, Cordeliers, 2014 46
Exemple :
La santé
How can we get prepared to these changes?
Informatics and digital
humanities are at the cross
road of these questions
Learn informatics
• To understand the digital
world you are living in
• To decide your life in the
digital world
• To participate in/contribute
to the digital world
Abiteboul, Cordeliers, 2014 47
Merci
!
Abiteboul, Cordeliers, 2014 48

Mais conteúdo relacionado

Destaque

Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaumKezhan SHI
 
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesNorme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesKezhan SHI
 
Confidentialité des données michel béra
Confidentialité des données   michel béraConfidentialité des données   michel béra
Confidentialité des données michel béraKezhan SHI
 
Arbres de régression et modèles de durée
Arbres de régression et modèles de duréeArbres de régression et modèles de durée
Arbres de régression et modèles de duréeKezhan SHI
 
L'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceL'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceKezhan SHI
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohenKezhan SHI
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellierKezhan SHI
 
Présentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanPrésentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanKezhan SHI
 
Big data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesBig data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesKezhan SHI
 
Big data en (ré)assurance régis delayet
Big data en (ré)assurance   régis delayetBig data en (ré)assurance   régis delayet
Big data en (ré)assurance régis delayetKezhan SHI
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentationkreaume
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsStefan Urbanek
 
Knowledge management
Knowledge managementKnowledge management
Knowledge managementSehar Abbas
 

Destaque (13)

Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaum
 
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesNorme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
 
Confidentialité des données michel béra
Confidentialité des données   michel béraConfidentialité des données   michel béra
Confidentialité des données michel béra
 
Arbres de régression et modèles de durée
Arbres de régression et modèles de duréeArbres de régression et modèles de durée
Arbres de régression et modèles de durée
 
L'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceL'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data science
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohen
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellier
 
Présentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanPrésentation Françoise Soulié Fogelman
Présentation Françoise Soulié Fogelman
 
Big data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesBig data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuaires
 
Big data en (ré)assurance régis delayet
Big data en (ré)assurance   régis delayetBig data en (ré)assurance   régis delayet
Big data en (ré)assurance régis delayet
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentation
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: Models
 
Knowledge management
Knowledge managementKnowledge management
Knowledge management
 

Semelhante a From data and information to knowledge : the web of tomorrow - Serge abitboul - Université d'été des Actuaires

Into the next dimension
Into the next dimensionInto the next dimension
Into the next dimensionEd Charbeneau
 
Introduction to Artificial Intelligences
Introduction to Artificial IntelligencesIntroduction to Artificial Intelligences
Introduction to Artificial IntelligencesMeenakshi Paul
 
computer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial Intelligencecomputer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial IntelligenceKhanKhaja1
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015Jonathan Woodward
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Jisc
 
e-Research and the Demise of the Scholarly Article
e-Research and the Demise of the Scholarly Articlee-Research and the Demise of the Scholarly Article
e-Research and the Demise of the Scholarly ArticleDavid De Roure
 
Social Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationSocial Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationDavid De Roure
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
The_Information_Age.pptx
The_Information_Age.pptxThe_Information_Age.pptx
The_Information_Age.pptxAllisaAlcober1
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective LivesJoy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective LivesBayCHI
 
Forethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital ScholarshipForethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital ScholarshipDavid De Roure
 
Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online caniceconsulting
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at WorkTarek Hoteit
 

Semelhante a From data and information to knowledge : the web of tomorrow - Serge abitboul - Université d'été des Actuaires (20)

Into the next dimension
Into the next dimensionInto the next dimension
Into the next dimension
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
Introduction to Artificial Intelligences
Introduction to Artificial IntelligencesIntroduction to Artificial Intelligences
Introduction to Artificial Intelligences
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
computer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial Intelligencecomputer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial Intelligence
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
 
e-Research and the Demise of the Scholarly Article
e-Research and the Demise of the Scholarly Articlee-Research and the Demise of the Scholarly Article
e-Research and the Demise of the Scholarly Article
 
Social Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationSocial Machines of Scholarly Collaboration
Social Machines of Scholarly Collaboration
 
AI history (Epita International Masters)
AI history (Epita International Masters)AI history (Epita International Masters)
AI history (Epita International Masters)
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]
 
The_Information_Age.pptx
The_Information_Age.pptxThe_Information_Age.pptx
The_Information_Age.pptx
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective LivesJoy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
 
Forethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital ScholarshipForethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital Scholarship
 
Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at Work
 

Mais de Kezhan SHI

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septKezhan SHI
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Kezhan SHI
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[Kezhan SHI
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septKezhan SHI
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_Kezhan SHI
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standardKezhan SHI
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014Kezhan SHI
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilanKezhan SHI
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Kezhan SHI
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2Kezhan SHI
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2Kezhan SHI
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Kezhan SHI
 
Rapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILRapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILKezhan SHI
 
Xavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionXavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionKezhan SHI
 

Mais de Kezhan SHI (16)

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-sept
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-sept
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2
 
Rapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILRapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNIL
 
Xavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionXavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régression
 

Último

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Último (20)

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

From data and information to knowledge : the web of tomorrow - Serge abitboul - Université d'été des Actuaires

  • 1. From data and information to knowledge: the Web of tomorrow Serge Abiteboul INRIA & ENS Cachan Académie des sciences 1
  • 2. Organization • The Web today • The Web tomorrow – The scale – The new frontier: knowledge – How is knowledge acquired? – Reasoning with distributed knowledge – Other issues • Conclusion Abiteboul, Cordeliers, 2014 2
  • 3. The Web today Abiteboul, Cordeliers, 2014 3
  • 4. hypertext universal library of text and multimedia Network of machines (Internet) Network of content (Web) Network of people (social networks) personal/private data social dataAbiteboul, Cordeliers, 2014 4
  • 5. Data & information Originally, computers were seen as systems to compute – Solving differential equations – Cryptography – Etc. They are primarily used to store/manage data/information – Accounting, banking (transactions) – Inventory, catalogue – Agenda, contacts – Library, etc. – Sciences: measurements, simulation Abiteboul, Cordeliers, 2014 5
  • 6. What has changed recently? • The technology – Larger/faster disks, memories – Faster networks/processors – Algorithmic progress, e.g., parallel computing • The scale – Size of data – Number of machines – Number of users • The Web – Information was living in islands with different formats, application programming languages​​, operating systems – The web brought universal standards for sharing information Abiteboul, Cordeliers, 2014 6
  • 7. The digital world Hundreds of millions of web sites Billions of communicating objects Thousands of billions of pages 7Abiteboul, Cordeliers, 2014
  • 8. Two main achievements of computer science of the 20th century • Data: database systems • Information: Web search engines Abiteboul, Cordeliers, 2014 8
  • 9. Data: database systems A database system plays the role of a mediator between an intelligent user and objects storing information Huge impact on our lives Abiteboul, Cordeliers, 2014 9  t,d ( Film(t, d, « Bogart »)  Showing(t, s, h) ) Where and when may I watch a movie featuring Bogart? intget(intkey){ inthash=(key%TS );while(table[has h]=NULL&&table [hash]- >getKey()=key) hash=(hash… Logic is the beginning of wisdom, not the end. Mr. Spock, Star Trek Where and when may I watch a movie featuring Humphrey Bogart?
  • 10. Information: Web search engine A Web search engine plays the role of a mediator between an intelligent user and objects storing information Huge impact on our lives Abiteboul, Cordeliers, 2014 10 movie Humphrey Bogart Where and when may I watch a movie featuring Humphrey Bogart? intget(intkey){ inthash=(key%TS );while(table[has h]=NULL&&table [hash]- >getKey()=key) hash=(hash… Don’t be evil Google corporate motto
  • 11. The secret of success 11 The improvement of a functionality or a new functionality Better interfaces Better ranking of pages Mathematical tools Logic & algebra Probabilities & matrix multiplications Smart algorithms Query optimization Parallel algorithms Solid engineering Failure recovery Clusters of thousands of machines Taking advantage of hardware Disk capacity Huge and cheap memories Informatics Mathematics Abiteboul, Cordeliers, 2014 Engineering Hardware
  • 12. The Web tomorrow How can the Web evolve to best serve you? Garbage in Garbage outAbiteboul, Cordeliers, 2014 12
  • 13. Assumption 1. The size will continue to grow 2. The information will continue to be managed by many systems – Vs. a company will conquer all the information of the world 3. These systems will be intelligent – In the sense that they produce/consume knowledge and not simply raw data 4. These systems will be willing to exchange knowledge and collaborate to serve you – Vs. capture you and keep you within islands of proprietary knowledge Abiteboul, Cordeliers, 2014 13 Prediction is very difficult, especially about the future. Niels Bohr
  • 14. The difficulties: The 4+1 Vs of Big data • Volume – Mass of data – more than can be stored/retrieved • Velocity – Data streams & frequent changes • Variety – Heterogeneity of structure, schema, ontologies… – Distribution of sources • Veracity – Uncertainty, errors, imprecision, contradictions… – Lack of quality – Opinions, sentiments, and lies. • Value Abiteboul, Cordeliers, 2014 14
  • 16. d a t a time Perhaps the main problem : surviving the deluge • More and more information available • An issue for computer systems is to select the information the user wants to see – Based on importance, quality, personal interest… Abiteboul, Cordeliers, 2014 16
  • 17. A technical challenge very large scale data/information analysis • An old problem since the early days of computer science • Data mining, business intelligence, big data… • Statistics • Complex algorithms 17 Informatics Engineering Hardware Mathematics Abiteboul, Cordeliers, 2014
  • 18. The new frontier: knowledge But of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die. Genesis 2:17 Abiteboul, Cordeliers, 2014 18
  • 19. Data Elementary description of some reality Temperature measurements in a weather station Information Data with a meaning (to construct a representation of some reality) A curb giving the evolution of the average temperature in a place throughout the year Knowledge Information equipped with some notion of truth and more generally some general laws that have been inferred The fact that temperature on the earth is augmenting because of human activity Data, information, knowledge Abiteboul, Cordeliers, 2014 19
  • 20. Ontologies Knowledge representation: logical sentences such as The royal society is_a An academy The royal society is_a Scientific organization The royal society is_a National institution The royal society is_located London, United Kingdom The royal society founded 1660 L’académie des sciences founded 1666 … A collection of such statements is called an ontology Abiteboul, Cordeliers, 2014 20
  • 21. Machines prefer formatted knowledge Machines have difficulties with human languages Machines are better at handling more formatted knowledge Abiteboul, Cordeliers, 2014 21 Text (from Wikipedia) Knowledge The Royal Society of London for Improving Natural Knowledge, commonly known as the Royal Society, is a learned society for science, and is possibly the oldest such society still in existence… is_a(The royal society, Scientific organization) founded(The royal society, 1660) ∀x,y ( founded(x,y) ⋀ is_a(x, Scientific organization) ⇒ y≥1660 )
  • 22. What are ontologies useful for? To answer queries more precisely When was the Royal Society Where is the Royal Society founded? located? Abiteboul, Cordeliers, 2014 22 Science Academy Royal Society London Royal Society of London is_located is_a 1660 founded To integrate data from several sources When was the science academy located in London founded?
  • 23. How is knowledge obtained? Abiteboul, Cordeliers, 2014 23
  • 24. How is knowledge obtained? • Humans specify explicitly the knowledge – Standard in science, much less in “real-world” • But – People like to talk/write in their natural languages – People like to publish on the web: web pages, blogs, tweets, votes, recommendations, opinions… – They do not appreciate the constraints of a knowledge editor – They want to keep their visibility • Machines will have to obtain knowledge Abiteboul, Cordeliers, 2014 24
  • 25. Automatic and collective acquisition of knowledge (*) knowledge = formal/digital knowledge EARLIER • Extraction from text • Web graph analysis • Collaboration • User’s opinion • Recommendation • Reasoning and inference Main issue: quality Abiteboul, Cordeliers, 2014 25
  • 26. Knowledge extraction from text Abiteboul, Cordeliers, 2014 26 Construction of large knowledge bases – Complex linguistic problem with inaccuracies and errors – E.g.: Yago (from Wikipedia) at Max Plank Institute Search for syntactic patterns such as – <person> is married to <person> – Woody Allen got hooked with Soon-Yi Previn You may think that the problem is to find such sentences Wrong: The problem is to keep a balance between precision: fraction of results that are correct recall: fraction of correct results that are retrieved Woody Allen Soon-Yi Previn
  • 27. Aligning ontologies • Manually – see Linked data further • Automatically – E.g.: Paris system at Inria & Telecom ParisTech Abiteboul, Cordeliers, 2014 27 Rodin Camille Claudel Poet Claudel profession Rodin id445 lover Sculptress Claudel Camille Camille French professionnationalityFNfn Claudel name ✕
  • 28. Knowledge acquisition: collaboration Internauts perform collectively tasks they cannot solve individually – No payment & little management Wikipedia: encyclopedia – 281 editions; 3 millions articles for the English version – Extremely used – Covers a much larger spectrum than a traditional encyclopedia – Controversial quality You probably heard that this is the work of amateurs and thus that it cannot be correct Wrong: the main issue is the stronger presence of professionals with personal agendas Abiteboul, Cordeliers, 2014 28
  • 29. Collaboration (end) Linked data project – Publish ontologies : Publish facts/relations and links between them • (2011) 31 billion facts/relations; – Align them with other ontologies: Publish links • (2011) 504 million links Open source software E.g., Linux: operating system – You may think that the Internet and the Web are major successes of the industry notably US – Wrong: They are mostly running on open-source code Crowdsourcing Publish questions ☛ Internauts answer – Mechanical Turk of Amazon – Reference to "The Turk," a chess-playing automaton of the 18th Foldit: decoding the structure of an enzyme close to the AIDS virus Abiteboul, Cordeliers, 2014 29
  • 30. Opinions & recommendations Learn the opinions of the Internauts – Quantitative (***) – Qualitative (“dating spot”) Increasingly standard – Movies in eBay – Lodging in Airbnb – Food in TripAdvisor… Users are taking over the role of specialists - spamming From users opinions/purchases, derive recommendations – Meetic organizes dates – Netflix suggests movies – Amazon suggests books Statistical analysis to discover proximities – Between customers in Meetic – Between customers and products in Netflix or Amazon Abiteboul, Cordeliers, 2014 30
  • 31. Opinions: voting with Web graph analysis You were perhaps told that a search engine is great because of the amount of information it indexes Wrong: indexing billions of Web pages is simple compared to selecting the links that appear on the first page of answers, which has become a business and even political issue • Based on the opinions of Web users Abiteboul, Cordeliers, 2014 31
  • 33. Inferring knowledge Using facts such as Psycho is a Hitchcock movie And rules such as WantsToSee( Alice, t )  Film( t, Hitchcock, a ), not Seen( Alice, t ) We can infer “intentional” facts such as Alice would like to see the movie Psycho Inference is complicated – Facts on the Web are uncertain: probabilities – Facts on the Web may contain contradictions – Scale • Avoid inferring all that may be inferred – too costly Abiteboul, Cordeliers, 2014 33
  • 34. Where is the truth? • Is everything true? – People rarely publish that something is false, e.g., “Elvis was not French” • There are too many false statements to state – One can use inference: “Elvis was American”; so it is likely that “Elvis was not French” • We have to deal with contradictions – Elvis Presley died on August 16, 1977 & The King is alive – Using voting, a machine may derive that “it is very likely that Elvis died” • Difficulties – Reason with inconsistencies ⇒ problems with logic – Truth becomes uncertain ⇒ measure uncertainty with probabilities – There may be different truths (viewpoints) Abiteboul, Cordeliers, 2014 34 The more I see, the less I know for sure. John Lennon
  • 35. Illustration: Recent work on corroboration To determine the true/false facts: – Use voting, get an estimate of the “truth value” of facts – Based on that, determine the “quality” of the data sources – Using this quality, get a better estimate of the “truth value” of facts – Then, get a better estimate of the “quality” of data sources… A human learns the difference between a newspaper and a tabloid On the Web, too many sources of information – machines will have to separate the wheat from the chaff Abiteboul, Cordeliers, 2014 35 Where is the truth? (end)
  • 36. Abiteboul, Cordeliers, 2014 36 In a classical database: everything that is not in the database is assumed to be false – closed world assumption On the Web, if you don’t know something, it may be out there… Or not – open world assumption Difficulties – We cannot bring all the Web locally – We cannot visit all the Web – How do I know where to find something? – How do I know it is not out there?
  • 37. Explaining • Users want to understand the information they see, the answers they are given In their professional life and in their social life as well – E.g., Alice has a new boyfriend – How was this determined? • Analogy: in sciences, knowing the result of an experiment without knowing its conditions is often useless • Difficulties – Reasoning with large number of facts – Information is often probabilistic and not public – Requires knowing how the information was obtained (its provenance) Abiteboul, Cordeliers, 2014 37
  • 39. Privacy • Users want to control their data • They have personal data – They want to share with their friends – They are willing to give to systems to get more personalized services • The systems want to get their personal data – For personalized ads – To sell to other business • How to do we that? – Better systems – Better laws – Better users (digital literacy) Abiteboul, Cordeliers, 2014 39
  • 40. Serendipity • You may hear by chance a song that is going to totally obsess you • A librarian may suggest your reading an article that will transform your research This is serendipity • A perfect search engine • A perfect recommendation system • A perfect computer assistant Such systems are boring They lack serendipity Abiteboul, Cordeliers, 2014 40 Design programs that would introduce serendipity in our lives • Random algorithms & sentiment analysis
  • 41. Hypermnesia Exceptionally exact or vivid memory, especially as associated with certain mental illnesses For a user: We cannot live knowing that any word, any move will leave a trace? For the ecosystem: We cannot store all the data we produce – lack of storage resources Abiteboul, Cordeliers, 2014 41 Forgetting is Key to a Healthy Mind Scientific American Image: Aaron Goodman • A main issue is to select the information we choose to keep
  • 42. The Babel of human-machine-interaction • Each time a user interacts with a data source, does he have to use the ontology of that source ? – How the source organizes information – The particular terminology it uses – Its sequencing of tasks, its codes… • No! • Instead of a user adapting to the ontologies of the N systems he uses each day • We want the N systems to adapt to the user’s ontology Abiteboul, Cordeliers, 2014 42
  • 43. Religion…science…machines • Knowledge used to be determined by religion • Knowledge determined scientifically • Knowledge determined by machines – Match making – Medical diagnosis… • Decisions are increasingly made by machines – Stock market (automatic trading) – Fully automated factory – Fully automated metros – Death penalty (killer drones)… Abiteboul, Cordeliers, 2014 43
  • 45. to the digital world! • The massive use of digital information has modified in depth all facets of our life : work, science, education, health, politics, etc. • We will soon be living in a world surrounded by machines that – acquire knowledge for us – remember knowledge for us – reason for us – communicate with other machines at a level unthinkable before • What will we do with that technology? • Will we become smarter ? • Will we become master or slave of the new technology? Abiteboul, Cordeliers, 2014 45
  • 46. Les soins personnalisés • Toutes les données médicales de la personne – Son génome • Toutes ses données sociales • Soins personnalisés • Mesures prédictives Les polices personnalisées • Plus chères pour les personnes à risque • Personnes « trop » à risque non assurées • Mutualisation des risques de plus en plus limitée C’est la même science qui rend ça possible Quel monde souhaitons-nous? Abiteboul, Cordeliers, 2014 46 Exemple : La santé
  • 47. How can we get prepared to these changes? Informatics and digital humanities are at the cross road of these questions Learn informatics • To understand the digital world you are living in • To decide your life in the digital world • To participate in/contribute to the digital world Abiteboul, Cordeliers, 2014 47