SlideShare uma empresa Scribd logo
1 de 38
MACHINE-IN-THE-
LOOP FOR
KNOWLEDGE
DISCOVERY
Max Kleiman-
Weiner
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
2
MACHINE-IN-THE-LOOP
FOR KNOWLEDGE
DISCOVERY
♒
♒
♒
♒
♒
♒
♒
♒
♒
♒
♒
♒
♒
♒
BOSTON 2015
@opendatasc
iMax Kleiman-Weiner
Co-founder of Diffeo
Co-organizer of TREC-KBA
3
Analysis & Production
Collection
& Processing
Dissemination
to Customers
People close this gap.
4
Search
Recommender
Engines
Text
Analytics
Analysis & Production
Collection
& Processing
Dissemination
to Customers
People close this gap.
5
Search
Recommender
Engines
Text
Analytics
Analysis & Production
Collection
& Processing
Dissemination
to Customers
People close this gap.
Reporting
Tools
6
Search
Recommender
Engines
Text
Analytics
Analysis & Production
Collection
& Processing
Dissemination
to Customers
People close this gap.
helps.
Reporting
Tools
7
From: Google Alerts <googlealerts-noreply@google.com>
Subject: Google Alert - "John R. Frank”
=== Web - 2 new results for ["John R. Frank"] ===
John R. Frank
SPOKANE, Wash. - John R. Frank, 55, died March 4, 2012, in Coeur
d' Alene, Idaho. Survivors include: his wife, Miki; daughter, Patricia
Frank; ...
<http://www.hutchnews.com/obituaries/Frank--John-CP>
In Memory of John R Frank
Biography. John R. Frank, age 55, passed away at Sacred Heart
Medical Center in Spokane, WA, on March 4, 2012. John was born in
Hutchison, KS, ...
<http://www.englishfuneralchapel.com/sitemaker/sites/Englis1/obit.cgi?user=583335Frank>
8
From: Google Alerts <googlealerts-noreply@google.com>
Subject: Google Alert - "John R. Frank”
=== Web - 2 new results for ["John R. Frank"] ===
John R. Frank
SPOKANE, Wash. - John R. Frank, 55, died March 4, 2012, in Coeur
d' Alene, Idaho. Survivors include: his wife, Miki; daughter, Patricia
Frank; ...
<http://www.hutchnews.com/obituaries/Frank--John-CP>
In Memory of John R Frank
Biography. John R. Frank, age 55, passed away at Sacred Heart
Medical Center in Spokane, WA, on March 4, 2012. John was born in
Hutchison, KS, ...
<http://www.englishfuneralchapel.com/sitemaker/sites/Englis1/obit.cgi?user=583335Frank>
KBAa TREC evaluation
How many days must a news article wait before
being cited in Wikipedia?
KBAa TREC evaluation
Knowledge Base Acceleration?
rate of assimilation << stream size
# editors << # entities << # mentions
(definition of a “large” KB)
KBAa TREC evaluation
TREC KBA 2014
Stream Filtering to
Recommend Citations and Fill Slots
1) Initialize with target entities
• Start with citations before dateX (20%)
1) Iterate over stream of text items
• Predict citations after dateX (80%)
• Citation = “vital”, i.e. changes profile
2) Identify infobox slot values
Content Stream
• 1.2bn texts, 50% English
• >500M tagged by Serif
• 13,663 hours of data (19 months)
• 105 docs-per-hour
• News, blogs, forums, preprints, and more
Your KBA
System
Entities in Wikipedia or
another Knowledge Base
Automatically
recommend new
edits
KBAa TREC evaluation
1
10
4250 4270 4290 4310 4330 4350
Citationworthy
doccounts
hour number in stream corpus
BBC: “What would you say to
forming The Beatles - The Next
Generation” …
James: “… I'd be up for it.”
echo
Entity profiles evolve over time!
Exploring these issues in new track: TREC Dynamic Domain
SSF example:
“FounderOf” slot on James McCartney
KBAa TREC evaluation
1
10
100
2011-10-05-00
2011-10-10-17
2011-10-16-09
2011-10-22-01
2011-10-27-17
2011-11-02-09
2011-11-08-01
2011-11-13-17
2011-11-19-09
2011-11-25-01
2011-11-30-17
2011-12-06-09
2011-12-12-01
2011-12-17-17
2011-12-23-09
2011-12-29-01
2012-01-03-17
2012-01-09-09
2012-01-15-01
2012-01-20-17
2012-01-26-09
2012-02-01-01
2012-02-06-17
2012-02-12-09
2012-02-18-01
2012-02-23-17
2012-02-29-09
2012-03-06-01
2012-03-11-17
2012-03-17-09
2012-03-23-01
2012-03-28-17
2012-04-03-09
2012-04-09-01
2012-04-14-17
2012-04-20-09
2012-04-26-01
2012-05-01-17
2012-05-07-09
2012-05-13-01
2012-05-18-17
2012-05-24-09
2012-05-30-01
2012-06-04-17
2012-06-10-09
2012-06-16-01
2012-06-21-17
2012-06-27-09
2012-07-03-02
2012-07-08-20
2012-07-14-12
2012-07-20-04
2012-07-25-20
2012-07-31-12
2012-08-06-04
2012-08-11-20
2012-08-17-12
2012-08-23-04
2012-08-28-20
2012-09-03-12
2012-09-09-04
2012-09-14-20
2012-09-20-12
2012-09-26-04
2012-10-01-20
2012-10-07-12
2012-10-13-04
2012-10-18-20
2012-10-24-12
2012-10-30-04
2012-11-04-20
2012-11-10-12
2012-11-16-04
2012-11-21-20
2012-11-27-12
2012-12-03-04
2012-12-08-20
2012-12-14-12
2012-12-20-04
2012-12-25-20
2012-12-31-12
2013-01-06-04
2013-01-11-20
2013-01-17-12
2013-01-23-04
2013-01-28-20
2013-02-03-12
2013-02-09-04
Corpus counts per hour
Pre-hoc assess all surface mentions
“vital” docs change entity profile, usually filling a slot
training
evaluation
http://trec-kba.org/
KBAa TREC evaluation
1,187,974,321 documents total
579,838,246 English docs tagged by BBN Serif
• NER, within-doc coref
• Sentence Parse Trees
• Relations
Available at: http://trec-kba.org/
15
15
What entity resolvers do easily
is boring.
Wang Qishan, China's top graft buster and a
member of the Politburo Standing Committee,
making him one of the seven most powerful people
in China
Wang Qishan, male, Han nationality, is a native of
Tianzhen County, Shanxi province. Director, Rural
Development Research Center of the State
Council, Communication Office. He served as a
member of the CPC Beijing Municipal Committee in
2003, and he was appointed mayor of Beijing in the
same year.
Wang, 64, is member of the Political Bureau of the
17th CPC Central Committee and vice premier. He
was also elected into the new Central Commission
for Discipline Inspection.
16
16
What entity resolvers should do
is exciting.
Wang also announced that Trinidad and Tobago
would receive a 40M yuan grant (approximately
TT$40M) from the Chinese government to fund
projects mutually agreed upon by both
governments.
Do you realize that Harry Reid and his son Rory
have a deal for that land with Chinese Vice Premier
Wang Qishan for a 5 billion solar plant. Its allover
social media…
The U.S. Chamber of Commerce co-hosted a
dinner last night at the JW Marriot in Washington to
honor Chinese Vice Premier Wang Qishan, who is
in the United States to co-chair the 23rd China -
U.S. Joint Commission on Commerce and Trade
17
Traditional Machine Learning
17
Human
18
Human-in-the-loop (active learning)
18
Human
19
Machine-in-the-Loop (active-ranking)
19
Collaborative
MediaWiki KB
Authoring Tools
Novel and
Relevant
Results
Information
gathering as
training
Profile as
query
21
Large Collections
of Unstructured
Documents
StreamCorpus
Pipeline
(FOSS)
rejester
3rd Party
NER,
e.g. Serif,
Basis
dossier.store
dossier.label
kvlayer
TreeLab
Feature
Collection
PostgreSQL
Dossier Stack (FOSS)TreeLab
Active
Learning
FOSS development:
github.com/streamcorpus
github.com/dossier
Diffeo ships a stack of docker containers,
including a fully integrated MediaWiki for
turn-key deployment on a single server or
customization on a large cluster.
dossier.web
Diffeo TreeLab
Clustering & Ranking
on Bigforest
♒
♒
♒
♒
♒
♒
♒
♒ ♒
♒
♒
♒
Diffeo: Built for Clouds
22
Large Collections
of Unstructured
Documents
StreamCorpus
Pipeline
(FOSS)
rejester
3rd Party
NER,
e.g. Serif,
Basis
dossier.store
dossier.label
kvlayer
TreeLab
Feature
Collection
PostgreSQL
Dossier Stack (FOSS)TreeLab
Active
Learning
dossier.web
Diffeo TreeLab
Clustering & Ranking
on Bigforest
♒
♒
♒
♒
♒
♒
♒
♒ ♒
♒
♒
♒
Scales with
Your Database
Hierarchical Clustering
and
Active Ranking
Natural Language
Processing Pipelines
and Feature Vectors
Data
Center
Scaling
User Experience for
Machine-Assisted
Entity Research
23
"#both_bol_JJ_0": {
"24 minute": 1,
"american": 5,
"linux": 2,
"pro government": 1,
"#both_co_LOC_3": {
"arab": 2,
"east": 4,
"iran": 5,
"kingdom": 2,
"lebanon": 2,
"london": 2
"#both_co_MAGIC_VALUE_0": {
"pagewanted all r 0accessdate22": 1,
"q cache 9qiv27 ymm8j": 1
},
"#both_co_NATIONALITY_2": {
"american": 6,
"indians": 2,
"iranian": 5,
"saudi": 2,
"syrian": 93
"#entity_type": {
"ORG": 6
},
"#post_bow_RB_2": {
"allegedly": 4,
"falsely": 1,
"openly": 2,
Natural Language
Large Collections
of Unstructured
Documents
StreamCorpus
Pipeline
(FOSS)
3rd Party
NER,
e.g. Serif,
Basis
TreeLab
Feature
Collection
The Syrian Electronic Army (SEA) is a
group of computer hackers who support
the government of Syrian President
Bashar al-Assad. Using spamming,[2]
defacement, malware (including the
Blackworm tool),[3] phishing, and denial
of service attacks, it mainly targets
political opposition groups and western
websites including news organizations
and human rights groups. The Syrian
Electronic Army is the first public, virtual
army in the Arab world to openly launch
cyber attacks on its opponents.[4]
Feature Vectors
24
Large Collections
of Unstructured
Documents
StreamCorpus
Pipeline
(FOSS)
3rd Party
NER,
e.g. Serif,
Basis
TreeLab
Feature
Collection
Feature Vectors
"#post_bow_VB_3": {
"quote": 6,
"infiltrate": 4,
"disclose": 2,
"#post_co_DATE_2": {
"2011": 2,
"2011 3 15": 1,
"2012": 4,
"2013": 26,
"2014": 12,
"2015": 4,
"april": 11,
"august": 6,
"february": 17,
"january": 10,
"july": 29,
"june": 8,
"november": 21,
"october": 8,
"september": 10
},
"post_co_PER_1": {
"37246009": -1,
"131492056": 2,
"437879322": 1,
"664353967": -2,
"680387639": -1,
"681138746": 1,
"740105633": -1,
"792534747": 2,
"973559166": 3,
"1146544765": 6,
"1273592341": 2,
"#post_co_PER_1": {
"al assad": 2,
"barack": 2,
"bashar": 3,
"buscemi": 1,
"crook": 1,
"foster": 1,
"fowler": 1,
"jordan": 1,
"monde": 1,
"obama": 6,
"sabari": 2,
25
Distantly supervised models
 not perfect
26
Distantly supervised models
 not perfect Can flatten into a set of sets for
Cross Doc Coref Resolution (CDCR)
with set-based metrics, e.g.,
• Bcubed F
• CEAF
27
Distantly supervised models
 not perfect Can flatten into a set of sets for
Cross Doc Coref Resolution (CDCR)
with set-based metrics, e.g.,
• Bcubed F
• CEAF
28
Distantly supervised models
 not perfect Can flatten into a set of sets for
Cross Doc Coref Resolution (CDCR)
with set-based metrics, e.g.,
• Bcubed F
• CEAF
29
Distantly supervised models
 not perfect Can flatten into a set of sets for
Cross Doc Coref Resolution (CDCR)
with set-based metrics, e.g.,
• Bcubed F
• CEAF
30
Distantly supervised models
 not perfect Can flatten into a set of sets for
Cross Doc Coref Resolution (CDCR)
with set-based metrics, e.g.,
• Bcubed F
• CEAF
31
Distantly supervised models
 not perfect
Traversing from a leaf starts with higher precision,
and progresses to higher recall.
32
Ranking models
 just different
Traversing from a leaf starts with higher precision,
and progresses to higher recall.
33
Ranking models
 just different
Traversing from a leaf starts with higher precision,
and progresses to higher recall.
Crossing the set
boundary can find the
most surprising and
useful information.
34
Ranking models
 beyond coref
Ranking can recover
from coref errors and
expose knowledge
beyond coref.
35
Ranking models
 beyond coref
Ranking can recover
from coref errors and
expose knowledge
beyond coref.
36
Measurement Methodologies
Unexpected Expected Reciprocal Rank u-ERR
1. Submit profile as query to search engine
2. Receive items
3. Start reading through the list:
4. User examines position n
5. If user finds new knowledge:
6. Update profile
7. Go to 1 with updated profile as query
8. else
9. n += 1
10. Go to 4
u-ERR = 1 / (expected list position of surprise)
Figure of merit: depth in the list
where user discovers
new knowledge
37
Retrieval results
suggest updates
to the profile
Entity profiles
act as queries for
retrieval algorithms
User interactions
steer the system
beyond coref to
find surprises
Truth data is
expensive
Observe user actions
Learn Actively
38
Thank you!
38
diffeo.com trec-kba.org

Mais conteúdo relacionado

Semelhante a Machine-In-The-Loop for Knowledge Discovery

Web of Data and its Status on Persian Web Data Space
Web of Data and its Status on Persian Web Data SpaceWeb of Data and its Status on Persian Web Data Space
Web of Data and its Status on Persian Web Data SpaceAli Khalili
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009Peter Mika
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library DataRichard Wallis
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
 
The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our OysterRichard Wallis
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim
 
Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014ALTER WAY
 
Design for Findability: metadata, metrics and collaboration on LOC.gov
Design for Findability: metadata, metrics and collaboration on LOC.govDesign for Findability: metadata, metrics and collaboration on LOC.gov
Design for Findability: metadata, metrics and collaboration on LOC.govUXPA International
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your datasetTuri, Inc.
 
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...Aditya Jami
 
OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11
OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11
OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11Stan Curtis
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Aadil Hussaini
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationFrank van Harmelen
 
Design for Findability at the Library of Congress
Design for Findability at the Library of CongressDesign for Findability at the Library of Congress
Design for Findability at the Library of CongressJill MacNeice
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkGeorgi Kobilarov
 

Semelhante a Machine-In-The-Loop for Knowledge Discovery (20)

Web of Data and its Status on Persian Web Data Space
Web of Data and its Status on Persian Web Data SpaceWeb of Data and its Status on Persian Web Data Space
Web of Data and its Status on Persian Web Data Space
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
 
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
 
The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our Oyster
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014
 
Design for Findability: metadata, metrics and collaboration on LOC.gov
Design for Findability: metadata, metrics and collaboration on LOC.govDesign for Findability: metadata, metrics and collaboration on LOC.gov
Design for Findability: metadata, metrics and collaboration on LOC.gov
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
Farirhair.ai: AI platform to mine competitive intelligence from billions of u...
 
OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11
OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11
OpenSource SmartGrid: Teeters tail-of-possibilities 8dec11
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
Design for Findability at the Library of Congress
Design for Findability at the Library of CongressDesign for Findability at the Library of Congress
Design for Findability at the Library of Congress
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
 

Mais de odsc

Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysisodsc
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Upodsc
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hiveodsc
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depthodsc
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Informationodsc
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet odsc
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLodsc
 
Beyond Names
Beyond NamesBeyond Names
Beyond Namesodsc
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500odsc
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Dataodsc
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Scienceodsc
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions odsc
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learnodsc
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Toolsodsc
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypseodsc
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science odsc
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Researchodsc
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Agile Data
Agile DataAgile Data
Agile Dataodsc
 
Using your powers for good: Data science in the social sector
Using your powers for good: Data science in the social sectorUsing your powers for good: Data science in the social sector
Using your powers for good: Data science in the social sectorodsc
 

Mais de odsc (20)

Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Science
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Agile Data
Agile DataAgile Data
Agile Data
 
Using your powers for good: Data science in the social sector
Using your powers for good: Data science in the social sectorUsing your powers for good: Data science in the social sector
Using your powers for good: Data science in the social sector
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Machine-In-The-Loop for Knowledge Discovery

Notas do Editor

  1. Diffeo connects documents to the entities mentioned in the text, so you can discover everything written about your entities of interest.
  2. How many of you have setup a google alert? How many of you have setup a google alert on yourself?
  3. This shows the lag time between publication and eventual citation of news articles linked from pages in Category:Living_people. The median lag time for articles published after 2001 is more than 1.5 years. You can see the lag time leveling out to a long heavy tail for earlier articles, and then only gradually accelerating for a small handful of news articles that get incorporated rapidly – out of ~50,000 articles only ~300 had a lag time of less than a day.
  4. Automatically recommends content. Collaborate with the machine and other users on building a profile. As the user builds a profile the system becomes aware of both the information the user knows about, which is contained in the wiki and the information *not* contained in the wiki which can be used for learning. Hard / soft selectors. Hard selectors are phrases that are synonymous with the entity, name, phone number, IP. Soft –selectors are signatures of the entity such as commonly used long phrases or other co-occurring entities.