SlideShare uma empresa Scribd logo
1 de 17
Mark A Greenwood, Jonathon Hare,
David R Newman, Wim Peters
SemanticMedia@TheBritishLibrary
Monday 23rd September 2013
The Project Vision
• Semantic News is 6 month project:
• June to November 2013
• Two 50% FTEs (1 Southampton, 1 Sheffield)
• An interactive `second screen’ to provide
contextual information on Question Time
questions
• Use multiple data sources
• Perform named entity recognition
• Exploit Linked Open Datasets
• Towards an almost real-time system
Where is the Data? (1)
• Question Time in
2010
• 34 episodes, 163
questions
• BBC Subtitles
• XML encoded
• Broadcast as the
subtitles stream
Where is the Data? (2)
• BBC Programmes Data
• XML encoded
• Information about the
programme,
(panellists, topics,
broadcast dates, etc.)
• Tweets
• Taken from the Twitter
‘Garden Hose’ (10%
stream)
Pre-parsing Subtitles Data
• Raw XML subtitles
• Remove duplicate words
• Parse into CSV
• time offset
• sentence
• Break into questions
• BBC Programmes data provides question time
offsets
• Compare with subtitles time offsets and split
Pre-parsing Twitter Data
• Twitter ‘Garden Hose’ for 2010 Dataset
• Used Apache Hadoop and filtered on:
• @bbcqt, @bbcquestiontime
• #bbcqt, #bbcquestiontime, #questiontime
• “Question Time” “David Dimbleby”
• Collated JSON results and imported into
OpenRefine
• Removed irrelevant fields
• Filtered out tweets that did not contain “bbc”
• Exported as CSV
Information Extraction with GATE
● General Architecture for Text Engineering (GATE)
● Developed by University of Sheffield since 2000
● Used by many researchers, scientists and
organisations all over the world
● Includes various components for language processing
● Parsers, machine learning tools, stemmers, IR tools, IE
components for various languages...
● Also performs visualising and manipulating of text,
annotations, ontologies, parse trees, etc., and tools
for evaluation
Linguistic pre-processing
● Techniques
● Tokenization
● Sentence Splitting
● Language Identification
● POS tagging
● Morphological analysis
● Adapted for use with social media like Twitter
Named Entity Recognition
● Approaches
● Gazetteer lookup
● JAPE grammars
● Co-reference
● Types
● Location: countries, regions, cities etc.
● Organisation: names of companies, government
organisations, committees, agencies, universities, etc.
● Person: names of people
● Date: absolute dates like ‘October 2012’ or ‘2007’, as well
as relative dates, such as ‘last year’.
● Measurements: e.g. “8,596 km”, “one fifth”, percentages
and probabilities
Enrichment: LODIE
● Under constant development in various projects
● Associates the most probable LOD URI with
named entities
● Disambiguation against DBPedia
● Various techniques to enhance recall
Enrichment: LODIE
“Ken Clarke: The Labour plotters hide behind the
knife and stab with the cloak! Brilliant!!”
“Hain just lost Labour votes by supporting the
•£25k benefits of an extremist.”
Representing Extracted Information
Conceptualising a Question
http://www.youtube.com/watch?v=O3l9Mi-KylI
Show Me The Data!
• Use (Linked) Open Data Datasets
• Crime Data
• Election Data (constituencies, majorities, etc.)
• MP voting records
• School league tables
• NHS performance league tables
• Economic Figures (GDP, Inflation, Unemployment)
• Compare and contrast
Let’s have some
questions from
our audience.

Mais conteúdo relacionado

Semelhante a Semanticnews 230913-final

Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
Vladimir Vassilev
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
guest59ccff
 

Semelhante a Semanticnews 230913-final (20)

Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
6_Big Data Sources part3-Day 3_A_text_mining.pptx
6_Big Data Sources part3-Day 3_A_text_mining.pptx6_Big Data Sources part3-Day 3_A_text_mining.pptx
6_Big Data Sources part3-Day 3_A_text_mining.pptx
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko Grobelnik
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and future
 
EricRochesterResume
EricRochesterResumeEricRochesterResume
EricRochesterResume
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
co:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlbergerco:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlberger
 
BiographyNet: Linking the world of History
BiographyNet: Linking the world of HistoryBiographyNet: Linking the world of History
BiographyNet: Linking the world of History
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
The Coming Explosion of Records at FamilySearch - Presentation
The Coming Explosion of Records at FamilySearch - PresentationThe Coming Explosion of Records at FamilySearch - Presentation
The Coming Explosion of Records at FamilySearch - Presentation
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
 
R programming language - Mustafa Wahedi
R programming language - Mustafa WahediR programming language - Mustafa Wahedi
R programming language - Mustafa Wahedi
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
 
Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Semanticnews 230913-final

  • 1. Mark A Greenwood, Jonathon Hare, David R Newman, Wim Peters SemanticMedia@TheBritishLibrary Monday 23rd September 2013
  • 2.
  • 3.
  • 4. The Project Vision • Semantic News is 6 month project: • June to November 2013 • Two 50% FTEs (1 Southampton, 1 Sheffield) • An interactive `second screen’ to provide contextual information on Question Time questions • Use multiple data sources • Perform named entity recognition • Exploit Linked Open Datasets • Towards an almost real-time system
  • 5. Where is the Data? (1) • Question Time in 2010 • 34 episodes, 163 questions • BBC Subtitles • XML encoded • Broadcast as the subtitles stream
  • 6. Where is the Data? (2) • BBC Programmes Data • XML encoded • Information about the programme, (panellists, topics, broadcast dates, etc.) • Tweets • Taken from the Twitter ‘Garden Hose’ (10% stream)
  • 7. Pre-parsing Subtitles Data • Raw XML subtitles • Remove duplicate words • Parse into CSV • time offset • sentence • Break into questions • BBC Programmes data provides question time offsets • Compare with subtitles time offsets and split
  • 8. Pre-parsing Twitter Data • Twitter ‘Garden Hose’ for 2010 Dataset • Used Apache Hadoop and filtered on: • @bbcqt, @bbcquestiontime • #bbcqt, #bbcquestiontime, #questiontime • “Question Time” “David Dimbleby” • Collated JSON results and imported into OpenRefine • Removed irrelevant fields • Filtered out tweets that did not contain “bbc” • Exported as CSV
  • 9. Information Extraction with GATE ● General Architecture for Text Engineering (GATE) ● Developed by University of Sheffield since 2000 ● Used by many researchers, scientists and organisations all over the world ● Includes various components for language processing ● Parsers, machine learning tools, stemmers, IR tools, IE components for various languages... ● Also performs visualising and manipulating of text, annotations, ontologies, parse trees, etc., and tools for evaluation
  • 10. Linguistic pre-processing ● Techniques ● Tokenization ● Sentence Splitting ● Language Identification ● POS tagging ● Morphological analysis ● Adapted for use with social media like Twitter
  • 11. Named Entity Recognition ● Approaches ● Gazetteer lookup ● JAPE grammars ● Co-reference ● Types ● Location: countries, regions, cities etc. ● Organisation: names of companies, government organisations, committees, agencies, universities, etc. ● Person: names of people ● Date: absolute dates like ‘October 2012’ or ‘2007’, as well as relative dates, such as ‘last year’. ● Measurements: e.g. “8,596 km”, “one fifth”, percentages and probabilities
  • 12. Enrichment: LODIE ● Under constant development in various projects ● Associates the most probable LOD URI with named entities ● Disambiguation against DBPedia ● Various techniques to enhance recall
  • 13. Enrichment: LODIE “Ken Clarke: The Labour plotters hide behind the knife and stab with the cloak! Brilliant!!” “Hain just lost Labour votes by supporting the •£25k benefits of an extremist.”
  • 16. Show Me The Data! • Use (Linked) Open Data Datasets • Crime Data • Election Data (constituencies, majorities, etc.) • MP voting records • School league tables • NHS performance league tables • Economic Figures (GDP, Inflation, Unemployment) • Compare and contrast
  • 17. Let’s have some questions from our audience.

Notas do Editor

  1. Retro Television(Intermediate) To reproduce the effects on this slide, do the following:On the Home tab, in the Slides group, click Layout, and then click Blank.On the Insert tab, in the Images group, click Picture.In the left pane of the Insert Picture dialog box, click the drive or library that contains the picture of the TV. In the right pane of the dialog box, click the picture that you want and then click Insert.Select the image and under Picture Tools, in the Format tab, in the Picture Styles group, click Picture Effects, click Shadow, and under Outer, select Offset Top (third row, second option from left).On the Home tab, in the Drawing group, click Arrange, point to Align then do the following:Click Align Middle.Click Align Center. To reproduce the video effects on this slide, do the following:On the Insert tab, in the Media group, click Video, and then click Video from File. In the left pane of the Insert Video dialog box, click the drive or library that contains the video. In the right pane of the dialog box, click the video that you want and then click Insert.On the Animations tab, in the Animation group, select Play. Also on the Animations tab, in the Timing group, click the arrow to the right of Start and select With Previous.With the video selected, under Video Tools, in the Format tab, in the bottom right corner of the Video Styles group, click the arrow opening the Format Video dialog box. In the Format Video dialog box, click Size in the left pane, and under Size and Rotate in the right pane, set the Height to 3.58” and the Width to 4.75”.In the Format Video dialog box, click Position in the left pane, and under Position on Slide in the right pane, set Horizontal to 2.6” and the Vertical to 1.36”.Close the Format Video dialog box.Select the video, and under Video Tools, on the Format tab, in the Adjust group, click Color, under Recolor, select Grayscale (first row, second option from left).Select the video, and under Video Tools, on the Format tab, in the Arrange group, click Send Backward, and then select Send to Back. To reproduce the background effects on this slide, do the following:On the Insert tab, In the Illustrations group, click Shapes, then under Rectangles, select Rectangle (first row, first option from left),Drag to draw rectangle on slide.Under Drawing Tools, on the Format tab, in the ShapeStyles group, click the arrow at the bottom right corner to launch the Format Picture dialog box.In the Format Picture dialog box, select Fill in the left pane, under Fill on the right pane, select Picture or Texture Fill, then click the arrow to the right of Texture and select Oak (fifth row, third option from left).Also in the Format Picture dialog box, select Line Color in the left pane, under Line Color on the right pane select No line.Also in the Format Picture dialog box, select Size in the left pane, under Size and rotate on the right pane do the following:In the Height box, enter 1.58”.In the Width box, enter 7.5”.In the Rotation box, enter 90 degrees.On the Home tab, in the Drawing group, click Arrange, point to Align and then do the following:Click Align Top.Click Align Left.Also on the Home tab, in the Clipboard group, click Copy, and then select Duplicate. Repeat this process five more times, for a total of seven rectangles.Select one of the newly colored rectangles, under Picture Tools, on the Format tab, in the Arrange group, click Align and then do the following:Click Align Top.Click Align Right.Press and hold CTRL, select all rectangles. Also under Picture Tools, on the Format tab, in the Arrange group, click Align and then do the following:Click Align Top.Click Distribute Horizontally.Select the second, third, fourth, fifth, and sixth rectangles from left, under Picture Tools, on the Format tab, in the Size group, set the Height as 1.56”.Press and hold CTRL, select all rectangles. Also under Picture Tools, on the Format tab, in the Arrange group, click Align, then select Distribute Horizontally.Press and hold CTRL, select the first, fourth, and seventh rectangles from left. Under Picture Tools, on the Format tab, in the Adjust group, click Color, then under Color Saturation, select Saturation: 66% (first row, third option from left).Press and hold CTRL, select the third and fifth rectangles from the left. Under Picture Tools, on the Format tab, in the Adjust group, click Color, then under Color Saturation, select Saturation: 200% (first row, fifth option from left). Close the Format Picture dialog.Select all of the oak panel rectangles.Also under Picture Tools, on the Format tab, in the Arrange group, click Send Backward and then click Send to Back.On the Insert tab, In the Illustrations group, click Shapes, then under Rectangles, select Rectangle (first row, first option from left).Drag to draw rectangle on slide.Under Drawing Tools, on the Format tab, in the ShapeStyles group, click the arrow at the bottom right corner to launch the Format Picture dialog box.In the Format Picture dialog box, select Fill in the left pane, under Fill on the right pane select Picture or Texture Fill, then click the arrow to the right of Texture and select Cork (fifth row, first option from left).Also in the Format Picture dialog box, select Line Color in the left pane, under Line Color on the right pane select No line.Also in the Format Picture dialog box, select Shadow in the left pane, under Presets on the right pane, under Inner select Inside Top (first row, second from left).Also in the Format Picture dialog box, select 3-D Rotation in the left pane, under Presets on the right pane, under Perspective select Perspective Relaxed (second row, third from left).Also in the Format Picture dialog box, select Size in the left pane, under Size and rotate on the right pane do the following:In the Height box, enter 3.58”.In the Width box, enter 11”.Also in the Format Picture dialog box, select Position in the left pane, and under Position on Slide on the right pane, do the following:In the Horizontal box, enter -0.5”.In the Vertical box, enter 4.47”.Close the Format Picture dialog box.With the same rectangle selected, under Drawing Tools, on the Format tab, in the Arrange group, click Send Backward, select Send Backward.