SlideShare uma empresa Scribd logo
1 de 13
“Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
EXTRA and EXTRA+
Stuart Myles * Associated Press * 6th November 2017
© 2017 IPTC (www.iptc.org) All rights reserved
https://flic.kr/p/kAXGfC
Rules-Based Classification
• Rules better for breaking news than statistical methods
– You don’t need 50 examples before you can start tagging
– A rule for a new topic doesn’t require other rules to change
• More consistent and scalable than hand tagging
• Easier to explain why rules classify content
– Machine learning methods are still “black boxes”
– Easier to precisely explain - and correct - mistakes
• You can use your own taxonomy, rules and formats
- Example rules help us drive development of the EXTRA system
- You can use the example rules to see how to develop your own
- Rules could apply IPTC Media Topics or any other taxonomy
© 2017 IPTC (www.iptc.org) All rights reserved 3
EXTRA
EXTraction Rules Apparatus
Rules-based classification of text
Open source software
EXTRA was being developed by the IPTC
€50,000 Grant from the Digital News Initiative
https://www.digitalnewsinitiative.com/fund/
https://iptc.github.io/extra/
© 2017 IPTC (www.iptc.org) All rights reserved 4
Development Process
The EXTRA software is being developed by Infalia
- All software is open source
Two linguists creating rules in English and German
- Samples rules to apply IPTC Media Topics
Example news corpora licensed for EXTRA
- English from Thomson Reuters
- German from APA
© 2017 IPTC (www.iptc.org) All rights reserved 5
EXTRA Components
Elasticsearch
Percolator
+ Custom
Code
Classification
Rule
authoring
Corpus
Testing
Schema
Management
© 2017 IPTC (www.iptc.org) All rights reserved 6
Classification using Percolator
• Elasticsearch
– A sophisticated, open source full-text search engine
– Lets you query documents stored in an index
• Elasticsearch Percolator
– Store queries in an index and match documents to queries
– Classification uses the percolator to match documents to rules
• EXTRA Rule Language
– Rule-writer-friendly language (easier than ES DSL)
– Access to all ES features, plus custom operators
© 2017 IPTC (www.iptc.org) All rights reserved 7
Schema and Rules
• EXTRA Schema
– Documents must be in (or converted to) a JSON format
– But it can be any JSON format you choose
– Allows validating that your rules reference valid fields
• Granular, field-by-field control of analyzers
– Such as whether and how to stem, e.g. by language
– Different ways to tokenize fields, e.g. for slug
– Allow a field to be queried as a whole or tokenized by sentence
or paragraph
– Allows validating that operators are valid by field type
• E.g. to flag that your rule references paragraphs in a field that has
none
© 2017 IPTC (www.iptc.org) All rights reserved 8
Schema and Rules Example
• Two fields - headline and body- with body allowed to be
queried by paragraph
headline
body
body_paragraph
• A rule to require that “angela merkel” and “us elections”
appear in the same paragraph
(prox/unit=paragraph/distance=1
(body adj "angela merkel")
(body adj "us elections")
)
© 2017 IPTC (www.iptc.org) All rights reserved 9
EXTRA Source Code
• The core classification engine
– cql parsers, cql to es mapper, rule schema dict classes,
dao classes, etc
https://github.com/iptc/extra-core
• EXTRA “extra” code
– API, UI, docker files for deployment
https://github.com/iptc/extra-ext
• Open source
– MIT license for EXTRA-specific code
– Apache license for Elasticsearch
© 2017 IPTC (www.iptc.org) All rights reserved 10
EXTRA Timetable
• EXTRA was completed in Summer 2017
• You can access the source code now
– Feedback welcome
• We have applied for a second round of funding: EXTRA+
• Join the (low frequency) email list to stay up-to-date
https://groups.yahoo.com/neo/groups/iptc-extra/info
© 2017 IPTC (www.iptc.org) All rights reserved 11
EXTRA+
Enriching Rule-based Classification of News
with Powerful Semantics
• “aboutness” evaluation
– Given that a story is about a topic, how much is it about it?
• Rule suggestion
– Suggest rules based on a pre-tagged corpus
• Enriched rule operators
– For example, nested “count” operators
© 2017 IPTC (www.iptc.org) All rights reserved 12
Date and Place of Next Meeting
Athens 23rd – 25th April 2018
https://flic.kr/p/atFSAr
ευχαριστώ και αντίο!!
© 2017 IPTC (www.iptc.org) All rights reserved 13

Mais conteúdo relacionado

Mais procurados

ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recallICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recallDr. Haxel Consult
 
Guidelines For PhD Research Projects
Guidelines For PhD Research ProjectsGuidelines For PhD Research Projects
Guidelines For PhD Research ProjectsPhD Services
 
IPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working GroupIPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working GroupStuart Myles
 
A Framework for Multi-source Studies based on Unstructured Data.
A Framework for Multi-source Studies based on Unstructured Data.A Framework for Multi-source Studies based on Unstructured Data.
A Framework for Multi-source Studies based on Unstructured Data.Sebastiano Panichella
 
Using peer-to-peer technologies to record the exchange of RO packages.
Using peer-to-peer technologies to record the exchange of RO packages. 	Using peer-to-peer technologies to record the exchange of RO packages.
Using peer-to-peer technologies to record the exchange of RO packages. Ayham Madi
 
The origin and evaluation criteria of aes
The origin and evaluation criteria of aesThe origin and evaluation criteria of aes
The origin and evaluation criteria of aesMDKAWSARAHMEDSAGAR
 

Mais procurados (8)

Db lec 08_new
Db lec 08_newDb lec 08_new
Db lec 08_new
 
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recallICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
 
Guidelines For PhD Research Projects
Guidelines For PhD Research ProjectsGuidelines For PhD Research Projects
Guidelines For PhD Research Projects
 
IPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working GroupIPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working Group
 
Theory of LaTeX
Theory of LaTeXTheory of LaTeX
Theory of LaTeX
 
A Framework for Multi-source Studies based on Unstructured Data.
A Framework for Multi-source Studies based on Unstructured Data.A Framework for Multi-source Studies based on Unstructured Data.
A Framework for Multi-source Studies based on Unstructured Data.
 
Using peer-to-peer technologies to record the exchange of RO packages.
Using peer-to-peer technologies to record the exchange of RO packages. 	Using peer-to-peer technologies to record the exchange of RO packages.
Using peer-to-peer technologies to record the exchange of RO packages.
 
The origin and evaluation criteria of aes
The origin and evaluation criteria of aesThe origin and evaluation criteria of aes
The origin and evaluation criteria of aes
 

Semelhante a IPTC EXTRA and EXTRA+ November 2017

IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018Stuart Myles
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For NewsStuart Myles
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Andrew Clark
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNICAPNIC
 
Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...Nina Jeliazkova
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersSematext Group, Inc.
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSONStuart Myles
 
Improving your team’s source code searching capabilities
Improving your team’s source code searching capabilitiesImproving your team’s source code searching capabilities
Improving your team’s source code searching capabilitiesNikos Katirtzis
 
Improving your team's source code searching capabilities - Voxxed Thessalonik...
Improving your team's source code searching capabilities - Voxxed Thessalonik...Improving your team's source code searching capabilities - Voxxed Thessalonik...
Improving your team's source code searching capabilities - Voxxed Thessalonik...Nikos Katirtzis
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)petrknoth
 
Introduction To Python
Introduction To PythonIntroduction To Python
Introduction To PythonVanessa Rene
 
CJUS 703Biblical Worldview of Corrections Assignment Instruction
CJUS 703Biblical Worldview of Corrections Assignment InstructionCJUS 703Biblical Worldview of Corrections Assignment Instruction
CJUS 703Biblical Worldview of Corrections Assignment InstructionVinaOconner450
 
Eprints digital library software.final
 Eprints digital library software.final Eprints digital library software.final
Eprints digital library software.finalNORLYN WAKAT
 
University of Applied Science Esslingen @ Scilab Conference 2018
University of Applied Science Esslingen @ Scilab Conference 2018University of Applied Science Esslingen @ Scilab Conference 2018
University of Applied Science Esslingen @ Scilab Conference 2018Scilab
 

Semelhante a IPTC EXTRA and EXTRA+ November 2017 (20)

IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
 
File000162
File000162File000162
File000162
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNIC
 
Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
 
Improving your team’s source code searching capabilities
Improving your team’s source code searching capabilitiesImproving your team’s source code searching capabilities
Improving your team’s source code searching capabilities
 
Improving your team's source code searching capabilities - Voxxed Thessalonik...
Improving your team's source code searching capabilities - Voxxed Thessalonik...Improving your team's source code searching capabilities - Voxxed Thessalonik...
Improving your team's source code searching capabilities - Voxxed Thessalonik...
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
 
IC-SDV 2018: IEEE
IC-SDV 2018: IEEEIC-SDV 2018: IEEE
IC-SDV 2018: IEEE
 
Introduction To Python
Introduction To PythonIntroduction To Python
Introduction To Python
 
CJUS 703Biblical Worldview of Corrections Assignment Instruction
CJUS 703Biblical Worldview of Corrections Assignment InstructionCJUS 703Biblical Worldview of Corrections Assignment Instruction
CJUS 703Biblical Worldview of Corrections Assignment Instruction
 
Eprints digital library software.final
 Eprints digital library software.final Eprints digital library software.final
Eprints digital library software.final
 
University of Applied Science Esslingen @ Scilab Conference 2018
University of Applied Science Esslingen @ Scilab Conference 2018University of Applied Science Esslingen @ Scilab Conference 2018
University of Applied Science Esslingen @ Scilab Conference 2018
 
TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.
 

Mais de Stuart Myles

IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasStuart Myles
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019Stuart Myles
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceStuart Myles
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?Stuart Myles
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated PressStuart Myles
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018Stuart Myles
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeStuart Myles
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?Stuart Myles
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...Stuart Myles
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesStuart Myles
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018Stuart Myles
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesStuart Myles
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...Stuart Myles
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorStuart Myles
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017Stuart Myles
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Stuart Myles
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working GroupStuart Myles
 
Rights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressRights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressStuart Myles
 
IPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 MeetingIPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 MeetingStuart Myles
 
IPTC Rights October 2016
IPTC Rights October 2016IPTC Rights October 2016
IPTC Rights October 2016Stuart Myles
 

Mais de Stuart Myles (20)

IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working Group
 
Rights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressRights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated Press
 
IPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 MeetingIPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 Meeting
 
IPTC Rights October 2016
IPTC Rights October 2016IPTC Rights October 2016
IPTC Rights October 2016
 

Último

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Último (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

IPTC EXTRA and EXTRA+ November 2017

  • 1. “Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
  • 2. EXTRA and EXTRA+ Stuart Myles * Associated Press * 6th November 2017 © 2017 IPTC (www.iptc.org) All rights reserved https://flic.kr/p/kAXGfC
  • 3. Rules-Based Classification • Rules better for breaking news than statistical methods – You don’t need 50 examples before you can start tagging – A rule for a new topic doesn’t require other rules to change • More consistent and scalable than hand tagging • Easier to explain why rules classify content – Machine learning methods are still “black boxes” – Easier to precisely explain - and correct - mistakes • You can use your own taxonomy, rules and formats - Example rules help us drive development of the EXTRA system - You can use the example rules to see how to develop your own - Rules could apply IPTC Media Topics or any other taxonomy © 2017 IPTC (www.iptc.org) All rights reserved 3
  • 4. EXTRA EXTraction Rules Apparatus Rules-based classification of text Open source software EXTRA was being developed by the IPTC €50,000 Grant from the Digital News Initiative https://www.digitalnewsinitiative.com/fund/ https://iptc.github.io/extra/ © 2017 IPTC (www.iptc.org) All rights reserved 4
  • 5. Development Process The EXTRA software is being developed by Infalia - All software is open source Two linguists creating rules in English and German - Samples rules to apply IPTC Media Topics Example news corpora licensed for EXTRA - English from Thomson Reuters - German from APA © 2017 IPTC (www.iptc.org) All rights reserved 5
  • 7. Classification using Percolator • Elasticsearch – A sophisticated, open source full-text search engine – Lets you query documents stored in an index • Elasticsearch Percolator – Store queries in an index and match documents to queries – Classification uses the percolator to match documents to rules • EXTRA Rule Language – Rule-writer-friendly language (easier than ES DSL) – Access to all ES features, plus custom operators © 2017 IPTC (www.iptc.org) All rights reserved 7
  • 8. Schema and Rules • EXTRA Schema – Documents must be in (or converted to) a JSON format – But it can be any JSON format you choose – Allows validating that your rules reference valid fields • Granular, field-by-field control of analyzers – Such as whether and how to stem, e.g. by language – Different ways to tokenize fields, e.g. for slug – Allow a field to be queried as a whole or tokenized by sentence or paragraph – Allows validating that operators are valid by field type • E.g. to flag that your rule references paragraphs in a field that has none © 2017 IPTC (www.iptc.org) All rights reserved 8
  • 9. Schema and Rules Example • Two fields - headline and body- with body allowed to be queried by paragraph headline body body_paragraph • A rule to require that “angela merkel” and “us elections” appear in the same paragraph (prox/unit=paragraph/distance=1 (body adj "angela merkel") (body adj "us elections") ) © 2017 IPTC (www.iptc.org) All rights reserved 9
  • 10. EXTRA Source Code • The core classification engine – cql parsers, cql to es mapper, rule schema dict classes, dao classes, etc https://github.com/iptc/extra-core • EXTRA “extra” code – API, UI, docker files for deployment https://github.com/iptc/extra-ext • Open source – MIT license for EXTRA-specific code – Apache license for Elasticsearch © 2017 IPTC (www.iptc.org) All rights reserved 10
  • 11. EXTRA Timetable • EXTRA was completed in Summer 2017 • You can access the source code now – Feedback welcome • We have applied for a second round of funding: EXTRA+ • Join the (low frequency) email list to stay up-to-date https://groups.yahoo.com/neo/groups/iptc-extra/info © 2017 IPTC (www.iptc.org) All rights reserved 11
  • 12. EXTRA+ Enriching Rule-based Classification of News with Powerful Semantics • “aboutness” evaluation – Given that a story is about a topic, how much is it about it? • Rule suggestion – Suggest rules based on a pre-tagged corpus • Enriched rule operators – For example, nested “count” operators © 2017 IPTC (www.iptc.org) All rights reserved 12
  • 13. Date and Place of Next Meeting Athens 23rd – 25th April 2018 https://flic.kr/p/atFSAr ευχαριστώ και αντίο!! © 2017 IPTC (www.iptc.org) All rights reserved 13