SlideShare uma empresa Scribd logo
1 de 34
Legislative document content extraction based
on Semantic Web technologies
A use case about processing the History of the Law at Chile
Francisco Cifuentes Silva
Library of Congress, Chile
PhD Student
WESO research group
Jose Emilio Labra Gayo
WESO research group
University of Oviedo, Spain
Chilean Library of Congress
In Spanish: BCN (Biblioteca del Congreso Nacional de Chile)
Political
powers
ExecutiveJudiciaryLegislative
Independent body inside the Legislative power
Advices the parliament and gives services to citizens
http://www.bcn.cl
2 projects at library of congress (BCN)
History of the Law
Parliamentary work
History of the Law (LeyChile)
Collect all documents generated during a law legislative process
Phases:
An initiative sees life as a draft bill
Subject to debates
Validity time (it is published)
Modifications, additions,...
Derogation
Goal:
Capture the spirit of the law
Traceability
https://www.bcn.cl/historiadelaley
Parliamentary work
Collect all legislative activity by each Member of Parliament
Retrieve all interventions made
Parliamentary motion
Session journal
Commission report
Ordered and categorised
https://www.bcn.cl/laborparlamentaria/
Both projects adopted semantic technologies
Some initial reasons:
Semantic technologies considered one pillar of strategic plan (in 2014)
Innovative action to generate new products
Improve interoperability mechanisms
Sem. Web aligned well with open & public data
Which semantic technologies?
Text mining and content enrichment
Entity extraction
Topic identification
Automatic markup
Classification
Machine readable info
XML & URIs
RDF
Ontologies
Linked Open Data
Workflow pipelines
3 main steps
Automatic XML Marker
RDF & Linked data generation
Content delivery
Linked Open
Data
Query DB
Workflow overview
National library
Legislative documents
• Paper (requires OCR)
• Text documents
Automatic
XML
marker
SVN repository
Akoma-Ntoso
XML editor &
tools
Publishing
(RDF extraction
From Akoma-Ntoso)
Services
layer
Content
portals
Automatic XML marker
Source: Text Target: XML following Akoma-Ntoso
Automatic XML marker
Text
Entity Type
MediatorLegal Knowledge
Base
Entity Type URI Structural
marker
Internal XML
representation
Converter
XML
AKN
Text
Text
Named Entity
Recognizer 4 phases
1. Named Entity Recognizer
Detection of entities & types of entities
Web service implementing the Stanford NER with a CRF classifier
Evaluation in production: detects 97% entities
Type Some examples # of entities
Person Salvador Allende, Sebastián Piñera 5.139
Organization Ministerio de Salud, SERNATUR 2.848
Location Valparaíso, Santiago de Chile 1.251
Document Ley 20.000, Diario de sesión nº 12 732.497
Role Senador, Diputado, Alcalde 428
Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389
Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737
Dates 27 de febrero de 2010, el próximo año, ... 20.632
Text
Entity Type
Text
Named Entity
Recognizer
2. Mediator
Entity linking and disambiguation
Text similarity algorithms
Based on Apache Lucene
In-house development
- Use of context information to narrow
list of candidates
- Custom filters and association
heuristics
- Specialized web services
Entity Type
Mediator
Legal Knowledge
Base
Entity Type URI
Text
Text
3. Structural marker
Detect structures in the text
Titles, subtitles, paragraphs, sections,...
Special structure for debates: participation
Regular expressions + custom rules
Entity Type URI
Structural
marker
Internal XML
representation
Text
4. XML converter to Akom-Ntoso
Programmatic approach
Internal XML representation similar to DOM
Each node converted to text in AKN-XML
Internal XML
representation
Converter
XML
AKN
Human edition of AKN-Documents
Quality assurance by human analysts
They review the generated XML documents
2 editors:
Ad-hoc XML editor
Commercial editor: LegisPro (Xcential)
Linked data generation
The pilot project (2011) carefully defined a stable URI model
URIs have been maintained since them
URIs = IDs in the whole system
URIs are dereferentiable
Content negotiation
Custom linked data browser
Documentation (in Spanish)
http://datos.bcn.cl/es/documentacion
AKN2RDF
RDF extraction from Akoma-Ntoso XML
● Custom-made converter (XSL discarded for perceived complexity)
● Each XML tag implemented in one Class
● Extracted data saved into multiple databases (Relational and RDF)
Linked data generation
Source: AKN XML documents
Linked data browser (WESO-DESH)
Target: RDF data
http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
SPARQL endpoint
RDF triples are published as a public SPARQL endpoint
Number of norms by municipality
Content delivery
Web portals using Open Source Technologies
CMS (Typo3)
Python/Java
Varnish
Apache Lucene
REST Web service layers which connect to RDF triplestore and DB
Data exports to PDF, Doc and XML formats
URIs of parliamentary profiles = URIs in triplestore
History of the Law portal
https://www.bcn.cl/historiadelaley
Links to
Members of
Parliament
Each article
has a link
Different
versions
of a law
History of the Law portal
https://www.bcn.cl/historiadelaley
Compare
different
versions
Parliamentary Work
https://www.bcn.cl/laborparlamentaria
Show
participation of
each Member of
Parliament
Some experimental visualizations
Relationships between laws
Historical Parliament
Parliamentary genealogy (family relationships)
Regions mentioned in laws (legislative hackathon)
Links between laws
Historical parliament
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/
Parliamentary genealogy
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/consulta.jsp
Regions mentioned by law
Result of a legislative hackathon
http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html
In 2010 there was an
Earthquake in BioBio region
Some statistics
24.368 documents (nov. 2018)
Number of RDF triples: 28 millions
According to Google analytics
Average browsing time: 2min 26s
Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2016-2017)
And some findings...
Question: why are there some valleys?
Dictatorship time
Session attendance by year
RDF triples generated by year
Some lessons learnt
RDF granularity & inference trade-off
RDF statements + inference (high running times...queries that didn't terminate)
A priori inferred triples added to triple store (high response times for large docs)
Small subset of RDF triples (structural parts of docs and metadata)
Performance problems in XML editor browsing long docs (>1000pages)
Low SPARQL endpoint usage by external apps
If we could start again, I would recommend ShEx
Personal note: These kind of data portals led to my interest in ShEx
Conclusions & future projects
Well designed URIs can act as a perfect glue for interoperability
Automatic workflow pipelines help long-term survival of LD-based projects
SPARQL endpoint since 2011
Future projects on top of existing ones
National Budget as Linked data
Diana Project: Members of Parliament linked to social network analysis
New portal: User customization & recommender systems
End of presentation
Acknowledgements:
David Vilches, Eridan Otto, Christian Sifaqui

Mais conteúdo relacionado

Mais procurados

SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsMatteoBelcao
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityIrina Bolychevsky
 
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...The-National-Archives
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceOCLC
 
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...Raf Buyle
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Jenel Farrell
 
Moving to the network level: discovery and disclosure
Moving to the network level:discovery and disclosureMoving to the network level:discovery and disclosure
Moving to the network level: discovery and disclosurelisld
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015Cason Snow
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015Cason Snow
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015Cason Snow
 

Mais procurados (20)

SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperability
 
Snac webinar v3
Snac webinar v3Snac webinar v3
Snac webinar v3
 
Graham Cousins
Graham CousinsGraham Cousins
Graham Cousins
 
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
 
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
 
Stahmer-9-Jun15-final
Stahmer-9-Jun15-finalStahmer-9-Jun15-final
Stahmer-9-Jun15-final
 
Wacker-4-june15
Wacker-4-june15Wacker-4-june15
Wacker-4-june15
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
 
Moving to the network level: discovery and disclosure
Moving to the network level:discovery and disclosureMoving to the network level:discovery and disclosure
Moving to the network level: discovery and disclosure
 
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015
 
Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...
 
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Semantic web
Semantic webSemantic web
Semantic web
 

Semelhante a Legislative Document Processing Using Semantic Web Technologies

Publishing web content tailored to audiences / Liberando contenido a la med...
Publishing  web content tailored to  audiences / Liberando contenido a la med...Publishing  web content tailored to  audiences / Liberando contenido a la med...
Publishing web content tailored to audiences / Liberando contenido a la med...congresochile
 
The ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and EvaluationThe ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and Evaluationsamossummit
 
Roles of the Chilean Library of Congress
Roles of the Chilean Library of CongressRoles of the Chilean Library of Congress
Roles of the Chilean Library of Congresscongresochile
 
E resources for law libraries
E resources for law librariesE resources for law libraries
E resources for law librariesKishor Satpathy
 
eGov2017 Blockchain Technology
eGov2017 Blockchain TechnologyeGov2017 Blockchain Technology
eGov2017 Blockchain TechnologyVestforsk.no
 
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
Building a Legal Taxonomy &  Thesaurus: The Palestinian ExperienceBuilding a Legal Taxonomy &  Thesaurus: The Palestinian Experience
Building a Legal Taxonomy & Thesaurus: The Palestinian ExperienceJamil Salem
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Benoit Pauwels
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...ULB - Bibliothèques
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Peter Neish
 
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG: connecting the knowledge community
 
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin BlockchainBeyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin BlockchainVestforsk.no
 
Information Technology and Legal Education_
Information Technology and Legal Education_Information Technology and Legal Education_
Information Technology and Legal Education_Kamlesh Singh
 
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...eraser Juan José Calderón
 
Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)Brian Huffman
 

Semelhante a Legislative Document Processing Using Semantic Web Technologies (20)

Publishing web content tailored to audiences / Liberando contenido a la med...
Publishing  web content tailored to  audiences / Liberando contenido a la med...Publishing  web content tailored to  audiences / Liberando contenido a la med...
Publishing web content tailored to audiences / Liberando contenido a la med...
 
Lex school 2011
Lex school 2011Lex school 2011
Lex school 2011
 
The ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and EvaluationThe ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and Evaluation
 
Workshop on "Legislative XML
Workshop on "Legislative XMLWorkshop on "Legislative XML
Workshop on "Legislative XML
 
Collecter 04
Collecter 04Collecter 04
Collecter 04
 
Introduction to uk legislation
Introduction to uk legislationIntroduction to uk legislation
Introduction to uk legislation
 
Roles of the Chilean Library of Congress
Roles of the Chilean Library of CongressRoles of the Chilean Library of Congress
Roles of the Chilean Library of Congress
 
E resources for law libraries
E resources for law librariesE resources for law libraries
E resources for law libraries
 
Presentación para USM
Presentación para USMPresentación para USM
Presentación para USM
 
eGov2017 Blockchain Technology
eGov2017 Blockchain TechnologyeGov2017 Blockchain Technology
eGov2017 Blockchain Technology
 
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
Building a Legal Taxonomy &  Thesaurus: The Palestinian ExperienceBuilding a Legal Taxonomy &  Thesaurus: The Palestinian Experience
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011
 
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
 
Limitreal
LimitrealLimitreal
Limitreal
 
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin BlockchainBeyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
 
Information Technology and Legal Education_
Information Technology and Legal Education_Information Technology and Legal Education_
Information Technology and Legal Education_
 
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
 
Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)
 

Mais de Jose Emilio Labra Gayo

Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctoradoJose Emilio Labra Gayo
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapesJose Emilio Labra Gayo
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesJose Emilio Labra Gayo
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosJose Emilio Labra Gayo
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorJose Emilio Labra Gayo
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applicationsJose Emilio Labra Gayo
 

Mais de Jose Emilio Labra Gayo (20)

Publicaciones de investigación
Publicaciones de investigaciónPublicaciones de investigación
Publicaciones de investigación
 
Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctorado
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Wikidata
WikidataWikidata
Wikidata
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introducción a la Web Semántica
Introducción a la Web SemánticaIntroducción a la Web Semántica
Introducción a la Web Semántica
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
2017 Tendencias en informática
2017 Tendencias en informática2017 Tendencias en informática
2017 Tendencias en informática
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
19 javascript servidor
19 javascript servidor19 javascript servidor
19 javascript servidor
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazados
 
16 Alternativas XML
16 Alternativas XML16 Alternativas XML
16 Alternativas XML
 
XSLT
XSLTXSLT
XSLT
 
XPath
XPathXPath
XPath
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el Servidor
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Legislative Document Processing Using Semantic Web Technologies

  • 1. Legislative document content extraction based on Semantic Web technologies A use case about processing the History of the Law at Chile Francisco Cifuentes Silva Library of Congress, Chile PhD Student WESO research group Jose Emilio Labra Gayo WESO research group University of Oviedo, Spain
  • 2. Chilean Library of Congress In Spanish: BCN (Biblioteca del Congreso Nacional de Chile) Political powers ExecutiveJudiciaryLegislative Independent body inside the Legislative power Advices the parliament and gives services to citizens http://www.bcn.cl
  • 3. 2 projects at library of congress (BCN) History of the Law Parliamentary work
  • 4. History of the Law (LeyChile) Collect all documents generated during a law legislative process Phases: An initiative sees life as a draft bill Subject to debates Validity time (it is published) Modifications, additions,... Derogation Goal: Capture the spirit of the law Traceability https://www.bcn.cl/historiadelaley
  • 5. Parliamentary work Collect all legislative activity by each Member of Parliament Retrieve all interventions made Parliamentary motion Session journal Commission report Ordered and categorised https://www.bcn.cl/laborparlamentaria/
  • 6. Both projects adopted semantic technologies Some initial reasons: Semantic technologies considered one pillar of strategic plan (in 2014) Innovative action to generate new products Improve interoperability mechanisms Sem. Web aligned well with open & public data
  • 7. Which semantic technologies? Text mining and content enrichment Entity extraction Topic identification Automatic markup Classification Machine readable info XML & URIs RDF Ontologies Linked Open Data
  • 8. Workflow pipelines 3 main steps Automatic XML Marker RDF & Linked data generation Content delivery
  • 9. Linked Open Data Query DB Workflow overview National library Legislative documents • Paper (requires OCR) • Text documents Automatic XML marker SVN repository Akoma-Ntoso XML editor & tools Publishing (RDF extraction From Akoma-Ntoso) Services layer Content portals
  • 10. Automatic XML marker Source: Text Target: XML following Akoma-Ntoso
  • 11. Automatic XML marker Text Entity Type MediatorLegal Knowledge Base Entity Type URI Structural marker Internal XML representation Converter XML AKN Text Text Named Entity Recognizer 4 phases
  • 12. 1. Named Entity Recognizer Detection of entities & types of entities Web service implementing the Stanford NER with a CRF classifier Evaluation in production: detects 97% entities Type Some examples # of entities Person Salvador Allende, Sebastián Piñera 5.139 Organization Ministerio de Salud, SERNATUR 2.848 Location Valparaíso, Santiago de Chile 1.251 Document Ley 20.000, Diario de sesión nº 12 732.497 Role Senador, Diputado, Alcalde 428 Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389 Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737 Dates 27 de febrero de 2010, el próximo año, ... 20.632 Text Entity Type Text Named Entity Recognizer
  • 13. 2. Mediator Entity linking and disambiguation Text similarity algorithms Based on Apache Lucene In-house development - Use of context information to narrow list of candidates - Custom filters and association heuristics - Specialized web services Entity Type Mediator Legal Knowledge Base Entity Type URI Text Text
  • 14. 3. Structural marker Detect structures in the text Titles, subtitles, paragraphs, sections,... Special structure for debates: participation Regular expressions + custom rules Entity Type URI Structural marker Internal XML representation Text
  • 15. 4. XML converter to Akom-Ntoso Programmatic approach Internal XML representation similar to DOM Each node converted to text in AKN-XML Internal XML representation Converter XML AKN
  • 16. Human edition of AKN-Documents Quality assurance by human analysts They review the generated XML documents 2 editors: Ad-hoc XML editor Commercial editor: LegisPro (Xcential)
  • 17. Linked data generation The pilot project (2011) carefully defined a stable URI model URIs have been maintained since them URIs = IDs in the whole system URIs are dereferentiable Content negotiation Custom linked data browser Documentation (in Spanish) http://datos.bcn.cl/es/documentacion
  • 18. AKN2RDF RDF extraction from Akoma-Ntoso XML ● Custom-made converter (XSL discarded for perceived complexity) ● Each XML tag implemented in one Class ● Extracted data saved into multiple databases (Relational and RDF)
  • 19. Linked data generation Source: AKN XML documents Linked data browser (WESO-DESH) Target: RDF data http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
  • 20. SPARQL endpoint RDF triples are published as a public SPARQL endpoint Number of norms by municipality
  • 21. Content delivery Web portals using Open Source Technologies CMS (Typo3) Python/Java Varnish Apache Lucene REST Web service layers which connect to RDF triplestore and DB Data exports to PDF, Doc and XML formats URIs of parliamentary profiles = URIs in triplestore
  • 22. History of the Law portal https://www.bcn.cl/historiadelaley Links to Members of Parliament Each article has a link Different versions of a law
  • 23. History of the Law portal https://www.bcn.cl/historiadelaley Compare different versions
  • 25. Some experimental visualizations Relationships between laws Historical Parliament Parliamentary genealogy (family relationships) Regions mentioned in laws (legislative hackathon)
  • 29. Regions mentioned by law Result of a legislative hackathon http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html In 2010 there was an Earthquake in BioBio region
  • 30. Some statistics 24.368 documents (nov. 2018) Number of RDF triples: 28 millions According to Google analytics Average browsing time: 2min 26s Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2016-2017)
  • 31. And some findings... Question: why are there some valleys? Dictatorship time Session attendance by year RDF triples generated by year
  • 32. Some lessons learnt RDF granularity & inference trade-off RDF statements + inference (high running times...queries that didn't terminate) A priori inferred triples added to triple store (high response times for large docs) Small subset of RDF triples (structural parts of docs and metadata) Performance problems in XML editor browsing long docs (>1000pages) Low SPARQL endpoint usage by external apps If we could start again, I would recommend ShEx Personal note: These kind of data portals led to my interest in ShEx
  • 33. Conclusions & future projects Well designed URIs can act as a perfect glue for interoperability Automatic workflow pipelines help long-term survival of LD-based projects SPARQL endpoint since 2011 Future projects on top of existing ones National Budget as Linked data Diana Project: Members of Parliament linked to social network analysis New portal: User customization & recommender systems
  • 34. End of presentation Acknowledgements: David Vilches, Eridan Otto, Christian Sifaqui