SlideShare uma empresa Scribd logo
1 de 18
Franco Niccolucci & Achille Felicetti
(PIN, University of Florence, Italy)
EOSC-hub Week 2018
Malaga, 16/4/2018
EOSCpilot is a project funded by the EC H2020 programme
 Domain: Archaeology
 Goal: semantic enrichment of texts
 Archaeological documentation largely based on texts
◦ Excavation diaries, reports, surveys, grey literature
◦ Literary/historical sources. research articles, monographs …
◦ Huge number of small (<100Kb) files in different languages
 Registry of 2,000,000 archaeological datasets (70% texts) in ARIADNE
 ARIADNE’s data infrastructure popular among archaeologists
◦ ARIADNE users in 2016: 25-30% of the European research community
◦ Strong support by
 Professional associations (EAA, EAC) & national archaeological/cultural heritage authorities
 National research institutions (CNR, CNRS, CAS, ÖAW, KNAW, BAS, ATHENA RC, FORTH)
 International recognition (USA, Mexico, Japan, Argentina)
 Needed for cloud-based data infrastructure to be developed in ARIADNEplus
◦ Deeper integration between texts, databases, GIS etc.
◦ Advanced services & VREs for data-centric archaeological research
2
EOSCpilot is a project funded by the EC H2020 programme
 NLP & NER OS engine
 Syntactic rules (tailored to specific writing style)
 Texts stating facts, not stories
◦ Data fuzziness, provenance, reliability, reasoning
 Domain ontology: CIDOC CRM (ISO 21127:2006)
◦ ... and not TEI
 Terminology
◦ Specialized vocabularies
 Terra sigillata is not just “sealed earth”
◦ Gazetteers for modern (Geonames) and ancient (Pleiades) place names
 Málaga (modern) vs Màlaka (Phoenician) vs Màlaca (Roman)
◦ Named time period management
 Bronze Age (∼ 3200-600 BC), Recent Orientalizing Period (∼ 630-570 BC)
EOSCpilot is a project funded by the EC H2020 programme
 Modular framework based on GATE toolchain: https://gate.ac.uk
◦ Advanced stemming/lemmatization components
 OpenNLP (https://opennlp.apache.org) : sentence segmentation and part of
speech (POS) tagging
 OpeNER (http://www.opener-project.eu) neuronal network for advanced
named entities recognition (NER), developed in OpeNER FP7 project
◦ Machine learning framework for auto education
 Annotated corpus required
 Ontology: CRMarcheo (CRM extension for archaeology)
 Vocabularies, gazetteers and terminological tools
◦ ICCD vocabularies for Italian archaeology, augmented with term lists
created on purpose
◦ Geonames (modern places), Pleiades (historical places)
◦ Timespan and named period component based on PeriodO
4
EOSCpilot is a project funded by the EC H2020 programme
 TextCrowd detects:
◦ Artefacts
◦ Colours
◦ Materials
◦ Time periods
◦ Persons
◦ Places
◦ Sites
◦ Time spans
◦ Techniques
 Target output formats:
◦ Textual documents automatically annotated and enriched
◦ CIDOC CRM semantic triples (RDF)
5
EOSCpilot is a project funded by the EC H2020 programme
 No annotated text corpora available in Italian to be used as training data for
machine learning algorithms
◦ Manual annotation of 400 pages of Italian archaeology reports (< 1 Person-Month)
 Preparation and adaptation of vocabularies
 Availability of user-friendly cloud-based environments and of necessary tools, to
migrate standalone prototype to cloud
◦ Several cloud solutions tested in early development, limited support provided except in
D4Science
◦ Implementation in D4Science infrastructure, but portable to other cloud services if support and
required modules available
 Authentication and Authorization
◦ No access control to metadata/data implemented so far
◦ Demonstrator focused on freely accessible textual documents
◦ Fasti Online used (http://www.fastionline.org) Open Access collection of archaeological reports
6
EOSCpilot is a project funded by the EC H2020 programme
 Operated and maintained by CNR-ISTI on the D4Science platform
https://www.d4science.org
 Modular engine based on GATE toolchain + OpenNLP-OpeNER
modules, natively provided by D4Science
 Web-based user interface for
◦ User and access management
◦ Cloud storage (private and shared files)
◦ Results available for other Virtual Research Environments (VRE) within D4Science
 Released for open use, for tests & comments
 No fancy interface produced, also to adapt to any Look-and-Feel
7
EOSCpilot is a project funded by the EC H2020 programme
 Machine-readable results: RDF encoding produced
 Human-readable results: color-encoded text (for testing)
 Interoperability of extracted knowledge
◦ Semantic information in CRM format: full integration and interoperability with
other archaeological semantic data (to be fully implemented in ARIADNEplus)
 Supporting FAIR Principles implementation
◦ Metadata to be stored in various registries for easy findability and accessibility
◦ Results ready to be reused within the same environment or consumed by other
services and/or in different scenarios
8
EOSCpilot is a project funded by the EC H2020 programme
 TEXTCROWD has shown to be useful for its main purpose: to demonstrate
the importance and usefulness of EOSC for scientific research in the cultural
heritage domain
 Adoption by other research teams in the EOSCpilot framework
◦ Integration of TEXTCROWD with new VisualMedia Demonstrator: a service for
sharing and visualizing visual media files on the web - automatic metadata extraction
from controlled lists or textual documents for 2D and 3D models
 Testing on real use cases in progress
◦ Open Access papers of the Italian Journal Archeologia e Calcolatori, ongoing
 Clean visualization
 Language extension
◦ English, Dutch: from standalone to cloud-based (annotated corpora available)
◦ French, Spanish, German: new from scratch (annotated corpora to be prepared)
◦ Other EU languages: OpeNER extension required
 Additional work required to suit it to everyday use – but not too much
9
EOSCpilot is a project funded by the EC H2020 programme
 TEXTCROWD Official Pages:
https://eoscpilot.eu/science-demos/textcrowd
https://textcrowd.d4science.org
 TEXTCROWD Pilot:
https://services.d4science.org/group/textcrowd/data-miner
(registration required)
10
EOSCpilot is a project funded by the EC H2020 programme
1. Upload the file(s) to analyze
2. Launch TextCrowd
3. Select the file(s) to process
4. Collect the results
EOSCpilot is a project funded by the EC H2020 programme
EOSCpilot is a project funded by the EC H2020 programme
EOSCpilot is a project funded by the EC H2020 programme
EOSCpilot is a project funded by the EC H2020 programme
EOSCpilot is a project funded by the EC H2020 programme
EOSCpilot is a project funded by the EC H2020 programme
EOSCpilot is a project funded by the EC H2020 programme
Franco Niccolucci: franco.niccolucci@gmail.com – Achille Felicetti: achille.felicetti@pin.unifi.it

Mais conteúdo relacionado

Semelhante a Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd

Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation EOSCpilot .eu
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...EOSC-hub project
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15locloud
 
European Cloud Initiative: implementation status
European Cloud Initiative: implementation statusEuropean Cloud Initiative: implementation status
European Cloud Initiative: implementation statusEUDAT
 
2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - finalEOSC-hub project
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
A Service-Oriented National E-Theses Information System And Repository
A Service-Oriented National E-Theses Information System And RepositoryA Service-Oriented National E-Theses Information System And Repository
A Service-Oriented National E-Theses Information System And RepositoryJill Brown
 
IMPACT at OCR Summit
IMPACT at OCR SummitIMPACT at OCR Summit
IMPACT at OCR Summitcneudecker
 
Science Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsScience Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsEOSCpilot .eu
 
Introduction to LoCloud
Introduction to LoCloud Introduction to LoCloud
Introduction to LoCloud locloud
 
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...e-ROSA
 
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenSem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenVladimir Alexiev, PhD, PMP
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Rio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der WerfRio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der WerfRio Info
 
Videoactive @IASA World Conference 2009
Videoactive @IASA World Conference 2009Videoactive @IASA World Conference 2009
Videoactive @IASA World Conference 2009Marco Rendina
 

Semelhante a Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd (20)

Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15
 
European Cloud Initiative: implementation status
European Cloud Initiative: implementation statusEuropean Cloud Initiative: implementation status
European Cloud Initiative: implementation status
 
2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
 
A Service-Oriented National E-Theses Information System And Repository
A Service-Oriented National E-Theses Information System And RepositoryA Service-Oriented National E-Theses Information System And Repository
A Service-Oriented National E-Theses Information System And Repository
 
IMPACT at OCR Summit
IMPACT at OCR SummitIMPACT at OCR Summit
IMPACT at OCR Summit
 
Science Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsScience Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and Astrophysics
 
Introduction to LoCloud
Introduction to LoCloud Introduction to LoCloud
Introduction to LoCloud
 
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
 
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenSem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
 
Rio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der WerfRio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der Werf
 
Videoactive @IASA World Conference 2009
Videoactive @IASA World Conference 2009Videoactive @IASA World Conference 2009
Videoactive @IASA World Conference 2009
 

Mais de EOSC-hub project

EOSC-hub Early Adopter Programme
EOSC-hub Early Adopter ProgrammeEOSC-hub Early Adopter Programme
EOSC-hub Early Adopter ProgrammeEOSC-hub project
 
Introduction to service management and FitSM
Introduction to service management and FitSMIntroduction to service management and FitSM
Introduction to service management and FitSMEOSC-hub project
 
Service management board (SMB), Service providers’ forum (SPF)
Service management board (SMB), Service providers’ forum (SPF)Service management board (SMB), Service providers’ forum (SPF)
Service management board (SMB), Service providers’ forum (SPF)EOSC-hub project
 
Joining the EOSC-hub as a Service Provider
Joining the EOSC-hub as a Service ProviderJoining the EOSC-hub as a Service Provider
Joining the EOSC-hub as a Service ProviderEOSC-hub project
 
PID services - understandability and findability of data
PID services - understandability and findability of dataPID services - understandability and findability of data
PID services - understandability and findability of dataEOSC-hub project
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitationEOSC-hub project
 
Repositories for long-term preservation - certification
Repositories for long-term preservation - certificationRepositories for long-term preservation - certification
Repositories for long-term preservation - certificationEOSC-hub project
 
EOSC working group on FAIR
EOSC working group on FAIREOSC working group on FAIR
EOSC working group on FAIREOSC-hub project
 
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...EOSC-hub project
 
Services to support FAIR data - Introduction
Services to support FAIR data - IntroductionServices to support FAIR data - Introduction
Services to support FAIR data - IntroductionEOSC-hub project
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationEOSC-hub project
 
Overview on the HPC CoEs panorama
Overview on the HPC CoEs panoramaOverview on the HPC CoEs panorama
Overview on the HPC CoEs panoramaEOSC-hub project
 
Overview of the Onboarding and validation process and the Rules of Participat...
Overview of the Onboarding and validation process and the Rules of Participat...Overview of the Onboarding and validation process and the Rules of Participat...
Overview of the Onboarding and validation process and the Rules of Participat...EOSC-hub project
 
ELIXIR Competence Centre in EOSC-hub
ELIXIR Competence Centre in EOSC-hubELIXIR Competence Centre in EOSC-hub
ELIXIR Competence Centre in EOSC-hubEOSC-hub project
 
Data sharing in EOSC-hub: perspectives on “sensitive” data
Data sharing in EOSC-hub: perspectives on “sensitive” dataData sharing in EOSC-hub: perspectives on “sensitive” data
Data sharing in EOSC-hub: perspectives on “sensitive” dataEOSC-hub project
 

Mais de EOSC-hub project (20)

EOSC-hub Early Adopter Programme
EOSC-hub Early Adopter ProgrammeEOSC-hub Early Adopter Programme
EOSC-hub Early Adopter Programme
 
Introduction to service management and FitSM
Introduction to service management and FitSMIntroduction to service management and FitSM
Introduction to service management and FitSM
 
Service management board (SMB), Service providers’ forum (SPF)
Service management board (SMB), Service providers’ forum (SPF)Service management board (SMB), Service providers’ forum (SPF)
Service management board (SMB), Service providers’ forum (SPF)
 
Joining the EOSC-hub as a Service Provider
Joining the EOSC-hub as a Service ProviderJoining the EOSC-hub as a Service Provider
Joining the EOSC-hub as a Service Provider
 
PID services - understandability and findability of data
PID services - understandability and findability of dataPID services - understandability and findability of data
PID services - understandability and findability of data
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitation
 
Repositories for long-term preservation - certification
Repositories for long-term preservation - certificationRepositories for long-term preservation - certification
Repositories for long-term preservation - certification
 
EOSC working group on FAIR
EOSC working group on FAIREOSC working group on FAIR
EOSC working group on FAIR
 
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...
 
Services to support FAIR data - Introduction
Services to support FAIR data - IntroductionServices to support FAIR data - Introduction
Services to support FAIR data - Introduction
 
EOSC-synergy
EOSC-synergyEOSC-synergy
EOSC-synergy
 
ExPaNDS
ExPaNDSExPaNDS
ExPaNDS
 
EOSC-Pillar
EOSC-PillarEOSC-Pillar
EOSC-Pillar
 
NI4OS-Europe
NI4OS-EuropeNI4OS-Europe
NI4OS-Europe
 
Excellerat CoE
Excellerat CoEExcellerat CoE
Excellerat CoE
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaboration
 
Overview on the HPC CoEs panorama
Overview on the HPC CoEs panoramaOverview on the HPC CoEs panorama
Overview on the HPC CoEs panorama
 
Overview of the Onboarding and validation process and the Rules of Participat...
Overview of the Onboarding and validation process and the Rules of Participat...Overview of the Onboarding and validation process and the Rules of Participat...
Overview of the Onboarding and validation process and the Rules of Participat...
 
ELIXIR Competence Centre in EOSC-hub
ELIXIR Competence Centre in EOSC-hubELIXIR Competence Centre in EOSC-hub
ELIXIR Competence Centre in EOSC-hub
 
Data sharing in EOSC-hub: perspectives on “sensitive” data
Data sharing in EOSC-hub: perspectives on “sensitive” dataData sharing in EOSC-hub: perspectives on “sensitive” data
Data sharing in EOSC-hub: perspectives on “sensitive” data
 

Último

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd

  • 1. Franco Niccolucci & Achille Felicetti (PIN, University of Florence, Italy) EOSC-hub Week 2018 Malaga, 16/4/2018
  • 2. EOSCpilot is a project funded by the EC H2020 programme  Domain: Archaeology  Goal: semantic enrichment of texts  Archaeological documentation largely based on texts ◦ Excavation diaries, reports, surveys, grey literature ◦ Literary/historical sources. research articles, monographs … ◦ Huge number of small (<100Kb) files in different languages  Registry of 2,000,000 archaeological datasets (70% texts) in ARIADNE  ARIADNE’s data infrastructure popular among archaeologists ◦ ARIADNE users in 2016: 25-30% of the European research community ◦ Strong support by  Professional associations (EAA, EAC) & national archaeological/cultural heritage authorities  National research institutions (CNR, CNRS, CAS, ÖAW, KNAW, BAS, ATHENA RC, FORTH)  International recognition (USA, Mexico, Japan, Argentina)  Needed for cloud-based data infrastructure to be developed in ARIADNEplus ◦ Deeper integration between texts, databases, GIS etc. ◦ Advanced services & VREs for data-centric archaeological research 2
  • 3. EOSCpilot is a project funded by the EC H2020 programme  NLP & NER OS engine  Syntactic rules (tailored to specific writing style)  Texts stating facts, not stories ◦ Data fuzziness, provenance, reliability, reasoning  Domain ontology: CIDOC CRM (ISO 21127:2006) ◦ ... and not TEI  Terminology ◦ Specialized vocabularies  Terra sigillata is not just “sealed earth” ◦ Gazetteers for modern (Geonames) and ancient (Pleiades) place names  Málaga (modern) vs Màlaka (Phoenician) vs Màlaca (Roman) ◦ Named time period management  Bronze Age (∼ 3200-600 BC), Recent Orientalizing Period (∼ 630-570 BC)
  • 4. EOSCpilot is a project funded by the EC H2020 programme  Modular framework based on GATE toolchain: https://gate.ac.uk ◦ Advanced stemming/lemmatization components  OpenNLP (https://opennlp.apache.org) : sentence segmentation and part of speech (POS) tagging  OpeNER (http://www.opener-project.eu) neuronal network for advanced named entities recognition (NER), developed in OpeNER FP7 project ◦ Machine learning framework for auto education  Annotated corpus required  Ontology: CRMarcheo (CRM extension for archaeology)  Vocabularies, gazetteers and terminological tools ◦ ICCD vocabularies for Italian archaeology, augmented with term lists created on purpose ◦ Geonames (modern places), Pleiades (historical places) ◦ Timespan and named period component based on PeriodO 4
  • 5. EOSCpilot is a project funded by the EC H2020 programme  TextCrowd detects: ◦ Artefacts ◦ Colours ◦ Materials ◦ Time periods ◦ Persons ◦ Places ◦ Sites ◦ Time spans ◦ Techniques  Target output formats: ◦ Textual documents automatically annotated and enriched ◦ CIDOC CRM semantic triples (RDF) 5
  • 6. EOSCpilot is a project funded by the EC H2020 programme  No annotated text corpora available in Italian to be used as training data for machine learning algorithms ◦ Manual annotation of 400 pages of Italian archaeology reports (< 1 Person-Month)  Preparation and adaptation of vocabularies  Availability of user-friendly cloud-based environments and of necessary tools, to migrate standalone prototype to cloud ◦ Several cloud solutions tested in early development, limited support provided except in D4Science ◦ Implementation in D4Science infrastructure, but portable to other cloud services if support and required modules available  Authentication and Authorization ◦ No access control to metadata/data implemented so far ◦ Demonstrator focused on freely accessible textual documents ◦ Fasti Online used (http://www.fastionline.org) Open Access collection of archaeological reports 6
  • 7. EOSCpilot is a project funded by the EC H2020 programme  Operated and maintained by CNR-ISTI on the D4Science platform https://www.d4science.org  Modular engine based on GATE toolchain + OpenNLP-OpeNER modules, natively provided by D4Science  Web-based user interface for ◦ User and access management ◦ Cloud storage (private and shared files) ◦ Results available for other Virtual Research Environments (VRE) within D4Science  Released for open use, for tests & comments  No fancy interface produced, also to adapt to any Look-and-Feel 7
  • 8. EOSCpilot is a project funded by the EC H2020 programme  Machine-readable results: RDF encoding produced  Human-readable results: color-encoded text (for testing)  Interoperability of extracted knowledge ◦ Semantic information in CRM format: full integration and interoperability with other archaeological semantic data (to be fully implemented in ARIADNEplus)  Supporting FAIR Principles implementation ◦ Metadata to be stored in various registries for easy findability and accessibility ◦ Results ready to be reused within the same environment or consumed by other services and/or in different scenarios 8
  • 9. EOSCpilot is a project funded by the EC H2020 programme  TEXTCROWD has shown to be useful for its main purpose: to demonstrate the importance and usefulness of EOSC for scientific research in the cultural heritage domain  Adoption by other research teams in the EOSCpilot framework ◦ Integration of TEXTCROWD with new VisualMedia Demonstrator: a service for sharing and visualizing visual media files on the web - automatic metadata extraction from controlled lists or textual documents for 2D and 3D models  Testing on real use cases in progress ◦ Open Access papers of the Italian Journal Archeologia e Calcolatori, ongoing  Clean visualization  Language extension ◦ English, Dutch: from standalone to cloud-based (annotated corpora available) ◦ French, Spanish, German: new from scratch (annotated corpora to be prepared) ◦ Other EU languages: OpeNER extension required  Additional work required to suit it to everyday use – but not too much 9
  • 10. EOSCpilot is a project funded by the EC H2020 programme  TEXTCROWD Official Pages: https://eoscpilot.eu/science-demos/textcrowd https://textcrowd.d4science.org  TEXTCROWD Pilot: https://services.d4science.org/group/textcrowd/data-miner (registration required) 10
  • 11. EOSCpilot is a project funded by the EC H2020 programme 1. Upload the file(s) to analyze 2. Launch TextCrowd 3. Select the file(s) to process 4. Collect the results
  • 12. EOSCpilot is a project funded by the EC H2020 programme
  • 13. EOSCpilot is a project funded by the EC H2020 programme
  • 14. EOSCpilot is a project funded by the EC H2020 programme
  • 15. EOSCpilot is a project funded by the EC H2020 programme
  • 16. EOSCpilot is a project funded by the EC H2020 programme
  • 17. EOSCpilot is a project funded by the EC H2020 programme
  • 18. EOSCpilot is a project funded by the EC H2020 programme Franco Niccolucci: franco.niccolucci@gmail.com – Achille Felicetti: achille.felicetti@pin.unifi.it