SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
///////////
OSI
TypeDB Loader
2022-11-22
Henning Kuich
Semantic and Knowledge Graph
Technologies @ Bayer Pharma R&D
what we are trying to do… disease understanding
what we are trying to do… disease understanding
what we are trying to do…
why TypeDB?
Most of biology can be modeled as a networks/graphs
data are highly context-dependent
inference can do A LOT for us
backend for AI/ML to “complete” our graph (attribute
and link prediction)
Ecosystem efforts
Importing ?  TypeDB Loader
Graph Exploration  TypeDB/ElasticSearch KG Framework
AI/ML  typeDB-to-pytorch-geometric / typedb-ml
what is TypeDB Loader?
why grami?
Repeated Logic:
• Optional vs Required
• Columns of lists of attributes
• Manual = error-prone
• Dirty Data = Crash @ Insert
Template
Query
Insert
Wanted:
• Basic data cleaning
• Validate data types
• Validate required attributes
NOT trivial @ Scale:
• Batches per Transaction
• Parallelization
Big Data  Failure Tolerance!
how does grami work?
Schema Processor Config Data Config Data
how does TypeDB Loader work?
Schema Data
Config
files, utf-8, gzipped
TypeDB Loader: Attributes
TypeDB Loader: Entities
TypeDB Loader: Relations, attribute players
TypeDB Loader: Relations, players by attribute
TypeDB Loader: Relations players by matching on players
TypeDB Loader: Appending Attributes
TypeDB Loader: Append-or-insert Attributes
TypeDB Loader: global configuration
migration status / stop & restart
Things just got a lot more tricky…
And it doesn’t yet work!
BUT:
- you cannot change block size
- you cannot re-order data
- you can change threads
TypeDB Loader reporting – config validation
Error reporting – failed inserts / invalid data
how to use typeDB Loader: executable
how to use typeDB Loader: as a Java Dependency
significant changes summary
Significant Updates:
• Data config + processor config now just one config
• Config syntax
• Config validation and reporting
• Failure reports on row-level
• UTF8 encoding enforcement
• RegEx-based preprocessing of data
Next Steps
• Config validation continued…
• Migration status re-implemention in progress
• Improved error reporting
• Schema-based config template generation
• JSON schema/data handling
Use me!
resources
TypeDB Loader
Github
https://github.com/typedb-osi/typedb-loader
Wiki https://github.com/typedb-osi/typedb-loader/wiki
Medium Tutorial in progress…
Example Project in progress…
Licensing
Above repositories include software developed at Bayer AG. They are released under the Apache License 2.0.
Credits
Icon in banner by Smashicons from Flaticon
Special thanks to…
///////////
Thank you!
Questions?
Twitter & LinkedIn:
@hkuich

Mais conteúdo relacionado

Semelhante a Loading Huge Amounts of Data

DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
David Mann
 

Semelhante a Loading Huge Amounts of Data (20)

Gloc gangler 2018._v4
Gloc gangler 2018._v4Gloc gangler 2018._v4
Gloc gangler 2018._v4
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
The journey of Moving from AWS ELK to GCP Data Pipeline
The journey of Moving from AWS ELK to GCP Data PipelineThe journey of Moving from AWS ELK to GCP Data Pipeline
The journey of Moving from AWS ELK to GCP Data Pipeline
 
Scaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case studyScaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case study
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Meet Magento Belarus 2015: Uladzimir Kalashnikau
Meet Magento Belarus 2015: Uladzimir KalashnikauMeet Magento Belarus 2015: Uladzimir Kalashnikau
Meet Magento Belarus 2015: Uladzimir Kalashnikau
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
 
Analytics Metrics Delivery & ML Feature Visualization
Analytics Metrics Delivery & ML Feature VisualizationAnalytics Metrics Delivery & ML Feature Visualization
Analytics Metrics Delivery & ML Feature Visualization
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At CommercetoolsGraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Checklist for Upgrades and Migrations
Checklist for Upgrades and MigrationsChecklist for Upgrades and Migrations
Checklist for Upgrades and Migrations
 

Mais de Vaticle

Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Vaticle
 
A Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security KnowledgeA Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security Knowledge
Vaticle
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Vaticle
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
Vaticle
 
TypeDB Academy | Modelling Principles
TypeDB Academy | Modelling PrinciplesTypeDB Academy | Modelling Principles
TypeDB Academy | Modelling Principles
Vaticle
 
Intro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed databaseIntro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed database
Vaticle
 
Graph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphsGraph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphs
Vaticle
 

Mais de Vaticle (20)

Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge Graph
 
A Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security KnowledgeA Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security Knowledge
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
 
The Next Big Thing in AI - Causality
The Next Big Thing in AI - CausalityThe Next Big Thing in AI - Causality
The Next Big Thing in AI - Causality
 
Building a Cyber Threat Intelligence Knowledge Graph
Building a Cyber Threat Intelligence Knowledge GraphBuilding a Cyber Threat Intelligence Knowledge Graph
Building a Cyber Threat Intelligence Knowledge Graph
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
 
Building a Distributed Database with Raft.pdf
Building a Distributed Database with Raft.pdfBuilding a Distributed Database with Raft.pdf
Building a Distributed Database with Raft.pdf
 
Enabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdfEnabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdf
 
TypeDB Academy | Inference with Rules
TypeDB Academy | Inference with RulesTypeDB Academy | Inference with Rules
TypeDB Academy | Inference with Rules
 
TypeDB Academy | Modelling Principles
TypeDB Academy | Modelling PrinciplesTypeDB Academy | Modelling Principles
TypeDB Academy | Modelling Principles
 
Beyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQLBeyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQL
 
TypeDB Academy- Getting Started with Schema Design
TypeDB Academy- Getting Started with Schema DesignTypeDB Academy- Getting Started with Schema Design
TypeDB Academy- Getting Started with Schema Design
 
Comparing Semantic Web Technologies to TypeDB
Comparing Semantic Web Technologies to TypeDBComparing Semantic Web Technologies to TypeDB
Comparing Semantic Web Technologies to TypeDB
 
Reasoner, Meet Actors | TypeDB's Native Reasoning Engine
Reasoner, Meet Actors | TypeDB's Native Reasoning EngineReasoner, Meet Actors | TypeDB's Native Reasoning Engine
Reasoner, Meet Actors | TypeDB's Native Reasoning Engine
 
Intro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed databaseIntro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed database
 
Graph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphsGraph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphs
 
Pandora Paper Leaks With TypeDB
 Pandora Paper Leaks With TypeDB Pandora Paper Leaks With TypeDB
Pandora Paper Leaks With TypeDB
 
Strongly Typed Data for Machine Learning
Strongly Typed Data for Machine LearningStrongly Typed Data for Machine Learning
Strongly Typed Data for Machine Learning
 
Open World Robotics
Open World RoboticsOpen World Robotics
Open World Robotics
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Loading Huge Amounts of Data