O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Loading Huge Amounts of Data

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 28 Anúncio

Loading Huge Amounts of Data

Baixar para ler offline

Loading a lot of data into a graph database is not a trivial exercise. TypeDB Loader (formerly known as GraMi) was developed to allow large-scale data import into TypeDB, a strongly-typed database. Recent improvements have immensely simplified the configuration interface to allow for easier data importing, while maintaining features and the promise of loading huge amounts of data into TypeDB as fast as possible.

Loading a lot of data into a graph database is not a trivial exercise. TypeDB Loader (formerly known as GraMi) was developed to allow large-scale data import into TypeDB, a strongly-typed database. Recent improvements have immensely simplified the configuration interface to allow for easier data importing, while maintaining features and the promise of loading huge amounts of data into TypeDB as fast as possible.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Mais de Vaticle (20)

Mais recentes (20)

Anúncio

Loading Huge Amounts of Data

  1. 1. /////////// OSI TypeDB Loader 2022-11-22 Henning Kuich Semantic and Knowledge Graph Technologies @ Bayer Pharma R&D
  2. 2. what we are trying to do… disease understanding
  3. 3. what we are trying to do… disease understanding
  4. 4. what we are trying to do…
  5. 5. why TypeDB? Most of biology can be modeled as a networks/graphs data are highly context-dependent inference can do A LOT for us backend for AI/ML to “complete” our graph (attribute and link prediction)
  6. 6. Ecosystem efforts Importing ?  TypeDB Loader Graph Exploration  TypeDB/ElasticSearch KG Framework AI/ML  typeDB-to-pytorch-geometric / typedb-ml
  7. 7. what is TypeDB Loader?
  8. 8. why grami? Repeated Logic: • Optional vs Required • Columns of lists of attributes • Manual = error-prone • Dirty Data = Crash @ Insert Template Query Insert Wanted: • Basic data cleaning • Validate data types • Validate required attributes NOT trivial @ Scale: • Batches per Transaction • Parallelization Big Data  Failure Tolerance!
  9. 9. how does grami work? Schema Processor Config Data Config Data
  10. 10. how does TypeDB Loader work? Schema Data Config files, utf-8, gzipped
  11. 11. TypeDB Loader: Attributes
  12. 12. TypeDB Loader: Entities
  13. 13. TypeDB Loader: Relations, attribute players
  14. 14. TypeDB Loader: Relations, players by attribute
  15. 15. TypeDB Loader: Relations players by matching on players
  16. 16. TypeDB Loader: Appending Attributes
  17. 17. TypeDB Loader: Append-or-insert Attributes
  18. 18. TypeDB Loader: global configuration
  19. 19. migration status / stop & restart Things just got a lot more tricky… And it doesn’t yet work! BUT: - you cannot change block size - you cannot re-order data - you can change threads
  20. 20. TypeDB Loader reporting – config validation
  21. 21. Error reporting – failed inserts / invalid data
  22. 22. how to use typeDB Loader: executable
  23. 23. how to use typeDB Loader: as a Java Dependency
  24. 24. significant changes summary Significant Updates: • Data config + processor config now just one config • Config syntax • Config validation and reporting • Failure reports on row-level • UTF8 encoding enforcement • RegEx-based preprocessing of data
  25. 25. Next Steps • Config validation continued… • Migration status re-implemention in progress • Improved error reporting • Schema-based config template generation • JSON schema/data handling
  26. 26. Use me! resources TypeDB Loader Github https://github.com/typedb-osi/typedb-loader Wiki https://github.com/typedb-osi/typedb-loader/wiki Medium Tutorial in progress… Example Project in progress… Licensing Above repositories include software developed at Bayer AG. They are released under the Apache License 2.0. Credits Icon in banner by Smashicons from Flaticon
  27. 27. Special thanks to…
  28. 28. /////////// Thank you! Questions? Twitter & LinkedIn: @hkuich

×