SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Martijn van Groningen
mvg@apache.org
@mvgroningen
Document relations
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Topics
• Background
• Parent / child support
• Nested support
• Future developments
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Query
Local join
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
• We need more capacity.
• But how to divide the relational data?
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Q
uery
sub-queries
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Query
sub-query
De-normalized document
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Query
sub-query
C
local joinlocal join
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
• Dealing with relations either pay the price on
write time or read time.
• Alternatively documents relations can balance
the costs between read and write time.
For example: one join to reduce duplicated data.
• Supporting “many-to-many” joins in a
distributed system is difficult.
Either unbalanced partitions or very expensive join.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
The query time join
Parent child
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child
• Parent / child is a query time join between
different document types in the same index.
• Parent and children documents are stored as
separate documents in the same index.
• Child documents can point to only one parent.
• Parent documents can be referred by multiple child documents.
• Also a parent document can be a child
document of a different parent.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child
• A parent document and its children
documents are routed into the same shard.
• Parent id is used as routing value.
• In combination with a parent ids in memory
data structure the parent-child join is fast.
• Use warmer api to preload it!
• Parent ids data structure size has significantly been reduced in
version 0.90.1
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - Indexing
• The parent document doesn’t need to exist at
time of indexing.
curl -XPUT 'localhost:9200/products' -d '{
  "mappings" : {
     "offer" : {
        "_parent" : { "type" : "product" }
     }
  }
}'
A offer document
is a parent of a
product document
curl -XPUT 'localhost:9200/products/offer/12?parent=p2345' -d '{
"valid_from" : "2013-05-01",
"valid_to" : "2013-10-01",
"price" : 26.87,
}'
Then when
indexing mention
to what product a
offer points to.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - Querying
• The has_child query returns parent
documents based on matches in its child
documents.
• The optional “score_mode” defines how child
hits are mapped to its parent document.
curl -XGET 'localhost:9200/products/_search' -d '{
"query" : {
      "has_child" : {
         "type" : "offer",
" "query" : {
            "range" : {
               "price" : {
"lte" : 50
               }
            }
       }
    }
  }
}'
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
The index time join
Nested objects
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
• In many cases domain models have the same
write / update live-cycle.
• Books & Chapters.
• Movies & Actors.
• De-normalizing results in the fastest queries.
• Compared to using parent/child queries.
• Nested objects allow smart de-normalization.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"authors" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapter_1_title" : "Introduction",
"chapter_1_summary" : "Short introduction about Elasticsearch’s features ...",
"chapter_1_number_of_pages" : 12,
"chapter_2_title" : "Data in, Data out",
"chapter_2_summary" : "How to manage your data with Elasticsearch ...",
"chapter_2_number_of_pages" : 39,
...
}
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"authors" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapter_1_title" : "Introduction",
"chapter_1_summary" : "Short introduction about Elasticsearch’s features ...",
"chapter_1_number_of_pages" : 12,
"chapter_2_title" : "Data in, Data out",
"chapter_2_summary" : "How to manage your data with Elasticsearch ...",
"chapter_2_number_of_pages" : 39,
...
}
Too verbose!
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"author" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapters" : [
{
"title" : "Introduction",
"summary" : "Short introduction about Elasticsearch’s features ...",
"number_of_pages" : 12
},
{
"title" : "Data in, Data out",
"summary" : "How to manage your data with Elasticsearch ...",
"number_of_pages" : 39
},
...
]
}
• JSON allows complex nesting of objects.
• But how does this get indexed?
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
...
"chapters" : [
{"title" : "Introduction", "summary" : "Short ...", "number_of_pages" : 12},
{"title" : "Data in, ...", "summary" : "How to ...", "number_of_pages" : 39},
...
]
}
{
"title" : "Elasticsearch",
...
"chapters.title" : ["Data in, Data out", "Introduction"],
"chapters.summary" : ["How to ...", "Short ..."],
"chapters.number_of_pages" : [12, 39]
}
Original json document:
Lucene Document Structure:
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Mapping
• The nested type triggers Lucene’s block
indexing.
• Multiple levels of inner objects is possible.
curl -XPUT 'localhost:9200/books' -d '{
"mappings" : {
"book" : {
"properties" : {
"chapters" : {
"type" : "nested"
}
}
}
}
}'
Document type
Field type: ‘nested’
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Block indexing
{"chapters.title" : "Into...", "chapters.summary" : "...", "chapters.number_of_pages" : 12},
{"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39},
...
{
"title" : "Elasticsearch",
...
}
Lucene Documents Structure:
• Inlining the inner objects as separate Lucene
documents right before the root document.
• The root document and its nested documents
always remain in the same block.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Nested query
• Nested query returns the complete “book” as
hit. (root document)
curl -XGET 'localhost:9200/books/book/_search' -d '{
  "query" : {
     "nested" : {
         "path" : "chapters",
         "score_mode" : "avg",
" "query" : {
            "match" : {
               "chapters.summary" : {
                  "query" : "indexing data"
               }
            }
         }" "
     }
  }
}'
Specify the
nested level.
Chapter level
query
score mode
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
X X X X X
root documents bitset:
Nested Lucene document, that match with the inner query.
Aggregate nested scores and push to root document.
X Set bit, that represents a root document.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
But first questions!
Extra slides
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Nested sorting
curl -XGET 'localhost:9200/books/book/_search' -d '{
 "query" : {
  "match" : {
"summary" : {
"query" : "guide"
}
}       
},
"sort" : [
{
"chapters.number_of_pages" : {
"sort_mode" : "avg",
"nested_filter" : {
"range" : {
"chapters.number_of_pages" : {"lte" : 15}
}
}
}
}
]
}'
Sort mode
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - sorting
• Parent/child sorting isn’t possible at the
moment.
• But there is a “custom_score” query work around.
• Downsides:
• Forces to execute a script for each matching document.
• The child sort value is converted into a float value.
"has_child" : {
"type" : "offer",
"score_mode" : "avg",
"query" : {
"custom_score" : {
"query" : { ... },
"script" : "doc["price"].value"
}
}
}
Wednesday, June 5, 13

Mais conteúdo relacionado

Destaque

Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...Dana Todd
 
Discovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpediaDiscovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpediaNicolas MARIE
 
Web search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discoveryWeb search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discoveryAli Dasdan
 
OseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platformOseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platform@CULT Srl
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 

Destaque (6)

Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
 
Discovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpediaDiscovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpedia
 
Web search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discoveryWeb search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discovery
 
OseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platformOseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platform
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 

Semelhante a Document relations - Berlin Buzzwords 2013

Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshopMathieu Elie
 
JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebGregg Kellogg
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearchsirensolutions
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearchmartijnvg
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachSymfonyMu
 
Linked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsLinked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsJay Myers
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBHector Correa
 
The googlization of search 2014
The googlization of search 2014The googlization of search 2014
The googlization of search 2014nabot
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyondErnesto Reig
 
Intro to Angular.JS Directives
Intro to Angular.JS DirectivesIntro to Angular.JS Directives
Intro to Angular.JS DirectivesChristian Lilley
 
20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdfMichal Miklas
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineLucidworks
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2Rafał Kuć
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchSematext Group, Inc.
 
NoSQL Now 2013 Presentation
NoSQL Now 2013 PresentationNoSQL Now 2013 Presentation
NoSQL Now 2013 PresentationArjen Schoneveld
 
Mongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeMongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeSpyros Passas
 

Semelhante a Document relations - Berlin Buzzwords 2013 (20)

Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshop
 
JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social Web
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approach
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Linked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsLinked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI Mpls
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDB
 
The googlization of search 2014
The googlization of search 2014The googlization of search 2014
The googlization of search 2014
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyond
 
Intro to Angular.JS Directives
Intro to Angular.JS DirectivesIntro to Angular.JS Directives
Intro to Angular.JS Directives
 
20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search Engine
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
 
NoSQL Now 2013 Presentation
NoSQL Now 2013 PresentationNoSQL Now 2013 Presentation
NoSQL Now 2013 Presentation
 
Mongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeMongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappe
 
Ias2010
Ias2010Ias2010
Ias2010
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Document relations - Berlin Buzzwords 2013

  • 1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Martijn van Groningen mvg@apache.org @mvgroningen Document relations Wednesday, June 5, 13
  • 2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Topics • Background • Parent / child support • Nested support • Future developments Wednesday, June 5, 13
  • 3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Wednesday, June 5, 13
  • 4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Query Local join Wednesday, June 5, 13
  • 5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background • We need more capacity. • But how to divide the relational data? Wednesday, June 5, 13
  • 6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Q uery sub-queries Wednesday, June 5, 13
  • 7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Query sub-query De-normalized document Wednesday, June 5, 13
  • 8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Wednesday, June 5, 13
  • 9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Query sub-query C local joinlocal join Wednesday, June 5, 13
  • 10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background • Dealing with relations either pay the price on write time or read time. • Alternatively documents relations can balance the costs between read and write time. For example: one join to reduce duplicated data. • Supporting “many-to-many” joins in a distributed system is difficult. Either unbalanced partitions or very expensive join. Wednesday, June 5, 13
  • 11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited The query time join Parent child Wednesday, June 5, 13
  • 12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child • Parent / child is a query time join between different document types in the same index. • Parent and children documents are stored as separate documents in the same index. • Child documents can point to only one parent. • Parent documents can be referred by multiple child documents. • Also a parent document can be a child document of a different parent. Wednesday, June 5, 13
  • 13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child • A parent document and its children documents are routed into the same shard. • Parent id is used as routing value. • In combination with a parent ids in memory data structure the parent-child join is fast. • Use warmer api to preload it! • Parent ids data structure size has significantly been reduced in version 0.90.1 Wednesday, June 5, 13
  • 14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - Indexing • The parent document doesn’t need to exist at time of indexing. curl -XPUT 'localhost:9200/products' -d '{   "mappings" : {      "offer" : {         "_parent" : { "type" : "product" }      }   } }' A offer document is a parent of a product document curl -XPUT 'localhost:9200/products/offer/12?parent=p2345' -d '{ "valid_from" : "2013-05-01", "valid_to" : "2013-10-01", "price" : 26.87, }' Then when indexing mention to what product a offer points to. Wednesday, June 5, 13
  • 15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - Querying • The has_child query returns parent documents based on matches in its child documents. • The optional “score_mode” defines how child hits are mapped to its parent document. curl -XGET 'localhost:9200/products/_search' -d '{ "query" : {       "has_child" : {          "type" : "offer", " "query" : {             "range" : {                "price" : { "lte" : 50                }             }        }     }   } }' Wednesday, June 5, 13
  • 16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited The index time join Nested objects Wednesday, June 5, 13
  • 17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects • In many cases domain models have the same write / update live-cycle. • Books & Chapters. • Movies & Actors. • De-normalizing results in the fastest queries. • Compared to using parent/child queries. • Nested objects allow smart de-normalization. Wednesday, June 5, 13
  • 18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "authors" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapter_1_title" : "Introduction", "chapter_1_summary" : "Short introduction about Elasticsearch’s features ...", "chapter_1_number_of_pages" : 12, "chapter_2_title" : "Data in, Data out", "chapter_2_summary" : "How to manage your data with Elasticsearch ...", "chapter_2_number_of_pages" : 39, ... } Wednesday, June 5, 13
  • 19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "authors" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapter_1_title" : "Introduction", "chapter_1_summary" : "Short introduction about Elasticsearch’s features ...", "chapter_1_number_of_pages" : 12, "chapter_2_title" : "Data in, Data out", "chapter_2_summary" : "How to manage your data with Elasticsearch ...", "chapter_2_number_of_pages" : 39, ... } Too verbose! Wednesday, June 5, 13
  • 20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "author" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapters" : [ { "title" : "Introduction", "summary" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 }, { "title" : "Data in, Data out", "summary" : "How to manage your data with Elasticsearch ...", "number_of_pages" : 39 }, ... ] } • JSON allows complex nesting of objects. • But how does this get indexed? Wednesday, June 5, 13
  • 21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", ... "chapters" : [ {"title" : "Introduction", "summary" : "Short ...", "number_of_pages" : 12}, {"title" : "Data in, ...", "summary" : "How to ...", "number_of_pages" : 39}, ... ] } { "title" : "Elasticsearch", ... "chapters.title" : ["Data in, Data out", "Introduction"], "chapters.summary" : ["How to ...", "Short ..."], "chapters.number_of_pages" : [12, 39] } Original json document: Lucene Document Structure: Wednesday, June 5, 13
  • 22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Mapping • The nested type triggers Lucene’s block indexing. • Multiple levels of inner objects is possible. curl -XPUT 'localhost:9200/books' -d '{ "mappings" : { "book" : { "properties" : { "chapters" : { "type" : "nested" } } } } }' Document type Field type: ‘nested’ Wednesday, June 5, 13
  • 23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Block indexing {"chapters.title" : "Into...", "chapters.summary" : "...", "chapters.number_of_pages" : 12}, {"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39}, ... { "title" : "Elasticsearch", ... } Lucene Documents Structure: • Inlining the inner objects as separate Lucene documents right before the root document. • The root document and its nested documents always remain in the same block. Wednesday, June 5, 13
  • 24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Nested query • Nested query returns the complete “book” as hit. (root document) curl -XGET 'localhost:9200/books/book/_search' -d '{   "query" : {      "nested" : {          "path" : "chapters",          "score_mode" : "avg", " "query" : {             "match" : {                "chapters.summary" : {                   "query" : "indexing data"                }             }          }" "      }   } }' Specify the nested level. Chapter level query score mode Wednesday, June 5, 13
  • 25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects X X X X X root documents bitset: Nested Lucene document, that match with the inner query. Aggregate nested scores and push to root document. X Set bit, that represents a root document. Wednesday, June 5, 13
  • 26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited But first questions! Extra slides Wednesday, June 5, 13
  • 27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Nested sorting curl -XGET 'localhost:9200/books/book/_search' -d '{  "query" : {   "match" : { "summary" : { "query" : "guide" } }        }, "sort" : [ { "chapters.number_of_pages" : { "sort_mode" : "avg", "nested_filter" : { "range" : { "chapters.number_of_pages" : {"lte" : 15} } } } } ] }' Sort mode Wednesday, June 5, 13
  • 28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - sorting • Parent/child sorting isn’t possible at the moment. • But there is a “custom_score” query work around. • Downsides: • Forces to execute a script for each matching document. • The child sort value is converted into a float value. "has_child" : { "type" : "offer", "score_mode" : "avg", "query" : { "custom_score" : { "query" : { ... }, "script" : "doc["price"].value" } } } Wednesday, June 5, 13