SlideShare uma empresa Scribd logo
1 de 18
Large scale
     search
Patterns for dealing with large-scale
           search systems
overview
•How to provide a scalable platform for
both users and data
•Issues introduced by a scalable platform
•Patterns for dealing with the issues
Big data




As data volumes increase they can prove
 too large for any one server to manage
Big data: Partitioning




Data can be partitioned into suitably sized
“shards” and placed on different servers
Partitioning: divide and
        conquer
                     ?


                         ?
               ?
                             ?




Each user’s search queries all shards in parallel
and combines results to provide fast responses
Big Search Loads
                 ?       ?       ?
         ?   ?       ?       ?       ?   ?
                         ?




However, as user volumes increase, the loads
 on each shard server can become too great
Replication
                ?       ?       ?
        ?   ?       ?       ?       ?   ?
                        ?




To spread the load of many simultaneous
   users, indexes need to be replicated
Scaling Summary
Partitioning                       Replication
coping with data volumes   coping with user volumes (and providing
                              redundancy in the event of failure)
Issues


 So far so good - but a scalable system with
many servers raises the concerns of balancing
        Consistency and Availability...
Consistency vs
                           availability
Servers




          !
                                   Content Freshness



              As the number of required servers increases, there is an increased
              probability that a server will fail or lag when adding new content.
Consistency vs
                           availability
Servers




                                           earliest cross-server           latest available
                                             consistent content            content

                                                                   ????
          These potential inconsistencies across servers introduces a dilemma -
              search the latest available content or older, consistent content?
Consistency vs
                   availability
     Consistency                                                  Availability




           FULL         Shard          Managed      sticky user   Every man for
        Consistency   Consistency   InConsistency    sessions        himself




What follows is a number of architectural patterns, each of which will make a
trade-off between the consistency and availability of content being searched
Patterns
Consistency   Availability




                                              Full Consistency

             All servers are designed to coordinate together when applying batches of
 Description new content. If any one server fails to apply updates, all servers abandon
             this batch of updates.

                             •All users of the system see the same version of content i.e. the same
                             point in time.
        Pros


                             •Complex distributed transaction software is required to coordinate
                             updates.
       Cons
                             •Any failure on a server delays the visibility of new content on all
                             servers.
Patterns
Consistency   Availability




                                            shard consistency

             Within each “shard” replica servers strive to maintain identical copies of
 Description the same content. New content additions are coordinated within each
             shard with any failure of a replica server aborting additions to that shard.

                             •Update failures are isolated to impacting availability of new content on
                             a single shard.
        Pros
                             •All users see the same (potentially uneven) content.

                             •Complex distributed transaction software may be required to
                             coordinate updates.
       Cons
                             •Any failure delays the visibility of new content in that shard.
                             •Shards may be “uneven” in the points-in-time they represent
Patterns
Consistency   Availability




                                       managed inconsistency
             All servers apply updates independently, with an agreed tolerance for
             “drift” between the freshness of content held on servers. When this
 Description threshold is reached the servers with the newest content halt updates
             until the drift gap is closed (this may require removing a failing replica
             server from active service).
                             •New content is continually made available to users until pre-defined
        Pros                 tolerances for failures are exceeded
                             •Different users may see different results depending on which (almost)
                             replica the load balancer chooses to service their queries
       Cons                  •Individual users hitting the refresh button may also see different results
                             as a result of non-exact replica servers
                             •Shards may be “uneven” in the points-in-time they represent
Patterns
Consistency   Availability




                                          sticky user sessions
             All servers are allowed to update independently. The load balancer is
             configured to route a user’s searches to the same choice of replica server
 Description in each shard whenever possible to hide any temporary drift between
             replicas.
                             •New content is continually made available to users.
        Pros                 •Individual users should not experience a “step back in time” when
                             repeating the same query due to inconsistent replicas.


                             •Different users may see different results depending on which (almost)
       Cons                  replica the load balancer chooses to service the query.
                             •Shards may be “uneven” in the points-in-time they represent.
Patterns
Consistency   Availability




                                        every man for himself

 Description All servers are allowed to update independently.


        Pros                 •New content is continually made available to users


                             •Different users may see different results depending on which (almost)
                             replica the load balancer chooses to service the query.
       Cons
                             •Individual users hitting the refresh button may also see different results
                             as a result of non-exact replica servers
considerations when
     selecting a pattern
•Pick an acceptable user experience as a starting point
  •“I always expect to be acting on the latest available information”
  •“I need all results to represent the same point in time”
  •“I expect hitting the refresh button to take me forward in time, never back”
  •“I expect to always see the same content as my colleagues”
•Recognise not all user requirements are realisable so rank them by
importance.
•Pick a pattern that works best for the selected requirements
  •Consider mixing some patterns e.g. “Managed Inconsistency” with
  “Sticky user sessions” seems a good compromise between
  maintaining (perceived) consistency and content availability
  •Consider different strategies for different user groups e.g. VIP users
  will always see the guaranteed-latest content.

Mais conteúdo relacionado

Mais procurados

The Next Generation of Microsoft Virtualization With Windows Server 2012
The Next Generation of Microsoft Virtualization With Windows Server 2012The Next Generation of Microsoft Virtualization With Windows Server 2012
The Next Generation of Microsoft Virtualization With Windows Server 2012Lai Yoong Seng
 
Com day how to bring windows azure portal to your datacenter
Com day   how to bring windows azure portal to your datacenterCom day   how to bring windows azure portal to your datacenter
Com day how to bring windows azure portal to your datacenterChristopher Keyaert
 
What’s New in vCloud Director 5.1?
What’s New in vCloud Director 5.1?What’s New in vCloud Director 5.1?
What’s New in vCloud Director 5.1?Eric Sloof
 
Best practices for_scaling_java_applications_with_distributed_caching
Best practices for_scaling_java_applications_with_distributed_cachingBest practices for_scaling_java_applications_with_distributed_caching
Best practices for_scaling_java_applications_with_distributed_cachingyamingd
 
Introduction to vm ware
Introduction to vm wareIntroduction to vm ware
Introduction to vm wareASIT Education
 
Best Practices for Deploying Microsoft Workloads on AWS
Best Practices for Deploying Microsoft Workloads on AWSBest Practices for Deploying Microsoft Workloads on AWS
Best Practices for Deploying Microsoft Workloads on AWSZlatan Dzinic
 

Mais procurados (6)

The Next Generation of Microsoft Virtualization With Windows Server 2012
The Next Generation of Microsoft Virtualization With Windows Server 2012The Next Generation of Microsoft Virtualization With Windows Server 2012
The Next Generation of Microsoft Virtualization With Windows Server 2012
 
Com day how to bring windows azure portal to your datacenter
Com day   how to bring windows azure portal to your datacenterCom day   how to bring windows azure portal to your datacenter
Com day how to bring windows azure portal to your datacenter
 
What’s New in vCloud Director 5.1?
What’s New in vCloud Director 5.1?What’s New in vCloud Director 5.1?
What’s New in vCloud Director 5.1?
 
Best practices for_scaling_java_applications_with_distributed_caching
Best practices for_scaling_java_applications_with_distributed_cachingBest practices for_scaling_java_applications_with_distributed_caching
Best practices for_scaling_java_applications_with_distributed_caching
 
Introduction to vm ware
Introduction to vm wareIntroduction to vm ware
Introduction to vm ware
 
Best Practices for Deploying Microsoft Workloads on AWS
Best Practices for Deploying Microsoft Workloads on AWSBest Practices for Deploying Microsoft Workloads on AWS
Best Practices for Deploying Microsoft Workloads on AWS
 

Semelhante a Patterns for large scale search

Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systemshyun soomyung
 
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdfUsmanAhmed269749
 
Service-oriented architecture
Service-oriented architectureService-oriented architecture
Service-oriented architectureShalva Usubov
 
Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...Amazon Web Services
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...Michael Noel
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
HSBC and AWS Day - Microservices and Serverless
HSBC and AWS Day - Microservices and ServerlessHSBC and AWS Day - Microservices and Serverless
HSBC and AWS Day - Microservices and ServerlessAmazon Web Services
 
Multi Tenancy In The Cloud
Multi Tenancy In The CloudMulti Tenancy In The Cloud
Multi Tenancy In The Cloudrohit_ainapure
 
Couchbase b jmeetup
Couchbase b jmeetupCouchbase b jmeetup
Couchbase b jmeetupmysqlops
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxTekle12
 
UNIT II (1).pptx
UNIT II (1).pptxUNIT II (1).pptx
UNIT II (1).pptxgopi venkat
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureLen Bass
 
AWS Elasticity and Auto Scaling
AWS Elasticity and Auto ScalingAWS Elasticity and Auto Scaling
AWS Elasticity and Auto ScalingChris Williams
 

Semelhante a Patterns for large scale search (20)

Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
 
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdf
 
Service-oriented architecture
Service-oriented architectureService-oriented architecture
Service-oriented architecture
 
SVN
SVNSVN
SVN
 
Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...Continuous Deployment Practices, with Production, Test and Development Enviro...
Continuous Deployment Practices, with Production, Test and Development Enviro...
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
HSBC and AWS Day - Microservices and Serverless
HSBC and AWS Day - Microservices and ServerlessHSBC and AWS Day - Microservices and Serverless
HSBC and AWS Day - Microservices and Serverless
 
Intro
IntroIntro
Intro
 
Multi Tenancy In The Cloud
Multi Tenancy In The CloudMulti Tenancy In The Cloud
Multi Tenancy In The Cloud
 
Couchbase b jmeetup
Couchbase b jmeetupCouchbase b jmeetup
Couchbase b jmeetup
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
 
UNIT II (1).pptx
UNIT II (1).pptxUNIT II (1).pptx
UNIT II (1).pptx
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study future
 
Subversion and bug tracking
Subversion and bug trackingSubversion and bug tracking
Subversion and bug tracking
 
Jelastic Features 2.x
Jelastic Features 2.xJelastic Features 2.x
Jelastic Features 2.x
 
AWS Elasticity and Auto Scaling
AWS Elasticity and Auto ScalingAWS Elasticity and Auto Scaling
AWS Elasticity and Auto Scaling
 

Último

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Patterns for large scale search

  • 1. Large scale search Patterns for dealing with large-scale search systems
  • 2. overview •How to provide a scalable platform for both users and data •Issues introduced by a scalable platform •Patterns for dealing with the issues
  • 3. Big data As data volumes increase they can prove too large for any one server to manage
  • 4. Big data: Partitioning Data can be partitioned into suitably sized “shards” and placed on different servers
  • 5. Partitioning: divide and conquer ? ? ? ? Each user’s search queries all shards in parallel and combines results to provide fast responses
  • 6. Big Search Loads ? ? ? ? ? ? ? ? ? ? However, as user volumes increase, the loads on each shard server can become too great
  • 7. Replication ? ? ? ? ? ? ? ? ? ? To spread the load of many simultaneous users, indexes need to be replicated
  • 8. Scaling Summary Partitioning Replication coping with data volumes coping with user volumes (and providing redundancy in the event of failure)
  • 9. Issues So far so good - but a scalable system with many servers raises the concerns of balancing Consistency and Availability...
  • 10. Consistency vs availability Servers ! Content Freshness As the number of required servers increases, there is an increased probability that a server will fail or lag when adding new content.
  • 11. Consistency vs availability Servers earliest cross-server latest available consistent content content ???? These potential inconsistencies across servers introduces a dilemma - search the latest available content or older, consistent content?
  • 12. Consistency vs availability Consistency Availability FULL Shard Managed sticky user Every man for Consistency Consistency InConsistency sessions himself What follows is a number of architectural patterns, each of which will make a trade-off between the consistency and availability of content being searched
  • 13. Patterns Consistency Availability Full Consistency All servers are designed to coordinate together when applying batches of Description new content. If any one server fails to apply updates, all servers abandon this batch of updates. •All users of the system see the same version of content i.e. the same point in time. Pros •Complex distributed transaction software is required to coordinate updates. Cons •Any failure on a server delays the visibility of new content on all servers.
  • 14. Patterns Consistency Availability shard consistency Within each “shard” replica servers strive to maintain identical copies of Description the same content. New content additions are coordinated within each shard with any failure of a replica server aborting additions to that shard. •Update failures are isolated to impacting availability of new content on a single shard. Pros •All users see the same (potentially uneven) content. •Complex distributed transaction software may be required to coordinate updates. Cons •Any failure delays the visibility of new content in that shard. •Shards may be “uneven” in the points-in-time they represent
  • 15. Patterns Consistency Availability managed inconsistency All servers apply updates independently, with an agreed tolerance for “drift” between the freshness of content held on servers. When this Description threshold is reached the servers with the newest content halt updates until the drift gap is closed (this may require removing a failing replica server from active service). •New content is continually made available to users until pre-defined Pros tolerances for failures are exceeded •Different users may see different results depending on which (almost) replica the load balancer chooses to service their queries Cons •Individual users hitting the refresh button may also see different results as a result of non-exact replica servers •Shards may be “uneven” in the points-in-time they represent
  • 16. Patterns Consistency Availability sticky user sessions All servers are allowed to update independently. The load balancer is configured to route a user’s searches to the same choice of replica server Description in each shard whenever possible to hide any temporary drift between replicas. •New content is continually made available to users. Pros •Individual users should not experience a “step back in time” when repeating the same query due to inconsistent replicas. •Different users may see different results depending on which (almost) Cons replica the load balancer chooses to service the query. •Shards may be “uneven” in the points-in-time they represent.
  • 17. Patterns Consistency Availability every man for himself Description All servers are allowed to update independently. Pros •New content is continually made available to users •Different users may see different results depending on which (almost) replica the load balancer chooses to service the query. Cons •Individual users hitting the refresh button may also see different results as a result of non-exact replica servers
  • 18. considerations when selecting a pattern •Pick an acceptable user experience as a starting point •“I always expect to be acting on the latest available information” •“I need all results to represent the same point in time” •“I expect hitting the refresh button to take me forward in time, never back” •“I expect to always see the same content as my colleagues” •Recognise not all user requirements are realisable so rank them by importance. •Pick a pattern that works best for the selected requirements •Consider mixing some patterns e.g. “Managed Inconsistency” with “Sticky user sessions” seems a good compromise between maintaining (perceived) consistency and content availability •Consider different strategies for different user groups e.g. VIP users will always see the guaranteed-latest content.