SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Nitin Sharma
Bloomreach
Li Ding
Bloomreach
3
01
Large Scale Cluster Management &
Recovery for
Multi-DC SolrCloud
4
02
Abstract
Cluster Management & Recovery for an enterprise grade global search Infrastructure is non-trivial.
•  Serving Hundreds of Millions of Documents & Queries
•  Multi-Tenant
•  Geographically distributed Data Centers
•  Custom SolrCloud Components - Analysis, Ranking and Faceting
•  Dynamic Ranking Elements – Collection/Cluster Level
At BloomReach, we have built an innovative search architecture aimed at reliable Cluster Management and
Recovery.
•  The Infrastructure is data center based.
•  Discovery service: DC Metadata, roles, tenants.
•  Real-time Active Monitoring: Robust failure detection.
•  Recovery Service: One step Recovery, Rollback and Backup.
The presentation will describe the infrastructure in great detail and how it achieves the availability and
performance while making things simple from a platform management standpoint
5
02
About us
BloomReach is a Cloud Marketing Platform. We have developed a personalized discovery platform that
features applications which analyze big data to makes our customers’ digital content more discoverable,
relevant and profitable.
Nitin: I work on Scaling the Search Platform for Bloomreach’s big data. My relevant experience and
background includes scaling real-time services for latency sensitive applications and building performance
and search-quality metrics infrastructure for personalization platforms.
Li : I am a member of technical staff at BloomReach's platform team. My background includes working on
virtualization management platforms, building search performance infrastructures, scaling distributed
services.
BloomReach’s Applications
Organic
Search
Contentunderstanding
What it does
Content optimization,
management and measurement
Benefit
Enhanced discoverability and
customer acquisition in organic search
What it does
Personalized onsite search and
navigation across devices
Benefit
Relevant and consistent onsite
experiences for new and known users
What it does
Merchandising tool that understands
products and identifies opportunities
Benefit
Prioritize and optimize
online merchandising
SNAP
Compass
7
01
Agenda
•  History - Elastic Search Infrastructure
•  Real Time Serving - Scaling & Availability Challenges.
•  Highly Available Multi DC/Multi-Tenant Architecture.
•  Cluster Management Suite
•  Replication & Ranking Config Mgmnt Service
•  Deployment & Recovery Service
•  Active Monitor Service
•  Auto Recovery Service
8
01
History - Elastic Solr Cloud Infrastructure
Zookeeper
SC2
Solr Compute Cloud Infrastructure – Large Scale Pipelines
Backend
Solr
Elastic
Cluster
Elastic
Cluster
Elastic
Cluster
Elastic
Cluster
Indexing Pipeline
Analysis PipelineH
A
F
T
provision
provision
replicate
replicate
Serving
Solr
H A F T
replicate
Read/write
10
01
SC2 HAFT – Open Source
github.com/bloomreach/solrcloud-haft
11
01
ZK Replicator – Open Source
github.com/bloomreach/zk-replicator
12
01
Agenda
•  History - Elastic Search Infrastructure
•  Real Time Serving - Scaling & Availability Challenges.
•  Highly Available Multi DC/Multi-Tenant Architecture.
•  Cluster Management Suite
•  Replication & Ranking Config Mgmnt Service
•  Deployment & Recovery Service
•  Active Monitor Service
•  Auto Recovery Service
13
01
Real Time Serving - Scaling & Availability Challenges
Real Time Serving – Dynamic Elements
Zookeeper
Custom SolrCloud Components & Analyzers
Global Ranking Configurations
Global Entities
Collection Level Entities
Ranking Files
Ranking configs
Serving SolrCloud
Load Balancer
Real Time Serving - Scaling & Availability Challenges
Zookeeper
Custom SolrCloud Components & Analyzers
Global Ranking Configurations
Global Entities
•  Query Parsers, Analyzers
•  Ranking & Scoring Components
•  Non Optimal Performance (Latency, Mem usage)
•  Memory & File Handle Leaks
•  Stale Searchers Left open•  Non Confirming Configurations
•  Size Limit
•  Solr Startup Issues
Serving SolrCloud
Real Time Serving - Scaling & Availability Challenges
Zookeeper
Collection Level Entities
•  Ranking Elements Loaded once per core
•  Collection Reloads
•  Non Optimal Performance (Latency, Mem usage)
•  Files not versioned to support roll back
•  External files not sharded.
Serving SolrCloud
Ranking Files
Ranking configs
•  Loads when core initializes. Misconfiguration crashes cores.
•  Hot swap of configs requires dynamic loading.
Real Time Serving – Recovery Challenges
Zookeeper
Global EntitiesCollection Level Entities
Serving SolrCloud
Load Balancer
•  The cluster goes down
•  Restoring older release takes time. Restarting 1000s of
collections is unstable and could take hours.
•  Serving is affected
Bad jar deployed
Large ranking
file
•  Large ranking file – Unsharded. Increases per core mem requirement.
•  Auto Rollback to previous version is non trivial (if new ranking file is produced
by pipelines).
•  Serving is affected. No longer highly Available.
Real Time Serving – Multi Tenancy Challenges
Zookeeper
Serving SolrCloud
Load Balancer
Tenant 1
Tenant 2
Tenant n
•  Tenant is a unique <app, collection> pair in solrcloud.
•  Unique collection type per app. [<Zkconfig, Ranking, Query
Patterns>]
•  Index, Config, Cluster Management strategies
vary drastically
Recovery:
Tenant 1: No dynamic config, static large index
Tenant 2: External Ranking files, aggressive index
refresh & customer generated data.
Tenant N: …
What is a tenant?
Real Time Serving – Multi DC Challenges
Zookeeper (Common)
•  Every data center hosts only part of the tenants
based on geo.
•  Adding a new Geo based DC needs to have
a shared zk ensemble
•  Selective Collection Placement is not possible
•  HA and Latency Guarantees are non trivial
Multi Dc
EU
East
West
20
01
Agenda
•  History - Elastic Search Infrastructure
•  Real Time Serving - Scaling & Availability Challenges.
•  Highly Available Multi DC/Multi-Tenant Architecture.
•  Cluster Management Suite
•  Replication & Ranking Config Mgmnt Service
•  Deployment & Recovery Service
•  Active Monitor Service
•  Auto Recovery Service
21
01
Multi DC /Multi-Tenant Architecture
Multi DC /Multi- Tenant Architecture
Terminologies Definition
Solr Data Center A logical group of solr nodes with metadata. The metadata contains
placement, role, replication factor, apps etc.
Solr Cluster Logical Grouping of Solr Data Centers.
Replication API Replicates Index from Elastic Clusters onto all Datacenters
Ranking File Management API Uploads ranking files to Serving Datacenters
Cluster Topology/ Data Center Definition
•  Where does the dc live?
•  How many nodes ?
•  Name, Type of DC.
•  Role of DC
•  Serve – Behind LB for api requests
•  Replicate – Gets indexing updates
•  LB End point
•  Tenants/apps
•  …
Automated Cluster Management Suite
Solr Serving 1
LoadBalancer
Solr Serving 2
Solr
Backup
Replication API
Cluster
Metadata
Cluster Management
API
Deployment/Recovery
API
R/W
Cluster Ops
Replication API
Ranking Mgmnt API
Cluster Management
API
Deployment/Recovery
API
HA Mode Setup
Ranking Mgmnt API
25
01
Agenda
•  History - Elastic Search Infrastructure
•  Real Time Serving - Scaling & Availability Challenges.
•  Highly Available Multi DC/Multi-Tenant Architecture.
•  Cluster Management Suite
•  Replication & Ranking Config Mgmnt Service
•  Deployment & Recovery Service
•  Active Monitor Service
•  Auto Recovery Service
Replication Management API
Solr Serving 1
LoadBalancer
Solr Serving 2
Solr
Backup
Replication API
Cluster
Metadata
Cluster Management
API
Query = operation: Replicate?
App : app1
Elastic Indexers
Index
Ranking File Management API
Solr Serving 1
LoadBalancer
Solr Serving 2
Solr
Backup
Ranking Mgmnt API
Cluster
Metadata
Cluster Management
API
Query = operation: Serve?
App : app2
Ranking Pipelines
Ranking
files
S3
Version
files and
Store
them
28
01
Agenda
•  History - Elastic Search Infrastructure
•  Real Time Serving - Scaling & Availability Challenges.
•  Highly Available Multi DC/Multi-Tenant Architecture.
•  Cluster Management Suite
•  Replication & Ranking Config Mgmnt Service
•  Deployment & Recovery Service
•  Active Monitor Service
•  Auto Recovery Service
Deployment/Recovery Service (Launch New Datacenter)
•  Adding new DC to the cluster (Geo based)
•  Adding temporary capacity for increased traffic.
•  Expanding cluster capacity permanently.
Deployment/Recovery Service (Launch New Datacenter)
Serving3
app1
Datacenter Definition
Data (JSON) ZK ZK ZK SOLR SOLR SOLR SOLR
Multi-threads installation
Smoke Test:
1)  Every collection is queryable
2)  Every collection has the
config it suppose to have
SolrCloud Production Cluster
Load Balancer
Serving1
app1
Backup
app1
app2
Serving2
app1
Others…
Deployment/Recovery Service
Cluster
Metadata
Index
Store the config
Example of additional DC
•  Where does the dc live?
•  How many nodes ?
•  Name, Type of DC.
•  Role of DC
•  Serve – Behind LB for api requests
•  Replicate – Gets indexing updates
•  LB End point
•  Tenants/apps
•  …
Deployment/Recovery Service (Hard Recovery)
•  One or more hosts in a datacenter is down
•  There are network issues with one datacenter
•  AWS decides to retire some instances in a datacenter
Smoke
Test
New
Serving2
Provision hosts using serving2’s config and same
as creating a new dc
New
Serving2
Update config of new
serving2
Add back to LB
Deployment/Recovery Service (Hard Recovery)
Load Balancer
Serving1
Dc
Serving2
app1
Deployment/Recovery Service
Cluster
Metadata
SolrCloud Production Cluster
Retrieve config
of serving2
Deployment/Recovery Service (Soft Recovery)
•  One or more datacenters are having high memory usage, CPU usage and doesn’t respond to re
quests
•  Several collections are down in a DC due to Zookeeper state or other issues
•  Deploy code. Our customized component needs a restart of Solr to take effect
Snapshots
•  A snapshot service will take snapshot of a serving dc every 24 hours
•  The snapshot contains global files: customized jar, Zookeeper configs
and per tenant level files like ranking files, synonyms etc. This is done
through HAFT API
•  Index is never snapshoted
•  The snapshot will be timestamped and stored in S3
SolrCloud Production Cluster
Load Balancer
Serving1
app1
Serving2
app1
Deployment/Recovery Service
H
A
F
T
S3
Using HAFT to
Take snapshots
Store snapshot
in S3 with timestamp
Base: s3://cluster/production/20151008155637
s3://cluster/production/20151008/jar
s3://cluster/production/20151008/zkconfig
s3://cluster/production/20151008/tenant1/ranking
…
Global Files:
Jar, ZK config
Per Tenant Files
Revert to Snapshots
When revert a DC back to a snapshot point, we use HAFT to replicate index from backup datacenter
and all external files, global config files and per tenant based files from the snapshot S3 location
Code Deploy Mode – Soft Recovery
Take a global
lock of that DC
SolrCloud Production Cluster
Load Balancer
Serving1
Dc
Serving2
Dc
Deployment/Recovery Service
Cluster
Metadata
Get DC’s config, ZK,
Solr hosts etc
Release Lock
Take out of LB
Deploy code
Release configs
Post Deployment Tests
Disaster Recovery Mode – Soft Recovery
SolrCloud Production Cluster
Load Balancer
Serving1
app1
Serving2
app1
Get the global lock
Take dc out of LB
Rolling restarts Solr
Check host and
collection health
Pass?
Add to LB
Delete all files of Zookeeper
and Solr, wipe out everything
Install Zookeeper
and Solr
Setup global files using
current files or snapshot files
Using HAFT to replicate all
collections with current or
snapshot per tenant files and
indexes from backup
Smoke Test
39
01
Agenda
•  History - Elastic Search Infrastructure
•  Real Time Serving - Scaling & Availability Challenges.
•  Highly Available Multi DC/Multi-Tenant Architecture.
•  Cluster Management Suite
•  Replication & Ranking Config Mgmnt Service
•  Deployment & Recovery Service
•  Active Monitor Service
•  Auto Recovery Service
Active Monitor Service
•  We are using SPM to monitor JVM usage, CPU load on each S
olr host as well
Sematxt
•  Runs every five minutes
•  Check if each Zookeeper node is accessible through HAFT api.
If more than half of the zookeeper node is down, page
•  Check if every Solr node is accessible and every collection on
that node is able to query. If a node is unhealthy, page
•  Checks all data centers
Node level monitorSolrCloud Production Cluster
Load Balancer
Serving1
dc
Serving2
dc
Others…
Auto Recovery Service
Active Monitor Service
Active Monitor Service
•  Runs every five minutes
•  Check if serving data centers with the
same app have same config files
•  Check if all datacneters have same index
•  Either one failed will page us
Cluster level monitorSolrCloud Production Cluster
Active Monitor Service
Auto Recovery Service
H
A
F
T
Serving1
app1
ZooKeeper
Serving1
app1
ZooKeeper
"test-tenant":{
"shards":{"shard1":{
"range":"80000000-7fffffff",
"state":"active",
"replicas":{"core_node1":{
"state":"active",
"base_url":"http://10.99.99.99:8983/solr
",
"core":"test-tenant_shard1_replica1",
"node_name":"10.99.99.99:8983_solr",
"leader":"true"}}}},
"maxShardsPerNode":"500",
"router":{"name":"compositeId"},
"replicationFactor":"1"},
"test-tenant":{
"shards":{"shard1":{
"range":"80000000-7fffffff",
"state":"active",
"replicas":{"core_node1":{
"state":"active",
"base_url":"http://10.99.99.99:8983/solr
",
"core":"test-tenant_shard1_replica1",
"node_name":"10.99.99.99:8983_solr",
"leader":"true"}}}},
"maxShardsPerNode":"500",
"router":{"name":"compositeId"},
"replicationFactor":"1"},
42
01
Agenda
•  History - Elastic Search Infrastructure
•  Real Time Serving - Scaling & Availability Challenges.
•  Highly Available Multi DC/Multi-Tenant Architecture.
•  Cluster Management Suite
•  Replication & Ranking Config Mgmnt Service
•  Deployment & Recovery Service
•  Active Monitor Service
•  Auto Recovery Service
Auto Recovery Service (In Progress)
•  One ZK node is down
•  One Solr node is down
•  Serving1 and Serving2 has different number
of ducments
•  Serving1 and Serving2 has different
versions of config files
•  Serving1 JVM usage is high
•  Serving1’s machines are not accessible
Active Monitor Service
•  Restart ZK, check if ZK is accessible
•  Restart Solr
•  Replicate from backup datacenter
•  Using versioned config files and HAFT api
to recreate the files
•  Soft recovery or rollback
•  Hard recovery
Auto Recovery Service
Only Page Us Only When Automation Failed
44
01
Questions?

Mais conteúdo relacionado

Mais procurados

Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...StreamNative
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
Exadata MAA Best Practices
Exadata MAA Best PracticesExadata MAA Best Practices
Exadata MAA Best PracticesRui Sousa
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer  Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer Mary Kypreos
 
VM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingVM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingQiming Teng
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersDataWorks Summit
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
 
Revitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsRevitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsLightbend
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARNDataWorks Summit
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...Zhijie Shen
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryDataWorks Summit
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 

Mais procurados (20)

Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
 
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Exadata MAA Best Practices
Exadata MAA Best PracticesExadata MAA Best Practices
Exadata MAA Best Practices
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer  Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
 
VM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingVM HA and Cross-Region Scaling
VM HA and Cross-Region Scaling
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 
Revitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsRevitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive Streams
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARN
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 

Semelhante a Automated Cluster Management and Recovery for Large Scale Multi-Tenant Search Infrastructure: Presented by Nitin Sharma & Li Ding, BloomReach

Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scaleOvais Tariq
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaLucidworks
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
Netflix0SS Services on Docker
Netflix0SS Services on DockerNetflix0SS Services on Docker
Netflix0SS Services on DockerDocker, Inc.
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...RightScale
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101Huy Vo
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Cloudian
 

Semelhante a Automated Cluster Management and Recovery for Large Scale Multi-Tenant Search Infrastructure: Presented by Nitin Sharma & Li Ding, BloomReach (20)

Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Netflix0SS Services on Docker
Netflix0SS Services on DockerNetflix0SS Services on Docker
Netflix0SS Services on Docker
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Ceilosca
CeiloscaCeilosca
Ceilosca
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0
 

Mais de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mais de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Último

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Último (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

Automated Cluster Management and Recovery for Large Scale Multi-Tenant Search Infrastructure: Presented by Nitin Sharma & Li Ding, BloomReach

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 3. 3 01 Large Scale Cluster Management & Recovery for Multi-DC SolrCloud
  • 4. 4 02 Abstract Cluster Management & Recovery for an enterprise grade global search Infrastructure is non-trivial. •  Serving Hundreds of Millions of Documents & Queries •  Multi-Tenant •  Geographically distributed Data Centers •  Custom SolrCloud Components - Analysis, Ranking and Faceting •  Dynamic Ranking Elements – Collection/Cluster Level At BloomReach, we have built an innovative search architecture aimed at reliable Cluster Management and Recovery. •  The Infrastructure is data center based. •  Discovery service: DC Metadata, roles, tenants. •  Real-time Active Monitoring: Robust failure detection. •  Recovery Service: One step Recovery, Rollback and Backup. The presentation will describe the infrastructure in great detail and how it achieves the availability and performance while making things simple from a platform management standpoint
  • 5. 5 02 About us BloomReach is a Cloud Marketing Platform. We have developed a personalized discovery platform that features applications which analyze big data to makes our customers’ digital content more discoverable, relevant and profitable. Nitin: I work on Scaling the Search Platform for Bloomreach’s big data. My relevant experience and background includes scaling real-time services for latency sensitive applications and building performance and search-quality metrics infrastructure for personalization platforms. Li : I am a member of technical staff at BloomReach's platform team. My background includes working on virtualization management platforms, building search performance infrastructures, scaling distributed services.
  • 6. BloomReach’s Applications Organic Search Contentunderstanding What it does Content optimization, management and measurement Benefit Enhanced discoverability and customer acquisition in organic search What it does Personalized onsite search and navigation across devices Benefit Relevant and consistent onsite experiences for new and known users What it does Merchandising tool that understands products and identifies opportunities Benefit Prioritize and optimize online merchandising SNAP Compass
  • 7. 7 01 Agenda •  History - Elastic Search Infrastructure •  Real Time Serving - Scaling & Availability Challenges. •  Highly Available Multi DC/Multi-Tenant Architecture. •  Cluster Management Suite •  Replication & Ranking Config Mgmnt Service •  Deployment & Recovery Service •  Active Monitor Service •  Auto Recovery Service
  • 8. 8 01 History - Elastic Solr Cloud Infrastructure
  • 9. Zookeeper SC2 Solr Compute Cloud Infrastructure – Large Scale Pipelines Backend Solr Elastic Cluster Elastic Cluster Elastic Cluster Elastic Cluster Indexing Pipeline Analysis PipelineH A F T provision provision replicate replicate Serving Solr H A F T replicate Read/write
  • 10. 10 01 SC2 HAFT – Open Source github.com/bloomreach/solrcloud-haft
  • 11. 11 01 ZK Replicator – Open Source github.com/bloomreach/zk-replicator
  • 12. 12 01 Agenda •  History - Elastic Search Infrastructure •  Real Time Serving - Scaling & Availability Challenges. •  Highly Available Multi DC/Multi-Tenant Architecture. •  Cluster Management Suite •  Replication & Ranking Config Mgmnt Service •  Deployment & Recovery Service •  Active Monitor Service •  Auto Recovery Service
  • 13. 13 01 Real Time Serving - Scaling & Availability Challenges
  • 14. Real Time Serving – Dynamic Elements Zookeeper Custom SolrCloud Components & Analyzers Global Ranking Configurations Global Entities Collection Level Entities Ranking Files Ranking configs Serving SolrCloud Load Balancer
  • 15. Real Time Serving - Scaling & Availability Challenges Zookeeper Custom SolrCloud Components & Analyzers Global Ranking Configurations Global Entities •  Query Parsers, Analyzers •  Ranking & Scoring Components •  Non Optimal Performance (Latency, Mem usage) •  Memory & File Handle Leaks •  Stale Searchers Left open•  Non Confirming Configurations •  Size Limit •  Solr Startup Issues Serving SolrCloud
  • 16. Real Time Serving - Scaling & Availability Challenges Zookeeper Collection Level Entities •  Ranking Elements Loaded once per core •  Collection Reloads •  Non Optimal Performance (Latency, Mem usage) •  Files not versioned to support roll back •  External files not sharded. Serving SolrCloud Ranking Files Ranking configs •  Loads when core initializes. Misconfiguration crashes cores. •  Hot swap of configs requires dynamic loading.
  • 17. Real Time Serving – Recovery Challenges Zookeeper Global EntitiesCollection Level Entities Serving SolrCloud Load Balancer •  The cluster goes down •  Restoring older release takes time. Restarting 1000s of collections is unstable and could take hours. •  Serving is affected Bad jar deployed Large ranking file •  Large ranking file – Unsharded. Increases per core mem requirement. •  Auto Rollback to previous version is non trivial (if new ranking file is produced by pipelines). •  Serving is affected. No longer highly Available.
  • 18. Real Time Serving – Multi Tenancy Challenges Zookeeper Serving SolrCloud Load Balancer Tenant 1 Tenant 2 Tenant n •  Tenant is a unique <app, collection> pair in solrcloud. •  Unique collection type per app. [<Zkconfig, Ranking, Query Patterns>] •  Index, Config, Cluster Management strategies vary drastically Recovery: Tenant 1: No dynamic config, static large index Tenant 2: External Ranking files, aggressive index refresh & customer generated data. Tenant N: … What is a tenant?
  • 19. Real Time Serving – Multi DC Challenges Zookeeper (Common) •  Every data center hosts only part of the tenants based on geo. •  Adding a new Geo based DC needs to have a shared zk ensemble •  Selective Collection Placement is not possible •  HA and Latency Guarantees are non trivial Multi Dc EU East West
  • 20. 20 01 Agenda •  History - Elastic Search Infrastructure •  Real Time Serving - Scaling & Availability Challenges. •  Highly Available Multi DC/Multi-Tenant Architecture. •  Cluster Management Suite •  Replication & Ranking Config Mgmnt Service •  Deployment & Recovery Service •  Active Monitor Service •  Auto Recovery Service
  • 22. Multi DC /Multi- Tenant Architecture Terminologies Definition Solr Data Center A logical group of solr nodes with metadata. The metadata contains placement, role, replication factor, apps etc. Solr Cluster Logical Grouping of Solr Data Centers. Replication API Replicates Index from Elastic Clusters onto all Datacenters Ranking File Management API Uploads ranking files to Serving Datacenters
  • 23. Cluster Topology/ Data Center Definition •  Where does the dc live? •  How many nodes ? •  Name, Type of DC. •  Role of DC •  Serve – Behind LB for api requests •  Replicate – Gets indexing updates •  LB End point •  Tenants/apps •  …
  • 24. Automated Cluster Management Suite Solr Serving 1 LoadBalancer Solr Serving 2 Solr Backup Replication API Cluster Metadata Cluster Management API Deployment/Recovery API R/W Cluster Ops Replication API Ranking Mgmnt API Cluster Management API Deployment/Recovery API HA Mode Setup Ranking Mgmnt API
  • 25. 25 01 Agenda •  History - Elastic Search Infrastructure •  Real Time Serving - Scaling & Availability Challenges. •  Highly Available Multi DC/Multi-Tenant Architecture. •  Cluster Management Suite •  Replication & Ranking Config Mgmnt Service •  Deployment & Recovery Service •  Active Monitor Service •  Auto Recovery Service
  • 26. Replication Management API Solr Serving 1 LoadBalancer Solr Serving 2 Solr Backup Replication API Cluster Metadata Cluster Management API Query = operation: Replicate? App : app1 Elastic Indexers Index
  • 27. Ranking File Management API Solr Serving 1 LoadBalancer Solr Serving 2 Solr Backup Ranking Mgmnt API Cluster Metadata Cluster Management API Query = operation: Serve? App : app2 Ranking Pipelines Ranking files S3 Version files and Store them
  • 28. 28 01 Agenda •  History - Elastic Search Infrastructure •  Real Time Serving - Scaling & Availability Challenges. •  Highly Available Multi DC/Multi-Tenant Architecture. •  Cluster Management Suite •  Replication & Ranking Config Mgmnt Service •  Deployment & Recovery Service •  Active Monitor Service •  Auto Recovery Service
  • 29. Deployment/Recovery Service (Launch New Datacenter) •  Adding new DC to the cluster (Geo based) •  Adding temporary capacity for increased traffic. •  Expanding cluster capacity permanently.
  • 30. Deployment/Recovery Service (Launch New Datacenter) Serving3 app1 Datacenter Definition Data (JSON) ZK ZK ZK SOLR SOLR SOLR SOLR Multi-threads installation Smoke Test: 1)  Every collection is queryable 2)  Every collection has the config it suppose to have SolrCloud Production Cluster Load Balancer Serving1 app1 Backup app1 app2 Serving2 app1 Others… Deployment/Recovery Service Cluster Metadata Index Store the config
  • 31. Example of additional DC •  Where does the dc live? •  How many nodes ? •  Name, Type of DC. •  Role of DC •  Serve – Behind LB for api requests •  Replicate – Gets indexing updates •  LB End point •  Tenants/apps •  …
  • 32. Deployment/Recovery Service (Hard Recovery) •  One or more hosts in a datacenter is down •  There are network issues with one datacenter •  AWS decides to retire some instances in a datacenter
  • 33. Smoke Test New Serving2 Provision hosts using serving2’s config and same as creating a new dc New Serving2 Update config of new serving2 Add back to LB Deployment/Recovery Service (Hard Recovery) Load Balancer Serving1 Dc Serving2 app1 Deployment/Recovery Service Cluster Metadata SolrCloud Production Cluster Retrieve config of serving2
  • 34. Deployment/Recovery Service (Soft Recovery) •  One or more datacenters are having high memory usage, CPU usage and doesn’t respond to re quests •  Several collections are down in a DC due to Zookeeper state or other issues •  Deploy code. Our customized component needs a restart of Solr to take effect
  • 35. Snapshots •  A snapshot service will take snapshot of a serving dc every 24 hours •  The snapshot contains global files: customized jar, Zookeeper configs and per tenant level files like ranking files, synonyms etc. This is done through HAFT API •  Index is never snapshoted •  The snapshot will be timestamped and stored in S3 SolrCloud Production Cluster Load Balancer Serving1 app1 Serving2 app1 Deployment/Recovery Service H A F T S3 Using HAFT to Take snapshots Store snapshot in S3 with timestamp Base: s3://cluster/production/20151008155637 s3://cluster/production/20151008/jar s3://cluster/production/20151008/zkconfig s3://cluster/production/20151008/tenant1/ranking … Global Files: Jar, ZK config Per Tenant Files
  • 36. Revert to Snapshots When revert a DC back to a snapshot point, we use HAFT to replicate index from backup datacenter and all external files, global config files and per tenant based files from the snapshot S3 location
  • 37. Code Deploy Mode – Soft Recovery Take a global lock of that DC SolrCloud Production Cluster Load Balancer Serving1 Dc Serving2 Dc Deployment/Recovery Service Cluster Metadata Get DC’s config, ZK, Solr hosts etc Release Lock Take out of LB Deploy code Release configs Post Deployment Tests
  • 38. Disaster Recovery Mode – Soft Recovery SolrCloud Production Cluster Load Balancer Serving1 app1 Serving2 app1 Get the global lock Take dc out of LB Rolling restarts Solr Check host and collection health Pass? Add to LB Delete all files of Zookeeper and Solr, wipe out everything Install Zookeeper and Solr Setup global files using current files or snapshot files Using HAFT to replicate all collections with current or snapshot per tenant files and indexes from backup Smoke Test
  • 39. 39 01 Agenda •  History - Elastic Search Infrastructure •  Real Time Serving - Scaling & Availability Challenges. •  Highly Available Multi DC/Multi-Tenant Architecture. •  Cluster Management Suite •  Replication & Ranking Config Mgmnt Service •  Deployment & Recovery Service •  Active Monitor Service •  Auto Recovery Service
  • 40. Active Monitor Service •  We are using SPM to monitor JVM usage, CPU load on each S olr host as well Sematxt •  Runs every five minutes •  Check if each Zookeeper node is accessible through HAFT api. If more than half of the zookeeper node is down, page •  Check if every Solr node is accessible and every collection on that node is able to query. If a node is unhealthy, page •  Checks all data centers Node level monitorSolrCloud Production Cluster Load Balancer Serving1 dc Serving2 dc Others… Auto Recovery Service Active Monitor Service
  • 41. Active Monitor Service •  Runs every five minutes •  Check if serving data centers with the same app have same config files •  Check if all datacneters have same index •  Either one failed will page us Cluster level monitorSolrCloud Production Cluster Active Monitor Service Auto Recovery Service H A F T Serving1 app1 ZooKeeper Serving1 app1 ZooKeeper "test-tenant":{ "shards":{"shard1":{ "range":"80000000-7fffffff", "state":"active", "replicas":{"core_node1":{ "state":"active", "base_url":"http://10.99.99.99:8983/solr ", "core":"test-tenant_shard1_replica1", "node_name":"10.99.99.99:8983_solr", "leader":"true"}}}}, "maxShardsPerNode":"500", "router":{"name":"compositeId"}, "replicationFactor":"1"}, "test-tenant":{ "shards":{"shard1":{ "range":"80000000-7fffffff", "state":"active", "replicas":{"core_node1":{ "state":"active", "base_url":"http://10.99.99.99:8983/solr ", "core":"test-tenant_shard1_replica1", "node_name":"10.99.99.99:8983_solr", "leader":"true"}}}}, "maxShardsPerNode":"500", "router":{"name":"compositeId"}, "replicationFactor":"1"},
  • 42. 42 01 Agenda •  History - Elastic Search Infrastructure •  Real Time Serving - Scaling & Availability Challenges. •  Highly Available Multi DC/Multi-Tenant Architecture. •  Cluster Management Suite •  Replication & Ranking Config Mgmnt Service •  Deployment & Recovery Service •  Active Monitor Service •  Auto Recovery Service
  • 43. Auto Recovery Service (In Progress) •  One ZK node is down •  One Solr node is down •  Serving1 and Serving2 has different number of ducments •  Serving1 and Serving2 has different versions of config files •  Serving1 JVM usage is high •  Serving1’s machines are not accessible Active Monitor Service •  Restart ZK, check if ZK is accessible •  Restart Solr •  Replicate from backup datacenter •  Using versioned config files and HAFT api to recreate the files •  Soft recovery or rollback •  Hard recovery Auto Recovery Service Only Page Us Only When Automation Failed