SlideShare uma empresa Scribd logo
1 de 29
Thoth
Real-time Solr Monitor
Search Analysis Engine
dbraga@trulia.com
pmhatre@trulia.com
Damiano Braga
Sr. Software Engineer
Praneet Mhatre
Data Mining Engineer
Overview
- What is Thoth ?
- Data Collection and Thoth Core Indexing
- Thoth API & Thoth Dashboard
- Thoth Monitor
- Thoth ML : Prediction and Topic Modeling
- Special Thanks & Q/A
Demo
What is Thoth?
- Innovation project at Trulia
- Understand our search infrastructure without touching logs
- Troubleshoot search performance issues
- Designed as a modular system
- Set of tools that can help gather info, monitor, understand a search infrastructure
- Open source project :
thoth
thoth-ml
thoth-api
thoth-dashboard
thoth-monitor
thoth-demo
Problem: Know Your Search Infrastructure
- Solr logs are a good source. Sometimes partial information
- Decentralized data (at least 1 log per search server)
- Log rotation
- Not searchable
If we could index all the information .. Let’s use Solr !
- We can search on it
- We have some handy features for free: facets, stats etc
- It’s scalable
Thoth Document
1 Solr Request = 1 Thoth (Solr) Document
Server Info
hostname, port number, core name, pool name
Query Info
timestamp, actual query, qtime, hits, exception?
Data Collection (1/2)
- Should be smooth. No traffic slowing down.
- We care about near real-time data
- We care about historical data
- Dataset is growing fast
- Interceptor on each search server
- We use a SolrComponent attached to a Request Handler
- Queue System (E.g: ActiveMQ) to facilitate and temporary store messages
- Each search server has a manifest in the solrconfig.xml
Data Collection (2/2)
<requestHandler name="select" class="com.solr2activemq.SolrToActiveMQHandler”>
<arr name="last-components”>
<str>solr2activemq</str>
</arr>
</requestHandler>
<searchComponent name="solr2activemq” class="com.solr2activemq.SolrToActiveMQComponent" >
<str name="activemq-broker-uri">localhost</str>
<int name="activemq-broker-port">61616</int>
<str name="activemq-broker-destination-type">queue</str>
<str name="activemq-broker-destination-name">test-queue</str>
<str name="solr-hostname">localhost</str>
<int name="solr-port">8983</int>
<str name="solr-poolname">default</str>
<str name="solr-corename">collection</str>
<int name="solr2activemq-buffer-size">1000</int>
<int name="solr2activemq-dequeuing-buffer-polling">500</int>
<int name="solr2activemq-check-activemq-polling">5000</int>
</searchComponent>
Sizing of Data
- Need for granular information for near real-time data
- Less granularity for historical data
Too much data = slow search, space problem
- Shrinking feature:
- Create Shrank Document
- Real-time Core cleanup
- Shrinking time is configurable
Thoth Index
- Solr 4.7
- Soft commit for near real-time search
- Soft commit maxTime set to 1s
- Auto commit set to 15s
- Update chain set to enforce UUID as PkID
- Use of Solrj to index data and query
Thoth API
- Abstraction for Thoth index and Thoth data
- Read only REST-like API
- JSON response
- Written in Node.js to accommodate socket.io
Example:
{"numFound":95,"values":[{"timestamp":"2014-09-
16T18:00:02Z","value":45337},{"timestamp":"2014-09-
16T18:15:02Z","value":77325},{"timestamp":"2014-09-
16T18:30:02Z","value":109523},{"timestamp":"2014-09-
16T18:45:02Z","value":112279},{"timestamp":"2014-09-
16T19:00:02Z","value":115334}
thoth:3001/api/server/foo/core/bar/port/portbar/start/NOW-1DAY/end/NOW/count/nqueries
Thoth Dashboard (1/5)
- Visual insight on Thoth data
- Useful graphs divided by server or pool
- Handy list of slow queries and exceptions
- Real-time view for server
- Selecting data based on time
- Sharable URLs (to OPS team, QA team, Release Eng. )
Thoth Dashboard (2/5)
Thoth Dashboard (3/5)
Thoth Dashboard (4/5)
Thoth Dashboard (5/5)
Thoth Monitor
- Continuously monitoring for metrics
- Stateless
- Alerting through email or Nagios
- Examples: QTime, Number of Zero hits,
Predictor Model Health
- Possibility to implement custom monitors
- Reuse StatsComponent
[http://wiki.apache.org/solr/StatsComponent]
if possible
Thoth ML
What can we do with all this data?
• Rich source of information
• Can we turn it into knowledge?
• How about machine learning?
1. Query time prediction
2. Query pattern recognition
3. Server sizing and resource allocation
1. Query Time Prediction (1/4)
• Goal : appropriately route queries to slow/ fast pool
• Look at query attributes
• Query text
• Start parameter
• Facets, range queries, geo spatial searches etc
• Train a supervised learning model
• Use learned model to predict if a query will be slow v/s fast
• H2O Machine Learning Library
1. Query Time Prediction (2/4)
Challenges
• Imbalanced dataset
• Frequency of model training
• Type of model
• Minimal delay requirement
1. Query Time Prediction (3/4)
Challenges Addressed
• Imbalanced dataset
• Stratified sampling
• Frequency of model training
• Auto identify relearning frequency
• Type of model
• Boolean, categorical features -> Tree based
• High accuracy
• Gradient Boosted Machine
• Minimal delay requirement
• User pool queries: 45-50 ms
• Prediction: 1-3 ms
1. Query Time Prediction (4/4)
• 1000 Gradient Boosted Trees
• Slow queries = (>100ms. Configurable)
• Experimental Results
• Training on ~3.1 million
• Test on ~1.4 million
• AUC: 0.94542
• Accuracy: 0.9202223
Query Time Prediction in Action (1/2)
Performance on real time traffic at Trulia
Query Time Prediction in Action (2/2)
Performance on real time traffic at Trulia
2. Query Pattern Recognition
• Exceptions, zero hit queries
• Analyze and find out why
• Probabilistic Topic Modeling
• Using MALLET open source toolkit
Topic Modeling Flow
Topics With Keywords
Future Direction
- Thoth ML improvements:
• Predicting query time buckets
• Regression v/s classification
• Exceptions and zero hit query analysis
• Sizing and resource allocation
- Solr Cloud integration
- Dashboard integration with Solr cloud
- More standard metrics on Thoth Monitor
- More data collection (load, GC)
Contributors and Special Thanks
Damiano : dbraga@trulia.com
Praneet: pmhatre@trulia.com
Fork us on Github!
github.com/trulia/thoth
JD Cantrell ( API, Dashboard)
Giulio Grillanda (API, Dashboard)
Rajendra Shioramwar (Core)
Ying Wang (Design)
Girish Gudla (Monitor)
Alexander Kanarsky
Alex Burmester

Mais conteúdo relacionado

Semelhante a Thoth - Realtime Solr Monitor and Search Analysis Engine

Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Lucidworks
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbZhangZhengming
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr at zvents 6 years later & still going strong
Solr at zvents   6 years later & still going strongSolr at zvents   6 years later & still going strong
Solr at zvents 6 years later & still going stronglucenerevolution
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
Visual Studio 2013 Profiling
Visual Studio 2013 ProfilingVisual Studio 2013 Profiling
Visual Studio 2013 ProfilingDenis Dudaev
 
Realtime Data Analytics
Realtime Data AnalyticsRealtime Data Analytics
Realtime Data AnalyticsBo Yang
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinDataWorks Summit
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastoreTomas Sirny
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From UberChester Chen
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data WarehousesConnor McDonald
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsMapR Technologies
 
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails AppSrijan Technologies
 

Semelhante a Thoth - Realtime Solr Monitor and Search Analysis Engine (20)

Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr at zvents 6 years later & still going strong
Solr at zvents   6 years later & still going strongSolr at zvents   6 years later & still going strong
Solr at zvents 6 years later & still going strong
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Visual Studio 2013 Profiling
Visual Studio 2013 ProfilingVisual Studio 2013 Profiling
Visual Studio 2013 Profiling
 
Realtime Data Analytics
Realtime Data AnalyticsRealtime Data Analytics
Realtime Data Analytics
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with Zeppelin
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
 

Último

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 

Último (20)

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 

Thoth - Realtime Solr Monitor and Search Analysis Engine

  • 1.
  • 2. Thoth Real-time Solr Monitor Search Analysis Engine dbraga@trulia.com pmhatre@trulia.com Damiano Braga Sr. Software Engineer Praneet Mhatre Data Mining Engineer
  • 3. Overview - What is Thoth ? - Data Collection and Thoth Core Indexing - Thoth API & Thoth Dashboard - Thoth Monitor - Thoth ML : Prediction and Topic Modeling - Special Thanks & Q/A Demo
  • 4. What is Thoth? - Innovation project at Trulia - Understand our search infrastructure without touching logs - Troubleshoot search performance issues - Designed as a modular system - Set of tools that can help gather info, monitor, understand a search infrastructure - Open source project : thoth thoth-ml thoth-api thoth-dashboard thoth-monitor thoth-demo
  • 5. Problem: Know Your Search Infrastructure - Solr logs are a good source. Sometimes partial information - Decentralized data (at least 1 log per search server) - Log rotation - Not searchable If we could index all the information .. Let’s use Solr ! - We can search on it - We have some handy features for free: facets, stats etc - It’s scalable
  • 6. Thoth Document 1 Solr Request = 1 Thoth (Solr) Document Server Info hostname, port number, core name, pool name Query Info timestamp, actual query, qtime, hits, exception?
  • 7. Data Collection (1/2) - Should be smooth. No traffic slowing down. - We care about near real-time data - We care about historical data - Dataset is growing fast - Interceptor on each search server - We use a SolrComponent attached to a Request Handler - Queue System (E.g: ActiveMQ) to facilitate and temporary store messages - Each search server has a manifest in the solrconfig.xml
  • 8. Data Collection (2/2) <requestHandler name="select" class="com.solr2activemq.SolrToActiveMQHandler”> <arr name="last-components”> <str>solr2activemq</str> </arr> </requestHandler> <searchComponent name="solr2activemq” class="com.solr2activemq.SolrToActiveMQComponent" > <str name="activemq-broker-uri">localhost</str> <int name="activemq-broker-port">61616</int> <str name="activemq-broker-destination-type">queue</str> <str name="activemq-broker-destination-name">test-queue</str> <str name="solr-hostname">localhost</str> <int name="solr-port">8983</int> <str name="solr-poolname">default</str> <str name="solr-corename">collection</str> <int name="solr2activemq-buffer-size">1000</int> <int name="solr2activemq-dequeuing-buffer-polling">500</int> <int name="solr2activemq-check-activemq-polling">5000</int> </searchComponent>
  • 9. Sizing of Data - Need for granular information for near real-time data - Less granularity for historical data Too much data = slow search, space problem - Shrinking feature: - Create Shrank Document - Real-time Core cleanup - Shrinking time is configurable
  • 10. Thoth Index - Solr 4.7 - Soft commit for near real-time search - Soft commit maxTime set to 1s - Auto commit set to 15s - Update chain set to enforce UUID as PkID - Use of Solrj to index data and query
  • 11. Thoth API - Abstraction for Thoth index and Thoth data - Read only REST-like API - JSON response - Written in Node.js to accommodate socket.io Example: {"numFound":95,"values":[{"timestamp":"2014-09- 16T18:00:02Z","value":45337},{"timestamp":"2014-09- 16T18:15:02Z","value":77325},{"timestamp":"2014-09- 16T18:30:02Z","value":109523},{"timestamp":"2014-09- 16T18:45:02Z","value":112279},{"timestamp":"2014-09- 16T19:00:02Z","value":115334} thoth:3001/api/server/foo/core/bar/port/portbar/start/NOW-1DAY/end/NOW/count/nqueries
  • 12. Thoth Dashboard (1/5) - Visual insight on Thoth data - Useful graphs divided by server or pool - Handy list of slow queries and exceptions - Real-time view for server - Selecting data based on time - Sharable URLs (to OPS team, QA team, Release Eng. )
  • 17. Thoth Monitor - Continuously monitoring for metrics - Stateless - Alerting through email or Nagios - Examples: QTime, Number of Zero hits, Predictor Model Health - Possibility to implement custom monitors - Reuse StatsComponent [http://wiki.apache.org/solr/StatsComponent] if possible
  • 18. Thoth ML What can we do with all this data? • Rich source of information • Can we turn it into knowledge? • How about machine learning? 1. Query time prediction 2. Query pattern recognition 3. Server sizing and resource allocation
  • 19. 1. Query Time Prediction (1/4) • Goal : appropriately route queries to slow/ fast pool • Look at query attributes • Query text • Start parameter • Facets, range queries, geo spatial searches etc • Train a supervised learning model • Use learned model to predict if a query will be slow v/s fast • H2O Machine Learning Library
  • 20. 1. Query Time Prediction (2/4) Challenges • Imbalanced dataset • Frequency of model training • Type of model • Minimal delay requirement
  • 21. 1. Query Time Prediction (3/4) Challenges Addressed • Imbalanced dataset • Stratified sampling • Frequency of model training • Auto identify relearning frequency • Type of model • Boolean, categorical features -> Tree based • High accuracy • Gradient Boosted Machine • Minimal delay requirement • User pool queries: 45-50 ms • Prediction: 1-3 ms
  • 22. 1. Query Time Prediction (4/4) • 1000 Gradient Boosted Trees • Slow queries = (>100ms. Configurable) • Experimental Results • Training on ~3.1 million • Test on ~1.4 million • AUC: 0.94542 • Accuracy: 0.9202223
  • 23. Query Time Prediction in Action (1/2) Performance on real time traffic at Trulia
  • 24. Query Time Prediction in Action (2/2) Performance on real time traffic at Trulia
  • 25. 2. Query Pattern Recognition • Exceptions, zero hit queries • Analyze and find out why • Probabilistic Topic Modeling • Using MALLET open source toolkit
  • 28. Future Direction - Thoth ML improvements: • Predicting query time buckets • Regression v/s classification • Exceptions and zero hit query analysis • Sizing and resource allocation - Solr Cloud integration - Dashboard integration with Solr cloud - More standard metrics on Thoth Monitor - More data collection (load, GC)
  • 29. Contributors and Special Thanks Damiano : dbraga@trulia.com Praneet: pmhatre@trulia.com Fork us on Github! github.com/trulia/thoth JD Cantrell ( API, Dashboard) Giulio Grillanda (API, Dashboard) Rajendra Shioramwar (Core) Ying Wang (Design) Girish Gudla (Monitor) Alexander Kanarsky Alex Burmester