SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
SearchHub: How to Spend Your Summer Keeping it Real
Grant Ingersoll
CTO, Lucidworks
3
01
SearchHub Demo
github.com/lucidworks/searchhub
http://searchhub.lucidworks.com
4
02
SearchHub Details
• Basics:
• 37 Apache Projects registered so far plus LW properties, opensource.com, Stack Overflow
• 130 datasources* including email, Github, JIRA*, Website and Wiki
• Fusion 2.4.2
• Signals everywhere
• UI based on View (work not complete)
• ASF Mail archives mirrored at: http://asfmail.lucidworks.io
5
03
Goals
• Company:
• “LucidFind” aka SearchHub on Fusion
• Provide backend for LW.com search, including
docs and support
• Real, production, living, breathing instance of
Fusion that we control
• Fusion best practices demo of major use cases
• CTO Office
• Real data, including clicks
• Platform for machine learning and experimentation
• Demos and talks
6
01
Agenda
• Quick Intro to Fusion and SearchHub
• Fusion Configuration, UI, Middle Tier
• Data Acquisition
• Deployment
• Signals and Machine Learning
• Next Steps
7
Drive next generation relevance
via Content, Collaboration and
Context
Built on best in class Open Source:
Apache Solr + Spark
Simplify application development
and reduce ongoing maintenance
Access data from
anywhere to build
intelligent, data-
driven applications.
Fusion in a Nutshell
8
01
Fusion
SECURITY BUILT-IN
Shards Shards
Apache Solr
Apache Zookeeper
ZK 1
Leader
Election
Load
Balancing
ZK N
Shared Config
Management
Worker Worker
Apache Spark
Cluster
Manager
Core Services
• • •
NLP
Recommenders / Signals
Blob Storage
Pipelines
Scheduling
Alerting / Messaging
Connectors
RESTAPI
Admin UI
Lucidworks
View
HDFS(Optional)
LOGS FILE WEB DATABASE CLOUD HADOOP
9
01
Fusion Configuration, UI and Middle Tier
• UI
• Derivative of Lucidworks View (https://lucidworks.com/products/view/)
• Deep integration of Snowplow Javascript Tracker (https://github.com/
snowplow/snowplow/wiki/javascript-tracker)
• Python Flask middle tier ($SEARCHHUB_HOME/python)
• Data sources (project_config)
• Pipelines (fusion_config)
• Schedules (fusion_config)
10
01
Data Acquisition
• Sources:
• ASF Mail archives mirrored at: http://asfmail.lucidworks.io
• Stack Overflow (SO)
• Github
• Processing
• Pipelines, including custom stage for parsing mail
• Main Challenges:
• “fail2ban” by the ASF
• Focused crawling of SO — JSoup FTW! (try.jsoup.org)
• Mail Threads
11
01
Deployment
• Client and Middle Tier run in a Docker container using Apache HTTPd and mod_wsgi
• Hosted on AWS (m4.2xls)
• Fusion backend is OOTB 2.4.2 with extra memory for Connectors and Solr
• README has the gory details: https://github.com/lucidworks/searchhub/blob/master/README.md
12
01
Signals
• UI is fully instrumented, using Snowplow Javascript Tracker, for most
user interactions. See SnowplowService.js
• Captures, amongst other things:
• User Id, Session Id, Unique Query Id, IP address, Location, Timing
data
• Actions tracked:
• Page View
• Page Ping (heartbeat) every 30 seconds
• Search with query, displayed doc list and displayed facet list
• Clicks with query, doc id, position, score and query UUID
• Typeahead Clicks with characters typed and suggestions offered
13
01
Machine Learning
• Fusion makes it easy to “round-trip” ML data/models between Spark and Solr
• Examples of:
• Recommenders
• Spark Lucene tokenization
• k-Means
• Word2Vec
• Topic Detection (LDA)
• Random Forests Classifier
• Many examples SparkShellHelpers.scala
14
Experiment Management and Bandits
Get Started
• Goal: Experimentation, not hard coded rules*
• Goal: Drive down the cost of experimentation
• “A/B testing on steroids”
• Exploration vs. Exploitation
• Fusion 3.0 (beta):
• Record and calculate relevance metrics from w/in Fusion (gold
standard, TREC, other)
• Easily calculate MRR, NDCG, Precision, Recall and report over time
• Support for Bandits: Greedy Epsilon, SoftMax, UCB1
15
Demo
16
01
Still Hungry?
• “Combining Content and Collaboration in Recommenders” by Jake Mannix:
Friday at 1:10 pm http://sched.co/7amt
• https://github.com/lucidworks/searchhub
• http://searchhub.lucidworks.com
•Email: grant@lucidworks.com
•Twitter: @gsingers
•Web: http://lucidworks.com

Mais conteúdo relacionado

Mais procurados

Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaLucidworks
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksLucidworks
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksLucidworks
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLucidworks
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsItamar
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsPhase2
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityStéphane Gamard
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!Michele Leroux Bustamante
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks
 

Mais procurados (20)

Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 

Semelhante a How to Spend Your Summer Keeping it Real with SearchHub

Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSkhangtoh
 
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...Lucidworks
 
Luigi presentation OA Summit
Luigi presentation OA SummitLuigi presentation OA Summit
Luigi presentation OA SummitOpen Analytics
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEMJohn Hubbard
 
Unicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingUnicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingJohn Gasper
 
JavaScript Libraries (Ajax Exp 2006)
JavaScript Libraries (Ajax Exp 2006)JavaScript Libraries (Ajax Exp 2006)
JavaScript Libraries (Ajax Exp 2006)jeresig
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Spring Roo Add-On Development & Distribution
Spring Roo Add-On Development & DistributionSpring Roo Add-On Development & Distribution
Spring Roo Add-On Development & DistributionStefan Schmidt
 
CNIT 121: 14 Investigating Applications
CNIT 121: 14 Investigating ApplicationsCNIT 121: 14 Investigating Applications
CNIT 121: 14 Investigating ApplicationsSam Bowne
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeCale Hoopes
 
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilitiesVorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilitiesDefconRussia
 
JavaScript Library Overview (Ajax Exp West 2007)
JavaScript Library Overview (Ajax Exp West 2007)JavaScript Library Overview (Ajax Exp West 2007)
JavaScript Library Overview (Ajax Exp West 2007)jeresig
 
Search onhadoopsfhug081413
Search onhadoopsfhug081413Search onhadoopsfhug081413
Search onhadoopsfhug081413gregchanan
 
DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...
DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...
DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...DevOpsDays Houston
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIOliver Busse
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalogmarkgrover
 

Semelhante a How to Spend Your Summer Keeping it Real with SearchHub (20)

Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUS
 
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
 
Burp suite
Burp suiteBurp suite
Burp suite
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
 
Luigi presentation OA Summit
Luigi presentation OA SummitLuigi presentation OA Summit
Luigi presentation OA Summit
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
 
Unicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingUnicon June 2014 IAM Briefing
Unicon June 2014 IAM Briefing
 
JavaScript Libraries (Ajax Exp 2006)
JavaScript Libraries (Ajax Exp 2006)JavaScript Libraries (Ajax Exp 2006)
JavaScript Libraries (Ajax Exp 2006)
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Spring Roo Add-On Development & Distribution
Spring Roo Add-On Development & DistributionSpring Roo Add-On Development & Distribution
Spring Roo Add-On Development & Distribution
 
CNIT 121: 14 Investigating Applications
CNIT 121: 14 Investigating ApplicationsCNIT 121: 14 Investigating Applications
CNIT 121: 14 Investigating Applications
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's Rye
 
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilitiesVorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilities
 
JavaScript Library Overview (Ajax Exp West 2007)
JavaScript Library Overview (Ajax Exp West 2007)JavaScript Library Overview (Ajax Exp West 2007)
JavaScript Library Overview (Ajax Exp West 2007)
 
Search onhadoopsfhug081413
Search onhadoopsfhug081413Search onhadoopsfhug081413
Search onhadoopsfhug081413
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...
DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...
DevOpsDays Houston 2019 - Shaun Ladewig, Robert Stone - From OverTheWallOps t...
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Into The Box 2015 Keynote
Into The Box 2015 KeynoteInto The Box 2015 Keynote
Into The Box 2015 Keynote
 

Mais de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mais de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

How to Spend Your Summer Keeping it Real with SearchHub

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
  • 2. SearchHub: How to Spend Your Summer Keeping it Real Grant Ingersoll CTO, Lucidworks
  • 4. 4 02 SearchHub Details • Basics: • 37 Apache Projects registered so far plus LW properties, opensource.com, Stack Overflow • 130 datasources* including email, Github, JIRA*, Website and Wiki • Fusion 2.4.2 • Signals everywhere • UI based on View (work not complete) • ASF Mail archives mirrored at: http://asfmail.lucidworks.io
  • 5. 5 03 Goals • Company: • “LucidFind” aka SearchHub on Fusion • Provide backend for LW.com search, including docs and support • Real, production, living, breathing instance of Fusion that we control • Fusion best practices demo of major use cases • CTO Office • Real data, including clicks • Platform for machine learning and experimentation • Demos and talks
  • 6. 6 01 Agenda • Quick Intro to Fusion and SearchHub • Fusion Configuration, UI, Middle Tier • Data Acquisition • Deployment • Signals and Machine Learning • Next Steps
  • 7. 7 Drive next generation relevance via Content, Collaboration and Context Built on best in class Open Source: Apache Solr + Spark Simplify application development and reduce ongoing maintenance Access data from anywhere to build intelligent, data- driven applications. Fusion in a Nutshell
  • 8. 8 01 Fusion SECURITY BUILT-IN Shards Shards Apache Solr Apache Zookeeper ZK 1 Leader Election Load Balancing ZK N Shared Config Management Worker Worker Apache Spark Cluster Manager Core Services • • • NLP Recommenders / Signals Blob Storage Pipelines Scheduling Alerting / Messaging Connectors RESTAPI Admin UI Lucidworks View HDFS(Optional) LOGS FILE WEB DATABASE CLOUD HADOOP
  • 9. 9 01 Fusion Configuration, UI and Middle Tier • UI • Derivative of Lucidworks View (https://lucidworks.com/products/view/) • Deep integration of Snowplow Javascript Tracker (https://github.com/ snowplow/snowplow/wiki/javascript-tracker) • Python Flask middle tier ($SEARCHHUB_HOME/python) • Data sources (project_config) • Pipelines (fusion_config) • Schedules (fusion_config)
  • 10. 10 01 Data Acquisition • Sources: • ASF Mail archives mirrored at: http://asfmail.lucidworks.io • Stack Overflow (SO) • Github • Processing • Pipelines, including custom stage for parsing mail • Main Challenges: • “fail2ban” by the ASF • Focused crawling of SO — JSoup FTW! (try.jsoup.org) • Mail Threads
  • 11. 11 01 Deployment • Client and Middle Tier run in a Docker container using Apache HTTPd and mod_wsgi • Hosted on AWS (m4.2xls) • Fusion backend is OOTB 2.4.2 with extra memory for Connectors and Solr • README has the gory details: https://github.com/lucidworks/searchhub/blob/master/README.md
  • 12. 12 01 Signals • UI is fully instrumented, using Snowplow Javascript Tracker, for most user interactions. See SnowplowService.js • Captures, amongst other things: • User Id, Session Id, Unique Query Id, IP address, Location, Timing data • Actions tracked: • Page View • Page Ping (heartbeat) every 30 seconds • Search with query, displayed doc list and displayed facet list • Clicks with query, doc id, position, score and query UUID • Typeahead Clicks with characters typed and suggestions offered
  • 13. 13 01 Machine Learning • Fusion makes it easy to “round-trip” ML data/models between Spark and Solr • Examples of: • Recommenders • Spark Lucene tokenization • k-Means • Word2Vec • Topic Detection (LDA) • Random Forests Classifier • Many examples SparkShellHelpers.scala
  • 14. 14 Experiment Management and Bandits Get Started • Goal: Experimentation, not hard coded rules* • Goal: Drive down the cost of experimentation • “A/B testing on steroids” • Exploration vs. Exploitation • Fusion 3.0 (beta): • Record and calculate relevance metrics from w/in Fusion (gold standard, TREC, other) • Easily calculate MRR, NDCG, Precision, Recall and report over time • Support for Bandits: Greedy Epsilon, SoftMax, UCB1
  • 16. 16 01 Still Hungry? • “Combining Content and Collaboration in Recommenders” by Jake Mannix: Friday at 1:10 pm http://sched.co/7amt • https://github.com/lucidworks/searchhub • http://searchhub.lucidworks.com •Email: grant@lucidworks.com •Twitter: @gsingers •Web: http://lucidworks.com