SlideShare uma empresa Scribd logo
1 de 30
Lucene Roadmap
Steve Rowe
LucidWorks
• 1997: Doug Cutting creates Lucene
• 2000-2001: SourceForge hosts Lucene
• 2001-present: Lucene @ Apache Software Foundation
• 2006: Flexible indexing planning starts
• 2007: Solr graduates from the Apache Incubator to join the Lucene PMC as a sub-project
• 2008: Flexible indexing implementation begins
• 2010: Lucene and Solr development merge
• 2011: Lucene and Solr 3.1 and all further releases coordinated (13 joint releases so far)
• 2012: Lucene/Solr 4.0 released
Some Lucene (& Solr) History & Stats
Lucene 4.0 Highlights
• Flexible indexing: pluggable codecs: index format suites
• Flexible scoring: more index stats & similarities that use them
• Faster multithreaded indexing via concurrent flushing: DWPT
• Doc Values: typed single-valued fields: flexible sorting, scoring
• Norms are now doc values: you can have more than one byte!
• More RAM efficient data structures, e.g. terms dict/idx & fieldcache
• Faster search filtering
• Merge I/O can be rate-limited, to reduce I/O contention
• IndexReader is now per-segment
• Completely reworked spatial search
Lucene 4.1 & 4.2 Highlights
• Seeks on writing out index files eliminated
• Compressed stored fields and term vectors
• AnalyzingSuggester and FuzzySuggester
• Lucene facet module improvements: speedups, NRT
support, DrillSideways
• PostingsHighlighter: uses postings offsets
• CommonTermsQuery: speed up queries with very highly
frequent terms.
• Doc Values API and performance improvements
• The FST package supports FSTs over 2GB in size
• LiveFieldValues: real-time get for Lucene
• New classification module
Lucene 4.3 Highlights
• minShouldMatch BooleanQuery major performance
improvement
• SortingAtomicReader and SortingMergePolicy
• DocIdSetIterator and Scorer now has a cost API
• Analyzing/FuzzySuggester now enable recording an
arbitrary byte[] as a payload
• Spatial module: support for query relations Within,
Contains, and Disjoint
• Facet module: new method computes facet counts
using SortedSetDocValuesField, without a separate
taxonomy index.
On the horizon
• More efficient positional queries
• Incremental field updates
• Korean Analyzer
Solr Dev/User Survey Results
Solr Developer/User survey, April 2013
• Survey invitation emailed to 4,136 people:
– LucidWorks training class attendees
– Revolution attendees
– LucidWorks webinar registrants
• 177 have responded so far
Please rank the following features by priority
Answered: 165 Skipped: 12
More questions
1. How many attendees are Eclipse developers?
2. How many attendees are running Solr Cloud
in production?
Solr: Past, Present & Future
Yonik Seeley
LucidWorks
Origins of Solr
• CNET driven to find alternatives to discontinued
commercial enterprise search product
• Plan A: ATOMICS (Apache TO MySQL In CNET
Search)
– Standalone server speaking XML over HTTP
– Meet majority of “search” needs
– http://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066
• Plan B: “Something based on Lucene”
– Started Summer 2004
– First prototype called “Fusion”, later renamed SOLAR
(Search On Lucene And Resin)
Origins of the first Solr admin UI
New admin UI
Timeline
(up to 1.4)
Initial
prototype
CNET
production
CNET
contributes
Solr to ASF
Solr
graduates
from
Incubator
Simple
faceting
replication
highlighting,
dismax
Spellchecking
, CSV, Luke
MLT, Update
Request
Processors
QParsers Search
Components
Multi-core
Distributed
Search
Data Import
Handler
JMX
1.3
1.4
Statistics
Component
Java
Replication
Terms and
TermVector
Components
Multi-select
faceting
Dynamic
Clustering
1.1
1.0
1.2
4.0
3.1
Solr 4
• Solr Cloud
– Distributed Indexing
– No single points of failure
– Near Real Time friendly (push replication)
• NoSQL feature set
– Update Durability
– Real-time get
– Atomic Updates
– Optimistic Concurrency
• Pseudo-join, Pivot Faceting, Pseudo-fields, etc
What search solution/version are you
currently using?
Recent Enhancements
Document Routing
80000000-bfffffff
00000000-3fffffff
40000000-
7fffffff
c0000000-ffffffff
shard1shard4
shard3 shard2
id = BigCo!doc5
1f27 3c71
(MurmurHash3)
q=my_query
shard.keys=BigCo!
1f27 0000 1f27 ffffto
(hash)
shard1
numShards=4
router=compositeId
Seamless Online Shard Splitting
Shard2_0
Shard1
replic
a
leader
Shard2
replic
a
leader
Shard3
replic
a
leader
Shard2_1
1. New sub-shards created in “construction” state
2. Leader starts forwarding applicable updates, which
are buffered by the sub-shards
3. Leader index is split and installed on the sub-shards
4. Sub-shards apply buffered updates then become
“active” leaders and old shard becomes “inactive”
update
Cloud Enhancements
• Request forwarding
– In a multi-collection cluster, any node can
handle/forward requests for any collection
• Collection Aliases
http://localhost:8983/solr/admin/collections
?action=CREATEALIAS
&name=northeast
&collections=NY,NJ,PA,CT,ME,MA,NH,RI,VT
• Coming Soon: Shard Aliases
Schema REST API
• Restlet is now integrated with Solr
• Get a specific field
curl http://localhost:8983/solr/schema/fields/price
{"field":{
"name":"price",
"type":"float",
"indexed":true,
"stored":true }}
• Get all fields
curl http://localhost:8983/solr/schema/fields
• Get Entire Schema!
curl http://localhost:8983/solr/schema
Dynamic Schema
• Add a new field (Solr 4.4)
curl -XPUT http://localhost:8983/solr/schema/fields/strength -d ‘
{"type":”float", "indexed":"true”}
‘
• Works in distributed (cloud) mode too!
• Future: More schemaless
– Reality: there is no such thing for Lucene based systems
– Type guessing for fields we haven’t seen before
Future
• Greater scalability
• More “NoSQL”
– More ways to update & manipulate documents
• Analytics
– More powerful faceting, functions, statistics
• Improved Relational queries
• More dynamic (settings & configuration)
• Continued focus on ease of use
Thank You!

Mais conteúdo relacionado

Mais procurados

Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark Summit
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMLucidworks
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaLucidworks
 
Search UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
Search UI and Lucidworks View: Presented by Josh Ellinger, LucidworksSearch UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
Search UI and Lucidworks View: Presented by Josh Ellinger, LucidworksLucidworks
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemCloudera, Inc.
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingShivji Kumar Jha
 
2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei Zaharia2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei ZahariaDatabricks
 
Unite 2017 - Elastic Eearch - Bart Wullems
Unite 2017 - Elastic Eearch - Bart WullemsUnite 2017 - Elastic Eearch - Bart Wullems
Unite 2017 - Elastic Eearch - Bart WullemsN Core
 
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...Databricks
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupShalin Shekhar Mangar
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseCecile Le Pape
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache SparkOren Raboy
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 

Mais procurados (20)

Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
 
Search UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
Search UI and Lucidworks View: Presented by Josh Ellinger, LucidworksSearch UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
Search UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
 
2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei Zaharia2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei Zaharia
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Unite 2017 - Elastic Eearch - Bart Wullems
Unite 2017 - Elastic Eearch - Bart WullemsUnite 2017 - Elastic Eearch - Bart Wullems
Unite 2017 - Elastic Eearch - Bart Wullems
 
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache Spark
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 

Semelhante a KEYNOTE: Lucene / Solr road map

Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetuprcmuir
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muirlucenerevolution
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Fusion on Kubernetes - Alan Eugenio & Joe Streeky, Lucidworks
Fusion on Kubernetes - Alan Eugenio & Joe Streeky, LucidworksFusion on Kubernetes - Alan Eugenio & Joe Streeky, Lucidworks
Fusion on Kubernetes - Alan Eugenio & Joe Streeky, LucidworksLucidworks
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scaleRan Levy
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_dbhyeongchae lee
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Josh Patterson
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
CCI2019 - Monitorare SQL Server Senza Andare in Bancarotta
CCI2019 - Monitorare SQL Server Senza Andare in BancarottaCCI2019 - Monitorare SQL Server Senza Andare in Bancarotta
CCI2019 - Monitorare SQL Server Senza Andare in Bancarottawalk2talk srl
 
DSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - VerkadeDSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - VerkadeDeltares
 

Semelhante a KEYNOTE: Lucene / Solr road map (20)

Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
Solr 4
Solr 4Solr 4
Solr 4
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muir
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Fusion on Kubernetes - Alan Eugenio & Joe Streeky, Lucidworks
Fusion on Kubernetes - Alan Eugenio & Joe Streeky, LucidworksFusion on Kubernetes - Alan Eugenio & Joe Streeky, Lucidworks
Fusion on Kubernetes - Alan Eugenio & Joe Streeky, Lucidworks
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
CCI2019 - Monitorare SQL Server Senza Andare in Bancarotta
CCI2019 - Monitorare SQL Server Senza Andare in BancarottaCCI2019 - Monitorare SQL Server Senza Andare in Bancarotta
CCI2019 - Monitorare SQL Server Senza Andare in Bancarotta
 
DSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - VerkadeDSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
 

Mais de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mais de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

KEYNOTE: Lucene / Solr road map

  • 2. • 1997: Doug Cutting creates Lucene • 2000-2001: SourceForge hosts Lucene • 2001-present: Lucene @ Apache Software Foundation • 2006: Flexible indexing planning starts • 2007: Solr graduates from the Apache Incubator to join the Lucene PMC as a sub-project • 2008: Flexible indexing implementation begins • 2010: Lucene and Solr development merge • 2011: Lucene and Solr 3.1 and all further releases coordinated (13 joint releases so far) • 2012: Lucene/Solr 4.0 released Some Lucene (& Solr) History & Stats
  • 3. Lucene 4.0 Highlights • Flexible indexing: pluggable codecs: index format suites • Flexible scoring: more index stats & similarities that use them • Faster multithreaded indexing via concurrent flushing: DWPT • Doc Values: typed single-valued fields: flexible sorting, scoring • Norms are now doc values: you can have more than one byte! • More RAM efficient data structures, e.g. terms dict/idx & fieldcache • Faster search filtering • Merge I/O can be rate-limited, to reduce I/O contention • IndexReader is now per-segment • Completely reworked spatial search
  • 4. Lucene 4.1 & 4.2 Highlights • Seeks on writing out index files eliminated • Compressed stored fields and term vectors • AnalyzingSuggester and FuzzySuggester • Lucene facet module improvements: speedups, NRT support, DrillSideways • PostingsHighlighter: uses postings offsets • CommonTermsQuery: speed up queries with very highly frequent terms. • Doc Values API and performance improvements • The FST package supports FSTs over 2GB in size • LiveFieldValues: real-time get for Lucene • New classification module
  • 5. Lucene 4.3 Highlights • minShouldMatch BooleanQuery major performance improvement • SortingAtomicReader and SortingMergePolicy • DocIdSetIterator and Scorer now has a cost API • Analyzing/FuzzySuggester now enable recording an arbitrary byte[] as a payload • Spatial module: support for query relations Within, Contains, and Disjoint • Facet module: new method computes facet counts using SortedSetDocValuesField, without a separate taxonomy index.
  • 6. On the horizon • More efficient positional queries • Incremental field updates • Korean Analyzer
  • 8. Solr Developer/User survey, April 2013 • Survey invitation emailed to 4,136 people: – LucidWorks training class attendees – Revolution attendees – LucidWorks webinar registrants • 177 have responded so far
  • 9. Please rank the following features by priority Answered: 165 Skipped: 12
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. More questions 1. How many attendees are Eclipse developers? 2. How many attendees are running Solr Cloud in production?
  • 16. Solr: Past, Present & Future Yonik Seeley LucidWorks
  • 17. Origins of Solr • CNET driven to find alternatives to discontinued commercial enterprise search product • Plan A: ATOMICS (Apache TO MySQL In CNET Search) – Standalone server speaking XML over HTTP – Meet majority of “search” needs – http://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066 • Plan B: “Something based on Lucene” – Started Summer 2004 – First prototype called “Fusion”, later renamed SOLAR (Search On Lucene And Resin)
  • 18. Origins of the first Solr admin UI
  • 20. Timeline (up to 1.4) Initial prototype CNET production CNET contributes Solr to ASF Solr graduates from Incubator Simple faceting replication highlighting, dismax Spellchecking , CSV, Luke MLT, Update Request Processors QParsers Search Components Multi-core Distributed Search Data Import Handler JMX 1.3 1.4 Statistics Component Java Replication Terms and TermVector Components Multi-select faceting Dynamic Clustering 1.1 1.0 1.2 4.0 3.1
  • 21. Solr 4 • Solr Cloud – Distributed Indexing – No single points of failure – Near Real Time friendly (push replication) • NoSQL feature set – Update Durability – Real-time get – Atomic Updates – Optimistic Concurrency • Pseudo-join, Pivot Faceting, Pseudo-fields, etc
  • 22. What search solution/version are you currently using?
  • 24. Document Routing 80000000-bfffffff 00000000-3fffffff 40000000- 7fffffff c0000000-ffffffff shard1shard4 shard3 shard2 id = BigCo!doc5 1f27 3c71 (MurmurHash3) q=my_query shard.keys=BigCo! 1f27 0000 1f27 ffffto (hash) shard1 numShards=4 router=compositeId
  • 25. Seamless Online Shard Splitting Shard2_0 Shard1 replic a leader Shard2 replic a leader Shard3 replic a leader Shard2_1 1. New sub-shards created in “construction” state 2. Leader starts forwarding applicable updates, which are buffered by the sub-shards 3. Leader index is split and installed on the sub-shards 4. Sub-shards apply buffered updates then become “active” leaders and old shard becomes “inactive” update
  • 26. Cloud Enhancements • Request forwarding – In a multi-collection cluster, any node can handle/forward requests for any collection • Collection Aliases http://localhost:8983/solr/admin/collections ?action=CREATEALIAS &name=northeast &collections=NY,NJ,PA,CT,ME,MA,NH,RI,VT • Coming Soon: Shard Aliases
  • 27. Schema REST API • Restlet is now integrated with Solr • Get a specific field curl http://localhost:8983/solr/schema/fields/price {"field":{ "name":"price", "type":"float", "indexed":true, "stored":true }} • Get all fields curl http://localhost:8983/solr/schema/fields • Get Entire Schema! curl http://localhost:8983/solr/schema
  • 28. Dynamic Schema • Add a new field (Solr 4.4) curl -XPUT http://localhost:8983/solr/schema/fields/strength -d ‘ {"type":”float", "indexed":"true”} ‘ • Works in distributed (cloud) mode too! • Future: More schemaless – Reality: there is no such thing for Lucene based systems – Type guessing for fields we haven’t seen before
  • 29. Future • Greater scalability • More “NoSQL” – More ways to update & manipulate documents • Analytics – More powerful faceting, functions, statistics • Improved Relational queries • More dynamic (settings & configuration) • Continued focus on ease of use