SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Open Source Search

Andreas Pesenhofer

max.recall information systems GmbH
Künstlergasse 11/1 • A-1150 Wien • Austria
ICIC, October 2013

ICIC, October 2013
max.recall information systems
• max.recall is a software and consulting company enabling
enterprises to capitalize on the hidden value in the rapidly growing
amount of textual data
• Customized Solutions for
– Intelligent data analytics
– Vertical search
• Products and Services
– quantalyze: quantity analytics technology
– smart.coder: open-ended question coding tool for market researchers

• Founded 2010 and located in Vienna, Austria
• Operates worldwide with int’l customers from sectors such as IP,
market research, news and media, IT services

2
ICIC, October 2013
Recall and precision
•

Recall
–
–

•

Precision
–
–

•
•

•

Percent of relevant documents (items) returned
50 good answers in system, 25 returned = 50% recall
Percent of documents returned that are relevant
100 returned, 25 are relevant = 25% precision

Ideal is 100% recall and 100% precision: return all relevant documents and
only those
100% recall is easy – return all documents, but precision is low, relevant
documents can’t be found
Need adequate recall & enough precision for the task - that will vary by
application (data & users)

3
ICIC, October 2013
How to get good recall
• Collect, index and search all the data
– Check for missing or corrupt data

• Index everything
– Search everything … limit results by category AFTER the
search (clustering/faceting)

• Normalize the data
– Convert to lower case, strip/handle special characters,
stemming, ...

• Use spell-checking, synonyms to match users’
vocabulary with content
– Adaptive spell-checking, application-specific synonyms

• Light (or real) natural language processing for abstract
concepts

4
ICIC, October 2013
How to get good precision
• Term frequency (TF) – more occurrences of query terms is better
• Inverse document frequency (IDF) – rarer query terms are more
important
• Phrase boost – query terms near each other is better
• Field boost – where the query term is in doc matters (e.g., in 'title'
better)
• Length normalization – avoid penalizing short docs
• Recency – all things being equal, recent is better
• Authority – items linked to, clicked on or bought by others may be
better
• Implicit and explicit relevance feedback, more-like-this – expand
query
• Clustering/faceting – intent is not specific
• Lots of data

5
ICIC, October 2013
Every minute …

http://www.domo.com/

6
ICIC, October 2013
Groth of patent applications

7
ICIC, October 2013
Big Data Open Source Tools

8
ICIC, October 2013
Apache Lucene
Apache LuceneTM is a high-performance, full-featured text
search engine library written entirely in Java. It is a technology
suitable for nearly any application that requires full-text search,
especially cross-platform.
• Scalable, High-Performance Indexing
–
–
–
–

over 150GB/hour on modern hardware
small RAM requirements - only 1MB heap
incremental indexing as fast as batch indexing
index size roughly 20-30% the size of text indexed

9
ICIC, October 2013
Apache Lucene (2)
• Powerful, Accurate and Efficient Search Algorithms
– ranked searching - best results returned first
– many powerful query types: phrase queries, wildcard queries, proximity
queries, range queries and more
– fielded searching (e.g. title, author, contents)
– sorting by any field
– multiple-index searching with merged results
– allows simultaneous update and searching
– flexible faceting, highlighting, joins and result grouping
– fast, memory-efficient and typo-tolerant suggesters
– pluggable ranking models, including the Vector Space Model and Okapi
BM25
– configurable storage engine (codecs)

• Cross-Platform Solution
10
ICIC, October 2013
Apache Lucene (3)
• Cross-Platform Solution
– Available as Open Source software under the Apache License - Lucene
in both commercial and Open Source programs
– 100%-pure Java
– Implementations in other programming languages available, the index is
compatible

• Apache Lucene 4.5.0 was released on October 5th, 2013.

11
ICIC, October 2013
Apache SOLR
• Apache SOLR is an open source enterprise search platform
from the Apache Lucene project.
• major features:
–
–
–
–
–
–
–

full-text search
hit highlighting
faceted search
dynamic clustering
database integration
handling of rich documents (e.g., Word, PDF)
providing distributed search and index replication, Solr is highly
scalable.

• Apache SOLR 4.5.0 was released on October 5th, 2013.

12
ICIC, October 2013
elasticsearch
• elasticsearch is a distributed, RESTful, open source search
server based on Apache Lucene. It is developed by Shay
Banon and is released under the terms of the Apache
License.
• major features:
–
–
–
–

fully supports the near real-time search of Apache Lucene
cluster setup needs no additional software
features of Lucene are made available through the JSON and Java API
JSON in / JSON out (and YAML)

• elasticsearch 0.90.5 was released on September 17th,
2013, based on Lucene 4.4.

13
ICIC, October 2013
All Time Top Committers

ICIC, October 2013
Active Contributors

ICIC, October 2013
Lines of Code

ICIC, October 2013
The Mailing Lists

ICIC, October 2013
Interest over time

ICIC, October 2013
Case study - quantalyze
patents.quantalyze.com






Runs in your browser
Filter and keyword search
Physical quantity search
Interval search
Print view

Visualization:
 Physical quantity distributions
 Cross-tabulations (e.g.
concepts vs. quantity type)
 Different chart types

19
ICIC, October 2013
Case study - StumbleUpon
Create a world-class
customer experience
• A “Stumble” provides real-time
recommendations to 30 million
customers per day
• Intelligent search is key to
providing fast and more
informed recommendations
• Update your searches
immediately with newly posted
content

Develop and scale easily
•

•

•

Build in intelligent search to
scale with millions of users and
interactions
Take advantage of powerful
and flexible APIs for easy data
integration
Use easy to use but powerful
solutions for your big data
search and analytics needs

20
ICIC, October 2013
Strengths of open source search
• Best practice segmented index (like Google, Fast)
• Scalability
• Best practice, flexible ranking (term/field/doc boosts, function
queries, custom scoring…)
• Best overall query performance and complete query
capabilities (unlimited Boolean operations, wildcards,
findsimilar, synonyms, spell-check…)
• Multilingual, query filters, geo search, memory mapped
indexes, near real-time search, advanced proximity
operators…
• Rapid innovation
• Extensible architecture, complete control (open source)
• No license fees (open source)

21
ICIC, October 2013
Weaknesses of open source search
• Those typical of open source
–
–
–
–

No formal support
Limited access to training, consulting
Lack of stringent integrated QA
Speed of development and open source environment
too complex for some (e.g., what version should I
download? What patches? GUI?)

• Others
– Lucene/Solr/Elasicsearch development has tended to
focus on core capabilities, so missing certain features
for enterprise search (e.g., connectors, security,
alerts, advanced query operations)

22
ICIC, October 2013
Addressing open source weaknesses
• Community
– Community has a wealth of information on web sites, wikis
and mailing lists
– Community members usually respond quickly to questions

• Consultants
– May be especially helpful for systems integration or
addressing gaps

• Commercialization
– Companies commercializing open source provide
commercial support, certified versions, training and
consulting

• Internal resources

23
ICIC, October 2013
Product strengths of commercial
competitors
• Well established players tend to be full-featured
• Some organizations have focused on a
particular application or domain (e.g.,
ecommerce, publishing, legal, help desk)
• Some competitors have focused on appliance

24
ICIC, October 2013
Weaknesses of top commercial
competitors
•
•
•
•
•
•
•
•

Usually expensive, especially at scale
Platform or portability limitations
Limited transparency
Limited flexibility, especially for other than
intended application or domain
Limited customization, especially for appliance-like
products
Sometimes limited scalability
Technical debt and/or lack of rapid innovation
Customers are dependent on the company’s
continued business success

25
ICIC, October 2013
Competitive landscape
• Last years commercial companies have felt
increasing competition from
Lucene/Solr/Elasticsearch because of the
combination of its capability and price
• Some competitors have responded with
diversification
• Some have been acquired
• Need for good, affordable, flexible search
remains

26
ICIC, October 2013
Questions

ICIC, October 2013
Credits

28
ICIC, October 2013

Mais conteúdo relacionado

Mais procurados

II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - MinesoftDr. Haxel Consult
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...Dr. Haxel Consult
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheDr. Haxel Consult
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftDr. Haxel Consult
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASDr. Haxel Consult
 
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingII-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingDr. Haxel Consult
 
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
II-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirII-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirDr. Haxel Consult
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheDr. Haxel Consult
 
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...Dr. Haxel Consult
 
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...Peter Löwe
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 

Mais procurados (20)

II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - Minesoft
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CAS
 
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingII-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
 
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
II-SDV 2016 Raphael Ilmer, Quentin Ladetto - Optimization of Patent Landscape...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirII-SDV 2016 Questel Intellixir
II-SDV 2016 Questel Intellixir
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ Karlsruhe
 
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
 
II-SDV 2016 Linguamatics
II-SDV 2016 LinguamaticsII-SDV 2016 Linguamatics
II-SDV 2016 Linguamatics
 
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 

Destaque

ICIC 2013 Conference Proceedings Krishna Molecular Connections
ICIC 2013 Conference Proceedings Krishna Molecular ConnectionsICIC 2013 Conference Proceedings Krishna Molecular Connections
ICIC 2013 Conference Proceedings Krishna Molecular ConnectionsDr. Haxel Consult
 
ICIC 2013 New Product Introductions Dolcera
ICIC 2013 New Product Introductions DolceraICIC 2013 New Product Introductions Dolcera
ICIC 2013 New Product Introductions DolceraDr. Haxel Consult
 
ICIC 2013 New Product Introductions BizInt
ICIC 2013 New Product Introductions BizIntICIC 2013 New Product Introductions BizInt
ICIC 2013 New Product Introductions BizIntDr. Haxel Consult
 
ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...
ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...
ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...Dr. Haxel Consult
 
New Product Introductions - GenomeQuest Life Sciences
New Product Introductions - GenomeQuest Life SciencesNew Product Introductions - GenomeQuest Life Sciences
New Product Introductions - GenomeQuest Life SciencesDr. Haxel Consult
 
ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...
ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...
ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...Dr. Haxel Consult
 
ICIC 2014 Panel: Mobile Apps for Patent Searchers
ICIC 2014 Panel: Mobile Apps for Patent SearchersICIC 2014 Panel: Mobile Apps for Patent Searchers
ICIC 2014 Panel: Mobile Apps for Patent SearchersDr. Haxel Consult
 
ICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockDr. Haxel Consult
 
ICIC 2013 New Product Introductions Linguamatics
ICIC 2013 New Product Introductions LinguamaticsICIC 2013 New Product Introductions Linguamatics
ICIC 2013 New Product Introductions LinguamaticsDr. Haxel Consult
 
ICIC 2014 New Product Introduction Questel
ICIC 2014 New Product Introduction QuestelICIC 2014 New Product Introduction Questel
ICIC 2014 New Product Introduction QuestelDr. Haxel Consult
 
ICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CASICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CASDr. Haxel Consult
 
ICIC 2013 New Product Introductions GenomeQuest
ICIC 2013 New Product Introductions GenomeQuestICIC 2013 New Product Introductions GenomeQuest
ICIC 2013 New Product Introductions GenomeQuestDr. Haxel Consult
 
ICIC 2014 The Intermediates are becoming extict - radical Change for Info Pr...
ICIC 2014 The Intermediates are becoming extict  - radical Change for Info Pr...ICIC 2014 The Intermediates are becoming extict  - radical Change for Info Pr...
ICIC 2014 The Intermediates are becoming extict - radical Change for Info Pr...Dr. Haxel Consult
 
ICIC 2014 New Product Introduction Averbis
ICIC 2014 New Product Introduction AverbisICIC 2014 New Product Introduction Averbis
ICIC 2014 New Product Introduction AverbisDr. Haxel Consult
 
ICIC 2014 How the European Patent Office Uses Asian Documentation
ICIC 2014 How the European Patent Office Uses Asian Documentation ICIC 2014 How the European Patent Office Uses Asian Documentation
ICIC 2014 How the European Patent Office Uses Asian Documentation Dr. Haxel Consult
 
Knowledge Manager - Fit for the Future
Knowledge Manager - Fit for the Future Knowledge Manager - Fit for the Future
Knowledge Manager - Fit for the Future Dr. Haxel Consult
 
ICIC 2014 New Product Introduction Infotrieve
ICIC 2014 New Product Introduction InfotrieveICIC 2014 New Product Introduction Infotrieve
ICIC 2014 New Product Introduction InfotrieveDr. Haxel Consult
 
ICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChemICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChemDr. Haxel Consult
 
ICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction GridlogiscICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction GridlogiscDr. Haxel Consult
 
ICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuestICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuestDr. Haxel Consult
 

Destaque (20)

ICIC 2013 Conference Proceedings Krishna Molecular Connections
ICIC 2013 Conference Proceedings Krishna Molecular ConnectionsICIC 2013 Conference Proceedings Krishna Molecular Connections
ICIC 2013 Conference Proceedings Krishna Molecular Connections
 
ICIC 2013 New Product Introductions Dolcera
ICIC 2013 New Product Introductions DolceraICIC 2013 New Product Introductions Dolcera
ICIC 2013 New Product Introductions Dolcera
 
ICIC 2013 New Product Introductions BizInt
ICIC 2013 New Product Introductions BizIntICIC 2013 New Product Introductions BizInt
ICIC 2013 New Product Introductions BizInt
 
ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...
ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...
ICIC 2014 Tracking of the Mode of Action Landscape in Breast Cancer using Rep...
 
New Product Introductions - GenomeQuest Life Sciences
New Product Introductions - GenomeQuest Life SciencesNew Product Introductions - GenomeQuest Life Sciences
New Product Introductions - GenomeQuest Life Sciences
 
ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...
ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...
ICIC 2014 Patent Landscape Analysis as a Tool for Public Policies Adjustment:...
 
ICIC 2014 Panel: Mobile Apps for Patent Searchers
ICIC 2014 Panel: Mobile Apps for Patent SearchersICIC 2014 Panel: Mobile Apps for Patent Searchers
ICIC 2014 Panel: Mobile Apps for Patent Searchers
 
ICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian Radestock
 
ICIC 2013 New Product Introductions Linguamatics
ICIC 2013 New Product Introductions LinguamaticsICIC 2013 New Product Introductions Linguamatics
ICIC 2013 New Product Introductions Linguamatics
 
ICIC 2014 New Product Introduction Questel
ICIC 2014 New Product Introduction QuestelICIC 2014 New Product Introduction Questel
ICIC 2014 New Product Introduction Questel
 
ICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CASICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CAS
 
ICIC 2013 New Product Introductions GenomeQuest
ICIC 2013 New Product Introductions GenomeQuestICIC 2013 New Product Introductions GenomeQuest
ICIC 2013 New Product Introductions GenomeQuest
 
ICIC 2014 The Intermediates are becoming extict - radical Change for Info Pr...
ICIC 2014 The Intermediates are becoming extict  - radical Change for Info Pr...ICIC 2014 The Intermediates are becoming extict  - radical Change for Info Pr...
ICIC 2014 The Intermediates are becoming extict - radical Change for Info Pr...
 
ICIC 2014 New Product Introduction Averbis
ICIC 2014 New Product Introduction AverbisICIC 2014 New Product Introduction Averbis
ICIC 2014 New Product Introduction Averbis
 
ICIC 2014 How the European Patent Office Uses Asian Documentation
ICIC 2014 How the European Patent Office Uses Asian Documentation ICIC 2014 How the European Patent Office Uses Asian Documentation
ICIC 2014 How the European Patent Office Uses Asian Documentation
 
Knowledge Manager - Fit for the Future
Knowledge Manager - Fit for the Future Knowledge Manager - Fit for the Future
Knowledge Manager - Fit for the Future
 
ICIC 2014 New Product Introduction Infotrieve
ICIC 2014 New Product Introduction InfotrieveICIC 2014 New Product Introduction Infotrieve
ICIC 2014 New Product Introduction Infotrieve
 
ICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChemICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChem
 
ICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction GridlogiscICIC 2014 New Product Introduction Gridlogisc
ICIC 2014 New Product Introduction Gridlogisc
 
ICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuestICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuest
 

Semelhante a ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall

The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellDr. Haxel Consult
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Petter Skodvin-Hvammen
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013adamnelson
 
OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...
OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...
OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...FINOS
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibDavid Nzoputa Ofili
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
OpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemGrant Ingersoll
 
Rootconf 2017 - State of the Open Source monitoring landscape
Rootconf 2017 - State of the Open Source monitoring landscape Rootconf 2017 - State of the Open Source monitoring landscape
Rootconf 2017 - State of the Open Source monitoring landscape NETWAYS
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionLucidworks
 
An Introduction to ROS-Industrial
An Introduction to ROS-IndustrialAn Introduction to ROS-Industrial
An Introduction to ROS-IndustrialClay Flannigan
 
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Grid Protection Alliance
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library servicesNikesh Narayanan
 

Semelhante a ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall (20)

Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
EnterpriseSearch
EnterpriseSearchEnterpriseSearch
EnterpriseSearch
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Scalable Search Analytics
Scalable Search AnalyticsScalable Search Analytics
Scalable Search Analytics
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 
OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...
OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...
OSSF 2018 - Stefan Just of Codescoop - OSCAR - a new approach to Software Com...
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
OpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene Ecosystem
 
Rootconf 2017 - State of the Open Source monitoring landscape
Rootconf 2017 - State of the Open Source monitoring landscape Rootconf 2017 - State of the Open Source monitoring landscape
Rootconf 2017 - State of the Open Source monitoring landscape
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with Fusion
 
An Introduction to ROS-Industrial
An Introduction to ROS-IndustrialAn Introduction to ROS-Industrial
An Introduction to ROS-Industrial
 
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library services
 
II-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICSII-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICS
 

Mais de Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mais de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall

  • 1. Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 • A-1150 Wien • Austria ICIC, October 2013 ICIC, October 2013
  • 2. max.recall information systems • max.recall is a software and consulting company enabling enterprises to capitalize on the hidden value in the rapidly growing amount of textual data • Customized Solutions for – Intelligent data analytics – Vertical search • Products and Services – quantalyze: quantity analytics technology – smart.coder: open-ended question coding tool for market researchers • Founded 2010 and located in Vienna, Austria • Operates worldwide with int’l customers from sectors such as IP, market research, news and media, IT services 2 ICIC, October 2013
  • 3. Recall and precision • Recall – – • Precision – – • • • Percent of relevant documents (items) returned 50 good answers in system, 25 returned = 50% recall Percent of documents returned that are relevant 100 returned, 25 are relevant = 25% precision Ideal is 100% recall and 100% precision: return all relevant documents and only those 100% recall is easy – return all documents, but precision is low, relevant documents can’t be found Need adequate recall & enough precision for the task - that will vary by application (data & users) 3 ICIC, October 2013
  • 4. How to get good recall • Collect, index and search all the data – Check for missing or corrupt data • Index everything – Search everything … limit results by category AFTER the search (clustering/faceting) • Normalize the data – Convert to lower case, strip/handle special characters, stemming, ... • Use spell-checking, synonyms to match users’ vocabulary with content – Adaptive spell-checking, application-specific synonyms • Light (or real) natural language processing for abstract concepts 4 ICIC, October 2013
  • 5. How to get good precision • Term frequency (TF) – more occurrences of query terms is better • Inverse document frequency (IDF) – rarer query terms are more important • Phrase boost – query terms near each other is better • Field boost – where the query term is in doc matters (e.g., in 'title' better) • Length normalization – avoid penalizing short docs • Recency – all things being equal, recent is better • Authority – items linked to, clicked on or bought by others may be better • Implicit and explicit relevance feedback, more-like-this – expand query • Clustering/faceting – intent is not specific • Lots of data 5 ICIC, October 2013
  • 7. Groth of patent applications 7 ICIC, October 2013
  • 8. Big Data Open Source Tools 8 ICIC, October 2013
  • 9. Apache Lucene Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. • Scalable, High-Performance Indexing – – – – over 150GB/hour on modern hardware small RAM requirements - only 1MB heap incremental indexing as fast as batch indexing index size roughly 20-30% the size of text indexed 9 ICIC, October 2013
  • 10. Apache Lucene (2) • Powerful, Accurate and Efficient Search Algorithms – ranked searching - best results returned first – many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more – fielded searching (e.g. title, author, contents) – sorting by any field – multiple-index searching with merged results – allows simultaneous update and searching – flexible faceting, highlighting, joins and result grouping – fast, memory-efficient and typo-tolerant suggesters – pluggable ranking models, including the Vector Space Model and Okapi BM25 – configurable storage engine (codecs) • Cross-Platform Solution 10 ICIC, October 2013
  • 11. Apache Lucene (3) • Cross-Platform Solution – Available as Open Source software under the Apache License - Lucene in both commercial and Open Source programs – 100%-pure Java – Implementations in other programming languages available, the index is compatible • Apache Lucene 4.5.0 was released on October 5th, 2013. 11 ICIC, October 2013
  • 12. Apache SOLR • Apache SOLR is an open source enterprise search platform from the Apache Lucene project. • major features: – – – – – – – full-text search hit highlighting faceted search dynamic clustering database integration handling of rich documents (e.g., Word, PDF) providing distributed search and index replication, Solr is highly scalable. • Apache SOLR 4.5.0 was released on October 5th, 2013. 12 ICIC, October 2013
  • 13. elasticsearch • elasticsearch is a distributed, RESTful, open source search server based on Apache Lucene. It is developed by Shay Banon and is released under the terms of the Apache License. • major features: – – – – fully supports the near real-time search of Apache Lucene cluster setup needs no additional software features of Lucene are made available through the JSON and Java API JSON in / JSON out (and YAML) • elasticsearch 0.90.5 was released on September 17th, 2013, based on Lucene 4.4. 13 ICIC, October 2013
  • 14. All Time Top Committers ICIC, October 2013
  • 16. Lines of Code ICIC, October 2013
  • 17. The Mailing Lists ICIC, October 2013
  • 18. Interest over time ICIC, October 2013
  • 19. Case study - quantalyze patents.quantalyze.com      Runs in your browser Filter and keyword search Physical quantity search Interval search Print view Visualization:  Physical quantity distributions  Cross-tabulations (e.g. concepts vs. quantity type)  Different chart types 19 ICIC, October 2013
  • 20. Case study - StumbleUpon Create a world-class customer experience • A “Stumble” provides real-time recommendations to 30 million customers per day • Intelligent search is key to providing fast and more informed recommendations • Update your searches immediately with newly posted content Develop and scale easily • • • Build in intelligent search to scale with millions of users and interactions Take advantage of powerful and flexible APIs for easy data integration Use easy to use but powerful solutions for your big data search and analytics needs 20 ICIC, October 2013
  • 21. Strengths of open source search • Best practice segmented index (like Google, Fast) • Scalability • Best practice, flexible ranking (term/field/doc boosts, function queries, custom scoring…) • Best overall query performance and complete query capabilities (unlimited Boolean operations, wildcards, findsimilar, synonyms, spell-check…) • Multilingual, query filters, geo search, memory mapped indexes, near real-time search, advanced proximity operators… • Rapid innovation • Extensible architecture, complete control (open source) • No license fees (open source) 21 ICIC, October 2013
  • 22. Weaknesses of open source search • Those typical of open source – – – – No formal support Limited access to training, consulting Lack of stringent integrated QA Speed of development and open source environment too complex for some (e.g., what version should I download? What patches? GUI?) • Others – Lucene/Solr/Elasicsearch development has tended to focus on core capabilities, so missing certain features for enterprise search (e.g., connectors, security, alerts, advanced query operations) 22 ICIC, October 2013
  • 23. Addressing open source weaknesses • Community – Community has a wealth of information on web sites, wikis and mailing lists – Community members usually respond quickly to questions • Consultants – May be especially helpful for systems integration or addressing gaps • Commercialization – Companies commercializing open source provide commercial support, certified versions, training and consulting • Internal resources 23 ICIC, October 2013
  • 24. Product strengths of commercial competitors • Well established players tend to be full-featured • Some organizations have focused on a particular application or domain (e.g., ecommerce, publishing, legal, help desk) • Some competitors have focused on appliance 24 ICIC, October 2013
  • 25. Weaknesses of top commercial competitors • • • • • • • • Usually expensive, especially at scale Platform or portability limitations Limited transparency Limited flexibility, especially for other than intended application or domain Limited customization, especially for appliance-like products Sometimes limited scalability Technical debt and/or lack of rapid innovation Customers are dependent on the company’s continued business success 25 ICIC, October 2013
  • 26. Competitive landscape • Last years commercial companies have felt increasing competition from Lucene/Solr/Elasticsearch because of the combination of its capability and price • Some competitors have responded with diversification • Some have been acquired • Need for good, affordable, flexible search remains 26 ICIC, October 2013