O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
In Search of...
A topic for a meetup :-)
Tibor Lipusz
Lead Software Engineer, SME of Search & Security @ Subscription Serv...
Agenda
Understanding Search
Technology Stack
Brief Overview of Search in Liferay
Understanding Search
Why do we search?
Understanding Search
Why do we search?
Because we don’t know
everything :-)
Understanding Search
Why do we search?
Basic Human instinct We want answers→
Understanding Search
Why do we search?
Basic Human instinct We want answers→
Humans are information-driven Knowledge→
is k...
Understanding Search
Why do we search?
Basic Human instinct We want answers→
Humans are information-driven Knowledge is ke...
Understanding Search
Why do we search?
Basic Human instinct We want answers→
Humans are information-driven Knowledge is ke...
What do we search for?
Understanding Search
What do we search for?
Everything! :-)
Understanding Search
What do we search for?
Places
Understanding Search
What do we search for?
Images
Understanding Search
What do we search for?
Books
Understanding Search
What do we search for?
Science (Answers)
Understanding Search
What do we search for?
Articles
Understanding Search
What do we search for?
Documents
Understanding Search
What do we search for?
Text
Understanding Search
What do we search for?
And many more… :-)
How do we search?
Understanding Search
How do we search?
The good-old (search) input box
Understanding Search
How do we search?
Search box with suggestions
Understanding Search
How do we search?
Faceted Search (guided navigation)
Understanding Search
“The Search”
Input: Search
Keywords MAGIC!
Output: Search Results
Understanding Search
“The Search”
It’s so simple, huh? :-)
Understanding Search
“The Search”
Unfortunately, it’s not… :-(
Understanding Search
“The Search”
Input: Search
Keywords MAGIC!
Output: Search Results
Understanding Search
“The Search”: User Input
What makes it complex?
- Creating an intuitive yet powerful frontend (UI)
- ...
Understanding Search
“The Search”
Input: Search
Keywords MAGIC!
Output: Search Results
Understanding Search
“The Search”: Magic
Magic = Actual Search (sorry, no real magic)
Understanding Search
“The Search”: Magic
What makes it complex?
- Different types of content → Abstraction, Transformation...
Understanding Search
“The Search”
Input: Search
Keywords MAGIC!
Output: Search
Results
Understanding Search
“The Search”: Search Results
What makes it complex?
- Creating an intuitive yet powerful frontend (UI...
Understanding Search
“The Search”: ?
Didn’t we miss an important step?
Understanding Search
“The Search”: Content
The world is not ideal →
Understanding Search
“The Search”: Content
The world is not ideal →
Data comes in different forms →
Understanding Search
“The Search”: Content
The world is not ideal →
Data comes in different forms →
Search should operate ...
Understanding Search
“The Search”: Content
The world is not ideal →
Data comes in different forms →
Search should operate ...
Understanding Search
Summary
What we need?
- Data Transformation → Pre-Search
- Search Backend → Search → Magic :-)
- Sear...
Technology Stack
Technology Stack
Technology Stack
LuceneTM
Full-featured text search engine library
Java-based
Document model
Provides indexing, scoring, s...
Technology Stack
Lucene: The Life Inside the Search Engine
The (inverted) Index
Technology Stack
Lucene: The Life Inside the Search Engine
The Index Documents
Technology Stack
Lucene: The Life Inside the Search Engine
The Index Documents Fields
Technology Stack
Lucene
Document model Abstraction over data→
Main Phases
- Indexing: creating documents
- Analysis and An...
Technology Stack
Lucene
Why would we prefer this over databases?
Technology Stack
Lucene: Search Index vs. DB
- Different purposes
- DB: permanent storage, data-driven, application-centri...
Technology Stack
Technology Stack
Solr
Highly reliable, scalable and fault tolerant
Provides distributed indexing, replication and load-bal...
Technology Stack
Solr: Features
Technology Stack
Solr: Features
Technology Stack
Solr: Features
Technology Stack
https://www.elastic.co/products/elasticsearch
Highly scalable, distributed, open-source full-text search and analytics eng...
Technology Stack
Elasticsearch: Features
(Near) Real-Time Data*: all data is immediately made available for search and
ana...
Technology Stack
Elasticsearch: Features
Massively Distributed: ES allows you to start small and scale horizontally as you...
Technology Stack
Elasticsearch: Features
High Availability: ES clusters are resilient — they will detect new or failed nod...
Technology Stack
Elasticsearch: Features
Multitenancy: A cluster may contain multiple indices that can be queried
independ...
Technology Stack
Elasticsearch: Features
Full-Text Search: ES builds distributed capabilities on top of Apache Lucene to p...
Technology Stack
Elasticsearch: Features
RESTful API: ES is API driven. Almost any action can be performed using a simple
...
Technology Stack
Elasticsearch: Features
Open Source: Apache 2 License
Technology Stack
Elasticsearch: Features
Built on top of Lucene: Apache Lucene is a high performance, full-featured
Inform...
Technology Stack
Elasticsearch: Architecture
Elasticsearch
Cluster
Application
DB
Search
Index
Technology Stack
Elasticsearch
Elasticsearch
Cluster
Technology Stack
Elasticsearch: Cluster
Technology Stack
Elasticsearch: Nodes
Technology Stack
Elasticsearch: Index
Technology Stack
Elasticsearch: Shards
Technology Stack
Elasticsearch: Lucene Index
Brief Overview of Search in Liferay
Search in Liferay
The Liferay Search Infrastructure
Liferay
Platform
Assets:
web content,
message boards,
wiki pages...
Se...
→ 2004: Lucene: from the very beginnings
- Liferay Search APIs: built around/on top of Lucene, default engine
→ 2008 (5.1)...
→ 2011-2014: (6.0-6.2): enhancing search functionalities
- Suggestions,“Did You Mean”, filtering, improved permission awar...
Best of breed
Built for modern web applications
Distributed and clusterable by design
Lucene based
Great vendor support
Gr...
Open Source
Amazing documentation
High "just works" factor, e.g. zero-config indexing and clustering
REST for queries, hea...
Search in Liferay
Clustering with Liferay and Elasticsearch
Production mode
Dev mode
Security
Monitoring
Search in Liferay
Liferay 7 EE Search Features
Shield | Security for Elasticsearch
Protect your Liferay index with a username and password
SSL/TLS encryption for traffic...
Marvel | Monitor for Elasticsearch
Real-Time and Historical Analysis
Real-Time Cluster Health at a Glance
Multicluster Sup...
Details are coming soon...
Search in Liferay
Liferay 7 EE Search Features: Licensing
Resources
André de Oliveira: Harnessing the power of search, Liferay DEVCON, 2015
Michael Han: Search Reference documentat...
Thanks!
tibor.lipusz@liferay.com
https://github.com/lipusz
In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20
Próximos SlideShares
Carregando em…5
×

In search of: A meetup about Liferay and Search 2016-04-20

Presentation for meetup http://www.meetup.com/Liferay-Budapest-Tech-Meetup/events/229996198/

Liferay Hungary Kft., Budapest, 2016-04-20

  • Entre para ver os comentários

In search of: A meetup about Liferay and Search 2016-04-20

  1. 1. In Search of... A topic for a meetup :-) Tibor Lipusz Lead Software Engineer, SME of Search & Security @ Subscription Services 2016-04-20, Budapest
  2. 2. Agenda Understanding Search Technology Stack Brief Overview of Search in Liferay
  3. 3. Understanding Search
  4. 4. Why do we search?
  5. 5. Understanding Search Why do we search? Because we don’t know everything :-)
  6. 6. Understanding Search Why do we search? Basic Human instinct We want answers→
  7. 7. Understanding Search Why do we search? Basic Human instinct We want answers→ Humans are information-driven Knowledge→ is key to survive
  8. 8. Understanding Search Why do we search? Basic Human instinct We want answers→ Humans are information-driven Knowledge is key to survive→ Brainless Modeling the world→
  9. 9. Understanding Search Why do we search? Basic Human instinct We want answers→ Humans are information-driven Knowledge is key to survive→ Brainless Modeling the world→ The River and the Fish Information is fooding us, but most of→ the time we’re interested in only a specific slice of the“big cake”
  10. 10. What do we search for?
  11. 11. Understanding Search What do we search for? Everything! :-)
  12. 12. Understanding Search What do we search for? Places
  13. 13. Understanding Search What do we search for? Images
  14. 14. Understanding Search What do we search for? Books
  15. 15. Understanding Search What do we search for? Science (Answers)
  16. 16. Understanding Search What do we search for? Articles
  17. 17. Understanding Search What do we search for? Documents
  18. 18. Understanding Search What do we search for? Text
  19. 19. Understanding Search What do we search for? And many more… :-)
  20. 20. How do we search?
  21. 21. Understanding Search How do we search? The good-old (search) input box
  22. 22. Understanding Search How do we search? Search box with suggestions
  23. 23. Understanding Search How do we search? Faceted Search (guided navigation)
  24. 24. Understanding Search “The Search” Input: Search Keywords MAGIC! Output: Search Results
  25. 25. Understanding Search “The Search” It’s so simple, huh? :-)
  26. 26. Understanding Search “The Search” Unfortunately, it’s not… :-(
  27. 27. Understanding Search “The Search” Input: Search Keywords MAGIC! Output: Search Results
  28. 28. Understanding Search “The Search”: User Input What makes it complex? - Creating an intuitive yet powerful frontend (UI) - Expose syntax, but not overwhelm end-user - Features to provide: - Faceted Search - Search suggestions, Live Search - Related searches - Default View: Basic or Advanced - Type of user input: keywords, image, geolocation (coordinates)
  29. 29. Understanding Search “The Search” Input: Search Keywords MAGIC! Output: Search Results
  30. 30. Understanding Search “The Search”: Magic Magic = Actual Search (sorry, no real magic)
  31. 31. Understanding Search “The Search”: Magic What makes it complex? - Different types of content → Abstraction, Transformation - Different repositories → Integration - Language characteristics ( Site & Content Localization)← - Performance, Permissions, restrictions → Operation, Implementation, Business needs - Mapping user-thinking → Queries - Scaleability → Operation - Finding the good platform → Architecture
  32. 32. Understanding Search “The Search” Input: Search Keywords MAGIC! Output: Search Results
  33. 33. Understanding Search “The Search”: Search Results What makes it complex? - Creating an intuitive yet powerful frontend (UI) - Displaying results - Filtering - Pagination - Ordering - Highlighting - Snippets (summaries) - Navigation (to the actual content)
  34. 34. Understanding Search “The Search”: ? Didn’t we miss an important step?
  35. 35. Understanding Search “The Search”: Content The world is not ideal →
  36. 36. Understanding Search “The Search”: Content The world is not ideal → Data comes in different forms →
  37. 37. Understanding Search “The Search”: Content The world is not ideal → Data comes in different forms → Search should operate on a common form
  38. 38. Understanding Search “The Search”: Content The world is not ideal → Data comes in different forms → Search should operate on a common form ← Transparency for end-user
  39. 39. Understanding Search Summary What we need? - Data Transformation → Pre-Search - Search Backend → Search → Magic :-) - Search Frontend → Couldn’t find a good word for this, but writing something just to keep the format :-)
  40. 40. Technology Stack
  41. 41. Technology Stack
  42. 42. Technology Stack LuceneTM Full-featured text search engine library Java-based Document model Provides indexing, scoring, spellchecking, hit highlighting and advanced analysis/tokenization capabilities Provides a wide range of queries Open Source (Apache License 2.0)
  43. 43. Technology Stack Lucene: The Life Inside the Search Engine The (inverted) Index
  44. 44. Technology Stack Lucene: The Life Inside the Search Engine The Index Documents
  45. 45. Technology Stack Lucene: The Life Inside the Search Engine The Index Documents Fields
  46. 46. Technology Stack Lucene Document model Abstraction over data→ Main Phases - Indexing: creating documents - Analysis and Analyzers: tokenizing, stemming, punctuation etc. - Searching: getting hits - Analysis! - Queries, Scoring (relevance), Boosting, Facets etc.
  47. 47. Technology Stack Lucene Why would we prefer this over databases?
  48. 48. Technology Stack Lucene: Search Index vs. DB - Different purposes - DB: permanent storage, data-driven, application-centric - Search Engine, Index: optimized for search, user-centric - Search in search engine is about relevance - Database searches does not provide us with fuzzy searching or any type of relevancy - Can apply algorithms like“More Like This”to obtain similar content - Advanced Features - Geolocation - Faceting of results - Multi-lingual searching
  49. 49. Technology Stack
  50. 50. Technology Stack Solr Highly reliable, scalable and fault tolerant Provides distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more Standalone enterprise search server (web application) with a REST-like API Wide range of clients Apache Foundation
  51. 51. Technology Stack Solr: Features
  52. 52. Technology Stack Solr: Features
  53. 53. Technology Stack Solr: Features
  54. 54. Technology Stack
  55. 55. https://www.elastic.co/products/elasticsearch Highly scalable, distributed, open-source full-text search and analytics engine (server) Clustered by Design Extensible with plugins Technology Stack Elasticsearch
  56. 56. Technology Stack Elasticsearch: Features (Near) Real-Time Data*: all data is immediately made available for search and analytics.
  57. 57. Technology Stack Elasticsearch: Features Massively Distributed: ES allows you to start small and scale horizontally as you grow. Simply add more nodes, and let the cluster automatically take advantage of the extra hardware
  58. 58. Technology Stack Elasticsearch: Features High Availability: ES clusters are resilient — they will detect new or failed nodes, and reorganize and rebalance data automatically
  59. 59. Technology Stack Elasticsearch: Features Multitenancy: A cluster may contain multiple indices that can be queried independently or as a group. Index aliases.
  60. 60. Technology Stack Elasticsearch: Features Full-Text Search: ES builds distributed capabilities on top of Apache Lucene to provide the most powerful full- text search capabilities
  61. 61. Technology Stack Elasticsearch: Features RESTful API: ES is API driven. Almost any action can be performed using a simple RESTful API using JSON over HTTP
  62. 62. Technology Stack Elasticsearch: Features Open Source: Apache 2 License
  63. 63. Technology Stack Elasticsearch: Features Built on top of Lucene: Apache Lucene is a high performance, full-featured Information Retrieval library, written in Java
  64. 64. Technology Stack Elasticsearch: Architecture Elasticsearch Cluster Application DB Search Index
  65. 65. Technology Stack Elasticsearch Elasticsearch Cluster
  66. 66. Technology Stack Elasticsearch: Cluster
  67. 67. Technology Stack Elasticsearch: Nodes
  68. 68. Technology Stack Elasticsearch: Index
  69. 69. Technology Stack Elasticsearch: Shards
  70. 70. Technology Stack Elasticsearch: Lucene Index
  71. 71. Brief Overview of Search in Liferay
  72. 72. Search in Liferay The Liferay Search Infrastructure Liferay Platform Assets: web content, message boards, wiki pages... Search infrastructure (Magic happens here) Search engine(s) Indices, documents, analysis...
  73. 73. → 2004: Lucene: from the very beginnings - Liferay Search APIs: built around/on top of Lucene, default engine → 2008 (5.1): abstracting out search mechanism → Solr integration (plugin) → 2011 (6.0): Faceted Search Support Search in Liferay The Evolution of Search
  74. 74. → 2011-2014: (6.0-6.2): enhancing search functionalities - Suggestions,“Did You Mean”, filtering, improved permission awareness → 2016 (7.0): Elasticsearch integration - Default search engine - Generic queries - OSGi module - Partnership with Elastic Search in Liferay The Evolution of Search
  75. 75. Best of breed Built for modern web applications Distributed and clusterable by design Lucene based Great vendor support Great monitoring tools: Marvel Search in Liferay Why Elasticsearch?
  76. 76. Open Source Amazing documentation High "just works" factor, e.g. zero-config indexing and clustering REST for queries, health, admin - everything Great Java Client API Pretty JSON for talks ;-) Search in Liferay Great for Developers
  77. 77. Search in Liferay Clustering with Liferay and Elasticsearch Production mode Dev mode
  78. 78. Security Monitoring Search in Liferay Liferay 7 EE Search Features
  79. 79. Shield | Security for Elasticsearch Protect your Liferay index with a username and password SSL/TLS encryption for traffic within the Liferay Elasticsearch cluster Elasticsearch plugin - no need for an external security solution Restrict access to Liferay Portal instances with IP filtering Liferay Integration Search in Liferay Liferay 7 EE Search Features: Security
  80. 80. Marvel | Monitor for Elasticsearch Real-Time and Historical Analysis Real-Time Cluster Health at a Glance Multicluster Support Liferay Integration Search in Liferay Liferay 7 EE Search Features: Monitoring
  81. 81. Details are coming soon... Search in Liferay Liferay 7 EE Search Features: Licensing
  82. 82. Resources André de Oliveira: Harnessing the power of search, Liferay DEVCON, 2015 Michael Han: Search Reference documentation (Liferay Internal Documentation) Tibor Lipusz: Introduction to Indexing & Searching, Faceted Search in Liferay (Liferay Internal Training) Tibor Lipusz: Introduction to Search: Elasticsearch and Solr (Liferay Internal Training) Source of Images, Figures: www.liferay.com https://dev.liferay.com www.amazon.com www.google.com http://lucene.apache.org www.elastic.co http://lucene.apache.org/solr/
  83. 83. Thanks! tibor.lipusz@liferay.com https://github.com/lipusz

×