SlideShare a Scribd company logo
1 of 23
Download to read offline
Real time analytics
of big data with Elasticsearch




Karel Minařík
cets
                                      Fa




                                      ly tics
  SON                        Ana
J

 http://www.youtube.com/watch?v=-GftBySG99Q
http://karmi.cz
http://elasticsearch.com


                  Realtime Analytics With ElasticSearch
Using a search engine for analytics?


wat?

                              Realtime Analytics With ElasticSearch
HOW DOES SEARCH WORK?

A collection of documents




      file_1.txt
      The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...


      file_2.txt
      Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  
      programming  language  ...

      file_3.txt
      "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
HOW DOES SEARCH WORK?

How do you search documents?




File.read('file_1.txt').include?('ruby')
File.read('file_2.txt').include?('ruby')
...
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS



 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby  AND  song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS
                              Statistics!


 ruby    3                      file_1.txt        file_2.txt          file_3.txt
 pink    1                      file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
http://elasticsearch.org
ElasticSearch is an open source, scalable,
distributed, cloud-ready, highly-available full-
text search engine and database with powerful
aggregation features, communicating by JSON
over RESTful HTTP, based on Apache
Lucene.


                                 Realtime Analytics With ElasticSearch
FACETS

    Faceted Navigation



Query




Facets




                         http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
FACETS

Faceted Navigation with Elasticsearch

curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  '
{
    "query"  :  {
        "match"  :  {  "name"  :  "John"}                                                        User query
    },
    "filter"  :  {
        "terms"  :  {  "employer"  :  ["IBM"]  }                                                 “Checkboxes”
    },
    "facets"  :  {
        "employer"  :  {
            "terms"  :  {                                                                        Facets
                    "field"  :  "employer",
                    "size"    :  3
            }                                    "facets"  :  {
        }                                                "employer"  :  {
    }                                                        "missing"  :  0,
}'
                                                                     "total"  :  10,
                                                                     "other"  :  3,
                                                                     "terms"  :  [  {
                                                                         "term"  :  "ibm",
                           Response                                      "count"  :  3
                                                                     },  {
                                                                         "term"  :  "twitter",
                                                                         "count"  :  2
                                                                     },  {
                                                                         "term"  :  "apple",
                                                                         "count"  :  2
                                                                     }  ]
                                                                 }
                                                             }

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
FACETS

Visualizing the Facets



  "facets"  :  {
          "employer"  :  {
              "missing"  :  0,
              "total"  :  10,
              "other"  :  3,
              "terms"  :  [  {
                  "term"  :  "ibm",
                  "count"  :  3
              },  {
                  "term"  :  "twitter",
                  "count"  :  2
              },  {
                  "term"  :  "apple",
                  "count"  :  2
              }  ]                                    DEMO: http://bl.ocks.org/4571766
          }
      }




  d3.js ~ A Bar Chart, Part 1
  http://mbostock.github.com/d3/tutorial/bar-1.html
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets




http://demo.kibana.org
Important Concepts
‣ No batch orientation
‣ No stats precomputation and caching
‣ No predefined metrics or schemas

‣ Combination of free text search, structured
  search, and facets
‣ Scripting for performing ad–hoc analytics
‣ Extendable: write your own facet types


                                 Realtime Analytics With ElasticSearch
FACETS

Scripting
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }
         }
     }
                                                             "facets"  :  {
}'
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
FACETS

Demonstrations
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }



}'
     }
         }
                                                                        Demo
                                                             "facets"  :  {
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
Thanks!
  d

More Related Content

Viewers also liked

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
brock55
 

Viewers also liked (16)

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using Kiji
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time Personalization
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businesses
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013] (20)

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
ElasticSearch with Tire
ElasticSearch with TireElasticSearch with Tire
ElasticSearch with Tire
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Linked Data on Rails
Linked Data on RailsLinked Data on Rails
Linked Data on Rails
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Mining legal texts with Python
Mining legal texts with PythonMining legal texts with Python
Mining legal texts with Python
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in html
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic web
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatial
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Google code search
Google code searchGoogle code search
Google code search
 

More from Karel Minarik

Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
Karel Minarik
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational Databases
Karel Minarik
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)
Karel Minarik
 

More from Karel Minarik (18)

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational Databases
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on Rails
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazyků
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra stroje
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzory
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

  • 1. Real time analytics of big data with Elasticsearch Karel Minařík
  • 2. cets Fa ly tics SON Ana J http://www.youtube.com/watch?v=-GftBySG99Q
  • 3. http://karmi.cz http://elasticsearch.com Realtime Analytics With ElasticSearch
  • 4. Using a search engine for analytics? wat? Realtime Analytics With ElasticSearch
  • 5. HOW DOES SEARCH WORK? A collection of documents file_1.txt The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ... file_2.txt Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented   programming  language  ... file_3.txt "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
  • 6. HOW DOES SEARCH WORK? How do you search documents? File.read('file_1.txt').include?('ruby') File.read('file_2.txt').include?('ruby') ...
  • 7. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 8. HOW DOES SEARCH WORK? The inverted index search  "ruby" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 9. HOW DOES SEARCH WORK? The inverted index search  "song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 10. HOW DOES SEARCH WORK? The inverted index search  "ruby  AND  song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 11. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS Statistics! ruby 3 file_1.txt file_2.txt file_3.txt pink 1 file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 13. ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full- text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene. Realtime Analytics With ElasticSearch
  • 14. FACETS Faceted Navigation Query Facets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
  • 15. FACETS Faceted Navigation with Elasticsearch curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "match"  :  {  "name"  :  "John"} User query    },    "filter"  :  {        "terms"  :  {  "employer"  :  ["IBM"]  } “Checkboxes”    },    "facets"  :  {        "employer"  :  {            "terms"  :  { Facets                    "field"  :  "employer",                    "size"    :  3            } "facets"  :  {        }        "employer"  :  {    }            "missing"  :  0, }'            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm", Response                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    } http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
  • 16. FACETS Visualizing the Facets "facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ] DEMO: http://bl.ocks.org/4571766        }    } d3.js ~ A Bar Chart, Part 1 http://mbostock.github.com/d3/tutorial/bar-1.html
  • 20. Important Concepts ‣ No batch orientation ‣ No stats precomputation and caching ‣ No predefined metrics or schemas ‣ Combination of free text search, structured search, and facets ‣ Scripting for performing ad–hoc analytics ‣ Extendable: write your own facet types Realtime Analytics With ElasticSearch
  • 21. FACETS Scripting Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } } } "facets"  :  { }'        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }
  • 22. FACETS Demonstrations Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } }' } } Demo "facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }