WhatsApp 9892124323 âCall Girls In Kalyan ( Mumbai ) secure service
Â
Montreal Elasticsearch Meetup
1.
2. LoĂŻc Bertron
Director of Research & Development @Cedrom-SNI
!
Working on Big Data for Cedrom-SNI : social media, tv & radio aggregation
Introduced Elasticsearch at Cedrom-Sni
!
Cedrom-Sni
!
10k+ different sources, 750k+ new docs/days
Our job : Ingesting, enriching, extracting analytics and intelligence from docs
loic.bertron@cedrom-sni.com
linkedin.com/in/loicbertron
@loicbertron
Who am I ?
3. ElasticSearch is offering advanced search features to any application or
website easily, scaling on a large amount of data.
«
»
ElasticSearch
4. Simple : Plug & Play - Schema free - RESTful API
!
Elastic : Automatically discover all others instances
!
Strong : Replication & Load balancing - Scales massively - Lucene based
!
Fast : Requests executed in parallel - Real Time
!
Full featured : Search, Analytics, Facets, Percolator, Geo search, Suggest, Plugins âŠ
What is ElasticSearch ?
5. Document as JSON
âą Object representing your data
âą Grouped in an index
âą One index can have multiples types of documents
{
  "message": "Introducing #ElasticSearch",
"post_date": "2014-03-12T18:30:00",
  "author": {
"first_name" : "LoĂŻc",
"email" : "loic.bertron@cedrom-sni.com"
},
"employee_at_Cedrom" : true,
"Tags" : ["Meetup","Montreal"]
}
7. Index a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T18:30:00",
  "message": "Introducing #ElasticSearch"
}'
9. Update a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T18:40:00",
  "message": "Introducing #ElasticSearch to the #Community"
}'
45. Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T18:45:00",
  "message": "The crowd is on fire #ElasticSearch"
}'
46. Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T18:45:00",
  "message": "The crowd is on fire #ElasticSearch"
}'
47. Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T18:45:00",
  "message": "The crowd is on fire #ElasticSearch"
}'
48. Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T18:45:00",
  "message": "The crowd is on fire #ElasticSearch"
}'
59. Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T19:00:00",
  "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
60. Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T19:00:00",
  "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
61. Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
  "user": "loicbertron",
  "post_date": "2014-03-12T19:00:00",
  "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
Doc 3
74. 1.Documents get indexed
2.I come back often on the search page to run my request
3.I hope that my document will be well ranked to be on top of the results page
4.if not, i wonât never see my document
Regular search engine usage
75. 1. Register my query
2. When document get indexed, the percolator look for a match again registered queries
Percolator
78. Percolator
$ curl -X GET http://node1:9200/twitter/tweet/_percolate -d '{
"doc" : {
  "user": "loicbertron",
  "post_date": "2014-03-12T19:00:00",
  "message": "A third message about #ElasticSearch"
}
}'
79. Percolator
{
  "took" : 19,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "total" : 1,
  "matches" : [
    {
       "_index" : "twitter",
       "_id" : "elasticsearch"
    }
  ]
}
80. {
"name": "Jules Verne",
"biography": "One of the greatest author",
!
"books": [
{
"title": "Vingt mille lieues sous les mers",
"genre": "Novel",
"publisher": "Hetzel"
}
{
"title": "Les ChĂąteaux en Californie",
"genre": "Drama",
"publisher": "Marc Soriano"
}
]
}
Inner objects
81. curl -XPUT node1:9200/authors/bare_author/1 -d'{
"name": "Jules Verne",
"biography": « One of the greets author"
}'
curl -XPOST node1:9200/authors/book/1?parent=1 -d '{
"title": "Les ChĂąteaux en Californie",
"genre": "Drama",
"publisher": "Marc Soriano"
}'
!
curl -XPOST node1:9200/authors/book/2?parent=1 -d '{
"title": "Vingt mille lieues sous les mers",
"genre": "Novel",
"publisher": "Hetzel"
!
}'
Parents / Childs
82. Others features
âą Suggest API : Did you mean ?, Autocomplete, âŠ
âą Results Highlight
âą More like this
âą Backup Data : Snapshot / Restore
âą File System
âą Amazon S3
âą HDFS
âą Google Compute Engine
âą Microsoft Azure
âą Hadoop connector
86. Thank you
Thank you David Pilato for his presentation : https://speakerdeck.com/dadoonet/tours-jug-elasticsearch
Thank you Kevin Kluge for his presentation : https://speakerdeck.com/elasticsearch/elasticsearch-in-20-minutes
88. Suggest
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
 "suggest" : {
  "my-title-suggestions-1" : {
   "text" : "devloping",
   "term" : {
    "size" : 3,
    "field" : "title" Â
   }
  }
 }
}'
89. Suggest
"suggest": {
  "my-title-suggestions-1": [
   {
    "text": "devloping",
    "offset": 0,
    "length": 9,
    "options": [
     {
      "text": "developing",
      "freq": 77,
      "score": 0.8888889
     },
     {
      "text": "deloping",
      "freq": 1,
      "score": 0.875
     },
     {
      "text": "deploying",
      "freq": 2,
      "score": 0.7777778
     }
    ]
   }
90. More Like This
curl -XGET 'http://node1:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1'
{
  "more_like_this" : {
    "fields" : ["name.first", "name.last"],
    "like_text" : "text like this one",
    "min_term_freq" : 1,
    "max_query_terms" : 12,
    "percent_terms_to_match" : 0.95
  }
}
92. {
  "query" : {...},
  "highlight" : {
    "number_of_fragments" : 3,
    "fragment_size" : 150,
    "tag_schema" : "styled",
    "fields" : {
      "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
      "bio.title" : { "number_of_fragments" : 0 },
      "bio.author" : { "number_of_fragments" : 0 },
      "bio.content" : { "number_of_fragments" : 5, "order" : "score" }
    }
  }
}
Highlight
94. Hadoop
âą Java library for integrating Elasticsearch and Hadoop
âą Pig, Hive, Cascading, MapReduce
âą Search and Real Time Analytics with Elasticsearch, Hadoop as Data Lake
âą Scales with Hadoop