SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Machine Learning in a Twitter
ETL using ELK
(ELASTICSEARCH, LOGSTASH, KIBANA)
MELVYN PEIGNON
melvynpeignon@gmail.com
What is an ETL?
• ETL: Extract Load Transform
Source Transformation
Data
warehouse/
Data store
Raw data
Processed
data
Extract Transform Load
Our use case: An ETL for Twitter
https://github.com/melvynator/ELK_twitter
Goals:
•Simplify a recurrent task for several members of the lab
•Normalize the data collection (Have a “universal” format)
•Have tweets analyzed the way we want (Emoji, punctuation)
•Include some machine learning model in our ETL
Our tools: ELK
•E : Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine
•L : Logstash
Logstash is an open source, server-side data processing pipeline that
simultaneously ingests data from multiple sources transforms it, and then
sends it to your favorite “stash”.
•K: Kibana
Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack
Source: Elastic website
Our model
Twitter Logstash Elasticsearch Kibana
Sentiment
API
Extract Transform Load
Logstash
Logstash allows you to ingest data from multiple sources, transform
the data and then store your processed data.
Notions you need to know:
•Event
•Input
•Filter
•Output
Petrol Pipeline
Storage
Truck
Refinery
Pipeline
Petrol Chemical
Logstash Pipeline
Elasticsearch
Filters
Twitter
API
Twitter
Input
Elasticsearch
outputData
Logstash
Logstash: Event
An event can be described as one raw data traveling across the
pipeline:
•A Log
•A Line
•A JSON
•…
Logstash: Input
The input plugins consume data from a source.
•File
•Elasticsearch
•Twitter API
•Github API
•…
Logstash: Filter
A filter plugin performs intermediary processing on an event. Filters are
often applied conditionally depending on the characteristics of the
event.
•Clone
•Mutate
•Ruby
•…
Logstash: Output
An output plugin sends event data to a particular destination.
•Elasticsearch
•MongoDB
•File
•…
Logstash: Flow
Input Filters Output
Events
Processed
events
Example: tutorial_input.txt
{
"last_name":"John",
"first_name":"Doe",
"age": 25,
"degree":"Master",
"school":"Stanford",
"comment": "A guy from somewhere"
}
{
"name":"John Doe",
"age": 25,
"comment": "A guy from somewhere",
"created_at" : 2017-09-04T08:39:54.847Z
}
{
"school": "Stanford",
"degree": "Master",
"created_at" : 2017-09-04T08:39:54.877Z
}
What we have: What we want:
Example: Building the configuration file
input {
file {
path => "/usr/local/Cellar/logstash/5.5.2/tutorial_input.txt"
start_position => beginning
sincedb_path => "/dev/null"
codec => "json"
} # the input will be a file containing JSON (takes only absolute path)
}
filter {} # Empty for the moment (We will only fill this)
output { stdout {codec => rubydebug}} # Print the result to the console
Example: Output
{
"path" => "/usr/local/Cellar/logstash/5.5.2/tutorial_input.txt",
"@timestamp" => 2017-09-04T07:03:26.388Z,
"degree" => "Master",
"@version" => "1",
"host" => "Melvyns-MacBook-Pro.local",
"last_name" => "Doe",
"school" => "Stanford",
"comment" => "A guy from somewhere",
"first_name" => "John",
"age" => 25
}
Example: Building “name” key
ruby { # Allows you to insert ruby code
code =>
‘ event.set("name", event.get("[first_name]") + " " +
event.get("[last_name]")) ’
} #event.get() allows retrieving a value given a key, event.set() to create
one
mutate {
remove_field => ["first_name", "last_name", "host", "path"]
} # mutate allows basic transformation on events; here we are removing 4 keys
The order used to modify the
document matters! Here
“Ruby” will be applied first
then “Mutate”.
1. Merging “first_name” and “last_name” into ”name”
2. Removing “first_name” and “last_name”
Example: Output
{
"@timestamp" => 2017-09-04T08:33:26.599Z,
"school" => "Stanford",
"degree" => "Master",
"@version" => "1",
"name" => "John Doe",
"comment" => "A guy from somewhere",
"age" => 25
}
Example: Building two events out of one
clone {
clones => ["education"]
} # we are duplicating our event and add in the replica a key "type" with "education" as a value.
if ([type] == "education") {
mutate {
remove_field => ["name", "age", "comment", "type"]
}
}
else {
mutate {
remove_field => ["school", "degree"]
}
}
Example: Output
{
"@timestamp" => 2017-09-04T08:39:54.847Z,
"name" => "John Doe",
"comment" => "A guy from somewhere",
"@version" => "1",
"age" => 25
}
{
"@timestamp" => 2017-09-04T08:39:54.897Z,
"school" => "Stanford",
"@version" => "1",
"degree" => "Master"
}
Example: Changing the name of a field
mutate {
remove_field => ["@version"] # @version, @timestamp are created
for every events
rename => { "@timestamp" => "created_at" } # change the name of a
field
}
The order inside mutate may
vary. You cannot know if
“remove_field” will be applied
before or after “rename”
Example: Output
{
"created_at" => 2017-09-04T08:39:54.847Z,
"name" => "John Doe",
"comment" => "A guy from somewhere",
"age" => 25
}
{
"created_at" => 2017-09-04T08:39:54.897Z,
"school" => "Stanford",
"degree" => "Master"
}
Logstash: Twitter input
input {
twitter {
consumer_key => "<YOUR-KEY>"
consumer_secret => "<YOUR-KEY>"
oauth_token => "<YOUR-KEY>"
oauth_token_secret => "<YOUR-KEY>"
keywords => [ "random", "word"]
full_tweet => true
type => "tweet"
}
}
Only one input:
Logstash: Our filters
There are many filters, but the overall goals of the filters are to:
•Remove depreciated fields
•Divide the tweet into two or three events (users and tweet)
•Remove the nesting of the JSON
•Remove the fields not used
Our model
Twitter Logstash Elasticsearch Kibana
Sentiment
API
Building the API
•Python
•Flask
•Building your model based on labeled data
•Design an endpoint that will receive the data you want to predict
Logstash REST filter
•https://github.com/lucashenning/logstash-filter-rest
•Allows RESTful resources inside Logstash
•Will call your Machine learning API
•Will add information to your events
Logstash REST filter: Example
rest {
request => {
url => http://localhost:5000/predict
method = "post"
params => {
"submit" => "%{tweet_content}"
}
headers => {
"Content-Type" => "application/json"
}
}
target => 'rest_result'
}
Time to do it yourself
•https://github.com/melvynator/Logstash_tutorial
•https://docs.google.com/forms/d/e/1FAIpQLSdDdFxmT5ZCXInaSohJB
Bo6fKKmCg3KLegeOOrxl1l4_sc-7g/viewform
Thank you!
Questions?

Mais conteúdo relacionado

Mais procurados

Pelatihan Kader Posyandu
Pelatihan Kader PosyanduPelatihan Kader Posyandu
Pelatihan Kader PosyanduChaicha Ceria
 
Kerangka acuan kelas ibu hamil sdrja
Kerangka acuan kelas ibu hamil sdrjaKerangka acuan kelas ibu hamil sdrja
Kerangka acuan kelas ibu hamil sdrjaramanityaikhsanmaula
 
Daftar ape outdoor tk paud tahun 2015
Daftar ape outdoor tk paud tahun 2015Daftar ape outdoor tk paud tahun 2015
Daftar ape outdoor tk paud tahun 2015Redis Manik
 
Laporan pelaksanaan ukk 2015
Laporan pelaksanaan ukk 2015Laporan pelaksanaan ukk 2015
Laporan pelaksanaan ukk 2015Ridwan Piliang
 
SOP Pemberantasan sarang nyamuk.docx
SOP Pemberantasan sarang nyamuk.docxSOP Pemberantasan sarang nyamuk.docx
SOP Pemberantasan sarang nyamuk.docxFerryNjud
 
RENCANA AKSI BELA NEGARA.docx
RENCANA AKSI BELA NEGARA.docxRENCANA AKSI BELA NEGARA.docx
RENCANA AKSI BELA NEGARA.docxyuliana141786
 
TRAINING AGENDA_DIGIVET_sent to partners.docx.pdf
TRAINING AGENDA_DIGIVET_sent to partners.docx.pdfTRAINING AGENDA_DIGIVET_sent to partners.docx.pdf
TRAINING AGENDA_DIGIVET_sent to partners.docx.pdfClaudia Lanteri
 
4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...
4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...
4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...alfilia ningrum
 
MINLOK LINSEK TW IV TAHUN 2022 (1).pdf
MINLOK LINSEK TW IV TAHUN 2022 (1).pdfMINLOK LINSEK TW IV TAHUN 2022 (1).pdf
MINLOK LINSEK TW IV TAHUN 2022 (1).pdfMrputrawarman
 
Form hasil capaian indikator ukp september 2021
Form hasil capaian indikator ukp september 2021Form hasil capaian indikator ukp september 2021
Form hasil capaian indikator ukp september 2021Retno Sf
 
Indikator strata-posyandu - copy
Indikator strata-posyandu - copyIndikator strata-posyandu - copy
Indikator strata-posyandu - copymoh ramli
 
20. luh putri adnyani rancanan aktualisasi
20. luh putri adnyani rancanan aktualisasi20. luh putri adnyani rancanan aktualisasi
20. luh putri adnyani rancanan aktualisasiLuh Putri Adnyani
 
Standar operasional prosedur konseling gizi
Standar operasional prosedur konseling giziStandar operasional prosedur konseling gizi
Standar operasional prosedur konseling giziyusup firmawan
 
POWER-POINT-SOSIALISASI-SHK.pptx
POWER-POINT-SOSIALISASI-SHK.pptxPOWER-POINT-SOSIALISASI-SHK.pptx
POWER-POINT-SOSIALISASI-SHK.pptxfebrihandayasari
 
Contoh proposal-bantuan-pendidikan ye
Contoh proposal-bantuan-pendidikan yeContoh proposal-bantuan-pendidikan ye
Contoh proposal-bantuan-pendidikan yeWeda Widagdo
 
MONITORING KEBERSIHAN.docx
MONITORING KEBERSIHAN.docxMONITORING KEBERSIHAN.docx
MONITORING KEBERSIHAN.docxHendraHermawan52
 

Mais procurados (20)

Pelatihan Kader Posyandu
Pelatihan Kader PosyanduPelatihan Kader Posyandu
Pelatihan Kader Posyandu
 
Kerangka acuan kelas ibu hamil sdrja
Kerangka acuan kelas ibu hamil sdrjaKerangka acuan kelas ibu hamil sdrja
Kerangka acuan kelas ibu hamil sdrja
 
Notulen mmd
Notulen mmdNotulen mmd
Notulen mmd
 
Daftar ape outdoor tk paud tahun 2015
Daftar ape outdoor tk paud tahun 2015Daftar ape outdoor tk paud tahun 2015
Daftar ape outdoor tk paud tahun 2015
 
Laporan pelaksanaan ukk 2015
Laporan pelaksanaan ukk 2015Laporan pelaksanaan ukk 2015
Laporan pelaksanaan ukk 2015
 
SOP Pemberantasan sarang nyamuk.docx
SOP Pemberantasan sarang nyamuk.docxSOP Pemberantasan sarang nyamuk.docx
SOP Pemberantasan sarang nyamuk.docx
 
RENCANA AKSI BELA NEGARA.docx
RENCANA AKSI BELA NEGARA.docxRENCANA AKSI BELA NEGARA.docx
RENCANA AKSI BELA NEGARA.docx
 
TRAINING AGENDA_DIGIVET_sent to partners.docx.pdf
TRAINING AGENDA_DIGIVET_sent to partners.docx.pdfTRAINING AGENDA_DIGIVET_sent to partners.docx.pdf
TRAINING AGENDA_DIGIVET_sent to partners.docx.pdf
 
4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...
4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...
4.1.2 ep2 hasil identifikasi umpan balik, analisis dan tindak lanjut t'hadap ...
 
MINLOK LINSEK TW IV TAHUN 2022 (1).pdf
MINLOK LINSEK TW IV TAHUN 2022 (1).pdfMINLOK LINSEK TW IV TAHUN 2022 (1).pdf
MINLOK LINSEK TW IV TAHUN 2022 (1).pdf
 
Proposal kantin
Proposal kantinProposal kantin
Proposal kantin
 
Form hasil capaian indikator ukp september 2021
Form hasil capaian indikator ukp september 2021Form hasil capaian indikator ukp september 2021
Form hasil capaian indikator ukp september 2021
 
Indikator strata-posyandu - copy
Indikator strata-posyandu - copyIndikator strata-posyandu - copy
Indikator strata-posyandu - copy
 
20. luh putri adnyani rancanan aktualisasi
20. luh putri adnyani rancanan aktualisasi20. luh putri adnyani rancanan aktualisasi
20. luh putri adnyani rancanan aktualisasi
 
Standar operasional prosedur konseling gizi
Standar operasional prosedur konseling giziStandar operasional prosedur konseling gizi
Standar operasional prosedur konseling gizi
 
POWER-POINT-SOSIALISASI-SHK.pptx
POWER-POINT-SOSIALISASI-SHK.pptxPOWER-POINT-SOSIALISASI-SHK.pptx
POWER-POINT-SOSIALISASI-SHK.pptx
 
Contoh proposal-bantuan-pendidikan ye
Contoh proposal-bantuan-pendidikan yeContoh proposal-bantuan-pendidikan ye
Contoh proposal-bantuan-pendidikan ye
 
Dokcil
DokcilDokcil
Dokcil
 
TOR REFRESING KADER
TOR  REFRESING KADERTOR  REFRESING KADER
TOR REFRESING KADER
 
MONITORING KEBERSIHAN.docx
MONITORING KEBERSIHAN.docxMONITORING KEBERSIHAN.docx
MONITORING KEBERSIHAN.docx
 

Semelhante a Machine Learning Twitter ETL

03 form-data
03 form-data03 form-data
03 form-datasnopteck
 
Pxb For Yapc2008
Pxb For Yapc2008Pxb For Yapc2008
Pxb For Yapc2008maximgrp
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with ElasticsearchHolden Karau
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Building a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to SpaceBuilding a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to SpaceMaarten Balliauw
 
Writing code that writes code - Nguyen Luong
Writing code that writes code - Nguyen LuongWriting code that writes code - Nguyen Luong
Writing code that writes code - Nguyen LuongVu Huy
 
TechkTalk #12 Grokking: Writing code that writes code – Nguyen Luong
TechkTalk #12 Grokking: Writing code that writes code – Nguyen LuongTechkTalk #12 Grokking: Writing code that writes code – Nguyen Luong
TechkTalk #12 Grokking: Writing code that writes code – Nguyen LuongGrokking VN
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developersSergio Bossa
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 

Semelhante a Machine Learning Twitter ETL (20)

03 form-data
03 form-data03 form-data
03 form-data
 
Pxb For Yapc2008
Pxb For Yapc2008Pxb For Yapc2008
Pxb For Yapc2008
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
AD102 - Break out of the Box
AD102 - Break out of the BoxAD102 - Break out of the Box
AD102 - Break out of the Box
 
Apache Beam de A à Z
 Apache Beam de A à Z Apache Beam de A à Z
Apache Beam de A à Z
 
Building a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to SpaceBuilding a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to Space
 
Writing code that writes code - Nguyen Luong
Writing code that writes code - Nguyen LuongWriting code that writes code - Nguyen Luong
Writing code that writes code - Nguyen Luong
 
TechkTalk #12 Grokking: Writing code that writes code – Nguyen Luong
TechkTalk #12 Grokking: Writing code that writes code – Nguyen LuongTechkTalk #12 Grokking: Writing code that writes code – Nguyen Luong
TechkTalk #12 Grokking: Writing code that writes code – Nguyen Luong
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Sbt for mere mortals
Sbt for mere mortalsSbt for mere mortals
Sbt for mere mortals
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
Php summary
Php summaryPhp summary
Php summary
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
 
Azure F#unctions
Azure F#unctionsAzure F#unctions
Azure F#unctions
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 

Último

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 

Último (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 

Machine Learning Twitter ETL

  • 1. Machine Learning in a Twitter ETL using ELK (ELASTICSEARCH, LOGSTASH, KIBANA) MELVYN PEIGNON melvynpeignon@gmail.com
  • 2. What is an ETL? • ETL: Extract Load Transform Source Transformation Data warehouse/ Data store Raw data Processed data Extract Transform Load
  • 3. Our use case: An ETL for Twitter https://github.com/melvynator/ELK_twitter Goals: •Simplify a recurrent task for several members of the lab •Normalize the data collection (Have a “universal” format) •Have tweets analyzed the way we want (Emoji, punctuation) •Include some machine learning model in our ETL
  • 4. Our tools: ELK •E : Elasticsearch Elasticsearch is a distributed, RESTful search and analytics engine •L : Logstash Logstash is an open source, server-side data processing pipeline that simultaneously ingests data from multiple sources transforms it, and then sends it to your favorite “stash”. •K: Kibana Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack Source: Elastic website
  • 5. Our model Twitter Logstash Elasticsearch Kibana Sentiment API Extract Transform Load
  • 6. Logstash Logstash allows you to ingest data from multiple sources, transform the data and then store your processed data. Notions you need to know: •Event •Input •Filter •Output
  • 9. Logstash: Event An event can be described as one raw data traveling across the pipeline: •A Log •A Line •A JSON •…
  • 10. Logstash: Input The input plugins consume data from a source. •File •Elasticsearch •Twitter API •Github API •…
  • 11. Logstash: Filter A filter plugin performs intermediary processing on an event. Filters are often applied conditionally depending on the characteristics of the event. •Clone •Mutate •Ruby •…
  • 12. Logstash: Output An output plugin sends event data to a particular destination. •Elasticsearch •MongoDB •File •…
  • 13. Logstash: Flow Input Filters Output Events Processed events
  • 14. Example: tutorial_input.txt { "last_name":"John", "first_name":"Doe", "age": 25, "degree":"Master", "school":"Stanford", "comment": "A guy from somewhere" } { "name":"John Doe", "age": 25, "comment": "A guy from somewhere", "created_at" : 2017-09-04T08:39:54.847Z } { "school": "Stanford", "degree": "Master", "created_at" : 2017-09-04T08:39:54.877Z } What we have: What we want:
  • 15. Example: Building the configuration file input { file { path => "/usr/local/Cellar/logstash/5.5.2/tutorial_input.txt" start_position => beginning sincedb_path => "/dev/null" codec => "json" } # the input will be a file containing JSON (takes only absolute path) } filter {} # Empty for the moment (We will only fill this) output { stdout {codec => rubydebug}} # Print the result to the console
  • 16. Example: Output { "path" => "/usr/local/Cellar/logstash/5.5.2/tutorial_input.txt", "@timestamp" => 2017-09-04T07:03:26.388Z, "degree" => "Master", "@version" => "1", "host" => "Melvyns-MacBook-Pro.local", "last_name" => "Doe", "school" => "Stanford", "comment" => "A guy from somewhere", "first_name" => "John", "age" => 25 }
  • 17. Example: Building “name” key ruby { # Allows you to insert ruby code code => ‘ event.set("name", event.get("[first_name]") + " " + event.get("[last_name]")) ’ } #event.get() allows retrieving a value given a key, event.set() to create one mutate { remove_field => ["first_name", "last_name", "host", "path"] } # mutate allows basic transformation on events; here we are removing 4 keys The order used to modify the document matters! Here “Ruby” will be applied first then “Mutate”. 1. Merging “first_name” and “last_name” into ”name” 2. Removing “first_name” and “last_name”
  • 18. Example: Output { "@timestamp" => 2017-09-04T08:33:26.599Z, "school" => "Stanford", "degree" => "Master", "@version" => "1", "name" => "John Doe", "comment" => "A guy from somewhere", "age" => 25 }
  • 19. Example: Building two events out of one clone { clones => ["education"] } # we are duplicating our event and add in the replica a key "type" with "education" as a value. if ([type] == "education") { mutate { remove_field => ["name", "age", "comment", "type"] } } else { mutate { remove_field => ["school", "degree"] } }
  • 20. Example: Output { "@timestamp" => 2017-09-04T08:39:54.847Z, "name" => "John Doe", "comment" => "A guy from somewhere", "@version" => "1", "age" => 25 } { "@timestamp" => 2017-09-04T08:39:54.897Z, "school" => "Stanford", "@version" => "1", "degree" => "Master" }
  • 21. Example: Changing the name of a field mutate { remove_field => ["@version"] # @version, @timestamp are created for every events rename => { "@timestamp" => "created_at" } # change the name of a field } The order inside mutate may vary. You cannot know if “remove_field” will be applied before or after “rename”
  • 22. Example: Output { "created_at" => 2017-09-04T08:39:54.847Z, "name" => "John Doe", "comment" => "A guy from somewhere", "age" => 25 } { "created_at" => 2017-09-04T08:39:54.897Z, "school" => "Stanford", "degree" => "Master" }
  • 23. Logstash: Twitter input input { twitter { consumer_key => "<YOUR-KEY>" consumer_secret => "<YOUR-KEY>" oauth_token => "<YOUR-KEY>" oauth_token_secret => "<YOUR-KEY>" keywords => [ "random", "word"] full_tweet => true type => "tweet" } } Only one input:
  • 24. Logstash: Our filters There are many filters, but the overall goals of the filters are to: •Remove depreciated fields •Divide the tweet into two or three events (users and tweet) •Remove the nesting of the JSON •Remove the fields not used
  • 25. Our model Twitter Logstash Elasticsearch Kibana Sentiment API
  • 26. Building the API •Python •Flask •Building your model based on labeled data •Design an endpoint that will receive the data you want to predict
  • 27. Logstash REST filter •https://github.com/lucashenning/logstash-filter-rest •Allows RESTful resources inside Logstash •Will call your Machine learning API •Will add information to your events
  • 28. Logstash REST filter: Example rest { request => { url => http://localhost:5000/predict method = "post" params => { "submit" => "%{tweet_content}" } headers => { "Content-Type" => "application/json" } } target => 'rest_result' }
  • 29. Time to do it yourself •https://github.com/melvynator/Logstash_tutorial •https://docs.google.com/forms/d/e/1FAIpQLSdDdFxmT5ZCXInaSohJB Bo6fKKmCg3KLegeOOrxl1l4_sc-7g/viewform