SlideShare uma empresa Scribd logo
1 de 27
1
Dr. Stefan Schadwinkel und Mike Lohmann
22
Who we are.
Log everything
Mike Lohmann
Architektur
Author (PHPMagazin, IX, heise.de)
Dr. Stefan Schadwinkel
Analytics
Author (heise.de, Cereb.Cortex, EJN, J.Neurophysiol.)
33
Agenda.
Log everything
 What we did. What we do.
 Log everything! - Our way from Requirement to Solution
 Infrastructure and technologies: Simple, Scalable, Open Source
 Happy business users.
44
What we did.
Log everything
 Creating & operating education communities
 Webapplications
 Multi-language
 Different market rules in different countries
 Consolidating the technological basis for multiple (new) products
55
DECK36 GmbH & Co. KG
Log everything
 DECK36 is a young spin-off from ICANS
 7 core engineers with longstanding expertise
(operate, scale, automate, analyze)
 Consulting and engineering services for the
etruvian group and external customers
66
Numberfacts of PokerStrategy.com
Log everything
6.000.000
Registered Users
PokerStrategy.com
Education since 2005
19 Languages
2.800.000
PI/Day
700.000
Posts/Day
77
Moving on…
Log everything
 Build more Education communities like PokerStrategy…
 Assume PokerStrategy KPIs(?)
 Other Business models
 Add mobile and the social web…
 Our requirement: Log everything!
88
Logging Tools / Technologies
Producer
 Web/Mobile Apps
 JS Frontend
 Servers
 Databases
9/22/2013
Transport
Now:
RabbitMQ +
Erlang Consumer
OR
Kafka +
Any other Consumer
Was:
Flume
Storage
Now:
S3 Storage +
Hadoop with EMR
OR
Any other storage
Was:
Virtualized Inhouse
Hadoop
Analytics
MapReduce with
Hive/Pig
Results in any format
Excel, QlikView,
RDMS, ...
Realtime Datastream Analytics
Storm / Trident
99
Logging Infrastructure
Producer
9/22/2013
Transport Storage Analytics
Databases
and Server
S3
Rabbit MQ
Consumer
Excel,
QlikView,
Tableau,
SASS, ...
Graylog
Zabbix
Apps
1-x
Hadoop
- Cluster
RDMS
Realtime Datastream Analytics (Storm)
Nimbus
(Master)
ZookeeperZookeeper
Zookeeper
SupervisorSupervisorSupervisor
Worker
Worker
Worker
NodeJS
1010
Producer
9/22/2013
Page
Controller
Monolog-
Logger
Shovel
Local
RabbitMQ
PageHit
Event
Listener
Processor
Handler
Formatter
PageHit-Event
Logger::log()
LogMessage, JSON
/Home
1111
Producer JS (in progress)
9/22/2013
JS Client
DataCollector
(NodeJS)
Shovel
Local
RabbitMQ
Local
Storage
Validator
Tracks Event
/Home
Trigger
WebSocket
1212
Producer
9/22/2013
 LoggingComponent: Provides interfaces, filters and handlers
 LoggingBundle: Glues all together with Symfony2
 Drupal Logging Module: Using the LoggingComponent
 JS Frontend Client: LogClient for Browsers (in progress)
https://github.com/ICANS/IcansLoggingComponent
https://github.com/ICANS/IcansLoggingBundle
https://github.com/ICANS/drupal-logging-module
https://github.com/DECK36/starlog-js-frontend-client
1313
Transport
9/22/2013
 1st Solution: Flume
+ Part of the Hadoop Ecosystem
+ Flexible Central config, Extensible via Plugins
- Not mature software (flume, flume-ng, plugin interfaces, ..)
- Central config has problems with puppet
 2nd Solution: RabbitMQ
+ Local RabbitMQ  Cluster
+ Decentralized config (producers & consumers simply connect)
- HDFS Sink not pre-packaged
1414
Storage
9/22/2013
 1st Solution: Self-hosted Hadoop
- Virtualized Infrastructure makes HDFS redundant
- High costs (cluster always running, admin work)
 2nd Solution: Cloud Storage
+ Amazon S3
+ Elastic MapReduce: Hadoop on demand
+ cost effective (only pay, what you use)
1515
Compaction
9/22/2013
 RabbitMQ consumer (Erlang) stores data to cloud
 Yet: we have a mixed message stream, but want:
s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo
 MapReduce:
 Streaming (stdin/stdout to any tool)
 Computation (Hive, Pig, Cascalog, etc.)
 Amazon Redshift
 PostgreSQL-compatible Data Warehouse
Hive Partitioning!
1616
Analytics
9/22/2013
 Cascalog is Clojure, Clojure is Lisp
(?<- (stdout) [?person] (age ?person ?age) … (< ?age 30))
Query
Operator
Cascading
Output Tap
Columns of
the dataset
generated
by the query
„Generator“ „Predicate“
 as many as you want
 both can be any clojure function
 clojure can call anything that is
available within a JVM
1717
Analytics
9/22/2013
• We use Cascalog to preprocess and organize that incoming flow of log messages:
1818
Analytics
9/22/2013
 Let‘s run the Cascalog processing on Amazon EMR:
./elastic-mapreduce --create --name „Log Message Compaction"
--bootstrap-action s3://[BUCKET]/mapreduce/configure-daemons
--num-instances $NUM
--slave-instance-type m1.large
--master-instance-type m1.large
--jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar
--step-action TERMINATE_JOB_FLOW
--step-name "Cascalog"
--main-class icans.cascalogjobs.processing.compaction
--args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error
1919
Analytics
9/22/2013
 Now we can access the log data within Hive and store results again to S3:
2020
Analytics
9/22/2013
 Now, get the stats by executing a query:
 We can now simply copy the data from S3 and import in any local analytical tool
 Excel, Redshift, QlikView, R, etc.
2121
Realtime Datastream Analytics
9/22/2013
• Storm: Hadoop for realtime analytics
• Rock solid HA concept
• Highly scalable
• Can:
Processing Streams (and trigger events)
Provide a DRPC functionality
Work on enormous data load
• Fancy names for modules
(spouts/bolts/tuple/topology)
• Easy to use
Small and easy to understand API
DevMode
• Add new topologies at run time
2222
Realtime Datastream Analytics
9/22/2013
2323
Happy business users!
9/22/2013
 Questions they have often can be automated (ETL, Reports)
 New questions can be explored (Ad-hoc, Search)
 Insights can be used as feedback into the system (Decisions, Websockets)
 Data-driven applications can be created that can be used by multiple websites or
they can be taylored to individual needs.
2424
Merci.
9/22/2013
Questions
?
2525
Contacts.
9/22/2013
Dr. Stefan Schadwinkel
stefan.schadwinkel@deck36.de
ICANS_StScha
Mike Lohmann
mike.lohmann@deck36.de
mikelohmann
2626
Tools/Technologies
9/22/2013
27
DECK36 GmbH & CO. KG
Valentinskamp 18
20354 Hamburg
Germany
Phone: +49 40 22 63 82 9-0
Fax: +49 40 38 67 15 92
Web: www.deck36.de

Mais conteúdo relacionado

Mais procurados

Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Databricks
 

Mais procurados (20)

H20 - Thirst for Machine Learning
H20 - Thirst for Machine LearningH20 - Thirst for Machine Learning
H20 - Thirst for Machine Learning
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...
 
Jeremy Foran [BAI Communications] | Detecting Subway Overcrowding in Real Tim...
Jeremy Foran [BAI Communications] | Detecting Subway Overcrowding in Real Tim...Jeremy Foran [BAI Communications] | Detecting Subway Overcrowding in Real Tim...
Jeremy Foran [BAI Communications] | Detecting Subway Overcrowding in Real Tim...
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
 
Serverless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud PlatformServerless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud Platform
 
Mapreduce
MapreduceMapreduce
Mapreduce
 
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
The Future of Sharding
The Future of ShardingThe Future of Sharding
The Future of Sharding
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...
 

Destaque

Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
Open Analytics
 
DocOnTime: introducción a la empresa
DocOnTime: introducción a la empresaDocOnTime: introducción a la empresa
DocOnTime: introducción a la empresa
DocOnTime
 
105 useful webites list
105 useful webites list105 useful webites list
105 useful webites list
Mathivanan M
 
Zirkulazio aparatua
Zirkulazio aparatuaZirkulazio aparatua
Zirkulazio aparatua
KOSMODISEA
 
Contra el mito de la neutralidad de la ciencia: el papel de la historia
Contra el mito de la neutralidad de la ciencia:  el papel de la historiaContra el mito de la neutralidad de la ciencia:  el papel de la historia
Contra el mito de la neutralidad de la ciencia: el papel de la historia
cienciaspsiquicas
 

Destaque (20)

365 Daily Success Quotes
365 Daily Success Quotes365 Daily Success Quotes
365 Daily Success Quotes
 
Oas schwartz 16
Oas schwartz 16Oas schwartz 16
Oas schwartz 16
 
angular2-learn
angular2-learnangular2-learn
angular2-learn
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
AmazonRedshift
AmazonRedshiftAmazonRedshift
AmazonRedshift
 
maria
mariamaria
maria
 
DocOnTime: introducción a la empresa
DocOnTime: introducción a la empresaDocOnTime: introducción a la empresa
DocOnTime: introducción a la empresa
 
Булатулы Асхат+GPS чипы+Производители
Булатулы Асхат+GPS чипы+ПроизводителиБулатулы Асхат+GPS чипы+Производители
Булатулы Асхат+GPS чипы+Производители
 
105 useful webites list
105 useful webites list105 useful webites list
105 useful webites list
 
durumstore katalog
durumstore katalogdurumstore katalog
durumstore katalog
 
Ekhi por Esther Jiménez y Miguel Ángel González
Ekhi por Esther Jiménez y Miguel Ángel GonzálezEkhi por Esther Jiménez y Miguel Ángel González
Ekhi por Esther Jiménez y Miguel Ángel González
 
HYTORC CERTIFICATION
HYTORC CERTIFICATIONHYTORC CERTIFICATION
HYTORC CERTIFICATION
 
AMARAPORN THEPHUDSADIN NA AYUTTHAYA
AMARAPORN  THEPHUDSADIN  NA AYUTTHAYAAMARAPORN  THEPHUDSADIN  NA AYUTTHAYA
AMARAPORN THEPHUDSADIN NA AYUTTHAYA
 
Los Medios de Comunicación
Los Medios de ComunicaciónLos Medios de Comunicación
Los Medios de Comunicación
 
army social media handbook
army social media handbook army social media handbook
army social media handbook
 
Spanska kurser för äldre | Sprakresor till Spanien | Spanska Språkresor för 50+
Spanska kurser för äldre | Sprakresor till Spanien | Spanska Språkresor för 50+ Spanska kurser för äldre | Sprakresor till Spanien | Spanska Språkresor för 50+
Spanska kurser för äldre | Sprakresor till Spanien | Spanska Språkresor för 50+
 
Zirkulazio aparatua
Zirkulazio aparatuaZirkulazio aparatua
Zirkulazio aparatua
 
MID Licencias
MID LicenciasMID Licencias
MID Licencias
 
Apropiacion social 2
Apropiacion social 2Apropiacion social 2
Apropiacion social 2
 
Contra el mito de la neutralidad de la ciencia: el papel de la historia
Contra el mito de la neutralidad de la ciencia:  el papel de la historiaContra el mito de la neutralidad de la ciencia:  el papel de la historia
Contra el mito de la neutralidad de la ciencia: el papel de la historia
 

Semelhante a DECK36 - Log everything! and Realtime Datastream Analytics with Storm

Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Henning Jacobs
 

Semelhante a DECK36 - Log everything! and Realtime Datastream Analytics with Storm (20)

Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Log everything!
Log everything!Log everything!
Log everything!
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
 
ThoughtWorks Technology Radar Roadshow - Sydney
ThoughtWorks Technology Radar Roadshow - SydneyThoughtWorks Technology Radar Roadshow - Sydney
ThoughtWorks Technology Radar Roadshow - Sydney
 
Smartblitzmerker
SmartblitzmerkerSmartblitzmerker
Smartblitzmerker
 
Google Cloud Next 2021 Recap
 Google Cloud Next 2021 Recap Google Cloud Next 2021 Recap
Google Cloud Next 2021 Recap
 
Build your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resourcesBuild your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resources
 
PyCharm_31
PyCharm_31PyCharm_31
PyCharm_31
 
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
 
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Decrease build time and application size
Decrease build time and application sizeDecrease build time and application size
Decrease build time and application size
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
CNCF Québec Meetup du 16 Novembre 2023
CNCF Québec Meetup du 16 Novembre 2023CNCF Québec Meetup du 16 Novembre 2023
CNCF Québec Meetup du 16 Novembre 2023
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

DECK36 - Log everything! and Realtime Datastream Analytics with Storm

  • 1. 1 Dr. Stefan Schadwinkel und Mike Lohmann
  • 2. 22 Who we are. Log everything Mike Lohmann Architektur Author (PHPMagazin, IX, heise.de) Dr. Stefan Schadwinkel Analytics Author (heise.de, Cereb.Cortex, EJN, J.Neurophysiol.)
  • 3. 33 Agenda. Log everything  What we did. What we do.  Log everything! - Our way from Requirement to Solution  Infrastructure and technologies: Simple, Scalable, Open Source  Happy business users.
  • 4. 44 What we did. Log everything  Creating & operating education communities  Webapplications  Multi-language  Different market rules in different countries  Consolidating the technological basis for multiple (new) products
  • 5. 55 DECK36 GmbH & Co. KG Log everything  DECK36 is a young spin-off from ICANS  7 core engineers with longstanding expertise (operate, scale, automate, analyze)  Consulting and engineering services for the etruvian group and external customers
  • 6. 66 Numberfacts of PokerStrategy.com Log everything 6.000.000 Registered Users PokerStrategy.com Education since 2005 19 Languages 2.800.000 PI/Day 700.000 Posts/Day
  • 7. 77 Moving on… Log everything  Build more Education communities like PokerStrategy…  Assume PokerStrategy KPIs(?)  Other Business models  Add mobile and the social web…  Our requirement: Log everything!
  • 8. 88 Logging Tools / Technologies Producer  Web/Mobile Apps  JS Frontend  Servers  Databases 9/22/2013 Transport Now: RabbitMQ + Erlang Consumer OR Kafka + Any other Consumer Was: Flume Storage Now: S3 Storage + Hadoop with EMR OR Any other storage Was: Virtualized Inhouse Hadoop Analytics MapReduce with Hive/Pig Results in any format Excel, QlikView, RDMS, ... Realtime Datastream Analytics Storm / Trident
  • 9. 99 Logging Infrastructure Producer 9/22/2013 Transport Storage Analytics Databases and Server S3 Rabbit MQ Consumer Excel, QlikView, Tableau, SASS, ... Graylog Zabbix Apps 1-x Hadoop - Cluster RDMS Realtime Datastream Analytics (Storm) Nimbus (Master) ZookeeperZookeeper Zookeeper SupervisorSupervisorSupervisor Worker Worker Worker NodeJS
  • 11. 1111 Producer JS (in progress) 9/22/2013 JS Client DataCollector (NodeJS) Shovel Local RabbitMQ Local Storage Validator Tracks Event /Home Trigger WebSocket
  • 12. 1212 Producer 9/22/2013  LoggingComponent: Provides interfaces, filters and handlers  LoggingBundle: Glues all together with Symfony2  Drupal Logging Module: Using the LoggingComponent  JS Frontend Client: LogClient for Browsers (in progress) https://github.com/ICANS/IcansLoggingComponent https://github.com/ICANS/IcansLoggingBundle https://github.com/ICANS/drupal-logging-module https://github.com/DECK36/starlog-js-frontend-client
  • 13. 1313 Transport 9/22/2013  1st Solution: Flume + Part of the Hadoop Ecosystem + Flexible Central config, Extensible via Plugins - Not mature software (flume, flume-ng, plugin interfaces, ..) - Central config has problems with puppet  2nd Solution: RabbitMQ + Local RabbitMQ  Cluster + Decentralized config (producers & consumers simply connect) - HDFS Sink not pre-packaged
  • 14. 1414 Storage 9/22/2013  1st Solution: Self-hosted Hadoop - Virtualized Infrastructure makes HDFS redundant - High costs (cluster always running, admin work)  2nd Solution: Cloud Storage + Amazon S3 + Elastic MapReduce: Hadoop on demand + cost effective (only pay, what you use)
  • 15. 1515 Compaction 9/22/2013  RabbitMQ consumer (Erlang) stores data to cloud  Yet: we have a mixed message stream, but want: s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo  MapReduce:  Streaming (stdin/stdout to any tool)  Computation (Hive, Pig, Cascalog, etc.)  Amazon Redshift  PostgreSQL-compatible Data Warehouse Hive Partitioning!
  • 16. 1616 Analytics 9/22/2013  Cascalog is Clojure, Clojure is Lisp (?<- (stdout) [?person] (age ?person ?age) … (< ?age 30)) Query Operator Cascading Output Tap Columns of the dataset generated by the query „Generator“ „Predicate“  as many as you want  both can be any clojure function  clojure can call anything that is available within a JVM
  • 17. 1717 Analytics 9/22/2013 • We use Cascalog to preprocess and organize that incoming flow of log messages:
  • 18. 1818 Analytics 9/22/2013  Let‘s run the Cascalog processing on Amazon EMR: ./elastic-mapreduce --create --name „Log Message Compaction" --bootstrap-action s3://[BUCKET]/mapreduce/configure-daemons --num-instances $NUM --slave-instance-type m1.large --master-instance-type m1.large --jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar --step-action TERMINATE_JOB_FLOW --step-name "Cascalog" --main-class icans.cascalogjobs.processing.compaction --args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error
  • 19. 1919 Analytics 9/22/2013  Now we can access the log data within Hive and store results again to S3:
  • 20. 2020 Analytics 9/22/2013  Now, get the stats by executing a query:  We can now simply copy the data from S3 and import in any local analytical tool  Excel, Redshift, QlikView, R, etc.
  • 21. 2121 Realtime Datastream Analytics 9/22/2013 • Storm: Hadoop for realtime analytics • Rock solid HA concept • Highly scalable • Can: Processing Streams (and trigger events) Provide a DRPC functionality Work on enormous data load • Fancy names for modules (spouts/bolts/tuple/topology) • Easy to use Small and easy to understand API DevMode • Add new topologies at run time
  • 23. 2323 Happy business users! 9/22/2013  Questions they have often can be automated (ETL, Reports)  New questions can be explored (Ad-hoc, Search)  Insights can be used as feedback into the system (Decisions, Websockets)  Data-driven applications can be created that can be used by multiple websites or they can be taylored to individual needs.
  • 27. 27 DECK36 GmbH & CO. KG Valentinskamp 18 20354 Hamburg Germany Phone: +49 40 22 63 82 9-0 Fax: +49 40 38 67 15 92 Web: www.deck36.de