O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Machine Learning and Logging for Monitoring Microservices

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Monitoring Docker with ELK
Monitoring Docker with ELK
Carregando em…3
×

Confira estes a seguir

1 de 39 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a Machine Learning and Logging for Monitoring Microservices (20)

Mais recentes (20)

Anúncio

Machine Learning and Logging for Monitoring Microservices

  1. 1. ANALYZE THIS: ML AND LOGGING FOR MONITORING MICROSERVICES
  2. 2. skb rides the rocket
  3. 3. kernel: xen_netfront: xennet: skb rides the rocket: 19 slots
  4. 4. Daniel Berman • Product Evangelist @logzio • LAMPer, Docker, ELK • Speaker/Blogger (SitePoint, DZone) • Meetup organizer: TLV-PHP, TLV- ELK • Contact me: @proudboffin | daniel@logz.io
  5. 5. 1-min on • Log analysis company • ELK-as-a-Service • Enterprise grade: auto- everything, security, multi-tenant • Additional features: ELK Apps, S3 archiving, AI
  6. 6. Agenda • Logs + logging background • The challenges • Centralized logging with ELK • Using machine learning • Demo • Q & A
  7. 7. WHAT ARE LOGS?
  8. 8. Online user behavior IoT analytic s Dev, monitoring & system troubleshooting Security and compliance LOG ANALYTICS IS FUNDEMENTAL FOR UNDERSTANDING MACHINES Security devices App server Network
  9. 9. LOG ANALYTICS FOR MICROSERVICES • Service logs 10/01/17 00:53:51 INFO apollo i.l.c.b.c.b.MappedPageFactory: Page file /tmp/logzio-logback-buffer/listener-metrics/logzio-logback-appender/data/page- 48.dat was just deleted. • Service metrics 10/01/17 02:53:51 INFO apollo a.b.c.metrics: Account-Incoming, key: 126, value: 54321
  10. 10. LOG ANALYTICS FOR MICROSERVICES • Host logs/metrics • Execution runtime logs
  11. 11. THE CHALLENGES WITH LOGGING MICROSERVICES • Transient • Distributed • Independent • Multilayered
  12. 12. LOGGING IN A DOCKERIZED WORLD $ docker logs 2016-06-02T13:05:22.614090Z 0 [Note] InnoDB: 5.7.12 started; log sequence number 2522067
  13. 13. LOGGING IN A DOCKERIZED WORLD $ docker stats CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 3747bd397456 0.01% 3.641 MB / 2.1 GB 0.17% 3.366 kB / 648 B 0 B / 0 B 396e42ba0d15 0.11% 1.638 MB / 2.1 GB 0.08% 9.79 kB / 648 B 348.2 kB / 0 B 468bf755240a 3.19% 45.67 MB / 2.1 GB 2.17% 25.19 MB / 17.95 MB 774.1 kB / 0 B 5f16814a3c0e 0.01% 495.6 kB / 2.1 GB 0.02% 8.564 kB / 648 B 0 B / 0 B 74cdfa7b8a0c 0.04% 3.908 MB / 2.1 GB 0.19% 2.028 kB / 648 B 0 B / 0 B 99bafb7600fc 0.00% 32.95 MB / 2.1 GB 1.57% 0 B / 0 B 2.093 MB / 20.48 kB
  14. 14. LOGGING IN A DOCKERIZED WORLD $ docker daemon time="2016-06-05T12:03:49.716900785Z" level=debug msg="received containerd event: &types.Event{Type:"exit", Id:"3747bd397456cd28058bb40799cd0642f431849b5c43ce56536ab7f55a98114f", Status:0x0, Pid:"4120a7625a592f7c95eab4b1b442a45370f6dd95b63d284714dbb58f00d0a20d", Timestamp:0x57541525}"
  15. 15. OH, AND THERE’S THIS… Large & complex application & operational logs Multiple different formats Multiple log files per component / instance SLOW & labor Intensive Error-prone processing Relies on an individual’s skills Expensive Hard to find what is relevant and important in log data Scaling and securing open-source implementation is expensive and almost impossible to scale
  16. 16. CENTRALIZED LOGGING TO THE RESCUE • Centralized data collection and management management • Provides inferable context to logs • Analysis, event correlation and visualization visualization
  17. 17. OLD SCHOOL LOGGING $ grep ' 30[1234] ' /var/logs/apache2/access.log | grep -v baidu | grep -v Googlebot 173.230.156.8 - - [04/Sep/2015:06:10:10 +0000] "GET /morpht HTTP/1.0" 301 26 "-" "Mozilla/5.0 (pc-x86_64-linux-gnu)" 192.3.83.5 - - [04/Sep/2015:06:10:22 +0000] "GET /?q=node/add HTTP/1.0" 301 26 "http://morpht.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5"
  18. 18. NEW SCHOOL LOGGING
  19. 19. A BIT ABOUT ELK • World’s most popular open source log analysis platform • 4.5M downloads a month! • Centralized logging AND: search, BI, SEO, IoT, and more
  20. 20. THE MARKET IS DOMINATED BY OPEN SOURCE SOLUTIONS Over the past 3 years, the market shifted attention from proprietary to open source It’s simple to get started and play with ELK, and the UI is just beautiful Simple and beautifulOpen Source/Flexible Fast-growing community, no vendor lock-in and no license cost Blazing quick responses even when searching through millions of documents Fast. Very fast. ELK Stack 500,000+ companies 15K companies
  21. 21. TYPICAL ELK PIPELINE • Visualizations and dashboards • Log shipper • Collecting and parsing • Full-text search and analysis engine • Scalable, fast, highly available • REST API
  22. 22. STEP 1 – INSTALLING ELK https://hub.docker.com/r/sebp/elk/ elk: image: sebp/elk ports: - "5601:5601" - "9200:9200" - "5044:5044" $ sudo docker-compose up elk https://github.com/deviantony/docker-elk
  23. 23. • Logging drivers (json-file, syslog, fluentd…) STEP 2 – FORWARDING LOGS $ docker run -d --name nginx --log-driver=syslog --log-opt syslog- address=tcp://SYSLOG_IP:PORT -p 80:80 nginx:alpine webserver: image: nginx:alpine container_name: nginx ports: - "80:80" s logging: driver: syslog options: syslog-address=tcp://SYSLOG_IP:PORT syslog-tag: "nginx"
  24. 24. • Logspout $ docker run --name="logspout" -- volume=/var/run/docker.sock:/var/run/docker.sock gliderlabs/logspout syslog+tls://167.23.145.12:55555 STEP 2 – FORWARDING LOGS
  25. 25. • Filebeat yourapp: image: your/image ports: - "80:80" links: - elk elk: image: sebp/elk ports: - "5601:5601" - "9200:9200" - "5044:5044" STEP 2 – FORWARDING LOGS
  26. 26. • Configure Logstash (input, filter, output) filter { if [type] == "dockerlogs" { if ([message] =~ "^tat ") { drop {} } grok { break_on_match => false match => [ "message", " responded with %{NUMBER:status_code:int}" ] tag_on_failure => [] } } } STEP 3 – PARSING
  27. 27. • DO NOT expose Elasticsearch (‘network.host’) • Use proxies • Isolate Elasticsearch • Change default ports STEP 4 – SECURITY
  28. 28. OTHER SOLUTIONS • Hosted ELK (Logz.io, Elastic Cloud, Sematext) • Other logging/monitoring SaaS (Datadog, Papertrail, Loggly)
  29. 29. THE BIG ELEPHANT (ELK) IN THE ROOM • Not knowing what question to ask • Needle in the haystack syndrome • Logs cannot be analyzed by a human alone • Anomaly detection does not work
  30. 30. ANOMALY DETECTION DOESN’T WORK • Not every anomaly is an error • Not every error represents itself in an anomaly • Apps run as step functions
  31. 31. ENTER MACHINE LEARNING?
  32. 32. DEMO TIME!
  33. 33. WHAT IS MACHINE LEARNING? “Machine learning is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed.” (TechTarget)
  34. 34. SUPERVISED MACHINE LEARNING (BY EXAMPLE) 1. Labeling – gathering and labeling logs • User behavior • Inter-user similarities • Public resources 2. Training a classifier – defining what log is important 3. Integration within the system
  35. 35. ‘skb rides the rocket’ kernel: xen_netfront: xennet: skb rides the rocket: 19 slots (http://serverfault.com/questions/647489/what-is-causing- skb-rides-the-rocket-errors)
  36. 36. EXTRAS • Logz.io blog: http://logz.io/blog • Elastic docs http://elastic.co/documentation • Slack team: https://elk-stack-professionals- pfuiokfxqy.now.sh • ELK meetup: https://www.meetup.com/Tel-Aviv-Yafo- ELK-ElasticSearch-Meetup/
  37. 37. THANKS! @proudboffin | daniel@logz.io

Notas do Editor

  • Syslog message, result of packet loss, due to a kernel bug in linux.
  • Syslog message, result of packet loss, due to a kernel bug in linux.
  • Logs are a stream of aggregated, time-ordered events collected from the output streams of running processes and backing services
  • Does anyone not use logs?
    When running builds to identify compile errors
    When you’re running a system – for troubleshooting your system
    For learning about the behavior of your system
    So anyone creating, deploying or running software needs logs!
  • Service logs – service_id, request_id (for tracing across the architecture), type, timestamp

    Metric collection - to measure improvements, new code

    Resource utilizations (CPU, memory, Network, Filesystem)

    Runtime metrics (Jenkins build times)
  • Metric collection - to measure improvements, new code

    Resource utilizations (CPU, memory, Network, Filesystem)

    Runtime metrics (Jenkins build times)
  • Microservices are stateless. That means that an instance of a service can be created, stopped, restarted, and destroyed at any time without impacting other services. Any logging functionality we implement can’t rely on the service persisting for any period of time.
    Microservices are independent. With microservices, only the execution environment is aware of the context. Kubernetes is aware of pods for example but not the hosting machine.
    Microservices are distributed. You’ll likely find yourself logging related data from two completely independent platforms. To log effectively, we need a way to correlate events across the infrastructure.

  • Let’s take the Docker execution environment for example. You have three different types of logs and metrics that can be extracted.
  • Multiply all of this – at Logz.io for example, we’e running about 60 Docker hosts, each with 4-5 containers…
  • In modern environments, log analysis remains an extremely complicated and resource consuming task for even the most experienced developer, DevOps or IT operations teams out there. Despite having all the most sophisticated analytics and monitoring tools.

    That’s because at the end of the day, behind these tools stands a human being who needs to connect-the-dots and make informed, timely decisions; He needs to  know how to extract signals and actionable meaning out of millions of log messages.
  • In essence, centralize logging detaches logging from the containers running your microservices
    Using parsing and filtering you can give your logs context
    By structuring logs, and providing a comfortable UI, it enables easier analysis
  • All three services are started automatically
    Image persists /var/lib/elasticsearch — which is the directory that Elasticsearch stores its data in — as a volume.
  • Install a log forwarder to send to Logstash – this depends on the Docker driver used.
  • Logspout is a log router for Docker containers that runs inside Docker. It attaches to all containers on a host, then routes their logs wherever you want. It also has an extensible module system.
    Logspout is a very small Docker container (15.2MB virtual)
  • Install a log forwarder to send to Logstash – this depends on the Docker driver used.
    docker inspect afaac897ab50 | grep LogPath
  • Each Docker image has it’s own logging format, so these filters will be very specific
  • Bind the nodes to localhost or private IP
    Use proxies to communicate with clients – to add user control and to do request filtering, put in front of Kibana
  • Bind the nodes to localhost or private IP
    Use proxies to communicate with clients – to add user control and to do request filtering, put in front of Kibana
  • False alarms and high signal-to-noise ratio
  • Not every anomaly is an error
    Developer introducing a new log line
    Access usage
    Seasonality changes
    Not every error represents itself in an anomaly
    Resource utilization
    Memory leak
    Applications run as a step function
    Anomaly detection works on continuous function
  • Enables you to train a self-improving system that asks the questions for us
    Can sift through vast amount of data and flag relevant events
  • Supervised machine learning is based on the idea of learning by example

    Labeling – gathering and labeling logs – coloring the data in different colors

    Opened/unopened
    Error logs
    Exceptions logs

    Training a classifier - defining what log is important. Simply put, a classifier is a formula that you build in order to answer a question. Using labels, we build a mathematical representation of a log message, which in turn is inserted into the formula – if the result of this formula passed a specific threshold, a log is relevant.

    Integration within the system – using Hadoop and Spark
  • As IT operations become agile and dynamic, they are also getting immensely complex.
    2 main challenges in logging microservices:
    Logging in a distributed architecture
    Finding the needle in the haystack
    Proposed solutions:
    Centralized logging
    Machine learning approach
    Turns manual Dev, DevOps and IT operations into an automated process
    Poses the questions for you – revealing events that would otherwise go undetected

×