O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Scaling graphite for application metrics

1.704 visualizações

Publicada em

How to architect graphite for high scale metrics collection.

Slides from the Orange County BigData Meetup talk 5/20/2015.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Scaling graphite for application metrics

  1. 1. Application Telemetry at scale How to Scale Graphite ! BigData Meetup May 20, 2015
  2. 2. > whoami • Jim Plush • Sr Director of Engineering @ CrowdStrike • twitter: @jimplush
  3. 3. About CrowdStrike Big Data Security Company Focus on targeted, state sponsored attacks and attribution Single enterprise can generate 2+TB of machine data per day MicroService architecture w/1000’s of VMs running. We use goodies like AWS, Kafka, Cassandra, Elastic Search, Hadoop, Scala, GoLang
  4. 4. What is Graphite? • Captures Numeric, Time-Series Data • Metric: test.myapp.host1.logins • Value: 64 • Timestamp: 1432077015
  5. 5. echo "test.myapp.host1.logins 64 `date +%s`" | nc 2003
  6. 6. Graphite • Composed of 3 projects • Carbon - collects and records metrics • Whisper - Backend storage mechanism • Graphite-Web - HTTP frontend for graphing API • Written in Python
  7. 7. What metrics to track? • counters, latencies, error rates • business metrics: sales, order latency, abandoned carts • refactoring, how do you know you’ve succeeded • hadoop metrics via MetricFactory • logins, login failures
  8. 8. Custom Event Annotations log when you deploy, change a library, make an improvement
  9. 9. Libraries • JAVA/Scala: dropwizard.github.io • Go: github.com/rcrowley/go-metrics
  10. 10. Multi Data Center Replication
  11. 11. Note: Relays may need to be scaled behind a local HAProxy
  12. 12. Specs: AWS biased NO MAGNETIC: SSD ONLY ! Load Balancer: HAProxy or ELBS Cache/Whisper: i2.2XL (8 core) 1.6TB RAID0 SSD XFS filesystem 1 cache per core Relays: c3.large (2 core) in autoscale group give it CPU#1 taskset -apc 1 PID Web Tier: M3 memory instances Use MySQL or Postgres Memcache ON
  13. 13. Data Retention [garbage_collection] pattern = garbageCollections$ retentions = 10s:7d,60s:90d ! Data size is fixed, inserted with NULLS, real data overwrites so files don’t grow larger. Makes estimation easier.
  14. 14. How does my data get to the right server?
  15. 15. Old School Sharding hash(key) % numServers
  16. 16. Modern Way Consistent Hashing • Less shuffling of data when adding or removing nodes • Graphite utilizes this approach much like Cassandra’s mechanism for distributing data
  17. 17. Graphite Downside: ! unlike Cassandra, re-balancing data isn’t automatic. You’ll need Carbonate or BuckyTools ! https://github.com/jssjr/carbonate or https://github.com/jjneely/buckytools
  18. 18. Seyren Alerting on actionable metrics
  19. 19. Key Resources https://gist.github.com/obfuscurity/63399584ea4d95f921e4 ! http://bitprophet.org/blog/2013/03/07/graphite/ ! ClusterServers vs Destinations https://answers.launchpad.net/graphite/+question/228472 ! Tuning for 3m writes a minute https://answers.launchpad.net/graphite/+question/178969 ! http://www.aosabook.org/en/graphite.html ! https://grey-boundary.io/the-architecture-of-clustering-graphite/ !
  20. 20. WE’RE HIRING :) •Offices in Irvine / Seattle / DC •Massive Scale •Fast Growing Company •Distributed Systems An Environment Made For Engineers •Open Source Friendly •Stock / Bonus plans •GeekDesks jim@crowdstrike.com twitter: @jimplush crowdstrike.com/careers The Tech For All-The-Things!