SlideShare uma empresa Scribd logo
1 de 26
Ayan Sen,Vinay Sen
@ All Things Open 2018, Raleigh
Observability at Expedia
Observability
@ All Things Open 2018, Raleigh
@ All Things Open 2018, Raleigh
Observability Events
Logs : stateless events generated by
the application
Metrics : timeseries events containing
measurements
Traces : correlated events to track
cause of ordering
@ All Things Open 2018, Raleigh
Logs
@ All Things Open 2018, Raleigh
Metrics
@ All Things Open 2018, Raleigh
Traces
Span typically represents a service call
or a block of code
Trace represents a collection of spans
correlated by an identifier
Distribution tracing tracks production requests as they track different parts of
the architecture
@ All Things Open 2018, Raleigh
Traces – Context Propagation
@ All Things Open 2018, Raleigh
Traces –Why do I care
@ All Things Open 2018, Raleigh
Traces - Event
@ All Things Open 2018, Raleigh
DistributedTracing Landscape
@ All Things Open 2018, Raleigh
Log
Metric
Trace
Observability
@ All Things Open 2018, Raleigh
A resilient, scalable tracing and analysis system
Haystack Architecture
@ All Things Open 2018, Raleigh
@ All Things Open 2018, Raleigh
 Traces
 Trends
 Service Graph
 Anomaly Detection
 Pipes
Haystack Subsystems
@ All Things Open 2018, Raleigh
Traces
@ All Things Open 2018, Raleigh
Traces
@ All Things Open 2018, Raleigh
Traces Subsystem Architecture
@ All Things Open 2018, Raleigh
Trends
@ All Things Open 2018, Raleigh
Trends Subsystem Architecture
@ All Things Open 2018, Raleigh
Service Graph Subsystem
@ All Things Open 2018, Raleigh
Service Graph Subsystem
@ All Things Open 2018, Raleigh
Anomaly Detection Subsystem
@ All Things Open 2018, Raleigh
Pipes Subsystem
@ All Things Open 2018, Raleigh
 Multiple brands
 More than few hundred services
 > 400k/sec spans ingestion
 40 node Kafka cluster
 65+ node c5.xlarge k8s cluster
 50 node c5.xlarge Cassandra
 Tens of ES node cluster
 SupportOpenTracing clients in Java, NodeJS, Go & Python (coming soon).
 Support integration with Istio
 Zipkin to Haystack span converter
 Deployment done throughTerraform scripts
Haystack @ Expedia
@ All Things Open 2018, Raleigh
 https://expediadotcom.github.io/haystack/
 https://github.com/ExpediaDotCom/haystack
 https://github.com/ExpediaDotCom/haystack-idl
 https://github.com/ExpediaDotCom/haystack-commons
 https://github.com/ExpediaDotCom/haystack-traces
 https://github.com/ExpediaDotCom/haystack-client-java
 https://github.com/ExpediaDotCom/haystack-agent
 https://github.com/ExpediaDotCom/haystack-trends
 https://github.com/ExpediaDotCom/haystack-collector
 https://github.com/ExpediaDotCom/haystack-pipes
 https://github.com/ExpediaDotCom/haystack-metrics
 https://github.com/ExpediaDotCom/haystack-service-graph
References
@ All Things Open 2018, Raleigh
Questions?

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Security Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic StackSecurity Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic Stack
 
Monitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at SkyMonitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at Sky
 
Atlassian User Group Toronto Hosted By Elasity & AWS
Atlassian User Group Toronto Hosted By Elasity & AWSAtlassian User Group Toronto Hosted By Elasity & AWS
Atlassian User Group Toronto Hosted By Elasity & AWS
 
kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発
 
ARIN 35: Internet Number Resource Status Report
ARIN 35: Internet Number Resource Status ReportARIN 35: Internet Number Resource Status Report
ARIN 35: Internet Number Resource Status Report
 
NRO Number Resource Status Report
NRO Number Resource Status ReportNRO Number Resource Status Report
NRO Number Resource Status Report
 
RIPE Atlas Streaming
RIPE Atlas StreamingRIPE Atlas Streaming
RIPE Atlas Streaming
 
RIPE Atlas streaming
RIPE Atlas streamingRIPE Atlas streaming
RIPE Atlas streaming
 
Open source historian
Open source historianOpen source historian
Open source historian
 
SW360 Update Tooling Telco
SW360 Update Tooling TelcoSW360 Update Tooling Telco
SW360 Update Tooling Telco
 
Measuring slack api_performance_using_druid
Measuring slack api_performance_using_druidMeasuring slack api_performance_using_druid
Measuring slack api_performance_using_druid
 
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
 
Boost dataviz with Python, OW2online, June 2020
Boost dataviz with Python, OW2online, June 2020Boost dataviz with Python, OW2online, June 2020
Boost dataviz with Python, OW2online, June 2020
 
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
 
Rule-Driven, Fully-Configurable Asset Tracking with GIS
Rule-Driven, Fully-Configurable Asset Tracking with GISRule-Driven, Fully-Configurable Asset Tracking with GIS
Rule-Driven, Fully-Configurable Asset Tracking with GIS
 
Gerrit topics support with AWS Lambda
Gerrit topics support with AWS LambdaGerrit topics support with AWS Lambda
Gerrit topics support with AWS Lambda
 
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
 
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
 
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
 
SensorThings API Webinar - #1 of 4 - Introduction
SensorThings API Webinar - #1 of 4 - IntroductionSensorThings API Webinar - #1 of 4 - Introduction
SensorThings API Webinar - #1 of 4 - Introduction
 

Semelhante a Observability at Expedia

Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
Harry Frost
 

Semelhante a Observability at Expedia (20)

Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphOracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Les logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiéeLes logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiée
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
 
Adding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIsAdding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIs
 
LeverX - Live Engineering with IoT on SAP Leonardo
LeverX - Live Engineering with IoT on SAP LeonardoLeverX - Live Engineering with IoT on SAP Leonardo
LeverX - Live Engineering with IoT on SAP Leonardo
 
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
 
SAP on AWS: SAPPHIRE NOW 2018 Recap
SAP on AWS: SAPPHIRE NOW 2018 RecapSAP on AWS: SAPPHIRE NOW 2018 Recap
SAP on AWS: SAPPHIRE NOW 2018 Recap
 
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Slides: How to Select a PaaS
Slides: How to Select a PaaSSlides: How to Select a PaaS
Slides: How to Select a PaaS
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificada
 

Último

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Último (20)

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 

Observability at Expedia

  • 1. Ayan Sen,Vinay Sen @ All Things Open 2018, Raleigh Observability at Expedia
  • 2. Observability @ All Things Open 2018, Raleigh
  • 3. @ All Things Open 2018, Raleigh Observability Events Logs : stateless events generated by the application Metrics : timeseries events containing measurements Traces : correlated events to track cause of ordering
  • 4. @ All Things Open 2018, Raleigh Logs
  • 5. @ All Things Open 2018, Raleigh Metrics
  • 6. @ All Things Open 2018, Raleigh Traces Span typically represents a service call or a block of code Trace represents a collection of spans correlated by an identifier
  • 7. Distribution tracing tracks production requests as they track different parts of the architecture @ All Things Open 2018, Raleigh Traces – Context Propagation
  • 8. @ All Things Open 2018, Raleigh Traces –Why do I care
  • 9. @ All Things Open 2018, Raleigh Traces - Event
  • 10. @ All Things Open 2018, Raleigh DistributedTracing Landscape
  • 11. @ All Things Open 2018, Raleigh Log Metric Trace Observability
  • 12. @ All Things Open 2018, Raleigh A resilient, scalable tracing and analysis system
  • 13. Haystack Architecture @ All Things Open 2018, Raleigh
  • 14. @ All Things Open 2018, Raleigh  Traces  Trends  Service Graph  Anomaly Detection  Pipes Haystack Subsystems
  • 15. @ All Things Open 2018, Raleigh Traces
  • 16. @ All Things Open 2018, Raleigh Traces
  • 17. @ All Things Open 2018, Raleigh Traces Subsystem Architecture
  • 18. @ All Things Open 2018, Raleigh Trends
  • 19. @ All Things Open 2018, Raleigh Trends Subsystem Architecture
  • 20. @ All Things Open 2018, Raleigh Service Graph Subsystem
  • 21. @ All Things Open 2018, Raleigh Service Graph Subsystem
  • 22. @ All Things Open 2018, Raleigh Anomaly Detection Subsystem
  • 23. @ All Things Open 2018, Raleigh Pipes Subsystem
  • 24. @ All Things Open 2018, Raleigh  Multiple brands  More than few hundred services  > 400k/sec spans ingestion  40 node Kafka cluster  65+ node c5.xlarge k8s cluster  50 node c5.xlarge Cassandra  Tens of ES node cluster  SupportOpenTracing clients in Java, NodeJS, Go & Python (coming soon).  Support integration with Istio  Zipkin to Haystack span converter  Deployment done throughTerraform scripts Haystack @ Expedia
  • 25. @ All Things Open 2018, Raleigh  https://expediadotcom.github.io/haystack/  https://github.com/ExpediaDotCom/haystack  https://github.com/ExpediaDotCom/haystack-idl  https://github.com/ExpediaDotCom/haystack-commons  https://github.com/ExpediaDotCom/haystack-traces  https://github.com/ExpediaDotCom/haystack-client-java  https://github.com/ExpediaDotCom/haystack-agent  https://github.com/ExpediaDotCom/haystack-trends  https://github.com/ExpediaDotCom/haystack-collector  https://github.com/ExpediaDotCom/haystack-pipes  https://github.com/ExpediaDotCom/haystack-metrics  https://github.com/ExpediaDotCom/haystack-service-graph References
  • 26. @ All Things Open 2018, Raleigh Questions?

Notas do Editor

  1. In todays microservice architecture, there’s a lot going on at the backend while serving a request. Multiple service interactions, levels of resiliency, multiple layers of caching etcs. So in case something goes wrong its not always evident as to why it happened. Observability is the ability to understand and troubleshoot our systems in production by collecting a series of timestamped events. These events can be either request scoped/system scoped. A garbage collection event would most likely not associated with a request, whereas a response time event is. For the sake of this presentation we are going to talk about events, which are request scoped.
  2. So what are the kind of events we are talking about here, I think they can be broadly classified into three types 1. Logs 2. Metrics 3. Traces Collecting each kind of events have their own use-cases but they don’t really have very clear boundaries. For instance an audit log which logs the response time for an incoming request in the system can be used to compute the average response time metric. In this case as you see you don’t explicitly collect the metric event.
  3. 1 minute
  4. 1 minute
  5. 2 minutes Distribution tracing tracks production requests by correlating different service interactions in the architecture
  6. 2 minutes Context propagation
  7. Reduce time to triage by contextualizing errors and delays Visualizing latencies over the network 2 minutes
  8. 1 minute
  9. 2 minutes
  10. 3 minutes
  11. 3 minutes
  12. 3 minutes This is the architecture of haystack system. We have kafka as central nervous system backing haystack. 1. Componentized: Haystack includes all of the necessary subsystems to make the system ready to use. But we have also ensured that the overall system is designed in such a way that you can replace any given subsystem to better meet your own needs. 2. Resilient: There is no single point of failure. 3. Scalable: We have completely decentralized our system which helps us to scale every component individually The architecture can be broken down into 3 parts : Subsystems : Haystack includes various subsystems to perform tracing, trending, service graph etc. We will go over these subsystems in a bit. Data Stores : We have 3 data stores, namely Cassandra : To store the raw stitched spans ,ie, traces. ElasticSearch is used as an indexer to query the data faster and MetricTank backed by Cassandra to store trends in metrics 2.0 format. Visualization : Haystack UI is a central place to visualize the processed data such as traces, trends, alerts from various haystack sub-systems. Let’s see the subsystems one by one.
  13. I will be doing deep dives about usecases and architecture about each of the current subsystems haystack has. Traces subsystem is mandatory, others are optional. If you deploy you can configure haystack to have only a subset of them, except Traces. Some of them are dependent on others, to be specific Anomaly detection requires Trends as you need trends to detect anomalies. Outcome of Trends goes in Kafka and Anomaly detection picks it up from there. We would love you to feel free and add any new subsystem on top of Kafka backone. It doesn’t need to be part of haystack’s repositories, if you need something specific to your companies need, you can build that and run on top of haystack’s Kafka. Don’t need to come and talk to us about adding any new thing in.
  14. Demo If you know the traceId you can jump to see the timeline/waterfall showing how a single end user request got severed inside your system. In case of this example, user request was to stark service at /stark/endpoint You might have used Zipkin or Jaeger before Usecase Identifying root cause of errors Perf bottlenecks Understanding of flow of requests Open tracing compliant Use 3 IDs traceId, spanId, and parentSpanId spanId needs to be passed on from a service to the next one, which is your logic pass it in http header or in payload. For the next service when it is logging span it will use the caller’s spanId as its parentSpanId. We are also looking into supporting zipkin style ids, they have a slight but crucial difference in Ids.
  15. Usecase You might not have traceIds handy For example, lets say your site has started showing intermittent errors for US SiteId, you might want to see traces where error = true and siteid = us and check traces for that scenario You can setup a number of whitelisted fields and they become searchable on haystack-ui. Click on any of these traces and you will get the timeline/waterfall view
  16. About the architecture, two apps in traces subsystem Indexer Reader
  17. The Trends subsystem is responsible for reading spans and generating vital service health trends. Introduce a new term operation. What is [user service -> loyalty service example] service operation
  18. The Trends subsystem is responsible for reading spans and generating vital service health trends. This system is loosely coupled and can be run on demand. It has two components : haystack-span-timeseries-transformer - This component is responsible for reading span and converting them to metrics 2.0 compatible MetricPoints. These metricpoints are then pushed back to kafka. haystack-timeseries-aggregator - This app is responsible for reading metric points, aggregating them based on rules and pushing the aggregated metric points to Kafka. The metric points are MetricTank compliant and can be directly consumed by metrictank which is a timeseries database. Currently we compute four trends for each combination of service and operation . These are Total count success_count [count] failure_count [count] duration [mean, median, std-dev, 99 percentile, 95 percentile] Each trend is computed for 4 intervals [1min, 5min, 15min, 1hour].
  19. The Trends subsystem is responsible for reading spans and generating vital service health trends. This system is loosely coupled and can be run on demand. It has two components : haystack-span-timeseries-transformer - This component is responsible for reading span and converting them to metrics 2.0 compatible MetricPoints. These metricpoints are then pushed back to kafka. haystack-timeseries-aggregator - This app is responsible for reading metric points, aggregating them based on rules and pushing the aggregated metric points to Kafka. The metric points are MetricTank compliant and can be directly consumed by metrictank which is a timeseries database. Currently we compute four trends for each combination of service and operation . These are Total count success_count [count] failure_count [count] duration [mean, median, std-dev, 99 percentile, 95 percentile] Each trend is computed for 4 intervals [1min, 5min, 15min, 1hour].
  20. The alerts view is used to show up alerts for any anomalous behavior in service health trends. Currently haystack alerts on total count, failure count and duration (TP99) . These alerts would be powered by adaptive alerting system which is one of the other OSS projects by Expedia.