SlideShare a Scribd company logo
1 of 20
Download to read offline
Swift Distributed Tracing Method and Tools
by Zhang Hua (Edward)
Standards Team/ETI/CDL/IBM
Agenda
 Background
 Tracing Proposal
 Tracing Architecture
 Tracing Data Model
 Tracing Analysis Tools
 Reference
Background
• Swift is a large scale distributed object store span thousands of nodes
across multiple zones and different regions.
– End to end performance is critical to success of Swift.
– Tools that aid in understanding the behavior and reasoning about performance issue are
invaluable.
• Motivation
– For a particular client request X, what is the actual route when it is being served by
different services? Is there any difference b/w actual route and expected route even we
know the access patterns?
– What is the performance behavior of the server components and third-party services?
Which part is slower than expected?
– How can we quickly diagnose the problem when it breaks at some points ?
e.g. PUT request X: Client(1) X Proxy-Server (1) Container-Server (1) X1” Account-Server (1)
X ’ Container-Server (2) X2” Account-Server (2)
Container-Server (3) X3” Account-Server (3)
Which part is slow? Looking at your logs?
When a request is made to Swift, it is given an unique transaction id. This id should be
in every log line that has to do with that request. This can be useful when looking at all
the services that are hit by a single request. But….is it efficient or handy to do?
Correlate the logs
Proxy server log @ node-P
Container server log @ node-C
Account server log @ node-A
Object server log @ node-O
Correlate the information pieces by transaction id and client IP from all logs of related hashed nodes!
• Counters + Counter_rate(sampling)
– Proxy-Server.{ACO}.{METHOD}.{CODE}
– {ACO}-server.{METHOD}.{CODE}
• Timers + Timer_data
– {ACO}-{DAEMON}.timing
– {ACO}-{DAEMON}.error.timing
– {ACO}-server.{METHOD}.timing
StatsD Metrics
StatsD logging options:
# access_log_statsd_host = localhost
# access_log_statsd_port = 8125
# access_log_statsd_default_sample_rate = 1.0
# access_log_statsd_sample_rate_factor = 1.0
# access_log_statsd_metric_prefix =
# access_log_headers = false
# log_statsd_valid_http_methods =
GET,HEAD,POST,PUT,DELETE,COPY,OPTIONS
Pros and cons of current implt.
• ReThink it
Can we provide a real time end to end performance tracing/tracking tool in Swift
infrastructure for developers and users to facilitate their analysis in development and
operation environment?
statsD logging
Pros • Real time performance metrics to monitor the
health of Swift cluster
• Performance impact is low by sending metrics
data via UDP protocol, no hit on local disk I/O
• Supported by different backend to report and
visualization
• Light-weighted
• Simple to use
• Rich logging tools
cons • Designed for cluster level healthy, not for end to
end performance.
• Can not provide metrics data for a specific set of
requests.
• No relationship between different set of metrics
for specific transactions or requests.
• Not designed for real time
• Require more efforts to collect and
analysis
• No representation for individual span
• Message size limitation
Our Proposal
• Goal
– Target for researchers, developers and admins, provide a method of traceability to
understand end to end performance issue and identify the bottlenecks.
• Scope
 Add WSGI middleware and hooks into swift components to collect trace data
 The middleware to control the activation and generation of trace
 Generate trace and span ids, collect the data and tired them together
 Send traced data to aggregator and saved into repository
 Minor fix of current Swift implementation to allow the path to include complete hops.
 Similar to trans-id, the trace-id and span-id need to be propagated through HTTP headers correctly b/w
services and components.
 Analysis tools of report and visualization
 Query the traced data by tiered trace ids
 Reconstruct span tree for each trace
Swift Messaging Route
Swift
Client
Proxy
Server
Container
Server
Container
Server
Container
Server
Account
Server
Auth
Account
Server
Account
Server
Request-XPUT Response-XPUT
Request-X’’PUT
Request-X”’PUT Response-
X’”PUT
Response-X’’PUT
Create a new container: PUT /account/container
• Swift components talks via HTTP request
and response messages.
• It is easy to use HTTP headers as the clue to
trace down the route.
Request-X’GET
Response-X’GET
Span Tree of Trace
Swift
Client
Proxy
Server
Container
Server
Container
Server
Container
Server
Account
Server
Auth
Account
Server
Account
Server
Request-XPUT
X-Trace-Id: 1234
Response-XPUT
Request-X’’PUT
X-Trace_Id: 1234
X-Span-Id: 1
Request-X”’PUT
X-Trace-Id: 1234
X-Span-Id: 2
Response-
X’”PUT
Response-X’’PUT
• X-Trace-Id: identification of each
trace
 Use X-Trans-Id to support
different cluster?
 Or generate new id for this
purpose?
• X-Span-Id: identification of each
span to represent individual
HTTP RESTful call and WSGI call.
 Generate new span id for
this purpose
(notes: UUID can be used for implementation)
Create a new container: PUT /account/container
Request-X’GET
Response-X’GET
X-trace Middleware Architecture
1. Generate trace ids based on configuration.
2. Create spans and collect trace data
3. Propagate trace ids to next hop
4. Send trace data into a repository via
separate transport protocol/channel
Swift
Client
Proxy
Server
Container
Server
Container
Server
Container
Server
Account
Server
Auth
Account
Server
Account
Server
x-trace
x-trace
x-
trace
Tracedatarepository
x-trace
Patches to fix the request path
• The trace id is passed along by proxy
server in HTTP headers, but will be lost
at some points because of recreating a
new request for next hops.
• Patches are needed to fix this problem
to form a complete tracing path for
container server, object server, etc.
Swift
Client
Proxy
Server
Container
Server
Container
Server
Container
Server
Account
Server
Auth
Account
Server
Account
Server
x-trace
x-trace
x-
trace
Tracedatarepository
x-tracepropagate
trace id in next
new request
Tie together tracing data
Reconstruct causal and temporal relationship view for PUT container call
Proxy-Server.PUT parent-span-id=0, span-id=1
timeline
Container-Server.PUT parent-span-id=1, span-id=2
Container-Server.PUT parent-span-id=1, span-id=3
Container-Server.PUT parent-span-id=1, span-id=4
Account-Server.PUT
parent-span-id=2, span-id=5
Account-Server.PUT
parent-span-id=3, span-id=6
Account-Server.PUT
parent-span-id=4, span-id=7
0 ms 200 ms50 ms 150 ms100 ms
Swift-Client.PUT parent-span-id=none, span-id=0
201
201
201
201
201
201 201
Another example: upload an object
Proxy-Server.PUT parent-span-id=0, span-id=1
timeline
Object-Server.PUT parent-span-id=1, span-id=2
Object-Server.PUT parent-span-id=1, span-id=3
Object-Server.PUT parent-span-id=1, span-id=4
Container-Server.PUT
parent-span-id=2, span-id=5
Container-Server.PUT
parent-span-id=3, span-id=6
Container-Server.PUT
parent-span-id=4, span-id=7
0 ms 200 ms50 ms 150 ms100 ms
Swift-Client.PUT parent-span-id=none, span-id=0
201
201
201
201
201
201 201
pipeline:main
Trace into middleware of the pipeline
• Expand the trace path into
WSGI call b/w middleware to
get more complete trace data.
• Possible choices
– Decorators for __call__
@trace_here()
def __call__(self, environ, start_response)
– Hack paste deployment package
– Profile with filters
Swift
Client
Proxy
Server
x-trace
Tracedatarepository
tempauth
cache
tempurl
dlo
Pipeline = catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk slo dlo ratelimit crossdomain tempauth tempurl formpost
staticweb container-quotas account-quotas proxy-logging proxy-serve
slo
…
Backend trace data model
{
"_id" : "14a467a402904aee87de4028a8595493",
"endpoint" : {
"port" : "6031",
"type" : "server",
"name" : "container.server",
"ipv4" : "127.0.0.1"
},
"name" : "GET",
"parent" : "57fbd3ec12fe4912ba89e7a8eb97f2e7",
"start_time" : 1400146616.554865,
"trace_id" : "d7ff028674c5471e94b964ec37d35546",
"end_time" : 1400146616.559608,
"annotations" : [
{
"type" : "string",
"value" :
"/sdb1/347/TEMPAUTH_test/summit",
"key" : "request_path",
"event" : "sr"
},
{
"type" : "string",
"value" : "200 OK",
"key" : "return_code",
"event" : "ss"
}
]
}
{
"_id" : "57fbd3ec12fe4912ba89e7a8eb97f2e7",
"endpoint" : {
"port" : "8080",
"type" : "server",
"name" : "proxy.server",
"ipv4" : "127.0.0.1"
},
"name" : "GET",
"parent" : "5602ca4010fe420c9fa56528faf711ab",
"start_time" : 1400146616.490691,
"trace_id" : "d7ff028674c5471e94b964ec37d35546",
"end_time" : 1400146616.58012,
"annotations" : [
{
"type" : "string",
"value" : "/v1/TEMPAUTH_test/summit",
"key" : "request_path",
"event" : "sr"
},
{
"type" : "string",
"value" : "200 OK",
"key" : "return_code",
"event" : "ss"
}
]
}
Query and analysis tools
• Query
– Query trace data by trace_id, span_id, order or range by time, group by nodes,
annotation keys
• Trace timeline
– Plot the spans on the timeline with causal relationships
• Diagnose
– Analyze the critical path for a success response
– Identify the failure point of in the path
• Simulation
– Replay the recorded processing of the requests
• Data Mining
Reference
• Google Dapper – a large-scale distributed systems tracing infrastructure
• Twitter Zipkin - a distributed tracing system that helps us gather timing
data for all the disparate services at Twitter.
• Berkeley XTrace : a pervasive network tracing framework
Demo
Q&A

More Related Content

What's hot

Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...Databricks
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Timothy Spann
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secb0ris_1
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Databricks
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaGuido Schmutz
 
Cowboy dating with big data
Cowboy dating with big data Cowboy dating with big data
Cowboy dating with big data b0ris_1
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...DataWorks Summit
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - FlumeAnandMHadoop
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentDataWorks Summit/Hadoop Summit
 

What's hot (20)

Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per sec
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
 
Cowboy dating with big data
Cowboy dating with big data Cowboy dating with big data
Cowboy dating with big data
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - Flume
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 

Viewers also liked

Control review for iOS
Control review for iOSControl review for iOS
Control review for iOSWilliam Price
 
Action Controller Overview, Season 1
Action Controller Overview, Season 1Action Controller Overview, Season 1
Action Controller Overview, Season 1RORLAB
 
Let's Learn Ruby - Basic
Let's Learn Ruby - BasicLet's Learn Ruby - Basic
Let's Learn Ruby - BasicEddie Kao
 
Ruby on Rails testing with Rspec
Ruby on Rails testing with RspecRuby on Rails testing with Rspec
Ruby on Rails testing with RspecBunlong Van
 
jQuery For Beginners - jQuery Conference 2009
jQuery For Beginners - jQuery Conference 2009jQuery For Beginners - jQuery Conference 2009
jQuery For Beginners - jQuery Conference 2009Ralph Whitbeck
 
Learning jQuery in 30 minutes
Learning jQuery in 30 minutesLearning jQuery in 30 minutes
Learning jQuery in 30 minutesSimon Willison
 
A swift introduction to Swift
A swift introduction to SwiftA swift introduction to Swift
A swift introduction to SwiftGiordano Scalzo
 
Introduction to html
Introduction to htmlIntroduction to html
Introduction to htmlvikasgaur31
 
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho PolutaInfinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho PolutaInfinum
 
Introduction to Web Architecture
Introduction to Web ArchitectureIntroduction to Web Architecture
Introduction to Web ArchitectureChamnap Chhorn
 
jQuery and Rails: Best Friends Forever
jQuery and Rails: Best Friends ForeverjQuery and Rails: Best Friends Forever
jQuery and Rails: Best Friends Foreverstephskardal
 
Swift Programming Language
Swift Programming LanguageSwift Programming Language
Swift Programming LanguageGiuseppe Arici
 

Viewers also liked (14)

Control review for iOS
Control review for iOSControl review for iOS
Control review for iOS
 
Action Controller Overview, Season 1
Action Controller Overview, Season 1Action Controller Overview, Season 1
Action Controller Overview, Season 1
 
Let's Learn Ruby - Basic
Let's Learn Ruby - BasicLet's Learn Ruby - Basic
Let's Learn Ruby - Basic
 
September2011aftma
September2011aftmaSeptember2011aftma
September2011aftma
 
Ruby on Rails testing with Rspec
Ruby on Rails testing with RspecRuby on Rails testing with Rspec
Ruby on Rails testing with Rspec
 
jQuery For Beginners - jQuery Conference 2009
jQuery For Beginners - jQuery Conference 2009jQuery For Beginners - jQuery Conference 2009
jQuery For Beginners - jQuery Conference 2009
 
Learning jQuery in 30 minutes
Learning jQuery in 30 minutesLearning jQuery in 30 minutes
Learning jQuery in 30 minutes
 
A swift introduction to Swift
A swift introduction to SwiftA swift introduction to Swift
A swift introduction to Swift
 
Web application architecture
Web application architectureWeb application architecture
Web application architecture
 
Introduction to html
Introduction to htmlIntroduction to html
Introduction to html
 
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho PolutaInfinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
 
Introduction to Web Architecture
Introduction to Web ArchitectureIntroduction to Web Architecture
Introduction to Web Architecture
 
jQuery and Rails: Best Friends Forever
jQuery and Rails: Best Friends ForeverjQuery and Rails: Best Friends Forever
jQuery and Rails: Best Friends Forever
 
Swift Programming Language
Swift Programming LanguageSwift Programming Language
Swift Programming Language
 

Similar to Swift distributed tracing method and tools v2

A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...HostedbyConfluent
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
 
Api Statistics- The Scalable Way
Api Statistics- The Scalable WayApi Statistics- The Scalable Way
Api Statistics- The Scalable WayWSO2
 
Applied Detection and Analysis Using Flow Data - MIRCon 2014
Applied Detection and Analysis Using Flow Data - MIRCon 2014Applied Detection and Analysis Using Flow Data - MIRCon 2014
Applied Detection and Analysis Using Flow Data - MIRCon 2014chrissanders88
 
Micro-service architectures with Gilmour
Micro-service architectures with GilmourMicro-service architectures with Gilmour
Micro-service architectures with GilmourAditya Godbole
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingHemant Kumar
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: ScaleGraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: ScaleNeo4j
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemAccumulo Summit
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital EnterpriseWSO2
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRyan Bosshart
 
Data science for infrastructure dev week 2022
Data science for infrastructure   dev week 2022Data science for infrastructure   dev week 2022
Data science for infrastructure dev week 2022ZainAsgar1
 
The missing signalling layer for WebRTC
The missing signalling layer for WebRTCThe missing signalling layer for WebRTC
The missing signalling layer for WebRTCWebRTCConferenceJapan
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017ajay_ei
 

Similar to Swift distributed tracing method and tools v2 (20)

A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
Api Statistics- The Scalable Way
Api Statistics- The Scalable WayApi Statistics- The Scalable Way
Api Statistics- The Scalable Way
 
Applied Detection and Analysis Using Flow Data - MIRCon 2014
Applied Detection and Analysis Using Flow Data - MIRCon 2014Applied Detection and Analysis Using Flow Data - MIRCon 2014
Applied Detection and Analysis Using Flow Data - MIRCon 2014
 
Micro-service architectures with Gilmour
Micro-service architectures with GilmourMicro-service architectures with Gilmour
Micro-service architectures with Gilmour
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: ScaleGraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLib
 
Data science for infrastructure dev week 2022
Data science for infrastructure   dev week 2022Data science for infrastructure   dev week 2022
Data science for infrastructure dev week 2022
 
The missing signalling layer for WebRTC
The missing signalling layer for WebRTCThe missing signalling layer for WebRTC
The missing signalling layer for WebRTC
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
 

Recently uploaded

Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 

Recently uploaded (20)

Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 

Swift distributed tracing method and tools v2

  • 1. Swift Distributed Tracing Method and Tools by Zhang Hua (Edward) Standards Team/ETI/CDL/IBM
  • 2. Agenda  Background  Tracing Proposal  Tracing Architecture  Tracing Data Model  Tracing Analysis Tools  Reference
  • 3. Background • Swift is a large scale distributed object store span thousands of nodes across multiple zones and different regions. – End to end performance is critical to success of Swift. – Tools that aid in understanding the behavior and reasoning about performance issue are invaluable. • Motivation – For a particular client request X, what is the actual route when it is being served by different services? Is there any difference b/w actual route and expected route even we know the access patterns? – What is the performance behavior of the server components and third-party services? Which part is slower than expected? – How can we quickly diagnose the problem when it breaks at some points ? e.g. PUT request X: Client(1) X Proxy-Server (1) Container-Server (1) X1” Account-Server (1) X ’ Container-Server (2) X2” Account-Server (2) Container-Server (3) X3” Account-Server (3)
  • 4. Which part is slow? Looking at your logs? When a request is made to Swift, it is given an unique transaction id. This id should be in every log line that has to do with that request. This can be useful when looking at all the services that are hit by a single request. But….is it efficient or handy to do?
  • 5. Correlate the logs Proxy server log @ node-P Container server log @ node-C Account server log @ node-A Object server log @ node-O Correlate the information pieces by transaction id and client IP from all logs of related hashed nodes!
  • 6. • Counters + Counter_rate(sampling) – Proxy-Server.{ACO}.{METHOD}.{CODE} – {ACO}-server.{METHOD}.{CODE} • Timers + Timer_data – {ACO}-{DAEMON}.timing – {ACO}-{DAEMON}.error.timing – {ACO}-server.{METHOD}.timing StatsD Metrics StatsD logging options: # access_log_statsd_host = localhost # access_log_statsd_port = 8125 # access_log_statsd_default_sample_rate = 1.0 # access_log_statsd_sample_rate_factor = 1.0 # access_log_statsd_metric_prefix = # access_log_headers = false # log_statsd_valid_http_methods = GET,HEAD,POST,PUT,DELETE,COPY,OPTIONS
  • 7. Pros and cons of current implt. • ReThink it Can we provide a real time end to end performance tracing/tracking tool in Swift infrastructure for developers and users to facilitate their analysis in development and operation environment? statsD logging Pros • Real time performance metrics to monitor the health of Swift cluster • Performance impact is low by sending metrics data via UDP protocol, no hit on local disk I/O • Supported by different backend to report and visualization • Light-weighted • Simple to use • Rich logging tools cons • Designed for cluster level healthy, not for end to end performance. • Can not provide metrics data for a specific set of requests. • No relationship between different set of metrics for specific transactions or requests. • Not designed for real time • Require more efforts to collect and analysis • No representation for individual span • Message size limitation
  • 8. Our Proposal • Goal – Target for researchers, developers and admins, provide a method of traceability to understand end to end performance issue and identify the bottlenecks. • Scope  Add WSGI middleware and hooks into swift components to collect trace data  The middleware to control the activation and generation of trace  Generate trace and span ids, collect the data and tired them together  Send traced data to aggregator and saved into repository  Minor fix of current Swift implementation to allow the path to include complete hops.  Similar to trans-id, the trace-id and span-id need to be propagated through HTTP headers correctly b/w services and components.  Analysis tools of report and visualization  Query the traced data by tiered trace ids  Reconstruct span tree for each trace
  • 9. Swift Messaging Route Swift Client Proxy Server Container Server Container Server Container Server Account Server Auth Account Server Account Server Request-XPUT Response-XPUT Request-X’’PUT Request-X”’PUT Response- X’”PUT Response-X’’PUT Create a new container: PUT /account/container • Swift components talks via HTTP request and response messages. • It is easy to use HTTP headers as the clue to trace down the route. Request-X’GET Response-X’GET
  • 10. Span Tree of Trace Swift Client Proxy Server Container Server Container Server Container Server Account Server Auth Account Server Account Server Request-XPUT X-Trace-Id: 1234 Response-XPUT Request-X’’PUT X-Trace_Id: 1234 X-Span-Id: 1 Request-X”’PUT X-Trace-Id: 1234 X-Span-Id: 2 Response- X’”PUT Response-X’’PUT • X-Trace-Id: identification of each trace  Use X-Trans-Id to support different cluster?  Or generate new id for this purpose? • X-Span-Id: identification of each span to represent individual HTTP RESTful call and WSGI call.  Generate new span id for this purpose (notes: UUID can be used for implementation) Create a new container: PUT /account/container Request-X’GET Response-X’GET
  • 11. X-trace Middleware Architecture 1. Generate trace ids based on configuration. 2. Create spans and collect trace data 3. Propagate trace ids to next hop 4. Send trace data into a repository via separate transport protocol/channel Swift Client Proxy Server Container Server Container Server Container Server Account Server Auth Account Server Account Server x-trace x-trace x- trace Tracedatarepository x-trace
  • 12. Patches to fix the request path • The trace id is passed along by proxy server in HTTP headers, but will be lost at some points because of recreating a new request for next hops. • Patches are needed to fix this problem to form a complete tracing path for container server, object server, etc. Swift Client Proxy Server Container Server Container Server Container Server Account Server Auth Account Server Account Server x-trace x-trace x- trace Tracedatarepository x-tracepropagate trace id in next new request
  • 13. Tie together tracing data Reconstruct causal and temporal relationship view for PUT container call Proxy-Server.PUT parent-span-id=0, span-id=1 timeline Container-Server.PUT parent-span-id=1, span-id=2 Container-Server.PUT parent-span-id=1, span-id=3 Container-Server.PUT parent-span-id=1, span-id=4 Account-Server.PUT parent-span-id=2, span-id=5 Account-Server.PUT parent-span-id=3, span-id=6 Account-Server.PUT parent-span-id=4, span-id=7 0 ms 200 ms50 ms 150 ms100 ms Swift-Client.PUT parent-span-id=none, span-id=0 201 201 201 201 201 201 201
  • 14. Another example: upload an object Proxy-Server.PUT parent-span-id=0, span-id=1 timeline Object-Server.PUT parent-span-id=1, span-id=2 Object-Server.PUT parent-span-id=1, span-id=3 Object-Server.PUT parent-span-id=1, span-id=4 Container-Server.PUT parent-span-id=2, span-id=5 Container-Server.PUT parent-span-id=3, span-id=6 Container-Server.PUT parent-span-id=4, span-id=7 0 ms 200 ms50 ms 150 ms100 ms Swift-Client.PUT parent-span-id=none, span-id=0 201 201 201 201 201 201 201
  • 15. pipeline:main Trace into middleware of the pipeline • Expand the trace path into WSGI call b/w middleware to get more complete trace data. • Possible choices – Decorators for __call__ @trace_here() def __call__(self, environ, start_response) – Hack paste deployment package – Profile with filters Swift Client Proxy Server x-trace Tracedatarepository tempauth cache tempurl dlo Pipeline = catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk slo dlo ratelimit crossdomain tempauth tempurl formpost staticweb container-quotas account-quotas proxy-logging proxy-serve slo …
  • 16. Backend trace data model { "_id" : "14a467a402904aee87de4028a8595493", "endpoint" : { "port" : "6031", "type" : "server", "name" : "container.server", "ipv4" : "127.0.0.1" }, "name" : "GET", "parent" : "57fbd3ec12fe4912ba89e7a8eb97f2e7", "start_time" : 1400146616.554865, "trace_id" : "d7ff028674c5471e94b964ec37d35546", "end_time" : 1400146616.559608, "annotations" : [ { "type" : "string", "value" : "/sdb1/347/TEMPAUTH_test/summit", "key" : "request_path", "event" : "sr" }, { "type" : "string", "value" : "200 OK", "key" : "return_code", "event" : "ss" } ] } { "_id" : "57fbd3ec12fe4912ba89e7a8eb97f2e7", "endpoint" : { "port" : "8080", "type" : "server", "name" : "proxy.server", "ipv4" : "127.0.0.1" }, "name" : "GET", "parent" : "5602ca4010fe420c9fa56528faf711ab", "start_time" : 1400146616.490691, "trace_id" : "d7ff028674c5471e94b964ec37d35546", "end_time" : 1400146616.58012, "annotations" : [ { "type" : "string", "value" : "/v1/TEMPAUTH_test/summit", "key" : "request_path", "event" : "sr" }, { "type" : "string", "value" : "200 OK", "key" : "return_code", "event" : "ss" } ] }
  • 17. Query and analysis tools • Query – Query trace data by trace_id, span_id, order or range by time, group by nodes, annotation keys • Trace timeline – Plot the spans on the timeline with causal relationships • Diagnose – Analyze the critical path for a success response – Identify the failure point of in the path • Simulation – Replay the recorded processing of the requests • Data Mining
  • 18. Reference • Google Dapper – a large-scale distributed systems tracing infrastructure • Twitter Zipkin - a distributed tracing system that helps us gather timing data for all the disparate services at Twitter. • Berkeley XTrace : a pervasive network tracing framework
  • 19. Demo
  • 20. Q&A