Visibility into your applications and systems is critical in guarding against errors, maintaining uptime, and protecting performance. In this session, learn how DevOps enables us to build better systems by leveraging the perspectives of different teams in order to gain that visibility.
1. G A I N M A X I M U M V I S I B I L I T Y
H O L I S T I C A L L Y V I E W I N G S Y S T E M S
2. A M B I G U O U S C Y L I N D E R S
P E R S P E C T I V E M A T T E R S
3. A B O U T M E
Evangelist at Datadog
@technovangelist
mattw@datadoghq.com
youtube.com/technovangelist
Organizer of DevOps Days Boston
2017 & 2018
4. D A T A D O G
S A A S - B A S E D M O N I T O R I N G
T R I L L I O N S O F P O I N T S / D A Y
W E ’ R E H I R I N G :
j o b s . d a t a d o g h q . c o m
T W : @ d a t a d o g h q
5. V I S I B I L I T Y ?
W H E R E A R E W E G E T T I N G
24. L O G S
• Event-based
• Easy to read for humans
25. L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
26. L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
• Ideally verbose & contain a lot of information
27. L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
• Ideally verbose & contain a lot of information
• Useful for finding details of an event
28. L O G S
• Event-based
• Easy to read for humans
• Well structured & easy to parse/grep for computers
• Ideally verbose & contain a lot of information
• Useful for finding details of an event
• Help catch unknown unknowns
29. The Data
• Metrics
• Logs
• Traces
The Tools
• Application Monitoring
• Log Management
• APM
B A C K E N D
V I S I B I L I T Y
31. T R A C E S
• Request-based
• Follow activity from request across function and service
calls.
32. T R A C E S
• Request-based
• Follow activity from request across function and service
calls.
• Useful for following code to answer “Where?” and “How
long?”
33. The Data
• Metrics
The Tools
• Real-User Monitoring
(RUM)
• Synthetics
F R O N T E N D
V I S I B I L I T Y
34. P E O P L E & R O B O T S
• RUM & Synthetics work best together
35. P E O P L E & R O B O T S
• RUM & Synthetics work best together
• RUM provides insight into how users actually use a product
36. P E O P L E & R O B O T S
• RUM & Synthetics work best together
• RUM provides insight into how users actually use a product
• Synthetics operate independently of users
37. D A T E - A - D O G
W H A T ’ S I T A L L M E A N ?
T I N D E R F O R P U P S
38. T H I S A P P I S
G R E A T !
W H O ’ S A G O O D B O Y ? ! ?
39. I G O T T A T E L L M Y
F R I E N D S A B O U T
T H I S A P P !
T H E Y ’ R E S O C U T E ! ! !
40. A N D M Y F R I E N D S
A R E G O N N A T E L L
T H E I R F R I E N D S …
A A A W W W W W W W ! ! !
41. W H A T J U S T
H A P P E N E D ? ! ?
W H E R E ’ D T H E P U P P I E S G O ?
42. H O W D O W E K N O W S O M E T H I N G
W E N T W R O N G ?
U S E R S A R E H A V I N G A H O R R I B L E E X P E R I E N C E
43.
44. R E A L - U S E R M O N I T O R I N G
H O W D O W E K N O W ?
45. R E A L - U S E R M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
47. S Y N T H E T I C S
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
48. S C E N A R I O : T H I R D P A R T Y C D N O U T A G E
We host puppy photos on Fastly & the app pulls directly from
the Fastly CDN. Fastly suffers massive DDOS attack.
49. S C E N A R I O : T H I R D P A R T Y C D N O U T A G E
We host puppy photos on Fastly & the app pulls directly from
the Fastly CDN. Fastly suffers massive DDOS attack.
• RUM & Synthetics: Will alert and can show what assets are
slow or are not being served.
50. S C E N A R I O : T H I R D P A R T Y C D N O U T A G E
We host puppy photos on Fastly & the app pulls directly from
the Fastly CDN. Fastly suffers massive DDOS attack.
• RUM & Synthetics: Will alert and can show what assets are
slow or are not being served.
• APM, Application and Infrastructure Monitoring: No alerts.
Everything is fine!
51. T R A C I N G ( A P M )
H O W D O W E K N O W ?
52. T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
53. T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
54. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
55. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
56.
57. S C E N A R I O : S E R V I C E O U T A G E
We use an image resizing/optimizing service that resizes
images asynchronously. It has issues. Images are returned
slowly.
58. S C E N A R I O : S E R V I C E O U T A G E
We use an image resizing/optimizing service that resizes
images asynchronously. It has issues. Images are returned
slowly.
• RUM & Synthetics: Might see alerts, but not know where
59. S C E N A R I O : S E R V I C E O U T A G E
We use an image resizing/optimizing service that resizes
images asynchronously. It has issues. Images are returned
slowly.
• RUM & Synthetics: Might see alerts, but not know where
• Application & Infrastructure Monitoring: Everything is fine!
60. S C E N A R I O : S E R V I C E O U T A G E
We use an image resizing/optimizing service that resizes
images asynchronously. It has issues. Images are returned
slowly.
• RUM & Synthetics: Might see alerts, but not know where
• Application & Infrastructure Monitoring: Everything is fine!
• APM: Can alert on latency and show where in the code you
are making the API calls.
61. A P P L I C A T I O N M O N I T O R I N G
H O W D O W E K N O W ?
62. S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly checks
password hashes, so all user logins fail.
63. S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly checks
password hashes, so all user logins fail.
• RUM & Synthetics, APM: No alerts. Everything is fine!
64. S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly checks
password hashes, so all user logins fail.
• RUM & Synthetics, APM: No alerts. Everything is fine!
• Infrastructure Monitoring: No alerts. Everything is fine!
65. S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly checks
password hashes, so all user logins fail.
• RUM & Synthetics, APM: No alerts. Everything is fine!
• Infrastructure Monitoring: No alerts. Everything is fine!
• Application Monitoring: Will alert impact on custom metrics
and can help identify why.
66. A P P L I C A T I O N M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
67. I N F R A S T R U C T U R E
M O N I T O R I N G
H O W D O W E K N O W ?
68. I N F R A S T R U C T U R E M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
69. S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
70. S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
• RUM & Synthetics, APM, Application Monitoring: Alerts that
latency is high. Will not be able to help identify why.
71. S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
• RUM & Synthetics, APM, Application Monitoring: Alerts that
latency is high. Will not be able to help identify why.
• Infrastructure Monitoring: Alerts on high resource use and
may be able to trigger automatic remediation.
72. A N O M A L Y D E T E C T I O N
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
73. H O W D O W E K N O W W H A T
W E N T W R O N G ?
74. U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
75. U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
76. L O G S
E X P L O R I N G W H A T W E N T W R O N G
77. H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
78. H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
• Get multiple perspectives
79. H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
• Get multiple perspectives
• Consider all 5 observability tools:
• RUM
• Synthetics
• Tracing
• Application+Infrastructure Monitoring
• Logs
80. Q U E S T I O N S ?
@technovangelist
mattw@datadoghq.com
youtube.com/technovangelist