SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
When Node.js goes wrong:
Debugging Node in production
    Surge 2012
    David Pacheco (@dapsays)
    Joyent
The Rise of Node.js

    • We see Node.js as the confluence of three ideas:
      • JavaScript’s friendliness and rich support for asynchrony (i.e., closures)
      • High-performance JavaScript VMs (e.g., V8)
      • Time-tested system abstractions (i.e. Unix, in the form of streams)
    • Event-oriented model delivers consistent performance in the
      presence of long latency events (i.e. no artificial latency bubbles)
    • Node.js is displacing C for a lot of highly reliable, high performance
      core infrastructure software (at Joyent alone: DNS, DHCP,
      SNMP, LDAP, key value stores, public-facing web services, ...).
    • This has been great for rapid development, but historically has
      come with a cost in debuggability.
    • Debugging is decidedly not a new problem...

2
The Genesis of debugging



    “As soon as we started programming,
    we found to our surprise that it wasn't
    as easy to get programs right as we
    had thought. Debugging had to be
    discovered. I can remember the exact
    instant when I realized that a large
    part of my life from then on was going
    to be spent in finding mistakes in my
    own programs.”
            —Sir Maurice Wilkes, 1913 - 2010




3
Debugging Node: a run-away service

    • February, 2011: Joyent is preparing to launch no.de, a Node PaaS.
    • During testing, Cloud Analytics becomes intermittently
      unresponsive.
    • Quickly traced the problem to a rogue data aggregator using 100%
      of 1 CPU core and not responding over HTTP or AMQP.
    • How do you debug this?




4
Debugging a run-away service

    • Check the logs?




5
Debugging a run-away service

    • Check syscall activity (truss/strace)?




6
Debugging a run-away service

    • Check thread stacks:

            v8::internal::Runtime::SetObjectProperty+0x36d()
            v8::internal::Runtime_SetProperty+0x73()
            0xfe7601f6()
            0xfbff31d8()
            0xfc468f59()
            0xfe8e51cf()                  ...
            0xfe760841()                ev_run+0x406()
            0xfe8e3dc8()                uv_run+0x1c()
            0xfe8e24a4()                node::Start+0xa9()
               ...                      main+0x1b()
                                        _start+0x83()




7
Give up?

    • Could add more logging for next time, but we don’t know how to
      reproduce it. (Plus, what would we log?)
    • ... but it’s still exhibiting these symptoms! Can’t we figure out why?!

                                      Text
                                       Text




8
The more general problem

    • The software: a moderately complex concurrent service
      (that is, where concurrent requests can affect one another).
    • The deployment: in production, 10s to 1000s of instances.
    • The problem: ~once/day, one of the instances crashes, leaving
      behind a stacktrace where an assertion was blown. (Or worse: one
      of the instances simply misbehaves, leaving nothing to help figure
      out what’s going wrong.)
    • How do you debug this?




9
A first approach

     • Add instrumentation (console.log) and redeploy.
     • How easy is it to deploy a new version?
     • How risky is it to deploy a new version? What’s the impact?
       What if it’s a very common code path that you need to instrument?
     • What if this isn’t your code, but a customer’s, whose deployment
       you cannot control?
     • Are you sure you’ll only need to do this once?
       (You lose credibility with each lap with ops and customers.)
     • If you’re lucky or if the problem is relatively simple, this can work
       okay.



10
“The postmortem technique”


     “Experience with the EDSAC has
     shown that although a high proportion
     of mistakes can be removed by
     preliminary checking, there frequently
     remain mistakes which could only
     have been detected in the early
     stages by prolonged and laborious
     study. Some attention, therefore, has
     been given to the problem of dealing
     with mistakes after the programme
     has been tried and found to fail.”

          —Stanley Gill, 1926 - 1975
          “The diagnosis of mistakes in programmes on the EDSAC”, 1951

11
A better approach

     • For C programs, we have rich tools for postmortem analysis of a
       system based on a snapshot of its state.
     • This technique is so old, the term for this state snapshot dates
       from the dawn of computing: it’s a core dump.
     • Once a core dump has been generated, either automatically after
       a crash or on-demand using gcore(1), the program can be
       immediately restarted to restore service quickly so that
       engineers can debug the problem asynchronously.
     • Using the debugger on the core dump, you can inspect all internal
       program state: global variables, threads, and objects.
     • Can also use the same tools with a live process.
     • Can’t we do this with Node.js?
12
Debugging dynamic environments

     • Historically, native postmortem tools have been unable to
       meaningfully observe dynamic environments like Node.
     • Such tools would need to translate native abstractions from the
       dump (symbols, functions, structs) into their higher-level
       counterparts in the dynamic environment (variables, Functions,
       Objects).
     • Some abstractions don’t even exist explicitly in the language itself.
       (e.g., JavaScript’s event queue)
     • Node is not alone! The state of the art is no better in Python, Ruby,
       or PHP, and not nearly solved for Java or Erlang either.




13
Aside: MDB

     • illumos-based systems like SmartOS and OmniOS have MDB, the
       modular debugger built specifically for postmortem analysis
     • MDB was originally built for postmortem analysis of the operating
       system kernel and later extended to applications
     • Plug-ins (“dmods”) can easily build on one another to deliver
       powerful postmortem analysis tools, e.g.:
       • ::stacks coalesces threads based on stack trace, with optional filtering
         by module, caller, etc
       • ::findleaks performs postmortem garbage collection on a core dump to find
         memory leaks in native code

     • Could we build a dmod for Node?


14
mdb_v8: postmortem debugging for Node

     • With some excruciating pain and some ugly layer violations, we
       were able to build mdb_v8
     • With ::jsstack, prints call stacks, including native C++ and
       JavaScript functions and arguments.
     • With ::jsprint, given a pointer, prints out as a C++ object and
       its JavaScript counterpart.
     • With ::v8function, given a JSFunction pointer, show the assembly
       for that function.
     • With ::findjsobjects, scans the heap to identify how many
       instances of each object type exist (incredible visibility into memory
       usage).
     • Demo
15
Remember that run-away Node program?

     • In February, 2011, we had essentially no way to see what this
       program was doing.
     • We saved a core dump in case we might one day have a way to
       read it. We also added instrumentation in case we saw it again.
       (We expected to see it again very soon after going to production.)
     • We didn’t see it again until October, while the mdb_v8 work was
       underway. So we applied what we had to the original core file...




16
And the winner is:

       > ::jsstack
       8046a9c <anonymous> (as exports.bucketize) at lib/heatmap.js position 7838
       8046af8 caAggrValueHeatmapImage at lib/ca/ca-agg.js position 48960
       ...
       > 8046a9c::jsframe -v
       8046a9c <anonymous> (as exports.bucketize)
             func: fc435fcd
             file: lib/heatmap.js
             posn: position 7838
             arg1: fc070719 (JSObject)
             arg2: fc070709 (JSArray)
       > fc070719::jsprint
       {
             base: 1320886447,
             height: 281,
                                   Invalid input resulted in infinite loop in JavaScript
             width: 624,           Time to root cause: 10 minutes
             max: 11538462,
             min: 11538462,
             ...
       }
17
mdb_v8: How the sausage is made

     • V8 (libv8.a) includes a small amount (a few KB) of metadata that
       describes the heap’s classes, type information, and class layouts.
       (Small enough to include in production builds.)
     • mdb_v8 knows how to identify stack frames, iterate function
       arguments, iterate object properties, and walk basic V8 structures
       (arrays, functions, strings).
     • mdb_v8 uses the debug metadata encoded in the binary to avoid
       hardcoding the way heap structures are laid out in memory. (Still
       has intimate knowledge of things like property iteration.)




18
What did you say was in this sausage?

     • Goal: debugger module shouldn’t hardcode structure offsets and
       other constants, but rather rely on metadata included in the “node”
       binary.
     • Generally speaking, these offsets are computed at compile-time
       and used in inline functions defined by macros. So they get
       compiled out and are not available at runtime.
     • The build process was modified to:
       • Generate a new C++ source file with references to the constants that we need,
           using extern “C” constants that our debugger module can look for and read.
       •   Build this with the rest of libv8_base.a.

     • Result: this “debug metadata” is embedded in $PREFIX/bin/node,
       and the debugger can read it directly from the core file.
     • (Should) generally work for 32-bit/64-bit, different architectures,
       and no matter how complex the expressions for the constants are.
19
Problems with this approach

     • We strongly believe in the general approach of having the
       debugger grok program state from a snapshot, because it’s
       comprehensive and has zero runtime overhead, meaning it
       works in production. (This is a constraint.)
     • With the current implementation, the debugger module is built and
       delivered separately from the VM, which means that changes in
       the VM can (and do) break the debugger module.
     • Additionally, each debugger feature requires reverse engineering
       and reimplementing some piece of the VM.
     • Ideally, the VM would embed programmatic logic for decoding the
       in-memory state (e.g., iterating objects, iterating object properties,
       walking the stack, and so on) -- without relying on the VM itself to
       be running.
20
Debugging live programs

     • Postmortem tools can be applied to live processes, and
       core files can be generated for running processes.
     • Examining processes and core dumps is useful for many kinds of
       failure, but sometimes you want to trace runtime activity.




21
DTrace

     • Provides comprehensive tracing of kernel and application-level
       events in real-time (from “thread on-CPU” to “Node GC done”)
     • Scales arbitrarily with the number of traced events.
       (first class in situ data aggregation)
     • Suitable for production systems because it’s safe, has minimal
       overhead (usually no disabled probe effect), and can be enabled/
       disabled dynamically (no application restart required).
     • Open-sourced in 2005. Available on illumos-derived systems like
       SmartOS and OmniOS, Solaris-derived systems, BSD, and
       MacOS (Linux ports in progress).




22
DTrace in dynamic environments

     • DTrace instruments the system holistically, which is to say, from
       the kernel, which poses a challenge for interpreted environments
     • User-level statically defined tracing (USDT) providers describe
       semantically relevant points of instrumentation
     • Some interpreted environments (e.g., Ruby, Python, PHP, Erlang)
       have added USDT providers that instrument the interpreter itself
     • This approach is very fine-grained (e.g., every function call) and
       doesn’t work in JIT’d environments
     • We decided to take a different tack for Node.js...



23
DTrace for Node.js

     • Given the nature of the paths that we wanted to instrument, we
       introduced a function into JavaScript that Node can call to get into
       USDT-instrumented C++
     • Introduces disabled probe effect: calling from JavaScript into C++
       costs even when probes are not enabled
     • Use USDT is-enabled probes to minimize disabled probe effect
       once in C++
     • If (and only if) the probe is enabled, prepare a structure for the
       kernel that allows for translation into a structure that is familiar to
       node programmers




24
DTrace example: Node GC time, per GC
       #	
  	
  dtrace –n ‘
       node*:::gc-start { self->start = timestamp; }
       node*:::gc-done/self->start/{
         @[“microseconds”] = quantize((timestamp – self->start) / 1000);
         self->start = 0;
       }’

       microseconds
               value     ------------- Distribution ------------- count
                  32   |                                          0
                  64   |@@@@@                                     19
                 128   |@@                                        6
                 256   |@@                                        6
                 512   |@@@@                                      13
                1024   |@@@@@                                     17
                2048   |@@@@@@@                                   24
                4096   |@@@@@@@@                                  29
                8192   |@@@@@                                     16
               16384   |@                                         5
               32768   |@                                         3
               65536   |                                          1
              131072   |@                                         3
              262144   |                                          0


25
DTrace probes in Node.js modules

     • Our technique is adequate for DTrace probes in the Node.js core,
       but it’s very cumbersome for pure Node.js modules
     • Fortunately, Chris Andrews has generalized this technique in his
       node-dtrace-provider module:
         https://github.com/chrisa/node-dtrace-provider
     • This module allows one to declare and fire one’s own probes
       entirely in JavaScript
     • Used extensively by Joyent’s Mark Cavage in his node-restify and
       ldap.js modules, especially to allow for measurement of latency




26
DTrace stack traces

     • ustack(): DTrace looks at (%ebp, %eip) and follows frame
       pointers to the top of the stack (standard approach).
       Asynchronously, looks for symbols in the process’s address space
       to map instruction offsets to function names:
          0x80ed9ab becomes malloc+0x16
     • Great for C, C++. Doesn’t work for JIT’d environments.
       • Functions are compiled at runtime => they have no corresponding symbols
              => the VM must be called upon at runtime to map frames to function names
       •   Garbage collection => functions themselves move around at arbitrary points
              => mapping of frames to function names must be done “synchronously”

     • jstack(): Like ustack(), but invokes VM-specific ustack helper,
       expressed in D and attached to the VM binary, to resolve names.



27
DTrace ustack helpers

     • For JIT’d code, DTrace supports ustack helper mechanism, by
       which the VM itself includes logic to translate from
           (frame pointer, instruction pointer) -> human-readable function name

     • When jstack() action is processed in probe context (in the kernel),
       DTrace invokes the helper to translate frames:

       Before                    After
       0xfe772a8c                toJSON at native date.js position 39314

       0xfe84d962                BasicJSONSerialize at native json.js position 8444

       0xfea6b6ed                BasicSerializeObject at native json.js position 7622

       0xfe84db11                BasicJSONSerialize at native json.js position 8444

       0xfeaba5ee                stringify at native json.js position 10128




28
V8 ustack helper

     • The ustack helper has to do much of the same work that mdb_v8
       does to identify stack frames and pick apart heap objects.
     • The same debug metadata that’s used for mdb_v8 is used for the
       helper, but unlike mdb_v8, the helper is embedded directly into the
       node binary (good!).
     • The implementation is written in D, and subject to all the same
       constraints as other DTrace scripts (and then some): no functions,
       no iteration, no if/else.
     • Particularly nasty pieces include expanding ConsStrings and
       binary searching to compute line numbers.
     • The helper only depends on V8, not Node.js. (With MacOS support
       for ustack helpers from profile probes, we could use the same
       helper to profile webapps running under Chrome!)
29
Profiling Node with DTrace

     • “profile” provider: probe that fires N times per second per CPU
     • ustack()/jstack() actions: collect user-level stacktrace when
       a probe fires.
     • Low-overhead runtime profiling (via stack sampling) that can be
       turned on and off without restarting your program.
     • Demo.




30
Node.js Flame Graph

     • Visualizing profiling output:




     • Full, interactive version: http://bit.ly/NMQT1B
31
More real-world examples

     • The infinite loop problem we saw earlier was debugged with
       mdb_v8, and could have also been debugged with DTrace.
     • @izs used mdb_v8‘s heap scanning to zero in on a memory leak
       in Node 0.7 that was seriously impacting several users, including
       Voxer.
     • @mranney (Voxer) has used Node profiling + flame graphs to
       identify several performance issues (unoptimized OpenSSL
       implementation, poor memory allocation behavior).
     • Debugging RangeError (stack overflow, with no stack trace).




32
Final thoughts

     • Node is a great for rapidly building complex or distributed system
       software. But in order to achieve the reliability we expect from such
       systems, we must be able to understand both fatal and non-
       fatal failure in production from the first occurrence.
     • One year ago: we had no way to solve the “infinite loop” problem
       without adding more logging and hoping to see it again.
     • Now we have tools to inspect both running and crashed Node
       programs (mdb_v8 and the DTrace ustack helper), and we’ve used
       them to debug problems in minutes that we either couldn’t solve at
       all before or which took days or weeks to solve.
     • But the postmortem tools are still primitive (like a flashlight in a
       dark room). Need better support from the VM.

33
• Thanks:
       •   @mraleph for help with V8 and landing patches
       •   @izs and the Node core team for help integrating DTrace and MDB support
       •   @brendangregg for flame graphs
       •   @chrisandrews for node-dtrace-provider
       •   @mcavage for putting it to such great use in node-restify and ldap.js
       •   @mranney and Voxer for pushing Node hard, running into lots of issues, and
           helping us refine the tools to debug them. (God bless the early adopters!)

     • For more info:
       •   http://dtrace.org/blogs/dap/2012/04/25/profiling-node-js/
       •   http://dtrace.org/blogs/dap/2012/01/13/playing-with-nodev8-postmortem-debugging/
       •   https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/mdb/common/modules/v8/mdb_v8.c
       •   https://github.com/joyent/node/blob/master/src/v8ustack.d




34

Mais conteúdo relacionado

Mais procurados

OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...Christopher Frohoff
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyYusuke Yamamoto
 
DeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelDeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelPeter Hlavaty
 
Software Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in JavaSoftware Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in JavaIsuru Perera
 
Inside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUGInside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUGSylvain Wallez
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugsDmitry Vyukov
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...Maarten Balliauw
 
JavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for DummiesJavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for DummiesCharles Nutter
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
Solaris DTrace, An Introduction
Solaris DTrace, An IntroductionSolaris DTrace, An Introduction
Solaris DTrace, An Introductionsatyajit_t
 
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Liang Chen
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETMaarten Balliauw
 
Implementing a JavaScript Engine
Implementing a JavaScript EngineImplementing a JavaScript Engine
Implementing a JavaScript EngineKris Mok
 
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)JiandSon
 
Java Performance Monitoring & Tuning
Java Performance Monitoring & TuningJava Performance Monitoring & Tuning
Java Performance Monitoring & TuningMuhammed Shakir
 
Power of linked list
Power of linked listPower of linked list
Power of linked listPeter Hlavaty
 
JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011Charles Nutter
 

Mais procurados (20)

Java in flames
Java in flamesJava in flames
Java in flames
 
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
 
DeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelDeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows Kernel
 
Fixing the Java Serialization Mess
Fixing the Java Serialization Mess Fixing the Java Serialization Mess
Fixing the Java Serialization Mess
 
Software Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in JavaSoftware Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in Java
 
Inside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUGInside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUG
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugs
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
 
JavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for DummiesJavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for Dummies
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Solaris DTrace, An Introduction
Solaris DTrace, An IntroductionSolaris DTrace, An Introduction
Solaris DTrace, An Introduction
 
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
 
Implementing a JavaScript Engine
Implementing a JavaScript EngineImplementing a JavaScript Engine
Implementing a JavaScript Engine
 
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
 
Java Performance Monitoring & Tuning
Java Performance Monitoring & TuningJava Performance Monitoring & Tuning
Java Performance Monitoring & Tuning
 
Power of linked list
Power of linked listPower of linked list
Power of linked list
 
The Java Memory Model
The Java Memory ModelThe Java Memory Model
The Java Memory Model
 
JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011
 

Destaque

Node.js post-mortem debugging with mdb and SmartOS
Node.js post-mortem debugging with mdb and SmartOSNode.js post-mortem debugging with mdb and SmartOS
Node.js post-mortem debugging with mdb and SmartOSjuliengilli
 
A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node ibmwebspheresoftware
 
Debugging & profiling node.js
Debugging & profiling node.jsDebugging & profiling node.js
Debugging & profiling node.jstomasperezv
 
What I learned from FluentConf and then some
What I learned from FluentConf and then someWhat I learned from FluentConf and then some
What I learned from FluentConf and then someOhad Kravchick
 
Web a Quebec - JS Debugging
Web a Quebec - JS DebuggingWeb a Quebec - JS Debugging
Web a Quebec - JS DebuggingRami Sayar
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prodYunong Xiao
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 

Destaque (10)

Wilson.memleaks
Wilson.memleaksWilson.memleaks
Wilson.memleaks
 
Node.js post-mortem debugging with mdb and SmartOS
Node.js post-mortem debugging with mdb and SmartOSNode.js post-mortem debugging with mdb and SmartOS
Node.js post-mortem debugging with mdb and SmartOS
 
A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node A Taste of Monitoring and Post Mortem Debugging with Node
A Taste of Monitoring and Post Mortem Debugging with Node
 
Setup nodejs
Setup nodejsSetup nodejs
Setup nodejs
 
Node.js debugging
Node.js debuggingNode.js debugging
Node.js debugging
 
Debugging & profiling node.js
Debugging & profiling node.jsDebugging & profiling node.js
Debugging & profiling node.js
 
What I learned from FluentConf and then some
What I learned from FluentConf and then someWhat I learned from FluentConf and then some
What I learned from FluentConf and then some
 
Web a Quebec - JS Debugging
Web a Quebec - JS DebuggingWeb a Quebec - JS Debugging
Web a Quebec - JS Debugging
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 

Semelhante a Surge2012

Dynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open ChallengesDynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open Challengesbcantrill
 
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...7mind
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
SSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJSSSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJSEugene Lazutkin
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERNeotys
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...Ambassador Labs
 
Why kernelspace sucks?
Why kernelspace sucks?Why kernelspace sucks?
Why kernelspace sucks?OpenFest team
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploitTiago Henriques
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containersRed Hat Developers
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allMarc Dutoo
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware
 
Production Debugging at Code Camp Philly
Production Debugging at Code Camp PhillyProduction Debugging at Code Camp Philly
Production Debugging at Code Camp PhillyBrian Lyttle
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013midnite_runr
 
Advanced Production Debugging
Advanced Production DebuggingAdvanced Production Debugging
Advanced Production DebuggingTakipi
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...Sang Don Kim
 

Semelhante a Surge2012 (20)

Dynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open ChallengesDynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open Challenges
 
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
SSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJSSSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJS
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
Rakuten openstack
Rakuten openstackRakuten openstack
Rakuten openstack
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLER
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
 
Why kernelspace sucks?
Why kernelspace sucks?Why kernelspace sucks?
Why kernelspace sucks?
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
 
Production Debugging at Code Camp Philly
Production Debugging at Code Camp PhillyProduction Debugging at Code Camp Philly
Production Debugging at Code Camp Philly
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
 
Advanced Production Debugging
Advanced Production DebuggingAdvanced Production Debugging
Advanced Production Debugging
 
Debugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to LinuxDebugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to Linux
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
 

Último

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Surge2012

  • 1. When Node.js goes wrong: Debugging Node in production Surge 2012 David Pacheco (@dapsays) Joyent
  • 2. The Rise of Node.js • We see Node.js as the confluence of three ideas: • JavaScript’s friendliness and rich support for asynchrony (i.e., closures) • High-performance JavaScript VMs (e.g., V8) • Time-tested system abstractions (i.e. Unix, in the form of streams) • Event-oriented model delivers consistent performance in the presence of long latency events (i.e. no artificial latency bubbles) • Node.js is displacing C for a lot of highly reliable, high performance core infrastructure software (at Joyent alone: DNS, DHCP, SNMP, LDAP, key value stores, public-facing web services, ...). • This has been great for rapid development, but historically has come with a cost in debuggability. • Debugging is decidedly not a new problem... 2
  • 3. The Genesis of debugging “As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.” —Sir Maurice Wilkes, 1913 - 2010 3
  • 4. Debugging Node: a run-away service • February, 2011: Joyent is preparing to launch no.de, a Node PaaS. • During testing, Cloud Analytics becomes intermittently unresponsive. • Quickly traced the problem to a rogue data aggregator using 100% of 1 CPU core and not responding over HTTP or AMQP. • How do you debug this? 4
  • 5. Debugging a run-away service • Check the logs? 5
  • 6. Debugging a run-away service • Check syscall activity (truss/strace)? 6
  • 7. Debugging a run-away service • Check thread stacks: v8::internal::Runtime::SetObjectProperty+0x36d() v8::internal::Runtime_SetProperty+0x73() 0xfe7601f6() 0xfbff31d8() 0xfc468f59() 0xfe8e51cf() ... 0xfe760841() ev_run+0x406() 0xfe8e3dc8() uv_run+0x1c() 0xfe8e24a4() node::Start+0xa9() ... main+0x1b() _start+0x83() 7
  • 8. Give up? • Could add more logging for next time, but we don’t know how to reproduce it. (Plus, what would we log?) • ... but it’s still exhibiting these symptoms! Can’t we figure out why?! Text Text 8
  • 9. The more general problem • The software: a moderately complex concurrent service (that is, where concurrent requests can affect one another). • The deployment: in production, 10s to 1000s of instances. • The problem: ~once/day, one of the instances crashes, leaving behind a stacktrace where an assertion was blown. (Or worse: one of the instances simply misbehaves, leaving nothing to help figure out what’s going wrong.) • How do you debug this? 9
  • 10. A first approach • Add instrumentation (console.log) and redeploy. • How easy is it to deploy a new version? • How risky is it to deploy a new version? What’s the impact? What if it’s a very common code path that you need to instrument? • What if this isn’t your code, but a customer’s, whose deployment you cannot control? • Are you sure you’ll only need to do this once? (You lose credibility with each lap with ops and customers.) • If you’re lucky or if the problem is relatively simple, this can work okay. 10
  • 11. “The postmortem technique” “Experience with the EDSAC has shown that although a high proportion of mistakes can be removed by preliminary checking, there frequently remain mistakes which could only have been detected in the early stages by prolonged and laborious study. Some attention, therefore, has been given to the problem of dealing with mistakes after the programme has been tried and found to fail.” —Stanley Gill, 1926 - 1975 “The diagnosis of mistakes in programmes on the EDSAC”, 1951 11
  • 12. A better approach • For C programs, we have rich tools for postmortem analysis of a system based on a snapshot of its state. • This technique is so old, the term for this state snapshot dates from the dawn of computing: it’s a core dump. • Once a core dump has been generated, either automatically after a crash or on-demand using gcore(1), the program can be immediately restarted to restore service quickly so that engineers can debug the problem asynchronously. • Using the debugger on the core dump, you can inspect all internal program state: global variables, threads, and objects. • Can also use the same tools with a live process. • Can’t we do this with Node.js? 12
  • 13. Debugging dynamic environments • Historically, native postmortem tools have been unable to meaningfully observe dynamic environments like Node. • Such tools would need to translate native abstractions from the dump (symbols, functions, structs) into their higher-level counterparts in the dynamic environment (variables, Functions, Objects). • Some abstractions don’t even exist explicitly in the language itself. (e.g., JavaScript’s event queue) • Node is not alone! The state of the art is no better in Python, Ruby, or PHP, and not nearly solved for Java or Erlang either. 13
  • 14. Aside: MDB • illumos-based systems like SmartOS and OmniOS have MDB, the modular debugger built specifically for postmortem analysis • MDB was originally built for postmortem analysis of the operating system kernel and later extended to applications • Plug-ins (“dmods”) can easily build on one another to deliver powerful postmortem analysis tools, e.g.: • ::stacks coalesces threads based on stack trace, with optional filtering by module, caller, etc • ::findleaks performs postmortem garbage collection on a core dump to find memory leaks in native code • Could we build a dmod for Node? 14
  • 15. mdb_v8: postmortem debugging for Node • With some excruciating pain and some ugly layer violations, we were able to build mdb_v8 • With ::jsstack, prints call stacks, including native C++ and JavaScript functions and arguments. • With ::jsprint, given a pointer, prints out as a C++ object and its JavaScript counterpart. • With ::v8function, given a JSFunction pointer, show the assembly for that function. • With ::findjsobjects, scans the heap to identify how many instances of each object type exist (incredible visibility into memory usage). • Demo 15
  • 16. Remember that run-away Node program? • In February, 2011, we had essentially no way to see what this program was doing. • We saved a core dump in case we might one day have a way to read it. We also added instrumentation in case we saw it again. (We expected to see it again very soon after going to production.) • We didn’t see it again until October, while the mdb_v8 work was underway. So we applied what we had to the original core file... 16
  • 17. And the winner is: > ::jsstack 8046a9c <anonymous> (as exports.bucketize) at lib/heatmap.js position 7838 8046af8 caAggrValueHeatmapImage at lib/ca/ca-agg.js position 48960 ... > 8046a9c::jsframe -v 8046a9c <anonymous> (as exports.bucketize) func: fc435fcd file: lib/heatmap.js posn: position 7838 arg1: fc070719 (JSObject) arg2: fc070709 (JSArray) > fc070719::jsprint { base: 1320886447, height: 281, Invalid input resulted in infinite loop in JavaScript width: 624, Time to root cause: 10 minutes max: 11538462, min: 11538462, ... } 17
  • 18. mdb_v8: How the sausage is made • V8 (libv8.a) includes a small amount (a few KB) of metadata that describes the heap’s classes, type information, and class layouts. (Small enough to include in production builds.) • mdb_v8 knows how to identify stack frames, iterate function arguments, iterate object properties, and walk basic V8 structures (arrays, functions, strings). • mdb_v8 uses the debug metadata encoded in the binary to avoid hardcoding the way heap structures are laid out in memory. (Still has intimate knowledge of things like property iteration.) 18
  • 19. What did you say was in this sausage? • Goal: debugger module shouldn’t hardcode structure offsets and other constants, but rather rely on metadata included in the “node” binary. • Generally speaking, these offsets are computed at compile-time and used in inline functions defined by macros. So they get compiled out and are not available at runtime. • The build process was modified to: • Generate a new C++ source file with references to the constants that we need, using extern “C” constants that our debugger module can look for and read. • Build this with the rest of libv8_base.a. • Result: this “debug metadata” is embedded in $PREFIX/bin/node, and the debugger can read it directly from the core file. • (Should) generally work for 32-bit/64-bit, different architectures, and no matter how complex the expressions for the constants are. 19
  • 20. Problems with this approach • We strongly believe in the general approach of having the debugger grok program state from a snapshot, because it’s comprehensive and has zero runtime overhead, meaning it works in production. (This is a constraint.) • With the current implementation, the debugger module is built and delivered separately from the VM, which means that changes in the VM can (and do) break the debugger module. • Additionally, each debugger feature requires reverse engineering and reimplementing some piece of the VM. • Ideally, the VM would embed programmatic logic for decoding the in-memory state (e.g., iterating objects, iterating object properties, walking the stack, and so on) -- without relying on the VM itself to be running. 20
  • 21. Debugging live programs • Postmortem tools can be applied to live processes, and core files can be generated for running processes. • Examining processes and core dumps is useful for many kinds of failure, but sometimes you want to trace runtime activity. 21
  • 22. DTrace • Provides comprehensive tracing of kernel and application-level events in real-time (from “thread on-CPU” to “Node GC done”) • Scales arbitrarily with the number of traced events. (first class in situ data aggregation) • Suitable for production systems because it’s safe, has minimal overhead (usually no disabled probe effect), and can be enabled/ disabled dynamically (no application restart required). • Open-sourced in 2005. Available on illumos-derived systems like SmartOS and OmniOS, Solaris-derived systems, BSD, and MacOS (Linux ports in progress). 22
  • 23. DTrace in dynamic environments • DTrace instruments the system holistically, which is to say, from the kernel, which poses a challenge for interpreted environments • User-level statically defined tracing (USDT) providers describe semantically relevant points of instrumentation • Some interpreted environments (e.g., Ruby, Python, PHP, Erlang) have added USDT providers that instrument the interpreter itself • This approach is very fine-grained (e.g., every function call) and doesn’t work in JIT’d environments • We decided to take a different tack for Node.js... 23
  • 24. DTrace for Node.js • Given the nature of the paths that we wanted to instrument, we introduced a function into JavaScript that Node can call to get into USDT-instrumented C++ • Introduces disabled probe effect: calling from JavaScript into C++ costs even when probes are not enabled • Use USDT is-enabled probes to minimize disabled probe effect once in C++ • If (and only if) the probe is enabled, prepare a structure for the kernel that allows for translation into a structure that is familiar to node programmers 24
  • 25. DTrace example: Node GC time, per GC #    dtrace –n ‘ node*:::gc-start { self->start = timestamp; } node*:::gc-done/self->start/{ @[“microseconds”] = quantize((timestamp – self->start) / 1000); self->start = 0; }’ microseconds value ------------- Distribution ------------- count 32 | 0 64 |@@@@@ 19 128 |@@ 6 256 |@@ 6 512 |@@@@ 13 1024 |@@@@@ 17 2048 |@@@@@@@ 24 4096 |@@@@@@@@ 29 8192 |@@@@@ 16 16384 |@ 5 32768 |@ 3 65536 | 1 131072 |@ 3 262144 | 0 25
  • 26. DTrace probes in Node.js modules • Our technique is adequate for DTrace probes in the Node.js core, but it’s very cumbersome for pure Node.js modules • Fortunately, Chris Andrews has generalized this technique in his node-dtrace-provider module: https://github.com/chrisa/node-dtrace-provider • This module allows one to declare and fire one’s own probes entirely in JavaScript • Used extensively by Joyent’s Mark Cavage in his node-restify and ldap.js modules, especially to allow for measurement of latency 26
  • 27. DTrace stack traces • ustack(): DTrace looks at (%ebp, %eip) and follows frame pointers to the top of the stack (standard approach). Asynchronously, looks for symbols in the process’s address space to map instruction offsets to function names: 0x80ed9ab becomes malloc+0x16 • Great for C, C++. Doesn’t work for JIT’d environments. • Functions are compiled at runtime => they have no corresponding symbols => the VM must be called upon at runtime to map frames to function names • Garbage collection => functions themselves move around at arbitrary points => mapping of frames to function names must be done “synchronously” • jstack(): Like ustack(), but invokes VM-specific ustack helper, expressed in D and attached to the VM binary, to resolve names. 27
  • 28. DTrace ustack helpers • For JIT’d code, DTrace supports ustack helper mechanism, by which the VM itself includes logic to translate from (frame pointer, instruction pointer) -> human-readable function name • When jstack() action is processed in probe context (in the kernel), DTrace invokes the helper to translate frames: Before After 0xfe772a8c toJSON at native date.js position 39314 0xfe84d962 BasicJSONSerialize at native json.js position 8444 0xfea6b6ed BasicSerializeObject at native json.js position 7622 0xfe84db11 BasicJSONSerialize at native json.js position 8444 0xfeaba5ee stringify at native json.js position 10128 28
  • 29. V8 ustack helper • The ustack helper has to do much of the same work that mdb_v8 does to identify stack frames and pick apart heap objects. • The same debug metadata that’s used for mdb_v8 is used for the helper, but unlike mdb_v8, the helper is embedded directly into the node binary (good!). • The implementation is written in D, and subject to all the same constraints as other DTrace scripts (and then some): no functions, no iteration, no if/else. • Particularly nasty pieces include expanding ConsStrings and binary searching to compute line numbers. • The helper only depends on V8, not Node.js. (With MacOS support for ustack helpers from profile probes, we could use the same helper to profile webapps running under Chrome!) 29
  • 30. Profiling Node with DTrace • “profile” provider: probe that fires N times per second per CPU • ustack()/jstack() actions: collect user-level stacktrace when a probe fires. • Low-overhead runtime profiling (via stack sampling) that can be turned on and off without restarting your program. • Demo. 30
  • 31. Node.js Flame Graph • Visualizing profiling output: • Full, interactive version: http://bit.ly/NMQT1B 31
  • 32. More real-world examples • The infinite loop problem we saw earlier was debugged with mdb_v8, and could have also been debugged with DTrace. • @izs used mdb_v8‘s heap scanning to zero in on a memory leak in Node 0.7 that was seriously impacting several users, including Voxer. • @mranney (Voxer) has used Node profiling + flame graphs to identify several performance issues (unoptimized OpenSSL implementation, poor memory allocation behavior). • Debugging RangeError (stack overflow, with no stack trace). 32
  • 33. Final thoughts • Node is a great for rapidly building complex or distributed system software. But in order to achieve the reliability we expect from such systems, we must be able to understand both fatal and non- fatal failure in production from the first occurrence. • One year ago: we had no way to solve the “infinite loop” problem without adding more logging and hoping to see it again. • Now we have tools to inspect both running and crashed Node programs (mdb_v8 and the DTrace ustack helper), and we’ve used them to debug problems in minutes that we either couldn’t solve at all before or which took days or weeks to solve. • But the postmortem tools are still primitive (like a flashlight in a dark room). Need better support from the VM. 33
  • 34. • Thanks: • @mraleph for help with V8 and landing patches • @izs and the Node core team for help integrating DTrace and MDB support • @brendangregg for flame graphs • @chrisandrews for node-dtrace-provider • @mcavage for putting it to such great use in node-restify and ldap.js • @mranney and Voxer for pushing Node hard, running into lots of issues, and helping us refine the tools to debug them. (God bless the early adopters!) • For more info: • http://dtrace.org/blogs/dap/2012/04/25/profiling-node-js/ • http://dtrace.org/blogs/dap/2012/01/13/playing-with-nodev8-postmortem-debugging/ • https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/mdb/common/modules/v8/mdb_v8.c • https://github.com/joyent/node/blob/master/src/v8ustack.d 34