O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Understanding Performance with DTrace

  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Understanding Performance with DTrace

  1. 1. Understanding Performance with DTrace (While the customer yells at you) Adam Leventhal @ahl
  2. 2. DTrace = +
  3. 3. Background • At the time the biggest deal in our history • Performance problems almost immediately • Experimentation / CEO mea culpa / frustration • I visit the customer…
  4. 4. “ZFS is a piece of shit; you could not have made a worse choice.” – Valued Customer
  5. 5. The Plan 1. Use DTrace 2. Figure out all their problems 3. Fix all their problems 4. Be annoyingly magnanimous with the customer
  6. 6. DTrace • To diagnose problems you need data • To collect data you must modify the system • DTrace – Dynamic instrumentation – Configurable data collection – Safe, efficient, concise – Sun Solaris in 2003; open source 2004 – Mac OS X, FreeBSD, Oracle Linux (and others)
  7. 7. DTrace Summary • Applications (most languages) and OS • Broad coverage / stable known probes • Trace between processes / languages • Trace kernel interactions • Powerful data aggregation • Easy one-liners / scripts for tough stuff
  8. 8. Customer System • Basically an NFS server (illumos / OpenZFS) – SAN backend – Ethernet-connected AIX client – Typically moving about 300MB/s – 4 socket, 6 cores/socket • Symptoms – Terrible latency reported from Oracle (AWR) – Sad / angry users
  9. 9. MEASURING NFS LATENCY
  10. 10. nfsv3:::op-read-start, nfsv3:::op-write-start { # ... } nfsv3:::op-read-done, nfsv3:::op-write-done { # ... }
  11. 11. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } nfsv3:::op-read-done, nfsv3:::op-write-done { @ = quantize(timestamp – self->ts); self->ts = 0; }
  12. 12. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } nfsv3:::op-read-done, nfsv3:::op-write-done { @[probename == “op-write-done” ? “write” : “read”] = quantize(timestamp – self->ts); self->ts = 0; }
  13. 13. nfsv3:::op-write-start { self->sync = args[2]->stable != 0; } sdt:::arc-miss, sdt:::blocked-read { self->io = 1; }
  14. 14. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } # set self->io and self->sync nfsv3:::op-read-done, nfsv3:::op-write-done { @[probename == “op-write-done” ? self->sync ? “sync write” : “async write” : self->io ? “uncached read” : “cached read”] = quantize(timestamp – self->ts); self->ts = 0; self->io = 0; self->sync = 0; }
  15. 15. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } # set self->io and self->sync nfsv3:::op-read-done, nfsv3:::op-write-done { @[probename == “op-write-done” ? self->sync ? “sync write” : “async write” : self->io ? “uncached read” : “cached read”, “microseconds”] = quantize(timestamp – self->ts); self->ts = 0; self->io = 0; self->sync = 0; }
  16. 16. cached read microseconds value ------------- Distribution ------------- count 4 | 0 8 |@@ 7 16 |@@@@@@@@@@@ 43 32 |@@@@@@@@@@@@@@@@@@@ 79 64 |@@@@@@ 23 128 |@@ 8 256 | 2 512 | 0 1024 | 1 2048 | 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 4 16 64 256 1024 4096 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000
  17. 17. uncached read microseconds value ------------- Distribution ------------- count 128 | 0 256 |@@@@@@@@ 1612 512 |@@ 508 1024 |@ 200 2048 |@ 192 4096 |@@@@@ 1021 8192 |@@@@@@@@@@@@@@@@@ 3411 16384 |@@@@@@ 1191 32768 |@ 128 65536 | 2 131072 | 1 262144 | 0 0 500 1000 1500 2000 2500 3000 3500 4000 1 8 64 512 4096 32768 0 500 1000 1500 2000 2500 3000 3500 4000 0 50000 100000 150000 200000 250000 300000
  18. 18. async write microseconds value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@ 442 64 |@@@@@@@@@@@@@@@@@ 767 128 |@@@@@ 235 256 |@@ 109 512 |@ 59 1024 | 16 2048 | 15 4096 |@ 29 8192 |@ 28 16384 | 12 32768 | 0 65536 | 0 131072 | 0 262144 | 0 524288 | 11 1048576 |@ 51 2097152 |@ 41 4194304 | 0 0 100 200 300 400 500 600 700 800 900 1 16 256 4096 65536 1048576 0 100 200 300 400 500 600 700 800 900 0 1000000 2000000 3000000 4000000 5000000
  19. 19. sync write microseconds value ------------- Distribution ------------- count 8 | 0 16 | 149 32 |@@@@@@@@@@@@@@@@@@@@@ 8682 64 |@@@@@ 2226 128 |@@@@ 1743 256 |@@ 658 512 | 95 1024 | 20 2048 | 19 4096 | 122 8192 |@@ 744 16384 |@@ 865 32768 |@@ 625 65536 |@ 316 131072 | 113 262144 | 22 524288 | 70 1048576 | 94 2097152 | 16 4194304 | 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 16 256 4096 65536 1048576 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 500000 1000000 1500000 2000000 2500000 13k < 1ms 3k 1ms-100ms 200 > ¼ second
  20. 20. sync write time contribution 0 20000000 40000000 60000000 80000000 100000000 120000000 0 1000000 2000000 3000000 4000000 5000000 0 20000000 40000000 60000000 80000000 100000000 120000000 1 8 64 512 4096 32768 262144 2097152
  21. 21. I/O: read read microseconds value ------------- Distribution ------------- count 16 | 0 32 | 14 64 | 33 128 |@@@@ 1249 256 |@@@ 998 512 |@ 268 1024 |@ 224 2048 |@ 257 4096 |@@@@@@ 1837 8192 |@@@@@@@@@@@@@@@@@@@ 5725 16384 |@@@@ 1313 32768 | 77 65536 | 0 131072 | 1 262144 | 0
  22. 22. I/O: write write microseconds value ------------- Distribution ------------- count 16 | 0 32 | 338 64 | 490 128 | 720 256 |@@@@ 15079 512 |@@@@@ 20342 1024 |@@@@@@@ 27807 2048 |@@@@@@@@ 28897 4096 |@@@@@@@@ 29910 8192 |@@@@@ 20605 16384 |@ 5081 32768 | 1079 65536 | 69 131072 | 5 262144 | 1 524288 | 0
  23. 23. Basic Performance Goals • Get idle out of the system – Figure out why work isn’t getting done • Get idle into the system – Don’t waste cycles – Be more efficient
  24. 24. Where are we going off cpu? nfsv3:::op-write-start { self->ts = timestamp; } sched:::off-cpu /self->ts/ { self->off = timestamp; } sched:::on-cpu /self->off/ { @s[stack()] = quantize((timestamp - self->off) / 1000); self->off = 0; } nfsv3:::op-write-done /self->ts/ { self->ts= 0; }
  25. 25. genunix`cv_wait+0x61 zfs`txg_wait_open+0x7a zfs`dmu_tx_wait+0xb3 zfs`zfs_write+0x686 genunix`fop_write+0x6b nfssrv`rfs3_write+0x50e nfssrv`common_dispatch+0x48b nfssrv`rfs_dispatch+0x2d rpcmod`svc_getreq+0x19c rpcmod`svc_run+0x171 rpcmod`svc_do_run+0x81 nfs`nfssys+0x765 unix`_sys_sysenter_post_swapgs+0x149 value ------------- Distribution ------------- count 4 | 0 8 | 1 16 | 1 32 | 2 64 | 1 128 | 1 256 | 0 512 | 0 1024 | 0 2048 | 0 4096 | 0 8192 | 0 16384 | 0 32768 | 0 65536 | 0 131072 | 0 262144 | 2 524288 | 21 1048576 |@@@@@@@@ 373 2097152 |@@@@@@@@@@@@@@ 675 4194304 |@@@@@@@@@@ 491 8388608 |@@@@@@@ 336 16777216 | 0
  26. 26. How ZFS Processes Data • Three batches of data – “Open” accepting new data – “Quiesced” intermediate state – “Syncing” writing data to disk • Originally no size limit • Throttle to avoid overwhelming backend
  27. 27. sync write microseconds value ------------- Distribution ------------- count 8 | 0 16 | 149 32 |@@@@@@@@@@@@@@@@@@@@@ 8682 64 |@@@@@ 2226 128 |@@@@ 1743 256 |@@ 658 512 | 95 1024 | 20 2048 | 19 4096 | 122 8192 |@@ 744 16384 |@@ 865 32768 |@@ 625 65536 |@ 316 131072 | 113 262144 | 22 524288 | 70 1048576 | 94 2097152 | 16 4194304 | 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 16 256 4096 65536 1048576 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 500000 1000000 1500000 2000000 2500000 13k < 1ms 3k 1ms-100ms 200 > ¼ second
  28. 28. Increasing Queue Depth 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1 4 16 64 256 1024 4096 16384 65536 262144 10 32 64 128 Write IO latency for various queue depths
  29. 29. Profiling • Profile provider – profile-199{ @[usym(arg1)] = count(); } • Pick individual functions to measure ... zio_wait 53us ( 0%) dmu_objset_is_dirty 66us ( 0%) spa_sync_config_object 75us ( 0%) spa_sync_aux_dev 79us ( 0%) list_is_empty 86us ( 0%) dsl_scan_sync 124us ( 0%) ddt_sync 201us ( 0%) txg_list_remove 519us ( 0%) vdev_config_sync 1830us ( 0%) bpobj_iterate 9939us ( 0%) vdev_sync 27907us ( 1%) bplist_iterate 35301us ( 1%) vdev_sync_done 346336us (16%) dsl_pool_sync 1652050us (79%) spa_sync 2077646us (100%)
  30. 30. Lockstat Count indv cuml rcnt nsec Lock Caller 166416 8% 17% 0.00 88424 0xffffff0d4aaa4818 cv_wait+0x69 nsec ------ Time Distribution ------ count Stack 512 |@ 7775 taskq_thread_wait+0x84 1024 |@@ 14577 taskq_thread+0x308 2048 |@@@@@ 31499 thread_start+0x8 4096 |@@@@@@ 36522 8192 |@@@ 19818 16384 |@ 11065 32768 |@ 7302 65536 |@ 7932 131072 | 5537 262144 |@ 7992 524288 |@ 8003 1048576 |@ 6017 2097152 | 2086 4194304 | 198 8388608 | 48 16777216 | 37 33554432 | 7 67108864 | 1
  31. 31. Happy Ending • Re-wrote the OpenZFS write throttle • Removed tons of inefficiencies in the code • Broke up the scorching hot lock 1 10 100 1000 10000 100000 1000000 10000000 0.00001 0.0001 0.001 0.01 0.1 1 10 Seconds (log scale) ZFS OpenZFS
  32. 32. Lessons Learned • Remedies before diagnosis will cause anger • Look at the real problem not a reproduction • Don’t let your software hide pain • Right tools, right questions • Iterate, iterate, iterate • Be magnanimous with the customer
  33. 33. BACKUP SLIDES
  34. 34. “Blah blah CDDL blah blah blah… 💩 😠”

×