SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Presented by
Date
Event
Benchmarking Best
Practices 102
Maxim Kuvyrkov
BKK16-300 March 9, 2016
Linaro Connect BKK16
Overview
● Revision (Benchmarking Best Practices 101)
● Reproducibility
● Reporting
Revision
Previously, in Benchmarking-101...
● Approach benchmarking as an experiment.
Be scientific.
● Design the experiment in light of your goal.
● Repeatability:
○ Understand and control noise.
○ Use statistical methods to find truth in noise.
And we briefly mentioned
● Reproducibility
● Reporting
So let’s talk some more about those.
Reproducibility
Reproducibility
An experiment is reproducible if external
teams can run the same experiment over large
periods of time and get commensurate
(comparable) results.
Achieved if others can repeat what we did
and get the same results as us, within the
given confidence interval.
From Repeatability to Reproducibility
We must log enough information that anyone
else can use that information to repeat our
experiments.
We have achieved reproducibility if they can
get the same results, within the given
confidence interval.
Logging: Target
● CPU/SoC/Board
○ Revision, patch level, firmware version…
● Instance of the board
○ Is board 1 really identical to board 2?
● Kernel version and configuration
● Distribution
Example: Target
Board: Juno r0
CPU: 2 * Cortex-A57r0p0, 4 * Cortex-A53r0p0
Firmware version: 0.11.3
Hostname: juno-01
Kernel: 3.16.0-4-generic #1 SMP
Distribution: Debian Jessie
Logging: Build
● Exact toolchain version
● Exact libraries used
● Exact benchmark source
● Build system (scripts, makefiles etc)
● Full build log
Others should be able to acquire and rebuild all
of these components.
Example: Build
Toolchain: Linaro GCC 2015.04
CLI: -O2 -fno-tree-vectorize -DFOO
Libraries: libBar.so.1.3.2, git.linaro.org/foo/bar
#8d30a2c508468bb534bb937bd488b18b8636d3b1
Benchmark: MyBenchmark, git.linaro.org/foo/mb
#d00fb95a1b5dbe3a84fa158df872e1d2c4c49d06
Build System: abe, git.linaro.org/toolchain/abe
#d758ec431131655032bc7de12c0e6f266d9723c2
Logging: Run-time Environment
● Environment variables
● Command-line options passed to benchmark
● Mitigation measures taken
Logging: Other
All of the above may need modification
depending on what is being measured.
● Network-sensitive benchmarks may need
details of network configuration
● IO-sensitive benchmarks may need details
of storage devices
● And so on...
Long Term Storage
All results should be stored with information
required for reproducibility
Results should be kept for the long term
● Someone may ask you for some information
● You may want to do some new analysis in
the future
Reporting
Reporting
● Clear, concise reporting allows others to
utilise benchmark results.
● Does not have to include all data required for
reproducibility.
● But that data should be available.
● Do not assume too much reader knowledge.
○ Err on the side of over-explanation
Reporting: Goal
Explain the goal of the experiment
● What decision will it help you to make?
● What improvement will it allow you to
deliver?
Explain the question that the experiment asks
Explain how the answer to that question helps
you to achieve the goal
Reporting
● Method: Sufficient high-level detail
○ Target, toolchain, build options, source, mitigation
● Limitations: Acknowledge and justify
○ What are the consequences for this experiment?
● Results: Discuss in context of goal
○ Co-locate data, graphs, discussion
○ Include units - numbers without units are useless
○ Include statistical data
○ Use the benchmark’s metrics
Presentation of Results
Graphs are always useful
Tables of raw data also useful
Statistical context essential:
● Number of runs
● (Which) mean
● Standard deviation
Experimental Conditions
Precisely what to report depends on what is
relevant to the results
The following are guidelines
Of course, all the environmental data should be
logged and therefore available on request
Include
Highlight key information, even if it could be
derived. Including:
● All toolchain options
● Noise mitigation measures
● Testing domain
● For e.g. memory sensitive benchmark, report
bus speed, cache hierarchy
Leave Out
Everything not essential to the main point
● Environment variables
● Build logs
● Firmware
● ...
All of this information should be available to be
provided on request.
Graphs:
Strong Suggestions
Speedup Over Baseline (1/3)
Misleading scale
● A is about 3.5%
faster than it was
before, not 103.5%
Obfuscated regression
● B is a regression
Speedup Over Baseline (2/3)
Baseline becomes 0
Title now correct
Regression clear
But, no confidence
interval.
Speedup Over Baseline (3/3)
Error bars tell us more
● Effect on D can be
disregarded
● Effect on A is real,
but noisy.
Labelling (1/2)
What is the unit?
What are we comparing?
Labelling (2/2)
Graphs:
Weak Suggestions
Show the mean
Direction of ‘Good’ (1/2)
“Speedup” changes to
“time to execute”
Direction of “good” flips
If possible, maintain a
constant direction of
good.
Direction of ‘Good’ (2/2)
If you have to change
the direction of ‘good’,
flag the direction
(everywhere)
Can be helpful to flag
it anyway
Consistent Order
Presents
improvements neatly
But, hard to compare
different graphs in the
same report
Scale (1/2)
A few high scores make
other results hard to see
A couple of alternatives
may be more clear...
Scale (2/2)
Summary
Summary
● Log everything, in detail
● Be clear about:
○ What the goal of your experiment is
○ What your method is, and how it achieves your
purpose
● Present results
○ Unambiguously
○ With statistical context
● Relate results to your goal

Mais conteúdo relacionado

Mais procurados

BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
Linaro
 
101 3.6 modify process execution priorities
101 3.6 modify process execution priorities101 3.6 modify process execution priorities
101 3.6 modify process execution priorities
Acácio Oliveira
 
Introducing TDD
Introducing TDDIntroducing TDD
Introducing TDD
Vlad Balan
 
BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr
Linaro
 

Mais procurados (20)

BKK16-210 Migrating to the new dispatcher
BKK16-210 Migrating to the new dispatcherBKK16-210 Migrating to the new dispatcher
BKK16-210 Migrating to the new dispatcher
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
 
Homework solutionsch9
Homework solutionsch9Homework solutionsch9
Homework solutionsch9
 
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
 
Stress driven development
Stress driven developmentStress driven development
Stress driven development
 
RTX Kernal
RTX KernalRTX Kernal
RTX Kernal
 
RR and priority scheduling
RR and priority schedulingRR and priority scheduling
RR and priority scheduling
 
Round robin scheduling
Round robin schedulingRound robin scheduling
Round robin scheduling
 
101 3.6 modify process execution priorities
101 3.6 modify process execution priorities101 3.6 modify process execution priorities
101 3.6 modify process execution priorities
 
Homework solution1
Homework solution1Homework solution1
Homework solution1
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorder
 
BUD17-214: Bus scaling QoS update
BUD17-214: Bus scaling QoS update BUD17-214: Bus scaling QoS update
BUD17-214: Bus scaling QoS update
 
Introducing TDD
Introducing TDDIntroducing TDD
Introducing TDD
 
Automation for developers
Automation for developersAutomation for developers
Automation for developers
 
BUD17-TR02: Upstreaming 101
BUD17-TR02: Upstreaming 101 BUD17-TR02: Upstreaming 101
BUD17-TR02: Upstreaming 101
 
BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr
 

Semelhante a BKK16-300 Benchmarking 102

Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitsky
lopanath
 

Semelhante a BKK16-300 Benchmarking 102 (20)

SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitsky
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
HW03 (1).pdf
HW03 (1).pdfHW03 (1).pdf
HW03 (1).pdf
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
C3 w3
C3 w3C3 w3
C3 w3
 
[SRD UGM] Sharing Session - Software Testing
[SRD UGM] Sharing Session - Software Testing[SRD UGM] Sharing Session - Software Testing
[SRD UGM] Sharing Session - Software Testing
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
 

Mais de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

Mais de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

BKK16-300 Benchmarking 102

  • 1. Presented by Date Event Benchmarking Best Practices 102 Maxim Kuvyrkov BKK16-300 March 9, 2016 Linaro Connect BKK16
  • 2. Overview ● Revision (Benchmarking Best Practices 101) ● Reproducibility ● Reporting
  • 4. Previously, in Benchmarking-101... ● Approach benchmarking as an experiment. Be scientific. ● Design the experiment in light of your goal. ● Repeatability: ○ Understand and control noise. ○ Use statistical methods to find truth in noise.
  • 5. And we briefly mentioned ● Reproducibility ● Reporting So let’s talk some more about those.
  • 7. Reproducibility An experiment is reproducible if external teams can run the same experiment over large periods of time and get commensurate (comparable) results. Achieved if others can repeat what we did and get the same results as us, within the given confidence interval.
  • 8. From Repeatability to Reproducibility We must log enough information that anyone else can use that information to repeat our experiments. We have achieved reproducibility if they can get the same results, within the given confidence interval.
  • 9. Logging: Target ● CPU/SoC/Board ○ Revision, patch level, firmware version… ● Instance of the board ○ Is board 1 really identical to board 2? ● Kernel version and configuration ● Distribution
  • 10. Example: Target Board: Juno r0 CPU: 2 * Cortex-A57r0p0, 4 * Cortex-A53r0p0 Firmware version: 0.11.3 Hostname: juno-01 Kernel: 3.16.0-4-generic #1 SMP Distribution: Debian Jessie
  • 11. Logging: Build ● Exact toolchain version ● Exact libraries used ● Exact benchmark source ● Build system (scripts, makefiles etc) ● Full build log Others should be able to acquire and rebuild all of these components.
  • 12. Example: Build Toolchain: Linaro GCC 2015.04 CLI: -O2 -fno-tree-vectorize -DFOO Libraries: libBar.so.1.3.2, git.linaro.org/foo/bar #8d30a2c508468bb534bb937bd488b18b8636d3b1 Benchmark: MyBenchmark, git.linaro.org/foo/mb #d00fb95a1b5dbe3a84fa158df872e1d2c4c49d06 Build System: abe, git.linaro.org/toolchain/abe #d758ec431131655032bc7de12c0e6f266d9723c2
  • 13. Logging: Run-time Environment ● Environment variables ● Command-line options passed to benchmark ● Mitigation measures taken
  • 14. Logging: Other All of the above may need modification depending on what is being measured. ● Network-sensitive benchmarks may need details of network configuration ● IO-sensitive benchmarks may need details of storage devices ● And so on...
  • 15. Long Term Storage All results should be stored with information required for reproducibility Results should be kept for the long term ● Someone may ask you for some information ● You may want to do some new analysis in the future
  • 17. Reporting ● Clear, concise reporting allows others to utilise benchmark results. ● Does not have to include all data required for reproducibility. ● But that data should be available. ● Do not assume too much reader knowledge. ○ Err on the side of over-explanation
  • 18. Reporting: Goal Explain the goal of the experiment ● What decision will it help you to make? ● What improvement will it allow you to deliver? Explain the question that the experiment asks Explain how the answer to that question helps you to achieve the goal
  • 19. Reporting ● Method: Sufficient high-level detail ○ Target, toolchain, build options, source, mitigation ● Limitations: Acknowledge and justify ○ What are the consequences for this experiment? ● Results: Discuss in context of goal ○ Co-locate data, graphs, discussion ○ Include units - numbers without units are useless ○ Include statistical data ○ Use the benchmark’s metrics
  • 20. Presentation of Results Graphs are always useful Tables of raw data also useful Statistical context essential: ● Number of runs ● (Which) mean ● Standard deviation
  • 21. Experimental Conditions Precisely what to report depends on what is relevant to the results The following are guidelines Of course, all the environmental data should be logged and therefore available on request
  • 22. Include Highlight key information, even if it could be derived. Including: ● All toolchain options ● Noise mitigation measures ● Testing domain ● For e.g. memory sensitive benchmark, report bus speed, cache hierarchy
  • 23. Leave Out Everything not essential to the main point ● Environment variables ● Build logs ● Firmware ● ... All of this information should be available to be provided on request.
  • 25. Speedup Over Baseline (1/3) Misleading scale ● A is about 3.5% faster than it was before, not 103.5% Obfuscated regression ● B is a regression
  • 26. Speedup Over Baseline (2/3) Baseline becomes 0 Title now correct Regression clear But, no confidence interval.
  • 27. Speedup Over Baseline (3/3) Error bars tell us more ● Effect on D can be disregarded ● Effect on A is real, but noisy.
  • 28. Labelling (1/2) What is the unit? What are we comparing?
  • 32. Direction of ‘Good’ (1/2) “Speedup” changes to “time to execute” Direction of “good” flips If possible, maintain a constant direction of good.
  • 33. Direction of ‘Good’ (2/2) If you have to change the direction of ‘good’, flag the direction (everywhere) Can be helpful to flag it anyway
  • 34. Consistent Order Presents improvements neatly But, hard to compare different graphs in the same report
  • 35. Scale (1/2) A few high scores make other results hard to see A couple of alternatives may be more clear...
  • 38. Summary ● Log everything, in detail ● Be clear about: ○ What the goal of your experiment is ○ What your method is, and how it achieves your purpose ● Present results ○ Unambiguously ○ With statistical context ● Relate results to your goal