Testing in Production (TiP) has moved from taboo to accepted practice owing to its ability to measure reality and provide actionable feedback. These risks can be mitigated by using proven methodologies, methodologies borne of experience and tools built specifically to handle TiP’s unique requirements.
Big Data enables TIP by analyzing user behavior then creating realistic tests. During testing, cloud-based resources are used for the huge data volumes and processed within-memory technology specifically designed for this process
Microsoft’s Seth Eliot is a TiP pioneer and SOASTA’s Rob Holcomb has helped evolve the practice with hundreds of SOASTA customers. Catch this webinar, now on-demand, as they dig into:
How to leverage both active and passive monitoring for TiP
Testing and measuring system stress in production
Experimentation and iterative improvement
How SOASTA CloudTest facilitates TiP for organizations of all sizes
5. About Seth
o Currently with Microsoft Engineering Excellence
focused on helping teams transition to The Cloud
o Previously with Bing, and before that Amazon.com
The author is an employee of Microsoft Corporation.
The views expressed in this presentation are those of the
author and do not necessarily reflect any views or positions
of Microsoft nor imply any relationship between Microsoft
and SOASTA.
o Seth wishes to thank Brad Johnson, Rob Holcomb and SOASTA
for this opportunity
5
10. The Three (or more) V’s of
Big Data [Strata Jan 2012]
11. TestOps
o Monitoring: What Ops does
o Testing: What Test Does
o TestOps: Change (augment) the “signal” used for quality
From Test Results…
…to Big Data
12. The Big Data Signal
o Is often found in Production
o May not always be “Big”
o The Quality Insights however should be Big
o TestOps: use this Big Data for quality assessment
o Big Data is in production
o Therefore we Test in Production
14. The Big Data Pipeline
o Facebook: Developers Instrument Everything
o Amazon: Central Monitoring
o Add some config Trending and Alerts
o Netflix: Custom libraries + AWS CloudWatch
Servers
CPU
15. How does TiP fit into Test
strategy?
Does TiP Replace Up-Front Testing (UFT)?
The Death of BUFT (Big UFT)?
Test
BUFT
Strat
Test
UFT TiP
Strat
16. Four Categories of TiP
o Passive Monitoring
o with Real Data
o Active Monitoring
o with Synthetic Transactions
o Experimentation
o on Real Users
o System Stress
o of the Service and Environment
17
18. Facebook Mines Big Data for Quality
Ganglia
“5 million metrics”
CPU, network usage
[Cook, June 2010]
19. User Performance Testing
o Collect specific telemetry about how long stuff takes from user point
of view
o Real User Data – Real User Experience
o End to End = complete request and response cycle
o From user to back-end round-trip
o Include traffic to partners, dependency response time
o Measured from the user point of view
o From around the world
o From diversity of browsers, OS, devices
20. Hotmail JSI User Performance Testing
Big Data?
o Hotmail's JavaScript Instrumentation (JSI)
o Budget for 500 Million measurements / month
o Scale for backend collection and analysis
o PLT by browser, OS, country, cluster, etc..
o As experienced by Millions of Real Users
21. Hotmail JSI User Performance Testing
• PLT by browser, OS, country, cluster, etc..
22. User Performance Testing Examples
o Hotmail
o Re-architected from the ground up around performance
o Read messages are 50% faster
o Windows Azure™
o Every API: Tracks how many calls were made; how many
succeeded, and how long each call took to process
24. TiP Test Execution
o From the Inside
o Against internal APIs
o Automated
o From the Outside
o From User Entry Point
o E2E Scenario in Production
o Automated
o or Manual
25
25. This looks like this
but in Production
which is OK, but…
Can we leverage
Big Data?
26
26. Active Monitoring
o Microsoft Exchange
o Instead of pass/fail signal look at thousands of continuous runs.
o Did we meet the "five nines" (99.999%) availability for scenario?
o Is scenario slower this release than last? - performance
[Deschamps, Johnston, Jan
2012]
27
27. Test Data Handling
o Synthetic Tests + Real Data = Potential Trouble
o Avoid it
o Tag it
o Clean it up
o Example: Facebook Test Users
o Cannot interact with real users
o Can only friend other Test Users
o Create 100s
o Programmatic Control
28
29. Experimentation
“To have a great idea,
have a lot of them”
-- Thomas Edison
o Try new things… in production
o Build on successes
o Cut your losses… before they get expensive
30. Mitigate Risk with Exposure Control
o Launch a new Service – Everyone sees it
o Exposure Control – only some see it
By Browser By Location By Percent
(scale)
31
31. Example: Controlled Test Flight:
Netflix
1B API requests per day
“Canary” Deployment
[Cockcroft, March 2012]
34. Load Testing in Production
o Injects load on top of real user traffic
o Monitors for performance
oTo assess system capabilities and scalability
o Big Data
o Traffic mix: real user queries, simulate scenarios
o Real time telemetry: Monitor and Back-Off
o After the fact Analysis
o Tune SLAs/Targets
o Tune real-time monitors and alerts
38
35. Load Testing in Production
o Rob will discuss some SOASTA
case studies
o Identified issues that only could
be found in production
o Agile approach to implementation
39
36. Destructive Testing in Production
o Google first year of a new data center [Google DC, 2008]
o 20 rack failures, 1000 server failures and thousands of hard
drive failures
o High Availability
means you must
embrace failure
o How do you test this?
40
37. Netflix Tests its “Rambo Architecture”
o …system has to be able to succeed, no matter
what, even all on its own
o Test with Fault Injection
[Netflix Army, July 2011]
o Netflix Simian Army
o Chaos monkey randomly kills production instance in AWS
o Chaos Gorilla simulates an outage of an entire Amazon AZ
o Janitor Monkey, Security Monkey, Latency Monkey…..
41
46. References
[Google Talk, June 2007] Google: Seattle Conference on Scalability: Lessons In Building Scalable Systems, Reza Behforooz
http://video.google.com/videoplay?docid=6202268628085731280
[Unpingco, Feb 2011] Edward Unpingco; Bug Miner; Internal Microsoft Presentation, Bing Quality Day
[Barranco, Dec 2011] René Barranco; Heuristics-Based Testing; Internal Microsoft Presentation
[Dell, 2012] http://whichtestwon.com/dell%e2%80%99s-site-wide-search-box-test
[Microsoft.com, TechNet] http://technet.microsoft.com/en-us/library/cc627315.aspx
[Cockcroft, March 2012] http://perfcap.blogspot.com/2012/03/ops-devops-and-noops-at-netflix.html
[Deschamps, Johnston, Jan Experiences of Test Automation; Dorothy Graham; Jan 2012; ISBN 0321754069; Chapter: “Moving to the Cloud: The
2012] Evolution of TiP, Continuous Regression Testing in Production”; Ken Johnston, Felix Deschamps
[Google DC, 2008] http://content.dell.com/us/en/gen/d/large-business/google-data-center.aspx?dgc=SM&cid=57468&lid=1491495
http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx
[Kohavi, Oct 2010] Tracking Users’ Clicks and Submits: Tradeoffs between User Experience and Data Loss
http://www.exp-platform.com/Pages/TrackingClicksSubmits.aspx
[Strata Jan 2012] What is big data? - An introduction to the big data landscape
http://radar.oreilly.com/2012/01/what-is-big-data.html
50
47. References, continued
[Netflix Army, July 2011] The Netflix Simian Army; July 2011
http://techblog.netflix.com/2011/07/netflix-simian-army.html
[Google-Wide Profiling, 2010] Ren, Gang, et al. Google-wide Profiling: A Continuous Profiling Infrastructure for Data Centers. [Online] July 30, 2010.
research.google.com/pubs/archive/36575.pdf
[Facebook ships, 2011] http://framethink.blogspot.com/2011/01/how-facebook-ships-code.html
[Google BusinessWeek, April How Google Fuels Its Idea Factory, BusinessWeek, April 29, 2008;
2008] http://www.businessweek.com/magazine/content/08_19/b4083054277984.htm
[IBM 2011] http://www.ibm.com/developerworks/websphere/techjournal/1102_supauth/1102_supauth.html
[Kokogiak, 2006] http://www.kokogiak.com/gedankengang/2006/08/amazons-digital-video-sneak-peek.html
[Google GTAC 2010] Whittaker, James. GTAC 2010: Turning Quality on its Head. [Online] October 29, 2010.
http://www.youtube.com/watch?v=cqwXUTjcabs&feature=BF&list=PL1242F05D3EA83AB1&index=16.
[Google, JW 2009] http://googletesting.blogspot.com/2009/07/plague-of-homelessness.html
[STPCon, 2012] STPCon Spring 2012 - Testing Wanted: Dead or Alive – March 26, 2012
[Cook, June 2010] Ganglia, OSD: Cook, Tom. A Day in the Life of Facebook Operations. Velocity 2010. [Online] June 2010.
http://www.youtube.com/watch?v=T-Xr_PJdNmQ
51
Notas do Editor
SOASTA
SOASTA
Early and often: TargetDon’t wait until the last minute: American GirlTIP for real results: Intuit TurboTax (hundreds of thousands of users) – in a real production environmentTest mix: spike (Nike—shoe release)Integrated monitoring data: Dillard’s DynaTrace—pushing app servers to the limit; stop!