SlideShare uma empresa Scribd logo
1 de 80
Baixar para ler offline
Tuesday, July 10, 12
Inside the Atlassian OnDemand
               private cloud


               George Barnett
               SAAS Platform Architect



Tuesday, July 10, 12
In 2010 a team of engineers moved into our secret lair
                          (above a pub) to re-imagine our hosted platform.

Tuesday, July 10, 12
6 months later
                                               13,500 VMs



                       Launch - October 2011
                       1000 VMs




Tuesday, July 10, 12
We have a cloud. So what?


Tuesday, July 10, 12
We also had a cloud.. and ..
                          VM sprawl              Poor performance


                       Over provisioning
                                                           Slow deployments


                                 Low visibility into the full stack


Tuesday, July 10, 12
Virtualisation often creates
                    new challenges but does
                  nothing about existing ones.

Tuesday, July 10, 12
Tuesday, July 10, 12
Tuesday, July 10, 12
Tuesday, July 10, 12
Tuesday, July 10, 12
Focus



Tuesday, July 10, 12
Be less flexible about what
                       infrastructure you provide.

Tuesday, July 10, 12
“You can use any database you like, as
                            long as its PostgreSQL 8.4.”



                         #summit12




Tuesday, July 10, 12
• Stop trying to be everything to everyone
                       • (we have other clouds within Atlassian)

                • Lower operational complexity
                • Easier to provide a deeply integrated, well supported
                  toolchain
                • Small test surface matrix




Tuesday, July 10, 12
Fail fast. Learn quickly.


Tuesday, July 10, 12
Do as little
                       as possible


                       deploy and
                         use it



Tuesday, July 10, 12
Block-1
                A small scale model of the initial proposed platform
                architecture. 4 desktop machines and a switch.


                Purpose: Validate design, evaluate failure modes.

                http://history.nasa.gov/Apollo204/blocks.html



Tuesday, July 10, 12
Block-1
                       Applications do not fall over.

                       Network boot assumptions validated.

                       Creation of VM’s over NFS too resource and time
                       intensive. (more on this later)



Tuesday, July 10, 12
Block-2
                A large scale model of the platform architecture.


                Purpose: Validate hardware resource assumptions and
                compare CPU vendors.

                http://history.nasa.gov/Apollo204/blocks.html



Tuesday, July 10, 12
Block-2
                       Customers per GB of RAM metric validated

                       VM Distribution and failover tools work.

                       Initial specs of compute hardware too conservative.
                       Decided to add 50% more RAM.



Tuesday, July 10, 12
Hardware



Tuesday, July 10, 12
Challenge
                Existing platform hardware was a poor fit for our workload.


                Memory and IO were heavily constrained, but CPU was not.




Tuesday, July 10, 12
Monitoring
                We took 6 months worth of monitoring data from our
                existing platform.
                We used this to data to determine the right mix of
                hardware.




Tuesday, July 10, 12
• 10 x Compute nodes (144G RAM, 12 cores, NO disks)
                • 3 x Storage nodes (24 disks)
                • Each rack delivered fully assembled
                       • Unwrap, provide power, networking
                       • Connected to customers in ~2 hours




Tuesday, July 10, 12
Advantage #1
                Reliable.

                Each machine goes through a 2
                day burn in before it goes into the
                rack.



Tuesday, July 10, 12
Advantage #2
                Neat.




Tuesday, July 10, 12
Advantage #3
                Consistent.




Tuesday, July 10, 12
Advantage #4
                Easy to deploy.




Tuesday, July 10, 12
No disks.



Tuesday, July 10, 12
Wait. What?


Tuesday, July 10, 12
Challenge
                Existing compute infrastructure used local disk for swap
                and hypervisor boot.
                Once we got the memory density right, it’s only boot.




Tuesday, July 10, 12
• No disks in compute infrastructure
                       • Avoid spinning 20 more disks per rack for a hypervisor OS

                • Evaluated booting from:
                       • USB drives
                       • NFS
                       • Custom binary initrd image + kernel




Tuesday, July 10, 12
• No disks in compute infrastructure
                       • Avoid spinning 20 more disks per rack for a hypervisor OS

                • Evaluated booting from:
                       • USB drives (unreliable and slow!)
                       • NFS (what if the network goes away?)
                       • Custom binary initrd image + kernel




Tuesday, July 10, 12
• Image is ~170Mb gzipped filesystem
                       • Download on boot, extract into ram - ~400Mb

                • No external dependencies after boot
                • All compute nodes boot from the same image
                       • Reboot to known state




Tuesday, July 10, 12
Compute Node                         Netboot Server
                                           dhcp
                           PXE                                  DHCP
                                         response


                                                                TFTP
                                           gpxe

                                           dhcp
                                                                DHCP
                         Etherboot       response


                                                                HTTP
                                      bootscript

                                      kernel & boot image

                           Boot


Tuesday, July 10, 12
Sharp Edges.
                • No swap == provision carefully
                       • Not a problem if you automate provisioning

                • Treat running hypervisor image like an appliance
                       • Don’t change code - rebuild image and reboot
                       • Doing this often? Too many services in the hypervisor




Tuesday, July 10, 12
Software



Tuesday, July 10, 12
Challenge
                Virtualisation is often inefficient.
                There’s a memory and CPU penalty which is hard to
                avoid.




Tuesday, July 10, 12
Open VZ
                • Linux containers
                       • Basis for Parallels Virtuozzo Containers
                       • LXC isn’t there yet

                • No guest OS kernels
                       • No performance hit
                       • Better resource sharing


Tuesday, July 10, 12
Performance



Tuesday, July 10, 12
http://wiki.openvz.org/Performance/vConsolidate-SMP


Tuesday, July 10, 12
http://wiki.openvz.org/Performance/LAMP


Tuesday, July 10, 12
Resource de-duping



Tuesday, July 10, 12
“Don’t load the same thing
                                 twice”

Tuesday, July 10, 12
Challenge
                Java VM’s aren’t lightweight.




Tuesday, July 10, 12
• Full virtualisation does a poor job at this
                       • 50 VMs = 50 Kernels + 50 caches + 50 shared libs!
                       • Memory de-dupe combats this, but burns CPU.

                • Memory de-dupe works across all OSes
                       • We don’t use Windows.
                       • By being less flexible, we can exploit Linux specific features.




Tuesday, July 10, 12
OpenVZ containers all share
                     the same kernel.

Tuesday, July 10, 12
• Provide a single OS image to all - free benefits:
                       • Shared libraries only load once.
                       • OS is cached only once.
                       • OS image is the same on every instance.




Tuesday, July 10, 12
Challenge
                If all containers share the same OS image, then
                managing state is a nightmare!
                One bad change in one container would break them all!




Tuesday, July 10, 12
• But managing state on multiple machines is a solved
                  problem!
                       • What if you have >10,000 machines.


                • Why are you modifying the OS anyway?




Tuesday, July 10, 12
Does your iPhone upgrade
                        iOS when you install an
                                 app?

Tuesday, July 10, 12
“Fix problems by removing them, not by adding
                                 systems to manage them.”




                        #summit12




Tuesday, July 10, 12
Read-only OS images



Tuesday, July 10, 12
Data classes in a system
                • OS and system daemon code
                • Application code
                • Application and user data




Tuesday, July 10, 12
Tuesday, July 10, 12
Tuesday, July 10, 12
OpenVZ Kernel

Tuesday, July 10, 12
OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OS tools
                       System supplied code

                                              OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OS tools
                                              / - Read Only
                       System supplied code

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OS tools
                                              / - Read Only
                       System supplied code

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OS tools                               Applications, JVM’s
                                              / - Read Only
                       System supplied code                   Configs

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OS tools                               Applications, JVM’s
                                              / - Read Only                         /sw - Read Only
                       System supplied code                   Configs

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container




                       OS tools                               Applications, JVM’s
                                              / - Read Only                         /sw - Read Only
                       System supplied code                   Configs

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container

                                          Application and user data - /data (R/W)




                       OS tools                               Applications, JVM’s
                                              / - Read Only                         /sw - Read Only
                       System supplied code                   Configs

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container

                                          Application and user data - /data (R/W)

                                                     /data/service/




                       OS tools                               Applications, JVM’s
                                              / - Read Only                         /sw - Read Only
                       System supplied code                   Configs

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container

                                          Application and user data - /data (R/W)

                                                     /data/service/




                       OS tools                               Applications, JVM’s
                                              / - Read Only                         /sw - Read Only
                       System supplied code                   Configs

                                                 OpenVZ Kernel

Tuesday, July 10, 12
Container

                                          Application and user data - /data (R/W)

                                                     /data/service/




                       OS tools                               Applications, JVM’s
                                              / - Read Only                         /sw - Read Only
                       System supplied code                   Configs

                                                 OpenVZ Kernel

Tuesday, July 10, 12
How?
                • Storage nodes export /e/ro/ & /e/rw
                • Build an OS distro inside a chroot.
                       • Use whatever tools you are comfortable with.

                • Put this chroot tree in the RO location on storage nodes
                • Make a “data” dir in the RW location for each container


Tuesday, July 10, 12
How?
                • On Container start bind mount:
                       /net/storage-n/e/ro/os/linux-image-v1/
                       -> /vz/<ctid>/root
                • Replace etc, var & tmp with a memfs
                       • Linux expects to be able to write to these

                • Mount containers data dir (RW) to /data

Tuesday, July 10, 12
More benefits
                • Distribute OS images as a simple directory.
                • Prove that environments (Dev, Stg, Prd) are identical
                  using MD5sum.
                • Flip between OS versions by changing a variable




Tuesday, July 10, 12
The Swear Wall



Tuesday, July 10, 12
The swear wall helps prevent death by a thousand cuts.


                       Your team has a gut feeling about whats hurting them -
                       this helps you quantify that feeling and act on the pain.




Tuesday, July 10, 12
Tuesday, July 10, 12
1.!@&*^# Solaris!
                       2.Solaris gets a mark
                       3.Repeat
                       4.Periodically throw out offensive technology
                       5...
                       6.PROFIT!!   (swear less)




Tuesday, July 10, 12
Optimise for the task at hand.


                       Don’t layer solutions onto problems. Get rid of them.




Tuesday, July 10, 12
Thank you!


Tuesday, July 10, 12

Mais conteúdo relacionado

Mais procurados

Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
Jignesh Shah
 
Operational Efficiency Hacks Web20 Expo2009
Operational Efficiency Hacks Web20 Expo2009Operational Efficiency Hacks Web20 Expo2009
Operational Efficiency Hacks Web20 Expo2009
John Allspaw
 
Visão geral sobre Citrix XenServer 6 - Ferramentas e Licenciamento
Visão geral sobre Citrix XenServer 6 - Ferramentas e LicenciamentoVisão geral sobre Citrix XenServer 6 - Ferramentas e Licenciamento
Visão geral sobre Citrix XenServer 6 - Ferramentas e Licenciamento
Lorscheider Santiago
 
Structure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceStructure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performance
Atlassian
 
E2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane ThirionE2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane Thirion
sthirion
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
Memcachedb: The Complete Guide
Memcachedb: The Complete GuideMemcachedb: The Complete Guide
Memcachedb: The Complete Guide
elliando dias
 
The Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup ExperiencesThe Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup Experiences
glbsolutions
 

Mais procurados (20)

Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
 
Operational Efficiency Hacks Web20 Expo2009
Operational Efficiency Hacks Web20 Expo2009Operational Efficiency Hacks Web20 Expo2009
Operational Efficiency Hacks Web20 Expo2009
 
XS Oracle 2009 Intro Slides
XS Oracle 2009 Intro SlidesXS Oracle 2009 Intro Slides
XS Oracle 2009 Intro Slides
 
State of Puppet - Puppet Camp Barcelona 2013
State of Puppet - Puppet Camp Barcelona 2013State of Puppet - Puppet Camp Barcelona 2013
State of Puppet - Puppet Camp Barcelona 2013
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Visão geral sobre Citrix XenServer 6 - Ferramentas e Licenciamento
Visão geral sobre Citrix XenServer 6 - Ferramentas e LicenciamentoVisão geral sobre Citrix XenServer 6 - Ferramentas e Licenciamento
Visão geral sobre Citrix XenServer 6 - Ferramentas e Licenciamento
 
Architecting for a cost effective Windows Azure solution
Architecting for a cost effective Windows Azure solutionArchitecting for a cost effective Windows Azure solution
Architecting for a cost effective Windows Azure solution
 
Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008
 
Structure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceStructure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performance
 
E2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane ThirionE2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane Thirion
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
 
Memcachedb: The Complete Guide
Memcachedb: The Complete GuideMemcachedb: The Complete Guide
Memcachedb: The Complete Guide
 
Ha & drs gotcha's
Ha & drs gotcha'sHa & drs gotcha's
Ha & drs gotcha's
 
Capacity Planning For LAMP
Capacity Planning For LAMPCapacity Planning For LAMP
Capacity Planning For LAMP
 
Let’s talk virtualization
Let’s talk virtualizationLet’s talk virtualization
Let’s talk virtualization
 
Tuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris EnvironmentTuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris Environment
 
Building A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionBuilding A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage Solution
 
Top Troubleshooting Tips and Techniques for Citrix XenServer Deployments
Top Troubleshooting Tips and Techniques for Citrix XenServer DeploymentsTop Troubleshooting Tips and Techniques for Citrix XenServer Deployments
Top Troubleshooting Tips and Techniques for Citrix XenServer Deployments
 
The Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup ExperiencesThe Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup Experiences
 

Destaque

Guaranteed Delivery - Delivering Infrastructure and Code Together - Matt Moor
Guaranteed Delivery - Delivering Infrastructure and Code Together - Matt MoorGuaranteed Delivery - Delivering Infrastructure and Code Together - Matt Moor
Guaranteed Delivery - Delivering Infrastructure and Code Together - Matt Moor
Atlassian
 
Atlassian Q&A - Inside and Out
Atlassian Q&A - Inside and OutAtlassian Q&A - Inside and Out
Atlassian Q&A - Inside and Out
colleenfry
 

Destaque (20)

Puppet Camp Tokyo 2014: Fireballs, ice bats and 1,000,000 plugins: a story of...
Puppet Camp Tokyo 2014: Fireballs, ice bats and 1,000,000 plugins: a story of...Puppet Camp Tokyo 2014: Fireballs, ice bats and 1,000,000 plugins: a story of...
Puppet Camp Tokyo 2014: Fireballs, ice bats and 1,000,000 plugins: a story of...
 
Guaranteed Delivery - Delivering Infrastructure and Code Together - Matt Moor
Guaranteed Delivery - Delivering Infrastructure and Code Together - Matt MoorGuaranteed Delivery - Delivering Infrastructure and Code Together - Matt Moor
Guaranteed Delivery - Delivering Infrastructure and Code Together - Matt Moor
 
How Atlassian's Build Engineering Team Has Scaled to 150k Builds Per Month an...
How Atlassian's Build Engineering Team Has Scaled to 150k Builds Per Month an...How Atlassian's Build Engineering Team Has Scaled to 150k Builds Per Month an...
How Atlassian's Build Engineering Team Has Scaled to 150k Builds Per Month an...
 
Enterprise Day 2015 - beyond software teams (Atlassian)
Enterprise Day 2015 - beyond software teams (Atlassian)Enterprise Day 2015 - beyond software teams (Atlassian)
Enterprise Day 2015 - beyond software teams (Atlassian)
 
Continuous Validation - Lean Startup Machine Sydney 2013
Continuous Validation - Lean Startup Machine Sydney 2013Continuous Validation - Lean Startup Machine Sydney 2013
Continuous Validation - Lean Startup Machine Sydney 2013
 
Atlassian Q&A - Inside and Out
Atlassian Q&A - Inside and OutAtlassian Q&A - Inside and Out
Atlassian Q&A - Inside and Out
 
Tools for better storytelling
Tools for better storytellingTools for better storytelling
Tools for better storytelling
 
Getting and keeping your teams healthy... the Atlassian way
Getting and keeping your teams healthy... the Atlassian wayGetting and keeping your teams healthy... the Atlassian way
Getting and keeping your teams healthy... the Atlassian way
 
JIRA Keynote Summit 2014
JIRA Keynote Summit 2014JIRA Keynote Summit 2014
JIRA Keynote Summit 2014
 
Scaling to 150,000 Builds a Month... and Beyond
Scaling to 150,000 Builds a Month... and BeyondScaling to 150,000 Builds a Month... and Beyond
Scaling to 150,000 Builds a Month... and Beyond
 
AtlasCamp 2015: Confluence making your life EASier
AtlasCamp 2015: Confluence making your life EASierAtlasCamp 2015: Confluence making your life EASier
AtlasCamp 2015: Confluence making your life EASier
 
Tailoring Confluence for Team Productivity
Tailoring Confluence for Team ProductivityTailoring Confluence for Team Productivity
Tailoring Confluence for Team Productivity
 
Turbo-Charge Your JIRA Service Desk with ITSM & Automation Awesomeness
Turbo-Charge Your JIRA Service Desk with ITSM & Automation AwesomenessTurbo-Charge Your JIRA Service Desk with ITSM & Automation Awesomeness
Turbo-Charge Your JIRA Service Desk with ITSM & Automation Awesomeness
 
Understanding git: Voxxed Vienna 2016
Understanding git: Voxxed Vienna 2016Understanding git: Voxxed Vienna 2016
Understanding git: Voxxed Vienna 2016
 
How Atlassian Uses Analytics to Build Better Products
How Atlassian Uses Analytics to Build Better ProductsHow Atlassian Uses Analytics to Build Better Products
How Atlassian Uses Analytics to Build Better Products
 
From the Atlassian Labs: FedEx Champions - Atlassian Summit 2010 - Lightning ...
From the Atlassian Labs: FedEx Champions - Atlassian Summit 2010 - Lightning ...From the Atlassian Labs: FedEx Champions - Atlassian Summit 2010 - Lightning ...
From the Atlassian Labs: FedEx Champions - Atlassian Summit 2010 - Lightning ...
 
The Inside Story of how Atlassian Makes Software
The Inside Story of how Atlassian Makes SoftwareThe Inside Story of how Atlassian Makes Software
The Inside Story of how Atlassian Makes Software
 
6 to 106 in 4 years - The story of the Atlassian Design team
6 to 106 in 4 years - The story of the Atlassian Design team6 to 106 in 4 years - The story of the Atlassian Design team
6 to 106 in 4 years - The story of the Atlassian Design team
 
Agile for the Masses: How to Make Any Team More Effective - John Wetenhall
Agile for the Masses: How to Make Any Team More Effective - John WetenhallAgile for the Masses: How to Make Any Team More Effective - John Wetenhall
Agile for the Masses: How to Make Any Team More Effective - John Wetenhall
 
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
 

Semelhante a Inside the Atlassian OnDemand Private Cloud

Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deployment
zeeg
 
ZFS and FreeBSD Jails
ZFS and FreeBSD JailsZFS and FreeBSD Jails
ZFS and FreeBSD Jails
apeiron
 
Node.js, toy or power tool?
Node.js, toy or power tool?Node.js, toy or power tool?
Node.js, toy or power tool?
Ovidiu Dimulescu
 
Mobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraphMobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraph
JP Lee
 

Semelhante a Inside the Atlassian OnDemand Private Cloud (20)

Ops for Developers
Ops for DevelopersOps for Developers
Ops for Developers
 
NDH2k12 Cloud Computing Security
NDH2k12 Cloud Computing SecurityNDH2k12 Cloud Computing Security
NDH2k12 Cloud Computing Security
 
Java GC - Pause tuning
Java GC - Pause tuningJava GC - Pause tuning
Java GC - Pause tuning
 
Rapid Home Provisioning
Rapid Home ProvisioningRapid Home Provisioning
Rapid Home Provisioning
 
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
FLASH MEMORY: THE BIG DATA from Structure:Data 2012FLASH MEMORY: THE BIG DATA from Structure:Data 2012
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
 
Optimizing WordPress Performance on Shared Web Hosting
Optimizing WordPress Performance on Shared Web HostingOptimizing WordPress Performance on Shared Web Hosting
Optimizing WordPress Performance on Shared Web Hosting
 
What Your CDN Won't Tell You: Optimizing a News Website for Speed and Stability
What Your CDN Won't Tell You: Optimizing a News Website for Speed and StabilityWhat Your CDN Won't Tell You: Optimizing a News Website for Speed and Stability
What Your CDN Won't Tell You: Optimizing a News Website for Speed and Stability
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
 
Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deployment
 
ZFS and FreeBSD Jails
ZFS and FreeBSD JailsZFS and FreeBSD Jails
ZFS and FreeBSD Jails
 
Real world experience with provisioning services
Real world experience with provisioning servicesReal world experience with provisioning services
Real world experience with provisioning services
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
 
Node.js, toy or power tool?
Node.js, toy or power tool?Node.js, toy or power tool?
Node.js, toy or power tool?
 
Mobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraphMobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraph
 
Mobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraphMobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraph
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
 
Congratsyourthedbatoo
CongratsyourthedbatooCongratsyourthedbatoo
Congratsyourthedbatoo
 
Cloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 SlidesCloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 Slides
 
Cloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentationsCloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentations
 

Mais de Atlassian

Design Your Next App with the Atlassian Vendor Sketch Plugin
Design Your Next App with the Atlassian Vendor Sketch PluginDesign Your Next App with the Atlassian Vendor Sketch Plugin
Design Your Next App with the Atlassian Vendor Sketch Plugin
Atlassian
 

Mais de Atlassian (20)

International Women's Day 2020
International Women's Day 2020International Women's Day 2020
International Women's Day 2020
 
10 emerging trends that will unbreak your workplace in 2020
10 emerging trends that will unbreak your workplace in 202010 emerging trends that will unbreak your workplace in 2020
10 emerging trends that will unbreak your workplace in 2020
 
Forge App Showcase
Forge App ShowcaseForge App Showcase
Forge App Showcase
 
Let's Build an Editor Macro with Forge UI
Let's Build an Editor Macro with Forge UILet's Build an Editor Macro with Forge UI
Let's Build an Editor Macro with Forge UI
 
Meet the Forge Runtime
Meet the Forge RuntimeMeet the Forge Runtime
Meet the Forge Runtime
 
Forge UI: A New Way to Customize the Atlassian User Experience
Forge UI: A New Way to Customize the Atlassian User ExperienceForge UI: A New Way to Customize the Atlassian User Experience
Forge UI: A New Way to Customize the Atlassian User Experience
 
Take Action with Forge Triggers
Take Action with Forge TriggersTake Action with Forge Triggers
Take Action with Forge Triggers
 
Observability and Troubleshooting in Forge
Observability and Troubleshooting in ForgeObservability and Troubleshooting in Forge
Observability and Troubleshooting in Forge
 
Trusted by Default: The Forge Security & Privacy Model
Trusted by Default: The Forge Security & Privacy ModelTrusted by Default: The Forge Security & Privacy Model
Trusted by Default: The Forge Security & Privacy Model
 
Designing Forge UI: A Story of Designing an App UI System
Designing Forge UI: A Story of Designing an App UI SystemDesigning Forge UI: A Story of Designing an App UI System
Designing Forge UI: A Story of Designing an App UI System
 
Forge: Under the Hood
Forge: Under the HoodForge: Under the Hood
Forge: Under the Hood
 
Access to User Activities - Activity Platform APIs
Access to User Activities - Activity Platform APIsAccess to User Activities - Activity Platform APIs
Access to User Activities - Activity Platform APIs
 
Design Your Next App with the Atlassian Vendor Sketch Plugin
Design Your Next App with the Atlassian Vendor Sketch PluginDesign Your Next App with the Atlassian Vendor Sketch Plugin
Design Your Next App with the Atlassian Vendor Sketch Plugin
 
Tear Up Your Roadmap and Get Out of the Building
Tear Up Your Roadmap and Get Out of the BuildingTear Up Your Roadmap and Get Out of the Building
Tear Up Your Roadmap and Get Out of the Building
 
Nailing Measurement: a Framework for Measuring Metrics that Matter
Nailing Measurement: a Framework for Measuring Metrics that MatterNailing Measurement: a Framework for Measuring Metrics that Matter
Nailing Measurement: a Framework for Measuring Metrics that Matter
 
Building Apps With Color Blind Users in Mind
Building Apps With Color Blind Users in MindBuilding Apps With Color Blind Users in Mind
Building Apps With Color Blind Users in Mind
 
Creating Inclusive Experiences: Balancing Personality and Accessibility in UX...
Creating Inclusive Experiences: Balancing Personality and Accessibility in UX...Creating Inclusive Experiences: Balancing Personality and Accessibility in UX...
Creating Inclusive Experiences: Balancing Personality and Accessibility in UX...
 
Beyond Diversity: A Guide to Building Balanced Teams
Beyond Diversity: A Guide to Building Balanced TeamsBeyond Diversity: A Guide to Building Balanced Teams
Beyond Diversity: A Guide to Building Balanced Teams
 
The Road(map) to Las Vegas - The Story of an Emerging Self-Managed Team
The Road(map) to Las Vegas - The Story of an Emerging Self-Managed TeamThe Road(map) to Las Vegas - The Story of an Emerging Self-Managed Team
The Road(map) to Las Vegas - The Story of an Emerging Self-Managed Team
 
Building Apps With Enterprise in Mind
Building Apps With Enterprise in MindBuilding Apps With Enterprise in Mind
Building Apps With Enterprise in Mind
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Inside the Atlassian OnDemand Private Cloud

  • 2. Inside the Atlassian OnDemand private cloud George Barnett SAAS Platform Architect Tuesday, July 10, 12
  • 3. In 2010 a team of engineers moved into our secret lair (above a pub) to re-imagine our hosted platform. Tuesday, July 10, 12
  • 4. 6 months later 13,500 VMs Launch - October 2011 1000 VMs Tuesday, July 10, 12
  • 5. We have a cloud. So what? Tuesday, July 10, 12
  • 6. We also had a cloud.. and .. VM sprawl Poor performance Over provisioning Slow deployments Low visibility into the full stack Tuesday, July 10, 12
  • 7. Virtualisation often creates new challenges but does nothing about existing ones. Tuesday, July 10, 12
  • 13. Be less flexible about what infrastructure you provide. Tuesday, July 10, 12
  • 14. “You can use any database you like, as long as its PostgreSQL 8.4.” #summit12 Tuesday, July 10, 12
  • 15. • Stop trying to be everything to everyone • (we have other clouds within Atlassian) • Lower operational complexity • Easier to provide a deeply integrated, well supported toolchain • Small test surface matrix Tuesday, July 10, 12
  • 16. Fail fast. Learn quickly. Tuesday, July 10, 12
  • 17. Do as little as possible deploy and use it Tuesday, July 10, 12
  • 18. Block-1 A small scale model of the initial proposed platform architecture. 4 desktop machines and a switch. Purpose: Validate design, evaluate failure modes. http://history.nasa.gov/Apollo204/blocks.html Tuesday, July 10, 12
  • 19. Block-1 Applications do not fall over. Network boot assumptions validated. Creation of VM’s over NFS too resource and time intensive. (more on this later) Tuesday, July 10, 12
  • 20. Block-2 A large scale model of the platform architecture. Purpose: Validate hardware resource assumptions and compare CPU vendors. http://history.nasa.gov/Apollo204/blocks.html Tuesday, July 10, 12
  • 21. Block-2 Customers per GB of RAM metric validated VM Distribution and failover tools work. Initial specs of compute hardware too conservative. Decided to add 50% more RAM. Tuesday, July 10, 12
  • 23. Challenge Existing platform hardware was a poor fit for our workload. Memory and IO were heavily constrained, but CPU was not. Tuesday, July 10, 12
  • 24. Monitoring We took 6 months worth of monitoring data from our existing platform. We used this to data to determine the right mix of hardware. Tuesday, July 10, 12
  • 25. • 10 x Compute nodes (144G RAM, 12 cores, NO disks) • 3 x Storage nodes (24 disks) • Each rack delivered fully assembled • Unwrap, provide power, networking • Connected to customers in ~2 hours Tuesday, July 10, 12
  • 26. Advantage #1 Reliable. Each machine goes through a 2 day burn in before it goes into the rack. Tuesday, July 10, 12
  • 27. Advantage #2 Neat. Tuesday, July 10, 12
  • 28. Advantage #3 Consistent. Tuesday, July 10, 12
  • 29. Advantage #4 Easy to deploy. Tuesday, July 10, 12
  • 32. Challenge Existing compute infrastructure used local disk for swap and hypervisor boot. Once we got the memory density right, it’s only boot. Tuesday, July 10, 12
  • 33. • No disks in compute infrastructure • Avoid spinning 20 more disks per rack for a hypervisor OS • Evaluated booting from: • USB drives • NFS • Custom binary initrd image + kernel Tuesday, July 10, 12
  • 34. • No disks in compute infrastructure • Avoid spinning 20 more disks per rack for a hypervisor OS • Evaluated booting from: • USB drives (unreliable and slow!) • NFS (what if the network goes away?) • Custom binary initrd image + kernel Tuesday, July 10, 12
  • 35. • Image is ~170Mb gzipped filesystem • Download on boot, extract into ram - ~400Mb • No external dependencies after boot • All compute nodes boot from the same image • Reboot to known state Tuesday, July 10, 12
  • 36. Compute Node Netboot Server dhcp PXE DHCP response TFTP gpxe dhcp DHCP Etherboot response HTTP bootscript kernel & boot image Boot Tuesday, July 10, 12
  • 37. Sharp Edges. • No swap == provision carefully • Not a problem if you automate provisioning • Treat running hypervisor image like an appliance • Don’t change code - rebuild image and reboot • Doing this often? Too many services in the hypervisor Tuesday, July 10, 12
  • 39. Challenge Virtualisation is often inefficient. There’s a memory and CPU penalty which is hard to avoid. Tuesday, July 10, 12
  • 40. Open VZ • Linux containers • Basis for Parallels Virtuozzo Containers • LXC isn’t there yet • No guest OS kernels • No performance hit • Better resource sharing Tuesday, July 10, 12
  • 45. “Don’t load the same thing twice” Tuesday, July 10, 12
  • 46. Challenge Java VM’s aren’t lightweight. Tuesday, July 10, 12
  • 47. • Full virtualisation does a poor job at this • 50 VMs = 50 Kernels + 50 caches + 50 shared libs! • Memory de-dupe combats this, but burns CPU. • Memory de-dupe works across all OSes • We don’t use Windows. • By being less flexible, we can exploit Linux specific features. Tuesday, July 10, 12
  • 48. OpenVZ containers all share the same kernel. Tuesday, July 10, 12
  • 49. • Provide a single OS image to all - free benefits: • Shared libraries only load once. • OS is cached only once. • OS image is the same on every instance. Tuesday, July 10, 12
  • 50. Challenge If all containers share the same OS image, then managing state is a nightmare! One bad change in one container would break them all! Tuesday, July 10, 12
  • 51. • But managing state on multiple machines is a solved problem! • What if you have >10,000 machines. • Why are you modifying the OS anyway? Tuesday, July 10, 12
  • 52. Does your iPhone upgrade iOS when you install an app? Tuesday, July 10, 12
  • 53. “Fix problems by removing them, not by adding systems to manage them.” #summit12 Tuesday, July 10, 12
  • 55. Data classes in a system • OS and system daemon code • Application code • Application and user data Tuesday, July 10, 12
  • 60. Container OpenVZ Kernel Tuesday, July 10, 12
  • 61. Container OpenVZ Kernel Tuesday, July 10, 12
  • 62. Container OS tools System supplied code OpenVZ Kernel Tuesday, July 10, 12
  • 63. Container OS tools / - Read Only System supplied code OpenVZ Kernel Tuesday, July 10, 12
  • 64. Container OS tools / - Read Only System supplied code OpenVZ Kernel Tuesday, July 10, 12
  • 65. Container OS tools Applications, JVM’s / - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  • 66. Container OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  • 67. Container OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  • 68. Container Application and user data - /data (R/W) OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  • 69. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  • 70. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  • 71. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  • 72. How? • Storage nodes export /e/ro/ & /e/rw • Build an OS distro inside a chroot. • Use whatever tools you are comfortable with. • Put this chroot tree in the RO location on storage nodes • Make a “data” dir in the RW location for each container Tuesday, July 10, 12
  • 73. How? • On Container start bind mount: /net/storage-n/e/ro/os/linux-image-v1/ -> /vz/<ctid>/root • Replace etc, var & tmp with a memfs • Linux expects to be able to write to these • Mount containers data dir (RW) to /data Tuesday, July 10, 12
  • 74. More benefits • Distribute OS images as a simple directory. • Prove that environments (Dev, Stg, Prd) are identical using MD5sum. • Flip between OS versions by changing a variable Tuesday, July 10, 12
  • 75. The Swear Wall Tuesday, July 10, 12
  • 76. The swear wall helps prevent death by a thousand cuts. Your team has a gut feeling about whats hurting them - this helps you quantify that feeling and act on the pain. Tuesday, July 10, 12
  • 78. 1.!@&*^# Solaris! 2.Solaris gets a mark 3.Repeat 4.Periodically throw out offensive technology 5... 6.PROFIT!! (swear less) Tuesday, July 10, 12
  • 79. Optimise for the task at hand. Don’t layer solutions onto problems. Get rid of them. Tuesday, July 10, 12