2. Ticketmaster - CoreOS Tectonic Summit 2016
JUSTIN DEAN
● SVP, Platform & Technical Operations
● ~1.75 Years at Ticketmaster
● Passionate about building high
performance organizations
● Nerdy about automating my beer & BBQ
pipeline (see PitmasterPi on github)
3. Ticketmaster - CoreOS Tectonic Summit 2016
OUR STORY
● About Ticketmaster
● Our Journey
● Large Enterprise Challenges &
Lessons Learned
● Why Kubernetes
● CoreOS Partnership
● Up Next
4. Ticketmaster - CoreOS Tectonic Summit 2016
● Publicly Traded Company (LYV)
● $7.6B Revenue
● $25B in GTV (Gross Transaction Value)
● Top 5 eCommerce site
ABOUT US
HISTORY
● 1976 - Founded at Arizona State University
● 1996 - Ticketmaster.com launched
● 2010 - Live Nation and Ticketmaster join forces to power live
experiences
9. Ticketmaster - CoreOS Tectonic Summit 2016
● Every era of software, many not
ready for containers and cloud
● 1970s: Custom VMS OS on
Emulated VAX (The Host)
● 2000s: Xen Cloud, Big-Iron Filers,
NFS, custom built infrastructure
PRE-MODERN TECHNOLOGY Tech
Museum
10. Ticketmaster - CoreOS Tectonic Summit 2016
TECH SCALE
● 21 Ticketing Systems and over 250 unique
products
● 1,400+ people in Product & Tech org
● Custom Private Cloud with over 22,000 VMs
across 7 global data centers
● Over 15,000+ network endpoints across the
world (Venues, Arenas, Kiosks, etc)
● Over 60% VM growth in last year
1 BILLION MACHINES!!*
*Not really :)
11. Ticketmaster - CoreOS Tectonic Summit 2016
{
Onsales = Black Friday every day!
● Huge spikes / demand for tickets
● Global company = across time
zones
● Limited inventory (Beyonce
Tickets!)
● Multiple sales channels
0 to 150M transactions in minutes!
That’s a spike of >8 GBps !!!!!
Self Inflicted DDOS-as-a-
Business
BIG SCALE, BIG CHALLENGES
13. Ticketmaster - CoreOS Tectonic Summit 2016
● Market leader with huge surface area
● Competitors of every size and shape
● Speed and agility are absolutely key
● Scale and complexity of 40-year old business make rapid changes very hard
COMPETITIVE LANDSCAPE
&
14. Ticketmaster - CoreOS Tectonic Summit 2016
TO RECAP...
Public company / market pressure / highly competitive landscape
Legacy tech, not ready for containers
Tech debt with high interest rates
Huge scale and complexity
Black Friday every day
20. Ticketmaster - CoreOS Tectonic Summit 2016
LEAN TRANSFORMATION
● Laser focused on highest priorities
● Created 65+ cross-functional delivery teams
● Eventually all roads led to “blocked by ops”
● Got faster at developing; did not get faster at delivering
2013 2016 2017
21. Ticketmaster - CoreOS Tectonic Summit 2016
AUTONOMOUS DELIVERY TEAMS
● Moved application support teams out of TechOps and into the product
teams directly
● Embedded Systems Engineers into product delivery teams (closer to truly
“cross-functional”)
● Self-Service Tools: Surge towards getting teams out of the ops business
● Self-Sufficient businesses (build it, run it, own it, optimize it, monetize it)
2013 2016 2017
Microbusiness
22. Ticketmaster - CoreOS Tectonic Summit 2016
TRANSFORMATION INSIGHTS
Realized our ability to innovate is dampened by our overly complex software factory:
30-50%
Of development
time spent moving
code around
($60M-$90M
problem)
150
Custom-built ways
to release
products (often
manually)
~50%
Incidents were
preventable;
mostly self-
inflicted
2013 2016 2017
23. Ticketmaster - CoreOS Tectonic Summit 2016
PUBLIC CLOUD
2013 2016 2017
Vehicle for deep
introspection of every
product
Immediate access to
infrastructure as APIs
Forcing function to
modernize all products to
cloud native standard (all
the *.-ilities)
Public Cloud =
Huge carbon filter
24. Ticketmaster - CoreOS Tectonic Summit 2016
CLOUD ENABLEMENT TEAM
● Small team of experts dedicated to developing:
▪ Future state architecture
▪ Path to Public Cloud
▪ Cloud Native Solution Patterns
▪ Cure us of our on-prem addiction (NFS, Always scaled, HW reliance, SW trees, etc)
● Provide Self-Service tooling and documentation for those solutions
● Enable teams to:
▪ Raise their tech maturity
▪ Containerize and retool their app
▪ Migrate themselves to the cloud
2013 2016 2017
25. Ticketmaster - CoreOS Tectonic Summit 2016
CLOUD ENABLEMENT METHOD
7 “Simple” Steps:
1. Containerize your app; use CoreOS
2. Terraform your infrastructure
3. Instrument everything, rich telemetry - no SSH or RDP!
4. Use synthetic monitoring to understand the health of your product
5. Security, security, security
6. Design shared-nothing architecture (no NFS)
7. Build for availability - no single points of failure
2013 2016 2017
26. Ticketmaster - CoreOS Tectonic Summit 2016
READY TO ROLL
● Highly skilled team
● Modern new stack architecture
● Comprehensive DIY toolkit/software
● 1,000+ pages of detailed documentation and solution patterns
2013 2016 2017
27. Ticketmaster - CoreOS Tectonic Summit 2016
Everybody has a plan until they
get punched in the face.
- Mike Tyson
28. Ticketmaster - CoreOS Tectonic Summit 2016
LEARNINGS
Public
Cloud
$$
$
$
$$
$
$
$
$
$
$
$
$
$
$
$
$
$$
$$
$
$
Learn the
API's/Primitives,
Learn to build
Infrastructure,
Learn to code it in
Terraform
Programmatic
Checkout Page
65,000
permutations
on how to use
AWS service
offerings
=
64,999 ways
to get it wrong
Rich set of
Primitives and
API's
100's of Devs,
different tech
stacks
29. Ticketmaster - CoreOS Tectonic Summit 2016
LEARNINGS SUMMARY
● Huge learning curve
● Hard to manage distributed systems at scale
● Wrong people to build & optimize infrastructure (across 100+ teams)
● Baking purchasing decisions into distributed terraform code is BAD
...Spending too much time
writing software to deploy software
instead of writing software to make money
30. Ticketmaster - CoreOS Tectonic Summit 2016
SOLUTION: CONTAINER ORCHESTRATION
● Abstract complexities of infrastructure from development teams,
including how to:
▪ Design
▪ Deploy
▪ Purchase
▪ Optimize
● Allows us to easily manage distributed systems at scale
31. Ticketmaster - CoreOS Tectonic Summit 2016
WE CHOSE KUBERNETES
● Kubernetes started organically appearing all over our company
● Ahead of other container management platform and rapidly improving
● Amazing community with hockey-stick velocity
● Kubernetes APIs and primitives are sweet!
▪ Iteration time is seconds VS minutes
▪ Automated rollbacks
▪ Scaling and self-healing are much faster than ASG’s
● Kubernetes gets us much better utilization of our EC2 instances
● Successfully used it to solve a major stability issues
32. Ticketmaster - CoreOS Tectonic Summit 2016
OPENTSDB ON KUBERNETES
● Critical system for application monitoring
▪ 500k metrics per second
● Large queries during ticketing sales were DDOS’ing OpenTSDB services
● Kubernetes pod health checks detect this and restart the failed containers
● Kubernetes primitives took a service that required hand holding to something
that manages itself
● Learning Moment! A reboot from an automated OS upgrade required manual
intervention
33. Ticketmaster - CoreOS Tectonic Summit 2016
SIMPLIFICATION WITH KUBERNETES
Public
Cloud
$$
$
$
$$
$$
$
$
$
$
$
$
$
$
$
$
$$
$$
$
$
Public Cloud
Kubernetes cluster
optimized by
Cluster Ops team
Kubernetes APIs /
abstraction
Homogenized
deployments via
Kubernetes
$ $
$ $ $
34. Ticketmaster - CoreOS Tectonic Summit 2016
KUBERNETES PROJECT
GOAL: Deploy a Ticketmaster product into a production-grade Kubernetes cluster
and equip team with the skills required to support its operation.
● Fully-remote team of 6
● Tons of work!
▪ How many clusters to build?
▪ Which architecture is right for us?
▪ How should we deploy and test the cluster?
▪ Which networking option to use inside of AWS?
35. Ticketmaster - CoreOS Tectonic Summit 2016
QUESTIONS
● Kubernetes @scale best practices and pitfalls
▪ Kubernetes @Ticketmaster Roadmap:
− Documented Reference Architecture specific to
Ticketmaster based on all the below that includes answers
to any below questions. We need a documented roadmap
for the team to start building based on Apprenda
Experience/Reference architecture.
▪ Guidance on what goes in K8S and what should not (if anything)
▪ What have we missed? What didn’t we ask?
▪ Best practices around secrets; how do companies manage this at
scale? Risks, alternatives, etc.?
▪ Kubernetes upgrades, possible w/o downtime?
▪ Insight on cloud primitives that are not K8S managed (Lambda, S3,
SQS, KMS, RDS, etc….). What are other companies doing here? Are
some of these on the K8S roadmap to orchestrate? Are these
resources managed by “clusterops”, or do delivery teams self-build
outside the k8s workflow? This is called the K8s service catalog
▪ What do they recommend for configuring containers within kubernetes
▪ How do they recommend granting iam roles to containers
● Kubernetes cross-domain (AWS/onprem/other cloud) insight
▪ Good idea? Possible pitfalls?
▪ How to front end AWS and Onprem so we can dynamically run HOT
aws expensive stuff on onprem behind the scenes
▪ Cross AWS region?
▪ If we run Kubernetes in Equinix, how do they recommend logging into
ECR with Kubernetes
● Cluster Networking
▪ What do they recommend for loadbalancer services in aws
▪ Overlay networking
▪ Software defined firewall
▪ Best ecosystem components (calico vs x, etc)
● Team / Operations
▪ How do engineering teams interactive with the cluster, kubectl on their
laptops? Probably not
▪ How long do they see it taking to build enough knowledge for
production support of k8s
▪ Insight on other companies K8S support models (what does ops do,
what does devops do, what are the governance models)
▪ Understanding of Implications on chargeback in AWS. How much
effort goes into tagging and reporting on ephemeral resources
(containers) that move around on AWS primitives (EC2 instances)
● CET (cloud enablement team)
▪ How to marry it into our CET strategy, specifically Terraform
▪ Help on rollout strategy. Start working in context with early adopter
enthusiastic teams asap OR wait until we have it more ‘operationally
mature’. Both tactics have merit, help us think through the strategy
here.
● Persistent storage, period.
▪ Torus, Ceph, EFS, NFS, Gluster, portworx ; pros / cons
▪ Databases (large/shared) on k8s?
▪ Other persistent workloads: elastic, cache, message bus, etc..
● Ongoing Apprenda Engagement
▪ Information regarding their consulting offerings/ prices/ models of
engagement. On prem team? Support team? Customized kubernetes
solutions and maintenance.
▪ Connect us to peer group in Kubernetes space
● Should we just leverage Tectonic?
● Archtics (massive legacy windows/powerbuilder/sybase/rdp over internet to sports
teams) Help
● Prometheus help
overlay networking?
Calico?
Flannel?
VPC networking? Canal?
cluster ops team?
Linkerd?
auth?
how many
etcd nodes?
Terraform vs
Kube API?
Prometheus?
24/7
support?
36. Ticketmaster - CoreOS Tectonic Summit 2016
COMMUNITY ENGAGEMENT
● Spent time with CoreOS, Kelsey Hightower,
Apprenda
● Attended conferences
● Hosted Meetups
● Joined SIGs
● Joined
37. Ticketmaster - CoreOS Tectonic Summit 2016
MILESTONES
Simple Kubernetes cluster
Operationalize Kubernetes
Enterprise Ready / HA
Kubernetes Cluster
Address consumability by appsOn-call production support
First customers go
live on Kubernetes
Expand!
1
2
3
45
6
*
38. Ticketmaster - CoreOS Tectonic Summit 2016
WORK BEGINS...BUT
● Continued to identify new questions
● Had not figured out operational support
● Needed enterprise-level features (auth)
● Needed answers based on experience; not theory
● Needed to accelerate implementation
40. Ticketmaster - CoreOS Tectonic Summit 2016
MILESTONES
✔ Simple Kubernetes cluster
Operationalize Kubernetes
Enterprise Ready / HA
Kubernetes Cluster
Address consumability by appsOn-call production support
First customers go
live on Kubernetes
Expand!
1
2
3
45
6
*
41. Ticketmaster - CoreOS Tectonic Summit 2016
WHY TECTONIC
● Vanilla upstream Kubernetes - No lock in
● Immediate enterprise level confidence
● Supported reference architecture (instead of DIY)
● Recommendations on operational practices, service provider integration, third
party add-ons, etc.
● Production Go-Live Support
● Automatic OS Updates! *Bummer, no more fun upgrade projects!
42. Ticketmaster - CoreOS Tectonic Summit 2016
COREOS PARTNERSHIP
● Providing input on Tectonic roadmap
● Influence the roadmap for things that REALLY matter to Enterprises
● Jointly solve Enterprise + Web Scale challenges
● Help foster the Enterprise Kubernetes community
43. Ticketmaster - CoreOS Tectonic Summit 2016
NEW TICKETMASTER WEB PLATFORM ON K8S
Before:
● Semi-manual stack creation,
bespoke cloudformation +
python boto scripts = 20+
mins to deploy
● Low Confidence
Now:
● K8S + Tectonic, fully
automated = 60 second app
updates
● High Confidence
● Unlocked Daily Delivery
Culture
44. Ticketmaster - CoreOS Tectonic Summit 2016
LET THE
MAKERS MAKE
● We have an amazing company of
Makers, Creators, Visionaries
● We must create the space for them to innovate
and deliver great solutions to the market
45. Ticketmaster - CoreOS Tectonic Summit 2016
RECAP
● Use Kubernetes to abstract infrastructure
complexities
● Have a cluster ops team do the optimization
voodoo; not everyone else
● Stop wasting effort writing software to deploy
software
● Let the Makers Make! Give time and mindshare
back to your most valuable asset (your people) to
do what they do best: Make Things!
46. Ticketmaster - CoreOS Tectonic Summit 2016
TICKETMASTER KUBERNAUTS
Stop by and say hi during the break!
&
Join us at the Sysdig/CoreOS/Ticketmaster
party tonight!
Food, drinks, LIVE BAND!!
Justin Dean Kraig Amador Abe Ingersoll Bindi BelangerJean-François
Nadeau