SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Building,
deploying and
running
production code
at Dropbox
Leonid Vasilyev, SRE at Dropbox. United Dev Conf ’17
•Intro & Background
•Building Code
•Deploying Packages
•Running Services
•Recap & Conclusion
Intro &
Background
Dropbox Backend Infrastructure:
Something one might call a “Hybrid Cloud”.
Few datacenters + AWS VPCs + Edge Network (POPs).
Running Ubuntu Server, Puppet/Chef and Nagios.
Rest of the stack is pretty custom.
Dropbox today is not just “file storage”,
but dozens of services,
running on tens of thousands of machines.
I. Building Code
Early days: few code repos, mostly Python.
No build system.
Period.
Why Bother?
Ruby Python
Java
C/C++
Node.js
Rust
Go
PHP
Exhibit 1: Runtimes & Packages (ಠ_ಠ)
Problems:
Repo is growing, new languages are in use:
Golang, Node.js, Rust.
No way to track dependencies,
dependencies installed in runtime via Puppet.
Global Encap repo deployed via rsync onto the whole fleet.
In search of a better build system
What are the requirements?
• Fast
• Reproducible
• Hermetic
• Flexible
• Explicit dependencies
A Historical Perspective*
•(2006) Google got annoyed with Make and began “Blaze”
•(2012) Looks like ex-googlers at Twitter were missing “Blaze”, hence
began “Pants”
•(2013) Looks like ex-googlers at Facebook were missing “Blaze”,
hence began “Buck”
•(2014) Google realised what’s going on and released “Blaze” as
“Bazel”
•(2016) Ex-googlers at Thought Machine are still missing “Blaze”,
hence began “Please”, in Go this time :)
Bazel Concepts
•WORKSPACE: one per repo, defines external
dependencies
•BUILD files: Python-like DSL for describing build
targets (test is also a build target)
•`*.bzl` files: Macro and extensions
•`//dropbox/aws:ec2allocate` — labels to
specify build targets
native.new_http_archive(
name = "six_archive",
urls = [
“http://pypi.python.org/.../six-1.10.0.tar.gz”,
],
sha256 = “…”,
strip_prefix = "six-1.10.0",
build_file = str(Label("//third_party:six.BUILD")),
)
External Dependencies(1)
py_library(
name = "six",
srcs = ["six.py"],
visibility = ["//visibility:public"],
)
External Dependencies(2)
py_library(
name = "platform_benchmark",
srcs = ["platform/benchmark.py"],
deps = [
":client",
":platform",
"@six_archive//:six",
],
)
External Dependencies(3)
Bazel adoption at Dropbox
•Migration started in July, 2015
•~6,400 Bazel BUILD files (~314,094 lines)
•~9,000 lines of custom *.bzl code
•Custom rules for: python, golang, rust, node.js
•BUILD file generator for Cmake, Python
•Mostly done, still work in progress …
Migration Status
Key Insights
•Robust remote build cache is essential.
•Keep explicit dependencies between
components.
•It is possible to retrofit new build system into
old codebase.
•Bazel, Pants, Buck, Please — pick one, or write
your own :)
II. Deploying Packages
Deployment System: YAPS
Service Configuration: Gestalt
Pystachio is used to specify the following schema:
class Project(Struct):
name = Required(String)
owner = Required(String)
deployments = Required(List(Deployment))
class Deployment(Struct):
name = Required(String)
build = Required(Build)
kick = Required(Kick)
dependencies = List(Dependency)
Service Configuration: Gestalt
class Build(Struct):
name = Required(String)
bazel_targets = Required(List(String))
timeout_s = Default(Integer, 3600)
class Kick(Struct):
name = Required(String)
package_dir = Required(String)
dbxinit = Required(Program)
host_kick_ration = Default(Float, 0.25)
Service Configuration: Gestalt
class Program(Struct):
name = Required(String)
num_procs = Default(Integer, 1)
env = Map(String, String)
cmd = Required(List(String))
limits = Limits # rss, max_fd, nice
logfile = Default(String, DEFAULT_LOG_DIR)
root_fs = String # docker os image
health_check = HealthCheck
Service Configuration: Gestalt
•About 500 files and 60,000 SLOC
•Complex evaluation rules
•Configuration tends to become a Turing-complete
language
•Advanced linters and validation needed
•Specifying resource limits is tricky
Gestalt: Challenges
YAPS Packages
YAPS Packages: Historical
approach
•Install Debian packages via Puppet/Chef
•Use Python’s Virtualenv & PyPI
•Encap — “Bag of rats” dependencies :)
•Blast the whole repo via rsync every few minutes
by CRON
YAPS Packages: Current approach
•SquashFS images. Native Linux in-kernel support
•Transparent compression and de-duplication
•Read-only mounts, +1 from security
•Loopback device mounts are fast
•SquashFS image has 1+ Bazel targets and
transitive dependency closure for each target
$ cd /srv/aws-tools
$ tree -L 3
.
|-- ec2terminate # <- executable file
`-- ec2terminate.runfiles # <- transitive closure
|-- MANIFEST # <- list of all files
`-- __main__ # <- dependencies
|-- _solib_k8
|-- configs
|-- dbops
|-- devsecrets
|-- dpkg
`-- dropbox
...
YAPS Packages: Challenges
•*.pyc files have to be in the package
•Unmountable packages due to open file descriptors
•If code has to be modified on the prod server
(YOLO), special procedure — “Hijacking” is
required
•Full package has be pushed even with a 1 line
change (Xdelta compression might help)
III. Running Services
Process Manager: Historical
approach
•Using Supervisord and configuration generated by
Puppet
•Update of Supervisord requires tasks to be restarted
•Loosing tasks if Supervisord killed by OOM
•Supervisord is really old, from 2004 (has
XMLRPC?!)
Process Manager: Current
approach
•Using Dbxinit: in-house project written in Go
•Keeps local state, thus can be updated without tasks
downtime, can survive OOM
•Supports health-checks for tasks
•Has resource limits: RSS, max fds, OOM score
•Speaks JSON HTTP
Configuration Management:
Historical approach
•Puppet 2.x in server mode
•Perf problems with server as fleet grew in size
•No linters or unit tests, caused a lot of errors
•“Blast to the fleet” deployment model
•Single global run via CRON, runs all modules - slow
Configuration Management:
Current approach
•Chef 12.x in Zero mode
•Invested heavily into linters and unit-testing
•Easy to test on a single production machine
•Has 3 runs: “platform”, “global” and “service”
•Cookbooks deployed via YAPS
•Generally trying to move service owners out of CM
It wouldn’t be 2017 if not … containers!
Containers: Runc for stateless
services
•Runc is integrated with Dbxinit, each task runs inside
its own container
•Runc uses minimal Ubuntu Docker image
•Main use case is dependency isolation via mount
namespaces
•Doesn’t use network namespaces yet
Containers: Challenges
•Log rotation. Logs should be moved from the box
ASAP, since machine with stateless service can be
shut down without notice
•Looking into ELK stack to solve that problem
•Resource accounting. Currently doesn’t enforce any
resource limits
VS
Human Automation
VS
Human Automation
BRAINS!
Attitude
Ops Automation: Nagios &
Naoru
•Nagios runs on all production machines & AWS
EC2 instances
•Common problems are automatically fixed by auto-
remediation system called “Naoru”
•Its input is a stream of Nagios alerts and output is a
set of remediations that can be executed
automatically
Talk Summary
Building:
•Unified build system with clean dependencies
Deploying:
•One deployment system and sound packaging
Running:
•Robust process management and automation of
simple tasks
Thank You!
{twttr: @leo_vsl}

Mais conteúdo relacionado

Mais procurados

Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cFrank Munz
 
Magento 2 Workflows
Magento 2 WorkflowsMagento 2 Workflows
Magento 2 WorkflowsRyan Street
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 
London devops logging
London devops loggingLondon devops logging
London devops loggingTomas Doran
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
 
[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at NuxeoNuxeo
 
Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSamantha Quiñones
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
 
Varnish Configuration Step by Step
Varnish Configuration Step by StepVarnish Configuration Step by Step
Varnish Configuration Step by StepKim Stefan Lindholm
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniZalando Technology
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Shirshanka Das
 
Nagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - FailoverNagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - FailoverNagios
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redisDaeMyung Kang
 
"High-load is at the intersection of DevOps and PHP development",
"High-load is at the intersection of DevOps and PHP development", "High-load is at the intersection of DevOps and PHP development",
"High-load is at the intersection of DevOps and PHP development", Fwdays
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformWangda Tan
 
Oracle Java Cloud Service JCS (and WebLogic 12c) - What you Should Know
Oracle Java Cloud Service JCS (and WebLogic 12c) - What you Should KnowOracle Java Cloud Service JCS (and WebLogic 12c) - What you Should Know
Oracle Java Cloud Service JCS (and WebLogic 12c) - What you Should KnowFrank Munz
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)Ontico
 

Mais procurados (20)

3 Git
3 Git3 Git
3 Git
 
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
 
Magento 2 Workflows
Magento 2 WorkflowsMagento 2 Workflows
Magento 2 Workflows
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo
 
Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with Varnish
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Varnish Configuration Step by Step
Varnish Configuration Step by StepVarnish Configuration Step by Step
Varnish Configuration Step by Step
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 
Nagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - FailoverNagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - Failover
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
"High-load is at the intersection of DevOps and PHP development",
"High-load is at the intersection of DevOps and PHP development", "High-load is at the intersection of DevOps and PHP development",
"High-load is at the intersection of DevOps and PHP development",
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 
Oracle Java Cloud Service JCS (and WebLogic 12c) - What you Should Know
Oracle Java Cloud Service JCS (and WebLogic 12c) - What you Should KnowOracle Java Cloud Service JCS (and WebLogic 12c) - What you Should Know
Oracle Java Cloud Service JCS (and WebLogic 12c) - What you Should Know
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
 

Semelhante a Building, Deploying and Running Production Code at Dropbox

Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013dotCloud
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
"Building, deploying and running production code at Dropbox" Васильев Леонид,...
"Building, deploying and running production code at Dropbox" Васильев Леонид,..."Building, deploying and running production code at Dropbox" Васильев Леонид,...
"Building, deploying and running production code at Dropbox" Васильев Леонид,...it-people
 
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSCloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSAWS Vietnam Community
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013Docker, Inc.
 
Docker introduction
Docker introductionDocker introduction
Docker introductiondotCloud
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013dotCloud
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
Intro to Docker November 2013
Intro to Docker November 2013Intro to Docker November 2013
Intro to Docker November 2013Docker, Inc.
 
SQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall WebinarSQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall WebinarTravis Wright
 
Australian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackAustralian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackMatt Ray
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Arun prasath
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with ChefMatt Ray
 
Introduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkIntroduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkJérôme Petazzoni
 
Docker introduction
Docker introductionDocker introduction
Docker introductionWalter Liu
 
eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...
eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...
eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...Gaetano Giunta
 

Semelhante a Building, Deploying and Running Production Code at Dropbox (20)

Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
"Building, deploying and running production code at Dropbox" Васильев Леонид,...
"Building, deploying and running production code at Dropbox" Васильев Леонид,..."Building, deploying and running production code at Dropbox" Васильев Леонид,...
"Building, deploying and running production code at Dropbox" Васильев Леонид,...
 
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSCloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
Intro to Docker November 2013
Intro to Docker November 2013Intro to Docker November 2013
Intro to Docker November 2013
 
SQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall WebinarSQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall Webinar
 
Australian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackAustralian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStack
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment
 
OpenStack Summit
OpenStack SummitOpenStack Summit
OpenStack Summit
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
 
Introduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkIntroduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New York
 
Short-Training asp.net vNext
Short-Training asp.net vNextShort-Training asp.net vNext
Short-Training asp.net vNext
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Docker-Intro
Docker-IntroDocker-Intro
Docker-Intro
 
eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...
eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...
eZ Publish 5: from zero to automated deployment (and no regressions!) in one ...
 

Mais de IT Event

Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...
Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...
Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...IT Event
 
Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"
Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"
Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"IT Event
 
Max Voloshin - "Organization of frontend development for products with micros...
Max Voloshin - "Organization of frontend development for products with micros...Max Voloshin - "Organization of frontend development for products with micros...
Max Voloshin - "Organization of frontend development for products with micros...IT Event
 
Roman Romanovsky, Sergey Rak - "JavaScript в IoT "
Roman Romanovsky, Sergey Rak - "JavaScript в IoT "Roman Romanovsky, Sergey Rak - "JavaScript в IoT "
Roman Romanovsky, Sergey Rak - "JavaScript в IoT "IT Event
 
Konstantin Krivlenia - "Continuous integration for frontend"
Konstantin Krivlenia - "Continuous integration for frontend"Konstantin Krivlenia - "Continuous integration for frontend"
Konstantin Krivlenia - "Continuous integration for frontend"IT Event
 
Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"
Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"
Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"IT Event
 
Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"
Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"
Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"IT Event
 
Vladimir Grinenko - "Dependencies in component web done right"
Vladimir Grinenko - "Dependencies in component web done right"Vladimir Grinenko - "Dependencies in component web done right"
Vladimir Grinenko - "Dependencies in component web done right"IT Event
 
Dmitry Bartalevich - "How to train your WebVR"
Dmitry Bartalevich - "How to train your WebVR"Dmitry Bartalevich - "How to train your WebVR"
Dmitry Bartalevich - "How to train your WebVR"IT Event
 
Aleksey Bogachuk - "Offline Second"
Aleksey Bogachuk - "Offline Second"Aleksey Bogachuk - "Offline Second"
Aleksey Bogachuk - "Offline Second"IT Event
 
James Allardice - "Building a better login with the credential management API"
James Allardice - "Building a better login with the credential management API"James Allardice - "Building a better login with the credential management API"
James Allardice - "Building a better login with the credential management API"IT Event
 
Fedor Skuratov "Dark Social: as messengers change the market of social media ...
Fedor Skuratov "Dark Social: as messengers change the market of social media ...Fedor Skuratov "Dark Social: as messengers change the market of social media ...
Fedor Skuratov "Dark Social: as messengers change the market of social media ...IT Event
 
Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"
Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"
Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"IT Event
 
Алексей Рагозин "Java и linux борьба за микросекунды"
Алексей Рагозин "Java и linux борьба за микросекунды"Алексей Рагозин "Java и linux борьба за микросекунды"
Алексей Рагозин "Java и linux борьба за микросекунды"IT Event
 
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"IT Event
 
Наш ответ Uber’у
Наш ответ Uber’уНаш ответ Uber’у
Наш ответ Uber’уIT Event
 
Александр Крашенинников "Hadoop High Availability: опыт Badoo"
Александр Крашенинников "Hadoop High Availability: опыт Badoo"Александр Крашенинников "Hadoop High Availability: опыт Badoo"
Александр Крашенинников "Hadoop High Availability: опыт Badoo"IT Event
 
Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...
Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...
Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...IT Event
 
Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"IT Event
 
Andrew Stain "User acquisition"
Andrew Stain "User acquisition"Andrew Stain "User acquisition"
Andrew Stain "User acquisition"IT Event
 

Mais de IT Event (20)

Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...
Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...
Denis Radin - "Applying NASA coding guidelines to JavaScript or airspace is c...
 
Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"
Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"
Sara Harkousse - "Web Components: It's all rainbows and unicorns! Is it?"
 
Max Voloshin - "Organization of frontend development for products with micros...
Max Voloshin - "Organization of frontend development for products with micros...Max Voloshin - "Organization of frontend development for products with micros...
Max Voloshin - "Organization of frontend development for products with micros...
 
Roman Romanovsky, Sergey Rak - "JavaScript в IoT "
Roman Romanovsky, Sergey Rak - "JavaScript в IoT "Roman Romanovsky, Sergey Rak - "JavaScript в IoT "
Roman Romanovsky, Sergey Rak - "JavaScript в IoT "
 
Konstantin Krivlenia - "Continuous integration for frontend"
Konstantin Krivlenia - "Continuous integration for frontend"Konstantin Krivlenia - "Continuous integration for frontend"
Konstantin Krivlenia - "Continuous integration for frontend"
 
Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"
Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"
Illya Klymov - "Vue.JS: What did I swap React for in 2017 and why?"
 
Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"
Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"
Evgeny Gusev - "A circular firing squad: How technologies drag frontend down"
 
Vladimir Grinenko - "Dependencies in component web done right"
Vladimir Grinenko - "Dependencies in component web done right"Vladimir Grinenko - "Dependencies in component web done right"
Vladimir Grinenko - "Dependencies in component web done right"
 
Dmitry Bartalevich - "How to train your WebVR"
Dmitry Bartalevich - "How to train your WebVR"Dmitry Bartalevich - "How to train your WebVR"
Dmitry Bartalevich - "How to train your WebVR"
 
Aleksey Bogachuk - "Offline Second"
Aleksey Bogachuk - "Offline Second"Aleksey Bogachuk - "Offline Second"
Aleksey Bogachuk - "Offline Second"
 
James Allardice - "Building a better login with the credential management API"
James Allardice - "Building a better login with the credential management API"James Allardice - "Building a better login with the credential management API"
James Allardice - "Building a better login with the credential management API"
 
Fedor Skuratov "Dark Social: as messengers change the market of social media ...
Fedor Skuratov "Dark Social: as messengers change the market of social media ...Fedor Skuratov "Dark Social: as messengers change the market of social media ...
Fedor Skuratov "Dark Social: as messengers change the market of social media ...
 
Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"
Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"
Андрей Зайчиков "Архитектура распределенных кластеров NoSQL на AWS"
 
Алексей Рагозин "Java и linux борьба за микросекунды"
Алексей Рагозин "Java и linux борьба за микросекунды"Алексей Рагозин "Java и linux борьба за микросекунды"
Алексей Рагозин "Java и linux борьба за микросекунды"
 
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
 
Наш ответ Uber’у
Наш ответ Uber’уНаш ответ Uber’у
Наш ответ Uber’у
 
Александр Крашенинников "Hadoop High Availability: опыт Badoo"
Александр Крашенинников "Hadoop High Availability: опыт Badoo"Александр Крашенинников "Hadoop High Availability: опыт Badoo"
Александр Крашенинников "Hadoop High Availability: опыт Badoo"
 
Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...
Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...
Анатолий Пласковский "Миллионы карточных платежей за месяц, или как потерять ...
 
Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"
 
Andrew Stain "User acquisition"
Andrew Stain "User acquisition"Andrew Stain "User acquisition"
Andrew Stain "User acquisition"
 

Último

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Último (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Building, Deploying and Running Production Code at Dropbox

  • 1.
  • 2. Building, deploying and running production code at Dropbox Leonid Vasilyev, SRE at Dropbox. United Dev Conf ’17
  • 3. •Intro & Background •Building Code •Deploying Packages •Running Services •Recap & Conclusion
  • 5.
  • 6. Dropbox Backend Infrastructure: Something one might call a “Hybrid Cloud”. Few datacenters + AWS VPCs + Edge Network (POPs). Running Ubuntu Server, Puppet/Chef and Nagios. Rest of the stack is pretty custom. Dropbox today is not just “file storage”, but dozens of services, running on tens of thousands of machines.
  • 7.
  • 9. Early days: few code repos, mostly Python. No build system. Period.
  • 12. Problems: Repo is growing, new languages are in use: Golang, Node.js, Rust. No way to track dependencies, dependencies installed in runtime via Puppet. Global Encap repo deployed via rsync onto the whole fleet.
  • 13. In search of a better build system What are the requirements? • Fast • Reproducible • Hermetic • Flexible • Explicit dependencies
  • 14.
  • 15. A Historical Perspective* •(2006) Google got annoyed with Make and began “Blaze” •(2012) Looks like ex-googlers at Twitter were missing “Blaze”, hence began “Pants” •(2013) Looks like ex-googlers at Facebook were missing “Blaze”, hence began “Buck” •(2014) Google realised what’s going on and released “Blaze” as “Bazel” •(2016) Ex-googlers at Thought Machine are still missing “Blaze”, hence began “Please”, in Go this time :)
  • 16. Bazel Concepts •WORKSPACE: one per repo, defines external dependencies •BUILD files: Python-like DSL for describing build targets (test is also a build target) •`*.bzl` files: Macro and extensions •`//dropbox/aws:ec2allocate` — labels to specify build targets
  • 17. native.new_http_archive( name = "six_archive", urls = [ “http://pypi.python.org/.../six-1.10.0.tar.gz”, ], sha256 = “…”, strip_prefix = "six-1.10.0", build_file = str(Label("//third_party:six.BUILD")), ) External Dependencies(1)
  • 18. py_library( name = "six", srcs = ["six.py"], visibility = ["//visibility:public"], ) External Dependencies(2)
  • 19. py_library( name = "platform_benchmark", srcs = ["platform/benchmark.py"], deps = [ ":client", ":platform", "@six_archive//:six", ], ) External Dependencies(3)
  • 20. Bazel adoption at Dropbox
  • 21. •Migration started in July, 2015 •~6,400 Bazel BUILD files (~314,094 lines) •~9,000 lines of custom *.bzl code •Custom rules for: python, golang, rust, node.js •BUILD file generator for Cmake, Python •Mostly done, still work in progress … Migration Status
  • 22. Key Insights •Robust remote build cache is essential. •Keep explicit dependencies between components. •It is possible to retrofit new build system into old codebase. •Bazel, Pants, Buck, Please — pick one, or write your own :)
  • 25. Service Configuration: Gestalt Pystachio is used to specify the following schema:
  • 26. class Project(Struct): name = Required(String) owner = Required(String) deployments = Required(List(Deployment)) class Deployment(Struct): name = Required(String) build = Required(Build) kick = Required(Kick) dependencies = List(Dependency) Service Configuration: Gestalt
  • 27. class Build(Struct): name = Required(String) bazel_targets = Required(List(String)) timeout_s = Default(Integer, 3600) class Kick(Struct): name = Required(String) package_dir = Required(String) dbxinit = Required(Program) host_kick_ration = Default(Float, 0.25) Service Configuration: Gestalt
  • 28. class Program(Struct): name = Required(String) num_procs = Default(Integer, 1) env = Map(String, String) cmd = Required(List(String)) limits = Limits # rss, max_fd, nice logfile = Default(String, DEFAULT_LOG_DIR) root_fs = String # docker os image health_check = HealthCheck Service Configuration: Gestalt
  • 29. •About 500 files and 60,000 SLOC •Complex evaluation rules •Configuration tends to become a Turing-complete language •Advanced linters and validation needed •Specifying resource limits is tricky Gestalt: Challenges
  • 31. YAPS Packages: Historical approach •Install Debian packages via Puppet/Chef •Use Python’s Virtualenv & PyPI •Encap — “Bag of rats” dependencies :) •Blast the whole repo via rsync every few minutes by CRON
  • 32. YAPS Packages: Current approach •SquashFS images. Native Linux in-kernel support •Transparent compression and de-duplication •Read-only mounts, +1 from security •Loopback device mounts are fast •SquashFS image has 1+ Bazel targets and transitive dependency closure for each target
  • 33. $ cd /srv/aws-tools $ tree -L 3 . |-- ec2terminate # <- executable file `-- ec2terminate.runfiles # <- transitive closure |-- MANIFEST # <- list of all files `-- __main__ # <- dependencies |-- _solib_k8 |-- configs |-- dbops |-- devsecrets |-- dpkg `-- dropbox ...
  • 34. YAPS Packages: Challenges •*.pyc files have to be in the package •Unmountable packages due to open file descriptors •If code has to be modified on the prod server (YOLO), special procedure — “Hijacking” is required •Full package has be pushed even with a 1 line change (Xdelta compression might help)
  • 36. Process Manager: Historical approach •Using Supervisord and configuration generated by Puppet •Update of Supervisord requires tasks to be restarted •Loosing tasks if Supervisord killed by OOM •Supervisord is really old, from 2004 (has XMLRPC?!)
  • 37. Process Manager: Current approach •Using Dbxinit: in-house project written in Go •Keeps local state, thus can be updated without tasks downtime, can survive OOM •Supports health-checks for tasks •Has resource limits: RSS, max fds, OOM score •Speaks JSON HTTP
  • 38. Configuration Management: Historical approach •Puppet 2.x in server mode •Perf problems with server as fleet grew in size •No linters or unit tests, caused a lot of errors •“Blast to the fleet” deployment model •Single global run via CRON, runs all modules - slow
  • 39. Configuration Management: Current approach •Chef 12.x in Zero mode •Invested heavily into linters and unit-testing •Easy to test on a single production machine •Has 3 runs: “platform”, “global” and “service” •Cookbooks deployed via YAPS •Generally trying to move service owners out of CM
  • 40. It wouldn’t be 2017 if not … containers!
  • 41. Containers: Runc for stateless services •Runc is integrated with Dbxinit, each task runs inside its own container •Runc uses minimal Ubuntu Docker image •Main use case is dependency isolation via mount namespaces •Doesn’t use network namespaces yet
  • 42. Containers: Challenges •Log rotation. Logs should be moved from the box ASAP, since machine with stateless service can be shut down without notice •Looking into ELK stack to solve that problem •Resource accounting. Currently doesn’t enforce any resource limits
  • 45. Ops Automation: Nagios & Naoru •Nagios runs on all production machines & AWS EC2 instances •Common problems are automatically fixed by auto- remediation system called “Naoru” •Its input is a stream of Nagios alerts and output is a set of remediations that can be executed automatically
  • 46.
  • 47. Talk Summary Building: •Unified build system with clean dependencies Deploying: •One deployment system and sound packaging Running: •Robust process management and automation of simple tasks