O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Scaling Marketplaces at Thumbtack

184 visualizações

Publicada em

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2HOCiRw.

Nate Kupp shares some of Thumbtack’s key learnings on their journey to scale: from a PHP/PostgreSQL monolith with a self-managed Hadoop cluster, to Dockerized microservices paired with managed/serverless data infrastructure, and their future with fully-managed systems. Filmed at qconsf.com.

Nate Kupp leads the Technical Infrastructure team at Thumbtack where he has worked on scaling both engineering teams and infrastructure to support Thumbtack's rapidly growing marketplaces. Before joining Thumbtack, he spent a few years at Apple, where he built out data infrastructure for iOS and watchOS battery life analytics.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Scaling Marketplaces at Thumbtack

  1. 1. Scaling Marketplaces at Thumbtack QCon SF 2017 Nate Kupp Technical Infrastructure Data Eng, Experimentation, Platform Infrastructure, Security, Dev Tools
  2. 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ thumbtack-scalability
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. Infrastructure from early beginnings
  5. 5. “You see that?” my dad would say, “That won’t last more than a few years. They’re going to have to rip the whole thing out within a decade”.
  6. 6. You have choices. A lot of them.
  7. 7. There’s no right answer. Sorry.
  8. 8. ME: HARDWARE SOFTWARE Semiconductor manufacturing research Apple iOS battery life Data Infrastructure @ Thumbtack Technical Infrastructure @ Thumbtack Data, core platform, A/B testing & dev tools Yup, those are working transistors
  9. 9. Growth Hacking with Team Philippines Customer Pro A Pro B Pro C I need a plumber! Is this a spam request? SPAM DETECTION Can we automate quoting? “QUOTING SERVICE” Which pros do we invite to quote? MATCHING
  10. 10. Hacky (especially in the early days) is fine, even necessary
  11. 11. Hacky is going to happen. With or without you.
  12. 12. “It has long been an axiom of mine that the little things are infinitely the most important.” - Sherlock Holmes “Small data” CSV files Upload Web UI Analysts “Big data” Hive/Impala
  13. 13. Change doesn’t happen in a vacuum. Give people a way to get things done while you build or migrate.
  14. 14. Late 2015 Starting to hit “real” scale Growth @ Thumbtack 2009 Thumbtack founded 2014 Raise $130mm Dec. 2014 I join Thumbtack Now Continued growth
  15. 15. Mid-2015 Early data infrastructure 2009 - 2014 The monolith 2015 Finally on AWS! Started using DynamoDB, Golang microservices The Evolution of Thumbtack’s Infrastructure Go servicesWebsite Website Sqoop ETL HDFS Event Logging
  16. 16. Today Mixed clouds, fully-containerized infrastructure, terraform The Evolution of Thumbtack’s Infrastructure Production Serving Infrastructure Data Infrastructure Application Layer (Docker on ECS) PHP, Go, Scala Storage Layer PostgreSQL, DynamoDB, Elasticsearch Processing Scala/Spark on Dataproc Storage GCS SQL BigQuery
  17. 17. pgbouncer Master + Hot standbys Production Serving Infrastructure ECS Go services services Website Latency Cluster Throughput Cluster ... Sqoop Cluster WAL-E Backups EBS + ZFS Sqoop ETL Application Layer Offline Data Processing DynamoDB ETL Indexing Pipelines Event Logging PG Logging Storage Layer
  18. 18. Application Layer Storage Cloud Storage Ingest Cloud Pub/Sub Analytics BigQuery Data Infrastructure Spark ETL Cloud Dataproc Airflow Thrift/JSON events Search Indexing Pipelines Pipeline Orchestration Full table Sqoop ETL Batch ETL Cloud Dataproc Job-scoped clusters on Dataproc 1.1 / Spark 2.0.2, Scala Data Retention Data retained forever on GCS / BigQueryCustom DynamoDB ETL Internal DynamoDB ETL pipelines, scales up/down read capacity Sqoop Cloud Dataproc Pipelines ML Pipelines
  19. 19. And it was great! - Costs comparable to monolithic cluster - Creating and destroying > 600 clusters per day - > 12,000 BigQuery queries per workday
  20. 20. But, getting there took some real work HDFS Storage Cloud Storage Analytics BigQuery Code Changes - Add retries everywhere (many 500/503 error codes) - Rewrite HDFS utilities - Change all Parquet to Avro - Educate team (100s of users) on new SQL dialect and web interface
  21. 21. And, still needed to build interfaces to managed services Ingest Cloud Pub/Sub Go services services Website Analytics BigQuery STREAMING PIPELINEDATA INGEST Spark Streaming Cloud Dataproc Spark Streaming Wrote custom Spark receiver for Cloud Pub/Sub, checkpointing WAL to local HDFS, no backpressure (yet) End-to-end Durability Built internal solutions; achieving end-to-end durability guarantees here is hard BigQuery Streaming APIs Streaming writes of O(100s of millions) of events / day into BigQuery tables
  22. 22. + Uptime: GCP uptime has actually been great - Unexpected BigQuery API Changes: Google occasionally makes production changes which break us. When these happen, we can’t really do anything until fixed by Google or a workaround is identified. “Here be dragons” April 2017 BigQuery Avro Ingest API Changes Previously, a field marked as required by the Avro schema could be loaded into a table with the field marked nullable; this started failing. October 2017 BigQuery Sharded Export Changes Noticed many hung Dataproc clusters. Team identified workaround to disable BQ sharded export by setting mapred.bq.input.sharded.export.enable to false.
  23. 23. Managed services make some things easier… But only some things. Go in with eyes wide open.
  24. 24. Our first managed service: DynamoDB Go services Website
  25. 25. DynamoDB table proliferation
  26. 26. GCP Migration & BigQuery Why? A few reasons... 1. Fully-managed, serverless, petabyte-scale data warehouse 2. Nested/complex data types 3. Security & access controls 4. Self-serve interface to Google Sheets
  27. 27. Surprise dependencies Exception: BigQuery job failed. Final error was: { 'reason': 'invalid', 'message':"Could not parse '.' as int for field category_id (position 0) starting at location 340 ", 'location': '/gdrive/home/Category taxonomy - master file' }
  28. 28. When you make it really easy to deploy infrastructure... ...you get A LOT of random infrastructure
  29. 29. What does that mean for Thumbtack & Serverless?
  30. 30. Hacky is going to happen. With or without you.
  31. 31. Empower the team while limiting the wild west.
  32. 32. QUESTIONS?
  33. 33. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ thumbtack-scalability