O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 66 Anúncio

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously

Baixar para ler offline

(Henri Cai, Pinterest) Kafka Summit SF 2018

With the rise of large-scale real-time computation, there is a growing need to link legacy MySQL systems with real-time platforms. Pinterest has a hundred billion pins stored in MySQL at the scale of a 100TB and most of this data is needed for building data-driven products for machine learning and data analytics.

This talk discusses how Pinterest designed and built a continuous database (DB) ingestion system for moving MySQL data into near-real-time computation pipelines with only 15 minutes of latency to support our dynamic personalized recommendations and search indices. Pinterest helps people discover and do things that they love. We have billions of core objects (pins/boards/users) stored in MySQL at the scale of 100TB. All this data needs to be ingested onto S3/Hadoop for machine learning and data analytics. As Pinterest is moving towards real-time computation, we are facing a stringent service-level agreement requirement such as making the MySQL data available on S3/Hadoop within 15 minutes, and serving the DB data incrementally in stream processing. We designed WaterMill: a continuous DB ingestion system to listen for MySQL binlog changes, publish the MySQL changelogs as an Apache Kafka® change stream and ingest and compact the stream into Parquet columnar tables in S3/Hadoop within 15 minutes.

We would like to share how we solved the problem of:
-Scalable data partitioning, efficient compaction algorithm
-Stories on schema migration, rewind and recovery
-PII (personally identifiable information) processing
-Columnar storage for efficient incremental query
-How the DB change stream powers other use cases such as cache invalidation in multi-datacenter
-How we deal with the issue of S3 eventual consistency and rate limiting; related technologies: Apache Kafka, stream processing, MySQL binlog processing, Amazon S3, Hadoop and Parquet columnar storage

(Henri Cai, Pinterest) Kafka Summit SF 2018

With the rise of large-scale real-time computation, there is a growing need to link legacy MySQL systems with real-time platforms. Pinterest has a hundred billion pins stored in MySQL at the scale of a 100TB and most of this data is needed for building data-driven products for machine learning and data analytics.

This talk discusses how Pinterest designed and built a continuous database (DB) ingestion system for moving MySQL data into near-real-time computation pipelines with only 15 minutes of latency to support our dynamic personalized recommendations and search indices. Pinterest helps people discover and do things that they love. We have billions of core objects (pins/boards/users) stored in MySQL at the scale of 100TB. All this data needs to be ingested onto S3/Hadoop for machine learning and data analytics. As Pinterest is moving towards real-time computation, we are facing a stringent service-level agreement requirement such as making the MySQL data available on S3/Hadoop within 15 minutes, and serving the DB data incrementally in stream processing. We designed WaterMill: a continuous DB ingestion system to listen for MySQL binlog changes, publish the MySQL changelogs as an Apache Kafka® change stream and ingest and compact the stream into Parquet columnar tables in S3/Hadoop within 15 minutes.

We would like to share how we solved the problem of:
-Scalable data partitioning, efficient compaction algorithm
-Stories on schema migration, rewind and recovery
-PII (personally identifiable information) processing
-Columnar storage for efficient incremental query
-How the DB change stream powers other use cases such as cache invalidation in multi-datacenter
-How we deal with the issue of S3 eventual consistency and rate limiting; related technologies: Apache Kafka, stream processing, MySQL binlog processing, Amazon S3, Hadoop and Parquet columnar storage

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously (20)

Anúncio

Mais de confluent (20)

Mais recentes (20)

Anúncio

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously

  1. 1. Movingthe needleofthePin: Oct, 2018Henry Cai www.linkedin.com/in/hecai Streaming100TBofpinsfrom MySQLtoS3/Hadoop continuously@Pinterest
  2. 2. Pinterestisthe visualdiscovery engine.Mission Helppeoplediscoveranddowhattheylove. 
  3. 3. >250M 80% 75% ofsignupsare fromoutside
 theU.S. ofPinnersuse Pinterestfrom mobile 100B Pinsand
 2BBoards monthly activeusers
  4. 4. Data-driven products • Personalized recommendation • SpamControl • SearchQuality • A/BExperiments • RelatedPins • …
  5. 5. DataPipeline stats • >1PBdata/day • >10Mmessages/second • >800Bmessages/day • >2,000kafkabrokers • >50,000clienthosts
  6. 6. Dataingestion types • Onlinelogging • Databasesnapshots
  7. 7. 2016 pipeline
  8. 8. Dataingestion@Pinterest 2016 Pinterest Services Singer Kafka
  9. 9. Dataingestion@Pinterest 2016 Pinterest Services Singer Kafka events
  10. 10. Dataingestion@Pinterest 2016 Pinterest Services Singer Kafka events Real-time 
 consumers Merced Tracker
  11. 11. Dataingestion@Pinterest 2016 Pinterest Services Singer Kafka events Real-time 
 consumers Databases Merced Tracker
  12. 12. Dataingestion@Pinterest 2016 Pinterest Services Singer Kafka events Real-time 
 consumers Databases Merced Tracker
  13. 13. Dataingestion@Pinterest 2016 Pinterest Services Singer Kafka events Real-time 
 consumers Databases Logical backup Merced Tracker
  14. 14. DBingestion@Pinterest Version1 DatabasesShard1 Slave Shard1 DrSlave Shard1 Master Mysqldump Hadoop Streaming Mapper1 Shard2 Slave Shard2 DrSlave Shard2 Master Mysqldump Hadoop Streaming Mapper2
  15. 15. DBingestion@Pinterest Version2 Databases logical csv 
 backup Tracker Version1 Shard1 Slave Shard1 DrSlave Shard1 Master Mysqldump Hadoop Streaming Mapper1 Shard2 Slave Shard2 DrSlave Shard2 Master Mysqldump Hadoop Streaming Mapper2
  16. 16. Painpoints Constraints • Reliabilitycausedbymysqlhostshiccup • Pullingover100TBdatadailybutonlyafewTB changedeveryday • Longlatency>24hour Future:DBChangeStreams • Trulycapturesdbtransactions • Across-regioncacheinvalidation • Realtimesearchindexbuilding • RealtimeRecommendationEngine
  17. 17. The
 newpipeline
  18. 18. Dataingestion@Pinterest now Pinterest Services Singer Kafka events
  19. 19. Dataingestion@Pinterest now Pinterest Services Singer Kafka events Databases DB/Kafka Bridge
  20. 20. Dataingestion@Pinterest now Pinterest Services Singer Kafka events Databases DB/Kafka Bridge Merced
  21. 21. Dataingestion@Pinterest now Pinterest Services Singer Kafka events Databases DB/Kafka Bridge Merced Watermill
  22. 22. Dataingestion@Pinterest now Pinterest Services Singer Kafka events Real-time 
 consumers Databases DB/Kafka Bridge Merced Watermill
  23. 23. DB/KafkaBridge(Maxwell) Pinterest Services Singer Kafka events Real-time 
 consumers Databases Merced Watermill DB/Kafka Bridge
  24. 24. DB/KafkaBridge Replica-SetNode Maxwell_position Maxwell_schema MySQL Processes and Schemas Maxwell Tables Binlog File Shard1 Shard2 Shard3 User Tables
  25. 25. DB/KafkaBridge Replica-SetNode Maxwell_position Maxwell_schema MySQL Processes and Schemas Maxwell Tables MySQL Processes (Co-located with MySQL Process) Binlog File Shard1 Shard2 Shard3 User Tables Kafka User Topic Kafka Pin Topic BinLog Tailer Thread InMemory Queue Async Kafka Producer Thread • BasedonMaxwell/Binlog-Connector • AddGTIDsupport • Addhandlingforretry/out-of-ordermessages • Co-locatewithmysql • Listensonmaster/slave
  26. 26. DB/Kafka Bridge Watermillcompaction Pinterest Services Singer Kafka events Real-time 
 consumers Databases Merced Watermill
  27. 27. Compaction ForOneShard • HashJoinbetweensnapshotanddelta • Deltaloadedinmemoryfirstassidelookup • Basesnapshotwaspipedthroughthemappernodeand compareagainstlookuptable - Lookupfail,snapshotrecordemittooutput - Lookupsucceed,butsnapshotrecordold,skipthe snapshot - Lookupsucceed,butsnapshotrecordnewer,remove lookuprecord • Attheend,appendtheremaininglookuprecordstooutput Delta Shard 1 Old Snapshot 
 Shard 1 Compactor New Snapshot 
 Shard 1
  28. 28. IncrementalDBingestionsequence MySQL Maxwell Kafka
  29. 29. IncrementalDBingestionsequence MySQL Maxwell Merced Delta Kafka
  30. 30. IncrementalDBingestionsequence MySQL Maxwell Merced Periodic Compaction Snapshot1 Delta Snapshot2 Kafka
  31. 31. IncrementalDBingestionsequence MySQL Tracker Batch Backup Backup Snapshot Maxwell Merced Periodic Compaction Snapshot1 Delta Snapshot2 Bootstrapper Kafka
  32. 32. IncrementalDBingestionsequence MySQL Tracker Batch Backup Backup Snapshot Maxwell Merced Periodic Compaction Periodic FileGC Snapshot1 Delta Snapshot2 Differ Bootstrapper Kafka
  33. 33. IncrementalDBingestionsequence MySQL Tracker Batch Backup Maxwell Merced Periodic Compaction Periodic FileGC SELECT FROM rt_users Snapshot1 Delta Snapshot2 Custom Input
 Format Differ Bootstrapper Backup Snapshot Kafka
  34. 34. DataLifecycleandTimelineManagement DailyDump 11:30 Bootstrap Snapshot 11:55 1 1 : 3 0 1 1 : 5 5 Timeline
  35. 35. DataLifecycleandTimelineManagement Merced Delta 12:01 DailyDump 11:30 Bootstrap Snapshot 11:55 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 Kafka Timeline
  36. 36. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM DailyDump 11:30 Bootstrap Snapshot 11:55 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 Kafka Timeline
  37. 37. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM 12:15 Select DailyDump 11:30 Bootstrap Snapshot 11:55 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 Kafka Timeline
  38. 38. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM 12:15 Select DailyDump 11:30 Bootstrap Snapshot 11:55 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 Kafka Timeline ProcessedUpTo CurrentSnapshot
  39. 39. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM 12:15 Select DailyDump 11:30 Bootstrap Snapshot 11:55 DailyDump 11:45 Bootstrap Snapshot 12:20 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 1 1 : 4 5 1 2 : 2 0 Kafka Timeline
  40. 40. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM 12:15 Select DailyDump 11:30 Bootstrap Snapshot 11:55 DailyDump 11:45 Bootstrap Snapshot 12:20 12:25 Select 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 1 1 : 4 5 1 2 : 2 0 Kafka Timeline
  41. 41. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM 12:15 Select DailyDump 11:30 Bootstrap Snapshot 11:55 DailyDump 11:45 Bootstrap Snapshot 12:20 12:25 Select 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 1 1 : 4 5 1 2 : 2 0 Kafka Timeline CurrentSnapshot ProcessedUpto
  42. 42. DataLifecycleandTimelineManagement 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 1 1 : 4 5 1 2 : 2 0 Timeline Merced CompactionDelta 12:01 Snapshot 12:10AM 12:15 Select DailyDump 11:30 Bootstrap Snapshot 11:55 DailyDump 11:45 Bootstrap Snapshot 12:20 12:25 Select Kafka ProcessedUpto … …NextCompaction……
  43. 43. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM DailyDump 11:30 Bootstrap Snapshot 11:55 DailyDump 11:45 Bootstrap Snapshot 12:20 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 1 1 : 4 5 1 2 : 2 0 Kafka Timeline CurrentSnapshot Periodic GC
  44. 44. DataLifecycleandTimelineManagement Merced CompactionDelta 12:01 Snapshot 12:10AM DailyDump 11:30 Bootstrap Snapshot 11:55 DailyDump 11:45 Bootstrap Snapshot 12:20 1 1 : 3 0 1 1 : 5 5 1 2 : 0 1 1 2 : 1 0 Kafka Timeline PossibleRewind Periodic GC
  45. 45. Consistency • MySQLMaster/SlaveFailover,ShardMigration • MySQLTransactions: • Splitbetweentables,splitbetweenKafkamessages • Ordering • BetweenINSERTandUPDATE • BetweenUPDATEandDELETE • SoftDELETEvs.HardDELETE • Consistencybetweenmultiplebootstrapand incrementalstreams • DuplicateRecords
  46. 46. Scalability • Partitioning • ShardedMySQL - Shardbaseddbsnapshotanddeltafiles - Twolevelsharinginthecasethatoriginalshardsarenot balanced • UnShardeddataset - Usehash+modtopartitionthedataonbothsnapshot anddeltafile • Filefilteringusingpredicatepushdown: • Onshard/partitionlevel • OnS3directory,fileandrecordlevel 10X
  47. 47. KafkaNuances • MessageOrdering: • Asyncproducerbutstillneedtomaintainmessageorder • MaintainorderbetweenS3fileandwithinS3file • At-least-oncedelivery • Duplicatemessages • MySQLGTIDnotalwaysincreasing • DealwithKafkaclusterhiccup: • produceracks=2 • cleanleaderelection
  48. 48. S3Nuances ● Eventual Consistency ● Read-after-write is OK, but not PUT followed by LIST ● Directory listing is slow ● Shorter SLA —> More smaller files ● In early iterations, directly listing >> file content reading ● Rate Limit: ● Launching thousands of mappers would quickly hit S3 rate limit
  49. 49. PIIProcessing • username,emailaddressetcneedstobe filteredout • ipaddressneedstobefilteredout john.doe@abc.com Justin Bieber 192.168.0.1
  50. 50. PIIProcessing • username,emailaddressetcneedstobe filteredout • ipaddressneedstobefilteredout john.doe@abc.com Justin Bieber 192.168.0.1 ColumnarLayout andIncremental Processing • Useparquetformattosupportfastquerieson subsetofcolumns • ingest_timeasnewcolumntogetthe incrementalresultsincethelastprocessing;
  51. 51. Operation
  52. 52. Bootstrap,synchronize&rewind MySQL Tracker Batch Backup Backup Snapshot Maxwell Merced Periodic Compaction Snapshot1 Delta Snapshot2 Bootstrapper Kafka
  53. 53. Bootstrap,synchronize&rewind(cont) • Wehavetheabilitytosynchronizeandrewind • Incaseofsoftwarebugsornetworkglitches • Snapshot(s)ontoBootstraptosynchronize • AbilitytorewindviatheSnapshots/Bootstrapmechanism MySQL Tracker Batch Backup Backup Snapshot Maxwell Merced Periodic Compaction Snapshot1 Delta Snapshot2 Bootstrapper Kafka
  54. 54. Schema Management andSchema Change • SchemaisUsedfor • Identifytheprimarykeyoftherow • Drivetheparquetfilegeneration • DealingWithSchemachange • Willissueanewbootstraponofflinetableschema • Compactionwillstillusethesnapshotschema
 (whichmightbeold) ID C1 C2 123 …. … 124 … …. 125 …. … 126 … …. dbname.table_name new_column …. … …. …
  55. 55. Validation • Validation • CreatingcompactionbasedonfromandtoGTIDrange • Compactionoutputvsbatchbackupoutput • Monitoring • Error,failure,stall • Latencyoncompaction Backup Snapshot Periodic Compaction Snapshot1 Snapshot2 Differ Bootstrapper
  56. 56. Summary
  57. 57. Comparison
 toother technologies • UberHudi(Hoodie) • NotsupportingS3,OnlysupportJava8+,Avro
  58. 58. Comparison
 toother technologies • UberHudi(Hoodie) • NotsupportingS3,OnlysupportJava8+,Avro • KafkaConnect • Onlyingestion,nocompacting,synchronizebetween bootstrap/incremental
  59. 59. Comparison
 toother technologies • UberHudi(Hoodie) • NotsupportingS3,OnlysupportJava8+,Avro • KafkaConnect/Debezium • Onlyingestion,nocompacting,synchronizebetween bootstrap/incremental • ApacheSqoop • BasedonBatchMode
  60. 60. Takeaway • Scalability • support100TBofdatabasedata • E2Elatencyof15minutes • Reliability • Strongdatabaseconsistencyonglobaltransactions, messageordering,duplicatemessagehandling • ValidationandMonitoring • Operability • Bootstrap,re-synchronize • Schemamanagement
  61. 61. Futurework • AdoptingKafkaExact-OnceProcessing Model • Kafkaasthedatabasechangestream • Cacheinvalidationacrossdatacenters • BuildingMaterializedViewsforMySQL • GeneratingIncrementalRecommendationSignals • OpenSource
  62. 62. Acknowledgement • Jointworkfrommany engineering,including YuYang,ChunyanWang, IndyPrentice,Shawn Nguyen,YinianQi, and manyothers
  63. 63. Thanks!
  64. 64. © Copyright, All Rights Reserved, Pinterest Inc. 2018

×