O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive

71 visualizações

Publicada em

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.

This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive

  1. 1. #MDBLocal “To free the genius within everyone by making data stunningly easy to work with.”
  2. 2. #MDBLocal Welcome to the World of Atlas Data Lake
  3. 3. #MDBLocal Isabel Peters Senior Software Engineer, MongoDB Atlas Backup
  4. 4. #MDBLocal Why are we building this? “IDC predicts that by 2025 worldwide data will reach 175 Zettabytes and 49% of it will reside in the public cloud. “ VS
  5. 5. #MDBLocal Atlas Data Lake Technical Deep Dive 1. Design Goals and Requirements 2. Creating an Atlas Data Lake 3. Atlas Data Lake Architecture 4. Future improvements
  6. 6. Design Goals and Requirements
  7. 7. #MDBLocal Implementation Requirements
  8. 8. #MDBLocal MongoDB Wire Protocol Support Requirements 1) Look and act like MongoDB Solution Empty • Implement a TCP server in Go. • Used mongo-go-driver’s wireprotocol packagey • Used mongo-go-driver's bson package • Read only
  9. 9. #MDBLocal MongoDB Security Model Requirements 2) Access customer’s data securely. Solution Empty • Users configured in MongoDB Atlas • Same authentication and authorization • Configure buckets
  10. 10. #MDBLocal Scalable Processing Requirements 3) Handle long running queries over vast amounts of data using resources efficiently Solution Empty • Read-only commands • Use server’s aggregation engine • Distributed MQL processing • Intelligent file targeting
  11. 11. #MDBLocal Data Formats Requirements 4) Support a variety of data formats Solution Empty • Avro (gzipped) • Parquet • BSON/ JSON (gzipped) • CSV/TSV (gzipped)
  12. 12. #MDBLocal Atlas Data Lake Features Multiple data formats Scalable MongoDB Query Language Serverless On Demand Integrated with Atlas
  13. 13. Creating your Atlas Data Lake
  14. 14. Files in S3 bucket: ent-archive /archive/customers - a-m.json - n-z.json /archive/invoices - 2019 - 1.parquet - 2.parquet - 2018 - 1.parquet - 2017.json.gz - 2016.json.gz
  15. 15. #MDBLocal You control your data layout Stores Empty Databases Empty Collections Empty DataSources CollectionCollection Store Store Database DataSource DataSource DataSource
  16. 16. #MDBLocal Data Lake Configuration 1. Configure a new Data Lake in Atlas 2. Connect to your Data Lake 3. Configure your databases and collections 4. Query your Data Lake
  17. 17. Configuration: S3 Store s3: { name: "ent-archive", bucket: "ent-archive", region: ”us-east-1", prefix: "/archive/" }
  18. 18. Configuration: Databases & Collections history: { customers: [{ store: "ent-archive", definition: "/customers/*" }], invoices: [{ store: "ent-archive", definition: "/invoices/{year int}/*" }, { store: "ent-archive", definition: "/invoices/{year int}.json.gz" }] }
  19. 19. #MDBLocal Querying via MongoDB Atlas • Atlas users require readWriteAnyDatabase or readAnyDatabase roles. • Use MongoDB drivers/clients including the mongo shell and MongoDB Compass • Write queries in MongoDB Query Language (MQL)
  20. 20. Atlas Data Lake Architecture
  21. 21. #MDBLocal MQL à Distributed MQL Parse query Parallelize processing Distribute workload
  22. 22. #MDBLocal Atlas Data Lake Architecture Atlas Control Control Plane Compute Plane Data Plane DataLake Frontend DataLake Agent Load Balancer Load Balancer DataLake Frontend DataLake Agent Load Balancer Load Balancer DataLake Frontend DataLake Agent Load Balancer Load Balancer
  23. 23. #MDBLocal Architecture Atlas Control Control Plane Compute Plane Data Plane DataLake Frontend DataLake Agent DataLake Agent DataLake Agent DataLake Agent DataLake Agent DataLake Agent DataLake Agent DataLake Agent DataLake Agent
  24. 24. { $match: { year: { $gt: 2000 } } } { $limit: 10 } Query Example: $limit Map: { $match: { year: { $gt: 2000 } } } { $limit: 10 } Reduce: { $limit: 10 }
  25. 25. { $group: { _id: "$year", totalAvg: { $avg: "amount" } } } Query Example: $group Map: { $group: { _id: "$year", totalAvg_sum: { $sum: "$amount" }, totalAvg_count: { $sum: 1 } } } Reduce: { $group: { _id: "$_id", totalAvg_sum: { $sum: "$totalAvg_sum" }, totalAvg_count: { $sum: "$totalAvg_count" } } } Finalize: { $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
  26. 26. Future improvements
  27. 27. #MDBLocal On the roadmap … MongoDB Operators $out $merge $graphLookup Performance • Aggregation • Indexes • Statistics over data File FormatsIntegrations
  28. 28. Summary
  29. 29. #MDBLocal Atlas Data Lake is the best way to: Access long-term data in multiple formats Query long-term data using MQL Analyse long-term data on demand
  30. 30. #MDBLocal Give it a try - Create your own Atlas Data Lake!
  31. 31. THANK YOU

×