Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on Azure. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
24. Cosmos
Microsoft’s internal data lake
• A data lake for all teams @Microsoft
• Tools approachable by any developer
• Batch, Interactive, Streaming, ML
• Used across Office, Xbox, Azure,
Windows, Ads, Bing, Skype, …
By the numbers
• Exabytes of data
• 100Ks of Physical Servers
• Millions of Interactive Queries
• Huge Streaming Pipelines
• 100Ks of Batch Jobs
• 10K+ Developers
Microsoft’s Big Data Service
Azure Data Lake
A data lake for everyone
• The next version of Cosmos
• Fully aligned with Hadoop ecosystem
and standards, with full support for
Hadoop tools and engines as well as
unique Microsoft capabilities
• Migration from Cosmos to ADL is
already underway
• External customers on the same
service as internal customers
25. Ingest all data
regardless of requirements
Store all data
in native format without
schema definition
Do analysis
Using analytic engines
like Hadoop
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
26. Azure Data Lake Overview
Windows Azure Blob Storage
Spark
Map-
Reduce
Impala
Cloudera
Azure Key
Vault
Azure
Active Dir
Azure Data Lake Store – in-cluster services
U-SQL
ADL Analytics
…
Ingestion Service
ADLS Gateway Service
Cosmos API HDFS++ API
HDFS++ API
Scope
YARN
ADLS Micro
Services
ADL local tier
Azure VMs
Azure remote storage tier
27. ADLS Gen 2
• Preview announced June 2018
• Allows all storage regions to have HDFS API
• Soon available for Cloudera implementations