This document summarizes digital transformation with Microsoft Azure, including cloud computing, big data, and data lakes. It discusses data lake characteristics such as structured, semi-structured, and unstructured data. Data lakes are used for reporting, visualization, analytics, and machine learning. They provide a single store for raw and processed data ranging from raw copies of source systems to structured data for analytics. The document also briefly mentions Azure Data Lake Analytics, DataBricks, and concludes by thanking the reader.
10. DataLake
Method of Storing Data within a System or Repository
DataLakeCharacteristics
- Structured= Relational Databases [Rows & Columns]
- Semi-Structured= CSV, Logs, XML, JSON
- Unstructured = Emails, Documents, Binaries, Audio, Video, PDFs
- Open-Source = Apache Hadoop [HDFS]
- Microsoft Azure = Azure Data Lake Store [ADLS]
- AmazonAWS = Amazon S3
Usedfor
- Reporting
- Visualization
- Analytics
- Machine Learning
SingleStore of Data in Enterprise Ranging from Raw Data [Copy of SourceSystem]
CentralizedData Store for Enterprises
12. Structured, Processed Structured / Semi-Structured / Unstructured / Raw
Data Warehouse Data Lake
Schema-On-Read
Designed for Low-Cost Storage
High Agile, Configure & Reconfigure
Schema-On-Write
Expensive for Large Data Volumes
Less Agile, Fixed Configuration
BusinessProfessional
DataLake
DataScientists
14. Drill Inspired by Google'sDremel [Big Query]
https://research.google.com/pubs/pub36632.html
https://cloud.google.com/bigquery/
Schema-Free SQL Query Engine for Hadoop, NoSQL & Storage
Query Engine [ANSI-SQL] for Big Data [Raw] Exploration
For Analysts, Business Users, DataScientists & DataDevelopers
1. Self-Service Exploration
2. Data Agility
3. Interactive Query Response Time and Scale
Use Cases for ApacheDrill
1. Raw Data Exploration
2. Data Discovery
15. AzureData LakeAnalytics[ADLA]
On-DemandAnalytics Job Service
Start in Seconds, Scale Instantly and Pay per Job
Develop Massively Parallel Programs [MPP] with Simplicity
100 Hrs – R$ 332 | 500 Hrs – R$ 1.494 – by MonthlyCommitment Package
[HaaS] - Hadoop-as-a-Services
Store Destinations
16. DataBricks
The UnifiedAnalyticsPlatform
Unifying Data Science,Engineeringand Business
Accelerateperformance
with an optimized Spark platform
Increase productivity
through interactive data science
Streamline processes
from ETL to production
Reduce costand complexity
with a fully managed, cloud-native platform