SlideShare uma empresa Scribd logo
1 de 52
Thank you to our sponsors!
Gold Sponsors
Silver Sponsors
Community Sponsors
An intro to
Azure Data Lake
Rick van den Bosch
M +31 (0)6 52 34 89 30
r.van.den.bosch@betabit.nl
Calendar
Data Lakes
About Azure Data Lake
Azure Data Lake Store
- DEMO
Azure Data Lake HDInsights
- DEMO
Azure Data Lake Analytics
- DEMO
Power BI
- DEMO
Resources
Rick van den Bosch
Cloud Solutions Architect
@rickvdbosch
rickvandenbosch.net
r.van.den.bosch@betabit.nl
Data Lakes
The Traditional Data Warehouse
6
Data sourcesNon-relational data
Ingest all data
regardless of
requirements
Store all data
in native format
without schema
definition
Do analysis
Hadoop, Spark, R,
Azure Data Lake
Analytics (ADLA)
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
Designed for the questions you don’t yet know!
The Data Lake Approach
About Azure Data Lake
8
Azure Data Lake
• Store and analyze petabyte-size files and trillions of
objects
• Develop massively parallel programs with simplicity
• Debug and optimize your big data programs with ease
• Enterprise-grade security, auditing, and support
• Start in seconds, scale instantly, pay per job
• Built on YARN, designed for the cloud
9
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics
Open Source Apache
Hadoop ADL Client
Azure DataBricks
HDInsight
Hive
• Performance at scale
• Optimized for analytics
• Multiple analytics engines
• Single repository sharing
Why Azure Data Lake?
an on-demand, real-time stream processing service with no-limits data lake built to support
massively parallel analytics
Azure Data Lake Store
Store
• Enterprise-wide hyper-scale repository
• Data of any size, type and ingestion speed
• Operational and exploratory analytics
• WebHDFS-compatible API
• Specifically designed to enable analytics
• Tuned for (data analytics scenario) performance
• Out of the box:
security, manageability, scalability, reliability, and
availability
15
Store
Architected and built for very high throughput at scale for
Big Data workloads
- No limits to file size, account size or number of files
Single-repository for sharing
- Cloud-scale distributed filesystem with file/folder
ACLS and RBAC
- Encryption-at-rest by default with Azure Key Vault
- Authenticated access with Azure Active Directory
integration
The Big Data platform for Microsoft
16
Key capabilities
Built for Hadoop
Unlimited storage, petabyte files
Performance-tuned for big data analytics
Enterprise-ready: Highly-available and secure
All data
Security
Authentication
• Azure Active Directory integration
• Oauth 2.0 support for REST interface
Access control
• Supports POSIX-style permissions (exposed by
WebHDFS)
• ACLs on root, subfolders and individual files
Encryption
18
Compatibility
19
Store
20
HDFS Compatible REST API
ADL Store
DEMO - Store
21
Ingest data – Ad hoc
Local computer
• Azure Portal
• Azure PowerShell
• Azure CLI
• Using Data Lake Tools for Visual Studio
Azure Storage Blob
• Azure Data Factory
• AdlCopy tool
• DistCp running on HDInsight cluster
22
Ingest data
Streamed
• Azure Stream Analytics
• Azure HDInsight Storm
• EventProcessorHost
Relational
• Apache Sqoop
• Azure Data Factory
23
Web server
Upload using custom applications
• Azure CLI
• Azure PowerShell
• Azure Data Lake Storage Gen1 .NET SDK
• Azure Data Factory
Ingest data
24
Process data
25
Download data
26
Visualize data
27
ADLS Gen 2
Takes core capabilities from Azure Data Lake Storage Gen1 such as
- a Hadoop compatible file system
- Azure Active Directory
- POSIX based ACLs
and integrates them into Azure Blob Storage
28
Additional benefits
Unlimited scale and performance
Performance improvements reading/writing individual objects (> throughput & concurrency)
Removes need to decide a priority: run analytics or not at data ingestion time
Data protection capabilities: encryption at rest
Integrated network Firewall capabilities
Durability options (Zone and Geo-Redundant Storage: high-availability and disaster recovery)
Linux integration – BlobFUSE
- mount Blob Storage from Linux VMs
- interact using standard Linux shell commands.
29
Data Lake Storage Gen2
“In Data Lake Storage Gen2, all
the qualities of object storage
remain while adding the
advantages of a file system
interface optimized for analytics
workloads.”
30
Known issues
Blob Storage APIs and Azure Data Lake Gen2 APIs aren't interoperable
Blob storage APIs not available
Azure Storage Explorer >= 1.6.0
AZCopy >= v10
Event Grid doesn't receive events
Soft Delete and Snapshots not available
Object level storage tiers not available
Diagnostic logs not available
31
Azure Data Lake HDInsight
HDInsight
Cloud distribution of the (Hortonworks) Hadoop
components
Supports multiple Hadoop cluster versions (can be
deployed any time)
Hadoop
• YARN for job scheduling & resource management
• MapReduce for parallel processing
• HDFS
33
HDInsight
35
Open Source Apache Hadoop ADL
Client
Azure DataBricks
HDInsight
Hive
DEMO - HDInsight
36
Azure Data Lake Analytics
Analytics
Dynamic scaling
Develop faster, debug and optimize smarter using familiar
tools
U-SQL: simple and familiar, powerful, and extensible
Integrates seamlessly with your IT investments
Affordable and cost effective
Works with all your Azure data
38
Analytics
On-demand analytics job service to simplify big data
analytics
Can handle jobs of any scale instantly
Azure Active Directory integration
U-SQL
39
Azure Data Lake Analytics
40
Analytics
Storage
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics• Serverless. Pay per job. Starts in
seconds. Scales instantly.
• Develop massively parallel programs
with simplicity
• Federated query from multiple data
sources
U-SQL
Language that combines declarative SQL with imperative C#
41
U-SQL – Key concepts
Rowset variables
• Each query expression that produces a rowset can be
assigned to a variable.
EXTRACT
• Reads data from a file & defines the schema on read *
OUTPUT
• Writes data from a rowset to a file *
42
U-SQL – Scalar variables
DECLARE @in string = "/Samples/Data/SearchLog.tsv";
DECLARE @out string = "/output/SearchLog-scalar-variables.csv";
@searchlog =
EXTRACT UserId int,
ClickedUrls string
FROM @in
USING Extractors.Tsv();
OUTPUT @searchlog
TO @out
USING Outputters.Csv();
43
U-SQL – Transform rowsets
@searchlog =
EXTRACT UserId int,
Region string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@rs1 =
SELECT UserId, Region
FROM @searchlog
WHERE Region == "en-gb";
OUTPUT @rs1
TO "/output/SearchLog-transform-rowsets.csv"
USING Outputters.Csv();
44
U-SQL – Extractor parameters
delimiter
encoding
escapeCharacter
nullEscape
quoting
rowDelimiter
silent
skipFirstNRows
charFormat
U-SQL – Outputter parameters
delimiter
dateTimeFormat
encoding
escapeCharacter
nullEscape
quoting
rowDelimeter
charFormat
outputHeader
U-SQL
Built-in extractors and outputters:
Text
Csv
Tsv
A (for instance) CSV Extractor or Outputter is
EXACTLY THAT
Data sources
Options in the Azure Portal:
• Data Lake Storage Gen1
• Azure Storage
DEMO - Analytics
49
DEMO - Power BI
Resources
Resources
Basic example
Advanced example
Create Database (U-SQL) & Create Data Source (U-SQL)
This example
HDInsight quickstart
Azure blog
Azure roadmap
Bedankt voor je aandacht
Track 1
15:35 – 16:20
Skynet Is Talking - Microsoft Bot Framework
Kris van der Mast
Track 2
15:35 – 16:20
Enter The Matrix: Securing Azure's Assets
Mike Martin

Mais conteúdo relacionado

Mais procurados

Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceChristopher Foot
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionAzure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionDenys Chamberland
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics SuiteJames Serra
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 

Mais procurados (20)

Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionAzure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 

Semelhante a Azure Lowlands: An intro to Azure Data Lake

Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxssuser290967
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQLMichael Rys
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platformgiventocode
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 

Semelhante a Azure Lowlands: An intro to Azure Data Lake (20)

Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptx
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
Solucion de BI en Azure
Solucion de BI en AzureSolucion de BI en Azure
Solucion de BI en Azure
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 

Mais de Rick van den Bosch

Configuration in azure done right
Configuration in azure done rightConfiguration in azure done right
Configuration in azure done rightRick van den Bosch
 
Getting started with Azure Cognitive services
Getting started with Azure Cognitive servicesGetting started with Azure Cognitive services
Getting started with Azure Cognitive servicesRick van den Bosch
 
From .NET Core 3, all the rest will be legacy
From .NET Core 3, all the rest will be legacyFrom .NET Core 3, all the rest will be legacy
From .NET Core 3, all the rest will be legacyRick van den Bosch
 
Getting sh*t done with Azure Functions (on AKS!)
Getting sh*t done with Azure Functions (on AKS!)Getting sh*t done with Azure Functions (on AKS!)
Getting sh*t done with Azure Functions (on AKS!)Rick van den Bosch
 
SAFwAD @ Intelligent Cloud Conference
SAFwAD @ Intelligent Cloud ConferenceSAFwAD @ Intelligent Cloud Conference
SAFwAD @ Intelligent Cloud ConferenceRick van den Bosch
 
Securing an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active DirectorySecuring an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active DirectoryRick van den Bosch
 
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event GridTechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event GridRick van den Bosch
 
.Net Core - not your daddy's dotnet
.Net Core - not your daddy's dotnet.Net Core - not your daddy's dotnet
.Net Core - not your daddy's dotnetRick van den Bosch
 
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”Rick van den Bosch
 
Take control of your deployments with Release Management
Take control of your deployments with Release ManagementTake control of your deployments with Release Management
Take control of your deployments with Release ManagementRick van den Bosch
 

Mais de Rick van den Bosch (11)

Configuration in azure done right
Configuration in azure done rightConfiguration in azure done right
Configuration in azure done right
 
Getting started with Azure Cognitive services
Getting started with Azure Cognitive servicesGetting started with Azure Cognitive services
Getting started with Azure Cognitive services
 
From .NET Core 3, all the rest will be legacy
From .NET Core 3, all the rest will be legacyFrom .NET Core 3, all the rest will be legacy
From .NET Core 3, all the rest will be legacy
 
Getting sh*t done with Azure Functions (on AKS!)
Getting sh*t done with Azure Functions (on AKS!)Getting sh*t done with Azure Functions (on AKS!)
Getting sh*t done with Azure Functions (on AKS!)
 
SAFwAD @ Intelligent Cloud Conference
SAFwAD @ Intelligent Cloud ConferenceSAFwAD @ Intelligent Cloud Conference
SAFwAD @ Intelligent Cloud Conference
 
Securing an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active DirectorySecuring an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active Directory
 
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event GridTechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid
 
Dude, Where's my Server?
Dude, Where's my Server?Dude, Where's my Server?
Dude, Where's my Server?
 
.Net Core - not your daddy's dotnet
.Net Core - not your daddy's dotnet.Net Core - not your daddy's dotnet
.Net Core - not your daddy's dotnet
 
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”
 
Take control of your deployments with Release Management
Take control of your deployments with Release ManagementTake control of your deployments with Release Management
Take control of your deployments with Release Management
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Azure Lowlands: An intro to Azure Data Lake

  • 1. Thank you to our sponsors! Gold Sponsors Silver Sponsors Community Sponsors
  • 2. An intro to Azure Data Lake Rick van den Bosch M +31 (0)6 52 34 89 30 r.van.den.bosch@betabit.nl
  • 3. Calendar Data Lakes About Azure Data Lake Azure Data Lake Store - DEMO Azure Data Lake HDInsights - DEMO Azure Data Lake Analytics - DEMO Power BI - DEMO Resources
  • 4. Rick van den Bosch Cloud Solutions Architect @rickvdbosch rickvandenbosch.net r.van.den.bosch@betabit.nl
  • 6. The Traditional Data Warehouse 6 Data sourcesNon-relational data
  • 7. Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Hadoop, Spark, R, Azure Data Lake Analytics (ADLA) Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices Designed for the questions you don’t yet know! The Data Lake Approach
  • 9. Azure Data Lake • Store and analyze petabyte-size files and trillions of objects • Develop massively parallel programs with simplicity • Debug and optimize your big data programs with ease • Enterprise-grade security, auditing, and support • Start in seconds, scale instantly, pay per job • Built on YARN, designed for the cloud 9
  • 10.
  • 11. HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics Open Source Apache Hadoop ADL Client Azure DataBricks HDInsight Hive • Performance at scale • Optimized for analytics • Multiple analytics engines • Single repository sharing Why Azure Data Lake? an on-demand, real-time stream processing service with no-limits data lake built to support massively parallel analytics
  • 13. Store • Enterprise-wide hyper-scale repository • Data of any size, type and ingestion speed • Operational and exploratory analytics • WebHDFS-compatible API • Specifically designed to enable analytics • Tuned for (data analytics scenario) performance • Out of the box: security, manageability, scalability, reliability, and availability 15
  • 14. Store Architected and built for very high throughput at scale for Big Data workloads - No limits to file size, account size or number of files Single-repository for sharing - Cloud-scale distributed filesystem with file/folder ACLS and RBAC - Encryption-at-rest by default with Azure Key Vault - Authenticated access with Azure Active Directory integration The Big Data platform for Microsoft 16
  • 15. Key capabilities Built for Hadoop Unlimited storage, petabyte files Performance-tuned for big data analytics Enterprise-ready: Highly-available and secure All data
  • 16. Security Authentication • Azure Active Directory integration • Oauth 2.0 support for REST interface Access control • Supports POSIX-style permissions (exposed by WebHDFS) • ACLs on root, subfolders and individual files Encryption 18
  • 20. Ingest data – Ad hoc Local computer • Azure Portal • Azure PowerShell • Azure CLI • Using Data Lake Tools for Visual Studio Azure Storage Blob • Azure Data Factory • AdlCopy tool • DistCp running on HDInsight cluster 22
  • 21. Ingest data Streamed • Azure Stream Analytics • Azure HDInsight Storm • EventProcessorHost Relational • Apache Sqoop • Azure Data Factory 23 Web server Upload using custom applications • Azure CLI • Azure PowerShell • Azure Data Lake Storage Gen1 .NET SDK • Azure Data Factory
  • 26. ADLS Gen 2 Takes core capabilities from Azure Data Lake Storage Gen1 such as - a Hadoop compatible file system - Azure Active Directory - POSIX based ACLs and integrates them into Azure Blob Storage 28
  • 27. Additional benefits Unlimited scale and performance Performance improvements reading/writing individual objects (> throughput & concurrency) Removes need to decide a priority: run analytics or not at data ingestion time Data protection capabilities: encryption at rest Integrated network Firewall capabilities Durability options (Zone and Geo-Redundant Storage: high-availability and disaster recovery) Linux integration – BlobFUSE - mount Blob Storage from Linux VMs - interact using standard Linux shell commands. 29
  • 28. Data Lake Storage Gen2 “In Data Lake Storage Gen2, all the qualities of object storage remain while adding the advantages of a file system interface optimized for analytics workloads.” 30
  • 29. Known issues Blob Storage APIs and Azure Data Lake Gen2 APIs aren't interoperable Blob storage APIs not available Azure Storage Explorer >= 1.6.0 AZCopy >= v10 Event Grid doesn't receive events Soft Delete and Snapshots not available Object level storage tiers not available Diagnostic logs not available 31
  • 30. Azure Data Lake HDInsight
  • 31. HDInsight Cloud distribution of the (Hortonworks) Hadoop components Supports multiple Hadoop cluster versions (can be deployed any time) Hadoop • YARN for job scheduling & resource management • MapReduce for parallel processing • HDFS 33
  • 32.
  • 33. HDInsight 35 Open Source Apache Hadoop ADL Client Azure DataBricks HDInsight Hive
  • 35. Azure Data Lake Analytics
  • 36. Analytics Dynamic scaling Develop faster, debug and optimize smarter using familiar tools U-SQL: simple and familiar, powerful, and extensible Integrates seamlessly with your IT investments Affordable and cost effective Works with all your Azure data 38
  • 37. Analytics On-demand analytics job service to simplify big data analytics Can handle jobs of any scale instantly Azure Active Directory integration U-SQL 39
  • 38. Azure Data Lake Analytics 40 Analytics Storage HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics• Serverless. Pay per job. Starts in seconds. Scales instantly. • Develop massively parallel programs with simplicity • Federated query from multiple data sources
  • 39. U-SQL Language that combines declarative SQL with imperative C# 41
  • 40. U-SQL – Key concepts Rowset variables • Each query expression that produces a rowset can be assigned to a variable. EXTRACT • Reads data from a file & defines the schema on read * OUTPUT • Writes data from a rowset to a file * 42
  • 41. U-SQL – Scalar variables DECLARE @in string = "/Samples/Data/SearchLog.tsv"; DECLARE @out string = "/output/SearchLog-scalar-variables.csv"; @searchlog = EXTRACT UserId int, ClickedUrls string FROM @in USING Extractors.Tsv(); OUTPUT @searchlog TO @out USING Outputters.Csv(); 43
  • 42. U-SQL – Transform rowsets @searchlog = EXTRACT UserId int, Region string FROM "/Samples/Data/SearchLog.tsv" USING Extractors.Tsv(); @rs1 = SELECT UserId, Region FROM @searchlog WHERE Region == "en-gb"; OUTPUT @rs1 TO "/output/SearchLog-transform-rowsets.csv" USING Outputters.Csv(); 44
  • 43. U-SQL – Extractor parameters delimiter encoding escapeCharacter nullEscape quoting rowDelimiter silent skipFirstNRows charFormat
  • 44. U-SQL – Outputter parameters delimiter dateTimeFormat encoding escapeCharacter nullEscape quoting rowDelimeter charFormat outputHeader
  • 45. U-SQL Built-in extractors and outputters: Text Csv Tsv A (for instance) CSV Extractor or Outputter is EXACTLY THAT
  • 46. Data sources Options in the Azure Portal: • Data Lake Storage Gen1 • Azure Storage
  • 50. Resources Basic example Advanced example Create Database (U-SQL) & Create Data Source (U-SQL) This example HDInsight quickstart Azure blog Azure roadmap
  • 51. Bedankt voor je aandacht
  • 52. Track 1 15:35 – 16:20 Skynet Is Talking - Microsoft Bot Framework Kris van der Mast Track 2 15:35 – 16:20 Enter The Matrix: Securing Azure's Assets Mike Martin

Notas do Editor

  1. WebHDFS means Hadoop & HDInsight compatibility
  2. Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem It does not impose any limits on account sizes, file sizes, or the amount of data that can be stored in a data lake. Data is stored durably by making multiple copies and there is no limit on the duration of time The data lake spreads parts of a file over a number of individual storage servers. This improves the read throughput when reading the file in parallel for performing data analytics. Redundant copies, enterprise-grade security for the stored data Data in native format, loading data doesn’t require a schema, structured, semi-structured, and unstructured data
  3. including multi-factor authentication, conditional access, role-based access control, application usage monitoring, security monitoring and alerting, etc. Do not enable encryption. Use keys managed by Data Lake Storage Gen1 Use keys from your own Key Vault.
  4. You can use Azure HDInsight and Azure Data Lake Analytics to run data analysis jobs on the data stored in Data Lake Storage Gen1.
  5. Apache Sqoop, Azure Data Factory, Apache DistCp Custom script / application Azure CLI, Azure PowerShell, Azure Data Lake Storage Gen1 .NET SDK
  6. Because Azure Data Lake Storage Gen2 is integrated into the Azure Storage platform, applications can use either the BLOB APIs or Azure Data Lake Storage Gen2 file system APIs for accessing data. BLOB APIs allow you to leverage your existing investments in BLOB Storage and continue to take advantage of the large ecosystem of first and third party applications already available while the Azure Data Lake Storage Gen2 file system APIs are optimized for analytics engines like Hadoop and Spark.
  7. Storage Explorer in portal WILL NOT WORK, Blob Viewing Tool partial support