SlideShare uma empresa Scribd logo
1 de 60
Baixar para ler offline
Data Solution Architect, Microsoft
AZURE DATA LAKE
Store and Analytics
Big Data for Microsoft Developers
Kenneth M. Nielsen
@doktorkermit
Kenneth M. Nielsen
• Worked with SQL Server since 1999
• Co-organizer of SQL Saturday DK
• Co-organizer of SQLNexus Nordic
• Community is Everything
• Data Solution Architect at Microsoft
• kmn@funkylab.com
• @doktorkermit
• www.funkylab.com
Agenda
• Azure Data Lake overview
• Azure Data Lake Store
• Azure Data Lake Analytics
• Azure Data Lake Analytics – Using Visual Studio
• Azure Data Lake Analytics – Using PowerShell
• Azure Data Lake Analytics – Cognitive Analysis
• Q & A
AZURE DATA LAKE
Overview
History
Bing needed to…
– Understand user behavior
And do it…
– At massive scale
– With agility and speed
– At low cost
So they built …
– Cosmos
Cosmos
• Batch Jobs
• Interactive
• Machine Learning
• Streaming
Thousands of Developers
AZURE DATA LAKE
Store and analyze data of any kind and size
Develop faster, debug and optimize smarter
Interactively explore patterns in your data
No learning curve
Managed and supported
Dynamically scales to match your business priorities
Enterprise-grade security
Built on YARN, designed for the cloud
DATA LAKE STORE
Azure Data Lake Store
A hyper scale repository
for big data analytics
workloads
No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the cloud
ENTERPRISE READY access control,
Encryption at rest
Optimized for analytic workload
PERFORMANCE
Azure Data Lake Store
Any Data
• Unstructured
• Semi-structured
• Structured
Azure Data Lake Store
Azure Data Lake Store
HDFS for the cloud
New filesystem build from the
ground up, based on
HADOOP file system
• Integrates with
HDInsight, Hortonworks
and Cloudera
• Supports Files and
Folder objects and
operations
Azure Data Lake Store
Unlimited storage
• Files sizes can be
from Gigabytes to
Petabytes
• No limits to scale
Azure Data Lake Store
Security
• Always encrypted; in motion
using SSL, and at rest using
keys in Azure Key Vault
• Single sign-on, multi-factor
authentication and seamless
integration of on-premises
identities with Active Directory
• Fine-grained POSIX-based
ACLs for role-based access
controls
• Auditing every access /
configuration change
DATA LAKE ANALYTICS
Azure Data Lake Analytics
A elastic analytics service
built on Apache YARN that processes all
data, at any size
• No limits to SCALE
• Includes U-SQL, a language that unifies the
benefits of SQL with the expressive power of C#
• Optimized to work with ADL STORE
• FEDERATED QUERY across Azure data sources
• ENTERPRISE READY Role based access control
& Auditing
• Pay PER JOB & Scale PER JOB
U-SQL
A new language for
Big Data
• Familiar syntax to millions of SQL & .NET
developers
• Unifies declarative nature of SQL with the
imperative power of C#
• Unifies structured, semi-structured and
unstructured data
• Distributed query support over all data
Language Overview
U-SQL Fundamentals
• All the familiar SQL clauses
SELECT | FROM | WHERE
GROUP BY | JOIN | OVER
• Operate on unstructured and
structured data
• Relational metadata objects
.NET integration and extensibility
• U-SQL expressions are full C# expressions
• Reuse .NET code in your own assemblies
• Use C# to define your own:
Types | Functions | Joins | Aggregators | I/O (Extractors, Outp
utters)
U-SQL Capabilities
Interactive
Batch
Streaming
Machine Learning
IN PROGRESS
AVAILABLE NOW
FUTURE
FUTURE
U-SQL Distributed Query
Azure Storage Blobs
Azure Data Lake Store
Azure SQL Database
Azure SQL Data Warehouse
Azure SQL DB in Azure VM
READ
READ
READ
READ
READ
WRITE
WRITE
WRITE
WRITE
WRITE
Develop massively parallel
programs with simplicity
• U-SQL: a simple
and powerful language that’s
familiar and easily extensible
• Unifies the declarative
nature of SQL with expressive
power of C#
• Leverage existing libraries in .NET
languages, R and Python
• Massively parallelize code on
diverse workloads (ETL, ML, image
tagging, facial detection)
@orders =
EXTRACT
OrderId int,
Customer string,
Date DateTime,
Amount float
FROM "/input/orders.txt"
USING Extractors.Tsv();
OUTPUT @orders
TO "/output/orders_copy.txt"
USING Outputters.Tsv();
Apply Schema on read
From a file in a Data Lake
Easy delimited text handling
Write out
Read the input, write it directly to output (just a simple copy)
Rowset
U-SQL Compilation Process
C#
C++
Algebra
Other files
(system files, deployed resources)
managed dll
Unmanaged dll
Compilation output (in job folder)
Compiler & Optimizer
U-SQL Metadata Service
Deployed to Vertices
Logical -> Physical Plan
Each square = “a vertex” represents
a fraction of the total
Vertexes in each SuperVertex (aka
“Stage) are doing the same operation
on different parts of the same data.
Vertexes in a later stages may
depend on a vertex in an earlier stage
Execution with Requested Parallelism
Requested Parallelism = 1
(reserve enough to do 1 vertex at
a time)
Requested Parallelism = 4
(reserve enough to do 4 vertices
at a time)
Job Scheduler
& Queue
Front-EndService
Query Life
Optimizer
Vertex Scheduling
Compiler
Runtime
Visual Studio
Portal / API
Stage Details
252 Pieces of work
AVG Vertex execution time
4.3 Billion rows
Data Read & Written
ADLAUs
Azure
Data
Lake
Analytics
Unit
Parallelism N = N ADLAUs
1 ADLAU ~=
A VM with 2 cores and 6 GB of
memory
Preparing
Queued
Running
Finalizing
Ended
(Succeeded, Failed, Cancelled)
New
Compiling
Queued
Scheduling
Starting
Running
Ended
UX Job State
The script is being compiled by the Compiler Service
All jobs enter the queue.
Are there enough ADLAUs to start the job?
If yes, then allocate those ADLAUs for the job
The U-SQL runtime is now executing the code on 1 or m
ore ADLAUs or finalizing the outputs
The job has concluded.
Why does a Job get Queued?
Local Cause
Conditions:
• Queue already at Max
Concurrency
Global Cause
Conditions:
• System-wide shortage of ADLAUs
• System-wide shortage of
Bandwidth
* If these conditions are met, a job will be queued even if the
queue is not at its Max Concurrency
DATA LAKE ANALYTICS
Visual Studio
Azure Data Lake – Visual Studio
Available
project types
Azure Data Lake – Visual Studio
Fully integrates to
Solution Explorer
Azure Data Lake – Visual Studio
• Monitor and
manage jobs
• Browse and
manage storage
• Browse U-SQL
catalog
CREATING U-SQL
Creating U-SQL
IntelliSense Supported
Creating U-SQL
Code behind to
enhance your
code
Debug and Optimize your
Big Data programs with ease
• Deep integration with
Visual Studio, Visual Studio Code,
Eclipse, & IntelliJ
• Easy for novices to write
simple queries
• Integrated with U-SQL,
Hive, Storm, and Spark
• Actively offers recommendations
to improve performance and
reduce cost
• Playback visually displays job run
USING VISUAL STUDIO
Demo
Installing Azure PowerShell
• PowerShell Gallery
• Recommended approach
• PowerShell 5.0 supports PowerShell Gallery
• Windows 10 ships with PowerShell 5.0
• Web Platform Installation (WebPI)
Installing from the PowerShell
Gallery
• Launch Windows PowerShell ISE as
Administrator
• Install-Module AzureRM
• Install-AzureRM
Finding the ADL cmdlets
• Option 1
• Get-Command -Module AzureRM.DataLakeStore
• Get-Command -Module AzureRM.DataLakeAnalytics
• Option 2
• Get-Command *DataLake*
Logging in to Azure
• Launch Windows PowerShell ISE
• $subname = “Your Subscription Name”
• Login-AzureRmAccount –SubscriptionName $subname
ADLS: Listing files in a store
• $adls = “mscloudsummitstore”
• Get-AzureRmDataLakeStoreChildItem
• -Account $adls
• -Path /
ADLS: Upload and download
• $adls = “mscloudsummitstore”
• Import-AzureRmDataLakeStoreItem
-Account $adls
-Path d:somefile.txt
-Destination /somefile.txt
• Export-AzureRmDataLakeStoreItem
-Account $adls
-Path /somefile.txt
-Destination d:somefile_copy.txt
ADLA: List and submit jobs
• $adla = “mscloudsummitanalytics”
• Get-AzureRmDataLakeAnalyticsJob
-Account $adla
•
Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-Script “…” # U-SQL text
-Name myjob
• Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-ScriptPath D:test.script
-Name myjob
ADL Store (ADLS) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Transferring Data
Upload into store from local disk
Download from store to local disk
Files and Folders
List contents of folder
Create
Move
Delete
Does file exist
Security
Get ACLs
Update ACLs
Get Owner
Set Owner
File Content
Set file content
Append file content
Get file content
Merge files
ADL Analytics (ADLA) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Data Sources
Add a data source
List data sources
Update data source
Delete data source
Compute
List jobs
Submit job
Cancel job
Catalog Items
List items in U-SQL catalog
Update item
Catalog Secrets
Create catalog secret
List catalog secrets
Delete catalog secrets
USING ADL POWERSHELL
DEMO
COGNITIVE ANALYSIS OF IMAGES
Install samples and assemblies
Running sample
Running sample
COGNITIVE ANALYSIS OF IMAGES
Demo
Additional capabilities and resources
Tools:
• http://aka.ms/adltoolsVS
Blogs and community page:
• http://funkylab.com/
• http://blogs.msdn.com/b/visualstudio/
• http://azure.microsoft.com/en-us/blog/topics/big-data/
• https://channel9.msdn.com/Search?term=U-SQL#ch9Search
• https://blogs.msdn.microsoft.com/azuredatalake/2016/11/22/u-sql-advanced-analytics-introducing-cognitive-scenarios-for-text-
and-imaging/
Documentation and articles and slides:
• http://aka.ms/usql_reference
• https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/
• https://msdn.microsoft.com/en-us/magazine/mt614251
GITHUB Get startet
• https://Github.com
ADL forums and feedback
• http://aka.ms/adlfeedback
• https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake
• http://stackoverflow.com/questions/tagged/u-sql
QUESTIONS
Register for SQL Nexus 2017
Register for SQL Nexus 2017
Merci beaucoup à nos sponsors!
Thank you to all our sponsors!
Join the conversation
#MSCloudSummit
@MSCloudSummit
Merci Beaucoup! Thank you!
Join the conversation
#MSCloudSummit
@MSCloudSummit
http://bit.ly/MSCSevalJ1
Evaluez les sessions…
…et tentez de gagner une
Surface Pro 4

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 

Semelhante a J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 

Semelhante a J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen (20)

Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data Lake
 
Tech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL DatabasesTech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL Databases
 

Mais de MS Cloud Summit (6)

J1 T1 5 - Stream Analytics - Cédric Charlier
J1 T1 5 - Stream Analytics - Cédric CharlierJ1 T1 5 - Stream Analytics - Cédric Charlier
J1 T1 5 - Stream Analytics - Cédric Charlier
 
J1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
J1 T1 4 - Azure Data Factory vs SSIS - Regis BaccaroJ1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
J1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
 
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
 
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage - Charles-Hen...
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage  - Charles-Hen...J1 T1 1 - Azure Data Platform, quelle solution pour quel usage  - Charles-Hen...
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage - Charles-Hen...
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Agenda MSCS Jour 1
 Agenda MSCS Jour 1 Agenda MSCS Jour 1
Agenda MSCS Jour 1
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

  • 1. Data Solution Architect, Microsoft AZURE DATA LAKE Store and Analytics Big Data for Microsoft Developers Kenneth M. Nielsen @doktorkermit
  • 2. Kenneth M. Nielsen • Worked with SQL Server since 1999 • Co-organizer of SQL Saturday DK • Co-organizer of SQLNexus Nordic • Community is Everything • Data Solution Architect at Microsoft • kmn@funkylab.com • @doktorkermit • www.funkylab.com
  • 3. Agenda • Azure Data Lake overview • Azure Data Lake Store • Azure Data Lake Analytics • Azure Data Lake Analytics – Using Visual Studio • Azure Data Lake Analytics – Using PowerShell • Azure Data Lake Analytics – Cognitive Analysis • Q & A
  • 5. History Bing needed to… – Understand user behavior And do it… – At massive scale – With agility and speed – At low cost So they built … – Cosmos Cosmos • Batch Jobs • Interactive • Machine Learning • Streaming Thousands of Developers
  • 6. AZURE DATA LAKE Store and analyze data of any kind and size Develop faster, debug and optimize smarter Interactively explore patterns in your data No learning curve Managed and supported Dynamically scales to match your business priorities Enterprise-grade security Built on YARN, designed for the cloud
  • 8. Azure Data Lake Store A hyper scale repository for big data analytics workloads No limits to SCALE Store ANY DATA in its native format HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE READY access control, Encryption at rest Optimized for analytic workload PERFORMANCE
  • 9. Azure Data Lake Store Any Data • Unstructured • Semi-structured • Structured
  • 11. Azure Data Lake Store HDFS for the cloud New filesystem build from the ground up, based on HADOOP file system • Integrates with HDInsight, Hortonworks and Cloudera • Supports Files and Folder objects and operations
  • 12. Azure Data Lake Store Unlimited storage • Files sizes can be from Gigabytes to Petabytes • No limits to scale
  • 13. Azure Data Lake Store Security • Always encrypted; in motion using SSL, and at rest using keys in Azure Key Vault • Single sign-on, multi-factor authentication and seamless integration of on-premises identities with Active Directory • Fine-grained POSIX-based ACLs for role-based access controls • Auditing every access / configuration change
  • 15. Azure Data Lake Analytics A elastic analytics service built on Apache YARN that processes all data, at any size • No limits to SCALE • Includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C# • Optimized to work with ADL STORE • FEDERATED QUERY across Azure data sources • ENTERPRISE READY Role based access control & Auditing • Pay PER JOB & Scale PER JOB
  • 16. U-SQL A new language for Big Data • Familiar syntax to millions of SQL & .NET developers • Unifies declarative nature of SQL with the imperative power of C# • Unifies structured, semi-structured and unstructured data • Distributed query support over all data
  • 17. Language Overview U-SQL Fundamentals • All the familiar SQL clauses SELECT | FROM | WHERE GROUP BY | JOIN | OVER • Operate on unstructured and structured data • Relational metadata objects .NET integration and extensibility • U-SQL expressions are full C# expressions • Reuse .NET code in your own assemblies • Use C# to define your own: Types | Functions | Joins | Aggregators | I/O (Extractors, Outp utters)
  • 19. U-SQL Distributed Query Azure Storage Blobs Azure Data Lake Store Azure SQL Database Azure SQL Data Warehouse Azure SQL DB in Azure VM READ READ READ READ READ WRITE WRITE WRITE WRITE WRITE
  • 20. Develop massively parallel programs with simplicity • U-SQL: a simple and powerful language that’s familiar and easily extensible • Unifies the declarative nature of SQL with expressive power of C# • Leverage existing libraries in .NET languages, R and Python • Massively parallelize code on diverse workloads (ETL, ML, image tagging, facial detection)
  • 21. @orders = EXTRACT OrderId int, Customer string, Date DateTime, Amount float FROM "/input/orders.txt" USING Extractors.Tsv(); OUTPUT @orders TO "/output/orders_copy.txt" USING Outputters.Tsv(); Apply Schema on read From a file in a Data Lake Easy delimited text handling Write out Read the input, write it directly to output (just a simple copy) Rowset
  • 22. U-SQL Compilation Process C# C++ Algebra Other files (system files, deployed resources) managed dll Unmanaged dll Compilation output (in job folder) Compiler & Optimizer U-SQL Metadata Service Deployed to Vertices
  • 23. Logical -> Physical Plan Each square = “a vertex” represents a fraction of the total Vertexes in each SuperVertex (aka “Stage) are doing the same operation on different parts of the same data. Vertexes in a later stages may depend on a vertex in an earlier stage
  • 24. Execution with Requested Parallelism Requested Parallelism = 1 (reserve enough to do 1 vertex at a time) Requested Parallelism = 4 (reserve enough to do 4 vertices at a time)
  • 25. Job Scheduler & Queue Front-EndService Query Life Optimizer Vertex Scheduling Compiler Runtime Visual Studio Portal / API
  • 26. Stage Details 252 Pieces of work AVG Vertex execution time 4.3 Billion rows Data Read & Written
  • 27. ADLAUs Azure Data Lake Analytics Unit Parallelism N = N ADLAUs 1 ADLAU ~= A VM with 2 cores and 6 GB of memory
  • 28. Preparing Queued Running Finalizing Ended (Succeeded, Failed, Cancelled) New Compiling Queued Scheduling Starting Running Ended UX Job State The script is being compiled by the Compiler Service All jobs enter the queue. Are there enough ADLAUs to start the job? If yes, then allocate those ADLAUs for the job The U-SQL runtime is now executing the code on 1 or m ore ADLAUs or finalizing the outputs The job has concluded.
  • 29. Why does a Job get Queued? Local Cause Conditions: • Queue already at Max Concurrency Global Cause Conditions: • System-wide shortage of ADLAUs • System-wide shortage of Bandwidth * If these conditions are met, a job will be queued even if the queue is not at its Max Concurrency
  • 31. Azure Data Lake – Visual Studio Available project types
  • 32. Azure Data Lake – Visual Studio Fully integrates to Solution Explorer
  • 33. Azure Data Lake – Visual Studio • Monitor and manage jobs • Browse and manage storage • Browse U-SQL catalog
  • 36. Creating U-SQL Code behind to enhance your code
  • 37. Debug and Optimize your Big Data programs with ease • Deep integration with Visual Studio, Visual Studio Code, Eclipse, & IntelliJ • Easy for novices to write simple queries • Integrated with U-SQL, Hive, Storm, and Spark • Actively offers recommendations to improve performance and reduce cost • Playback visually displays job run
  • 39. Installing Azure PowerShell • PowerShell Gallery • Recommended approach • PowerShell 5.0 supports PowerShell Gallery • Windows 10 ships with PowerShell 5.0 • Web Platform Installation (WebPI)
  • 40. Installing from the PowerShell Gallery • Launch Windows PowerShell ISE as Administrator • Install-Module AzureRM • Install-AzureRM
  • 41. Finding the ADL cmdlets • Option 1 • Get-Command -Module AzureRM.DataLakeStore • Get-Command -Module AzureRM.DataLakeAnalytics • Option 2 • Get-Command *DataLake*
  • 42. Logging in to Azure • Launch Windows PowerShell ISE • $subname = “Your Subscription Name” • Login-AzureRmAccount –SubscriptionName $subname
  • 43. ADLS: Listing files in a store • $adls = “mscloudsummitstore” • Get-AzureRmDataLakeStoreChildItem • -Account $adls • -Path /
  • 44. ADLS: Upload and download • $adls = “mscloudsummitstore” • Import-AzureRmDataLakeStoreItem -Account $adls -Path d:somefile.txt -Destination /somefile.txt • Export-AzureRmDataLakeStoreItem -Account $adls -Path /somefile.txt -Destination d:somefile_copy.txt
  • 45. ADLA: List and submit jobs • $adla = “mscloudsummitanalytics” • Get-AzureRmDataLakeAnalyticsJob -Account $adla • Submit-AzureRmDataLakeAnalyticsJob -Account $adla -Script “…” # U-SQL text -Name myjob • Submit-AzureRmDataLakeAnalyticsJob -Account $adla -ScriptPath D:test.script -Name myjob
  • 46. ADL Store (ADLS) feature set Account Management Create new account List accounts Update account properties Delete account Transferring Data Upload into store from local disk Download from store to local disk Files and Folders List contents of folder Create Move Delete Does file exist Security Get ACLs Update ACLs Get Owner Set Owner File Content Set file content Append file content Get file content Merge files
  • 47. ADL Analytics (ADLA) feature set Account Management Create new account List accounts Update account properties Delete account Data Sources Add a data source List data sources Update data source Delete data source Compute List jobs Submit job Cancel job Catalog Items List items in U-SQL catalog Update item Catalog Secrets Create catalog secret List catalog secrets Delete catalog secrets
  • 50. Install samples and assemblies
  • 53. COGNITIVE ANALYSIS OF IMAGES Demo
  • 54. Additional capabilities and resources Tools: • http://aka.ms/adltoolsVS Blogs and community page: • http://funkylab.com/ • http://blogs.msdn.com/b/visualstudio/ • http://azure.microsoft.com/en-us/blog/topics/big-data/ • https://channel9.msdn.com/Search?term=U-SQL#ch9Search • https://blogs.msdn.microsoft.com/azuredatalake/2016/11/22/u-sql-advanced-analytics-introducing-cognitive-scenarios-for-text- and-imaging/ Documentation and articles and slides: • http://aka.ms/usql_reference • https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/ • https://msdn.microsoft.com/en-us/magazine/mt614251 GITHUB Get startet • https://Github.com ADL forums and feedback • http://aka.ms/adlfeedback • https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake • http://stackoverflow.com/questions/tagged/u-sql
  • 56. Register for SQL Nexus 2017
  • 57. Register for SQL Nexus 2017
  • 58. Merci beaucoup à nos sponsors! Thank you to all our sponsors! Join the conversation #MSCloudSummit @MSCloudSummit
  • 59. Merci Beaucoup! Thank you! Join the conversation #MSCloudSummit @MSCloudSummit
  • 60. http://bit.ly/MSCSevalJ1 Evaluez les sessions… …et tentez de gagner une Surface Pro 4