SlideShare a Scribd company logo
1 of 32
First Language: APL
 Example (arbitrary dim regression): Const⌹Coeff∘.*⌽0,Dim
 High-Dimensional Arrays/Nested Arrays as Data Model
 Expressions: Mathematics (declarative), parallel
 Control flow: Recursion, GoTo 
 Syntax: Greek and Special Characters (easier to write than read)
Next Languages: Pascal, Modula-2, Oberon, C/C++
 Procedural, imperative (structured control flow)
 One Item: Structured types and Object Models
 Single-node, Parallelism/Distributed via Libraries
Other experiences: Lisp, Prolog
 Functional and Logic
 List as Data Model
 Recursion instead of control flow
Data Processing Languages:
 SQL: Declarative Expressions, Procedural Control Flow
 DataLog: Recursion
 XQuery: Tree Data Model, Declarative/Functional
My Language
History
Imperative vs Declarative
Procedural vs Functional vs Logical
One item vs Sets
Single-node vs Parallel vs Distributed
Programming vs Data Processing
Imperative
Tell the system how to get it
Declarative
Tell the system what you want
Let the system find a way to get it
Declarative leaves options that an optimizer can reason about
Imperative vs
Declarative
Procedural
operates by changing “persistent state” (variables) with each
expression
Allows side-effects in control flow
i = 0;
FOR j FROM 1 TO 100 DO i = i + 1;
i is now 100.
Functional
Transforms input into output, no “persistent state”
No side-effects in control flow
i = 0;
FOR j FROM 1 TO 100 DO i = i + 1;
i is now 1.
Procedural vs
Functional
Single objects
 Requires control flow
 Explicit Parallelism
Sets of objects
 Allows higher-level abstraction expressions
 Implicit Parallelism
 Objects can be
 Object/value
 Tuples
 Trees
 graphs
Meta Data Service provides sharing of data model
objects
One Item vs
Sets
Data Models
Language Parallelism vs Libraries
Scale-up vs Scale-out
Synchronization/Transactions
 Explicit imperative vs Implicit declarative
 ACID support
Single-node vs
Parallel vs
Distributed
Programming Languages
 Long-term data is in a store but not part of the language
model
 Designed for tight coupling of data and application logic
 Often imperative, procedural, one-item object-oriented,
explicit/library based parallelism
Data Processing Languages
 Long-term data is part of the language model
 Data can evolve independently of application
 Declarative and Functional
 Set-based
 Built-in parallelism and implicit/declarative synchronization
Programming
vs
Data
Processing
Writeability vs Readability
Consistency
Familiarity
Context independent
Composable
Mathematical vs natural language
Reserved Keywords?
Syntax Matters
No surprises!
Avoid complexities!
Composable
Optimizable
Implementable
Semantics
Matters
Consortiums/Standard Bodies:
 Slow (it took XQuery 6 years!)
 “Political” interests of participants can
negatively impact design
Individual/Small team:
 More focused
 Risk: different for difference sake
Evolve vs New
Create Language that address demand
How to create
languages
Some sample use cases
Digital Crime Unit – Analyze complex attack patterns
to understand BotNets and to predict and mitigate
future attacks by analyzing log records with
complex custom algorithms
Image Processing – Large-scale image feature
extraction and classification using custom code
Shopping Recommendation – Complex pattern
analysis and prediction over shopping records
using proprietary algorithms
 Declarativity does scaling and
parallelization for you
 Extensibility is bolted on and
not “native”
 hard to work with anything other than
structured data
 difficult to extend with custom code
 Extensibility through custom code
is “native”
 Declarativity is bolted on and
not “native”
 User often has to
care about scale and performance
 SQL is 2nd class within string
 Often no code reuse/
sharing across queries
 Declarativity and Extensibility are
equally native to the language!
Get benefits of both!
Makes it easy for you by unifying:
• Unstructured and structured data processing
• Declarative SQL and custom imperative Code
(C#, Python, R, …)
Scales-up and Scales-out custom code within
declarative framework
The origins
of U-SQL
SCOPE – Microsoft’s internal
Big Data language
• SQL and C# integration model
• Optimization and Scaling model
• Runs 100’000s of jobs daily
Hive
• Complex data types (Maps, Arrays)
• Data format alignment for text files
T-SQL/ANSI SQL
• Many of the SQL capabilities (windowing functions, meta
data model etc.)
U-SQL Language Philosophy
Declarative Query and Transformation Language:
• Uses SQL’s SELECT FROM WHERE with GROUP
BY/Aggregation, Joins, SQL Analytics functions
• Optimizable, Scalable
Expression-flow programming style:
• Easy to use functional lambda composition
• Composable, globally optimizable
Operates on Unstructured & Structured Data
• Schema on read over files
• Relational metadata objects (e.g. database, table)
Extensible from ground up:
• Type system is based on C#
• Expression language IS C#
• User-defined functions (U-SQL and C#)
• User-defined Aggregators (C#)
• User-defined Operators (UDO) (C#)
U-SQL provides the Parallelization and Scale-out
Framework for Usercode
• EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,
COMBINER, APPLIER
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float, ... );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;
U-SQL Data Model
Files and Tables
Set-based
Unstructured Data
@s = EXTRACT a string, b int, date DateTime, file string
FROM "filepath/{date:yyyy}/{date:MM}/{date:dd}/{file}.csv"
USING Extractors.Csv(encoding: Encoding.Unicode);
• Pro: Flexible, scaling with file sets and over parts of partitionable files
• Cons: System doesn’t know data distribution, statistics; no indexing
Structured Data
CREATE TABLE T (col1 int
, col2 string
, col3 SQL.MAP<string,string>
, INDEX idx CLUSTERED (col2 ASC)
PARTITIONED BY (col1)
DISTRIBUTED BY HASH (driver_id) );
• Pro: Provides system guarantees about data distributions, statistics, indices to
help performance and scale; object discoverability
• Cons: Needs Schema a priori, Cost of additional storage and generation
U-SQL Familiarity
SQL
C#
Python, R
Familiar Operations
• ORDER BY FETCH n ROWS
• GROUP BY HAVING
• UNION/INTERSECT/EXCEPT
• OVER Expression: Windowing, Analytics, Ranking Functions
• JOINS: INNER, FULL/LEFT/RIGHT OUTER, CROSS, SEMI, ANTI-SEMI-JOIN
• CROSS APPLY
• PIVOT/UNPIVOT (new!)
New Operations
• SET OPERATION BY NAME
SELECT * FROM @left
INTERSECT BY NAME ON (id, *)
SELECT * FROM @right;
• OUTER UNION BY NAME
SELECT * FROM @left
OUTER UNION BY NAME ON (A, K)
SELECT * FROM @right;
• Flexible Column Sets for parameter polymorphism
“Top 5”s
Surprises for
SQL Users
• AS is not as
• C# keywords and SQL keywords overlap
• Future Proofing against new reserved keywords in
both languages: Reserve all upper-case words as
U-SQL keywords
• " vs ' vs []
• = != ==
• Remember: C# expression language
• null IS NOT NULL
• C# nulls are two-valued
• PROCEDURES but no WHILE
• No UPDATE, DELETE, nor
MERGE (yet)
U-SQL Object Model
Reusability and
Discoverability
ADLA Account/Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
tables views TVFs
C# Fns C# UDAgg
Clustered
Index
partitions
C#
Assemblies
C# Extractors
Data
Source
C# Reducers
C# Processors
C# Combiners
C# Outputters
Ext. tables
User
objects
Refers toContains Implemented
and named by
Procedures
Creden-
tials
MD
Name
C# Name
C# Applier
Table Types
Legend
Statistics
C# UDTs
U-SQL Extensibility
 Start Time - End Time - User Name
 5:00 AM - 6:00 AM - ABC
 5:00 AM - 6:00 AM - XYZ
 8:00 AM - 9:00 AM - ABC
 8:00 AM - 10:00 AM - ABC
 10:00 AM - 2:00 PM - ABC
 7:00 AM - 11:00 AM - ABC
 9:00 AM - 11:00 AM - ABC
 11:00 AM - 11:30 AM - ABC
 11:40 PM - 11:59 PM - FOO
Start Time - End Time - User Name
5:00 AM - 6:00 AM - ABC
5:00 AM - 6:00 AM - XYZ
7:00 AM - 2:00 PM - ABC
11:40 PM - 0:40 AM - FOO
U-SQL extensibility
Extend U-SQL with C#/.NET
Built-in operators,
function, aggregates
C# expressions (in SELECT expressions)
User-defined aggregates (UDAGGs)
User-defined functions (UDFs)
User-defined operators (UDOs)
 User-Defined Extractors
 User-Defined Outputters
 User-Defined Processors
 Take one row and produce one row
 Pass-through versus transforming
 User-Defined Appliers
 Take one row and produce 0 to n rows
 Used with OUTER/CROSS APPLY
 User-Defined Combiners
 Combines rowsets (like a user-defined join)
 User-Defined Reducers
 Take n rows and produce m rows (normally m<n)
 Scaled out with explicit U-SQL Syntax that takes a UDO
instance (created as part of the execution):
 EXTRACT
 OUTPUT
What are
UDOs?
Custom Operator Extensions
Scaled out by U-SQL
• PROCESS
• COMBINE
• REDUCE
• CROSS APPLY
 .Net API provided to build UDOs
 Any .Net language usable
 however only C# is first-class in tooling
 Use U-SQL specific .Net DLLs
 Deploying UDOs
 Compile DLL
 Upload DLL to ADLS
 register with U-SQL script
 VisualStudio provides tool support
 UDOs can
 Invoke managed code
 Invoke native code deployed with UDO assemblies
 Invoke other language runtimes (e.g., Python, R)
 be scaled out by U-SQL execution framework
 UDOs cannot
 Communicate between different UDO invocations
 Call Webservices/Reach outside the vertex boundary
How to specify
UDOs?
Provide integration into U-SQL Data
Models (Files and Rowsets)
Integrates into Processing model and
optimization model
U-SQL Scalability and
Performance
Script level optimization
Scales as Data scales
 Automatic "in-lining"
optimized out-of-
the-box
 Per job
parallelization
visibility into execution
 Heatmap to identify
bottlenecks
U-SQL’s designed for Big Data
Analytics
Functional, Declarative, Set-based =>
Scalability and Optimizable
Provides Extensibility with known
Programming Languages
Familiarity: Evolution and Re-use
Summary

More Related Content

What's hot

What's hot (20)

U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code Examples
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
 
Introduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystIntroduction to Spark SQL & Catalyst
Introduction to Spark SQL & Catalyst
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 

Similar to The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)

Dev Sql Beyond Relational
Dev Sql Beyond RelationalDev Sql Beyond Relational
Dev Sql Beyond Relational
rsnarayanan
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
Vasavi College of Engg
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
nzhang
 

Similar to The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote) (20)

U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
esProc introduction
esProc introductionesProc introduction
esProc introduction
 
Dev Sql Beyond Relational
Dev Sql Beyond RelationalDev Sql Beyond Relational
Dev Sql Beyond Relational
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1
 
Introducing Oslo
Introducing OsloIntroducing Oslo
Introducing Oslo
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
 
Ikenstudiolive
IkenstudioliveIkenstudiolive
Ikenstudiolive
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 
Concepts In Object Oriented Programming Languages
Concepts In Object Oriented Programming LanguagesConcepts In Object Oriented Programming Languages
Concepts In Object Oriented Programming Languages
 
Tech Days09 Sqldev
Tech Days09 SqldevTech Days09 Sqldev
Tech Days09 Sqldev
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
SQL Server 2008 for .NET Developers
SQL Server 2008 for .NET DevelopersSQL Server 2008 for .NET Developers
SQL Server 2008 for .NET Developers
 
Day5
Day5Day5
Day5
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
 

More from Michael Rys

More from Michael Rys (11)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 

Recently uploaded

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)

  • 1.
  • 2.
  • 3. First Language: APL  Example (arbitrary dim regression): Const⌹Coeff∘.*⌽0,Dim  High-Dimensional Arrays/Nested Arrays as Data Model  Expressions: Mathematics (declarative), parallel  Control flow: Recursion, GoTo   Syntax: Greek and Special Characters (easier to write than read) Next Languages: Pascal, Modula-2, Oberon, C/C++  Procedural, imperative (structured control flow)  One Item: Structured types and Object Models  Single-node, Parallelism/Distributed via Libraries Other experiences: Lisp, Prolog  Functional and Logic  List as Data Model  Recursion instead of control flow Data Processing Languages:  SQL: Declarative Expressions, Procedural Control Flow  DataLog: Recursion  XQuery: Tree Data Model, Declarative/Functional My Language History Imperative vs Declarative Procedural vs Functional vs Logical One item vs Sets Single-node vs Parallel vs Distributed Programming vs Data Processing
  • 4. Imperative Tell the system how to get it Declarative Tell the system what you want Let the system find a way to get it Declarative leaves options that an optimizer can reason about Imperative vs Declarative
  • 5. Procedural operates by changing “persistent state” (variables) with each expression Allows side-effects in control flow i = 0; FOR j FROM 1 TO 100 DO i = i + 1; i is now 100. Functional Transforms input into output, no “persistent state” No side-effects in control flow i = 0; FOR j FROM 1 TO 100 DO i = i + 1; i is now 1. Procedural vs Functional
  • 6. Single objects  Requires control flow  Explicit Parallelism Sets of objects  Allows higher-level abstraction expressions  Implicit Parallelism  Objects can be  Object/value  Tuples  Trees  graphs Meta Data Service provides sharing of data model objects One Item vs Sets Data Models
  • 7. Language Parallelism vs Libraries Scale-up vs Scale-out Synchronization/Transactions  Explicit imperative vs Implicit declarative  ACID support Single-node vs Parallel vs Distributed
  • 8. Programming Languages  Long-term data is in a store but not part of the language model  Designed for tight coupling of data and application logic  Often imperative, procedural, one-item object-oriented, explicit/library based parallelism Data Processing Languages  Long-term data is part of the language model  Data can evolve independently of application  Declarative and Functional  Set-based  Built-in parallelism and implicit/declarative synchronization Programming vs Data Processing
  • 9. Writeability vs Readability Consistency Familiarity Context independent Composable Mathematical vs natural language Reserved Keywords? Syntax Matters
  • 11. Consortiums/Standard Bodies:  Slow (it took XQuery 6 years!)  “Political” interests of participants can negatively impact design Individual/Small team:  More focused  Risk: different for difference sake Evolve vs New Create Language that address demand How to create languages
  • 12.
  • 13. Some sample use cases Digital Crime Unit – Analyze complex attack patterns to understand BotNets and to predict and mitigate future attacks by analyzing log records with complex custom algorithms Image Processing – Large-scale image feature extraction and classification using custom code Shopping Recommendation – Complex pattern analysis and prediction over shopping records using proprietary algorithms
  • 14.  Declarativity does scaling and parallelization for you  Extensibility is bolted on and not “native”  hard to work with anything other than structured data  difficult to extend with custom code
  • 15.  Extensibility through custom code is “native”  Declarativity is bolted on and not “native”  User often has to care about scale and performance  SQL is 2nd class within string  Often no code reuse/ sharing across queries
  • 16.  Declarativity and Extensibility are equally native to the language! Get benefits of both! Makes it easy for you by unifying: • Unstructured and structured data processing • Declarative SQL and custom imperative Code (C#, Python, R, …) Scales-up and Scales-out custom code within declarative framework
  • 17. The origins of U-SQL SCOPE – Microsoft’s internal Big Data language • SQL and C# integration model • Optimization and Scaling model • Runs 100’000s of jobs daily Hive • Complex data types (Maps, Arrays) • Data format alignment for text files T-SQL/ANSI SQL • Many of the SQL capabilities (windowing functions, meta data model etc.)
  • 18. U-SQL Language Philosophy Declarative Query and Transformation Language: • Uses SQL’s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions • Optimizable, Scalable Expression-flow programming style: • Easy to use functional lambda composition • Composable, globally optimizable Operates on Unstructured & Structured Data • Schema on read over files • Relational metadata objects (e.g. database, table) Extensible from ground up: • Type system is based on C# • Expression language IS C# • User-defined functions (U-SQL and C#) • User-defined Aggregators (C#) • User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode • EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float, ... ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j;
  • 19. U-SQL Data Model Files and Tables Set-based
  • 20. Unstructured Data @s = EXTRACT a string, b int, date DateTime, file string FROM "filepath/{date:yyyy}/{date:MM}/{date:dd}/{file}.csv" USING Extractors.Csv(encoding: Encoding.Unicode); • Pro: Flexible, scaling with file sets and over parts of partitionable files • Cons: System doesn’t know data distribution, statistics; no indexing Structured Data CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col2 ASC) PARTITIONED BY (col1) DISTRIBUTED BY HASH (driver_id) ); • Pro: Provides system guarantees about data distributions, statistics, indices to help performance and scale; object discoverability • Cons: Needs Schema a priori, Cost of additional storage and generation
  • 22. Familiar Operations • ORDER BY FETCH n ROWS • GROUP BY HAVING • UNION/INTERSECT/EXCEPT • OVER Expression: Windowing, Analytics, Ranking Functions • JOINS: INNER, FULL/LEFT/RIGHT OUTER, CROSS, SEMI, ANTI-SEMI-JOIN • CROSS APPLY • PIVOT/UNPIVOT (new!) New Operations • SET OPERATION BY NAME SELECT * FROM @left INTERSECT BY NAME ON (id, *) SELECT * FROM @right; • OUTER UNION BY NAME SELECT * FROM @left OUTER UNION BY NAME ON (A, K) SELECT * FROM @right; • Flexible Column Sets for parameter polymorphism
  • 23. “Top 5”s Surprises for SQL Users • AS is not as • C# keywords and SQL keywords overlap • Future Proofing against new reserved keywords in both languages: Reserve all upper-case words as U-SQL keywords • " vs ' vs [] • = != == • Remember: C# expression language • null IS NOT NULL • C# nulls are two-valued • PROCEDURES but no WHILE • No UPDATE, DELETE, nor MERGE (yet)
  • 24. U-SQL Object Model Reusability and Discoverability
  • 25. ADLA Account/Catalog Database Schema [1,n] [1,n] [0,n] tables views TVFs C# Fns C# UDAgg Clustered Index partitions C# Assemblies C# Extractors Data Source C# Reducers C# Processors C# Combiners C# Outputters Ext. tables User objects Refers toContains Implemented and named by Procedures Creden- tials MD Name C# Name C# Applier Table Types Legend Statistics C# UDTs
  • 26. U-SQL Extensibility  Start Time - End Time - User Name  5:00 AM - 6:00 AM - ABC  5:00 AM - 6:00 AM - XYZ  8:00 AM - 9:00 AM - ABC  8:00 AM - 10:00 AM - ABC  10:00 AM - 2:00 PM - ABC  7:00 AM - 11:00 AM - ABC  9:00 AM - 11:00 AM - ABC  11:00 AM - 11:30 AM - ABC  11:40 PM - 11:59 PM - FOO Start Time - End Time - User Name 5:00 AM - 6:00 AM - ABC 5:00 AM - 6:00 AM - XYZ 7:00 AM - 2:00 PM - ABC 11:40 PM - 0:40 AM - FOO
  • 27. U-SQL extensibility Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined aggregates (UDAGGs) User-defined functions (UDFs) User-defined operators (UDOs)
  • 28.  User-Defined Extractors  User-Defined Outputters  User-Defined Processors  Take one row and produce one row  Pass-through versus transforming  User-Defined Appliers  Take one row and produce 0 to n rows  Used with OUTER/CROSS APPLY  User-Defined Combiners  Combines rowsets (like a user-defined join)  User-Defined Reducers  Take n rows and produce m rows (normally m<n)  Scaled out with explicit U-SQL Syntax that takes a UDO instance (created as part of the execution):  EXTRACT  OUTPUT What are UDOs? Custom Operator Extensions Scaled out by U-SQL • PROCESS • COMBINE • REDUCE • CROSS APPLY
  • 29.  .Net API provided to build UDOs  Any .Net language usable  however only C# is first-class in tooling  Use U-SQL specific .Net DLLs  Deploying UDOs  Compile DLL  Upload DLL to ADLS  register with U-SQL script  VisualStudio provides tool support  UDOs can  Invoke managed code  Invoke native code deployed with UDO assemblies  Invoke other language runtimes (e.g., Python, R)  be scaled out by U-SQL execution framework  UDOs cannot  Communicate between different UDO invocations  Call Webservices/Reach outside the vertex boundary How to specify UDOs? Provide integration into U-SQL Data Models (Files and Rowsets) Integrates into Processing model and optimization model
  • 30. U-SQL Scalability and Performance Script level optimization Scales as Data scales
  • 31.  Automatic "in-lining" optimized out-of- the-box  Per job parallelization visibility into execution  Heatmap to identify bottlenecks
  • 32. U-SQL’s designed for Big Data Analytics Functional, Declarative, Set-based => Scalability and Optimizable Provides Extensibility with known Programming Languages Familiarity: Evolution and Re-use Summary

Editor's Notes

  1. It is not often that one designs a new query language, but sometimes a new data model or new processing requirements offer the opportunity to design a new language. I have been fortunate to be involved in both implementing, influencing and designing a few data processing languages during my career ranging from T-SQL over XQuery to U-SQL. In this presentation, I will present my experiences around language designs, what in my opinion makes a good language (and what may make a not so good one), what trade-offs have to be considered and show some of the design decisions behind U-SQL.
  2. It is not often that one designs a new query language, but sometimes a new data model or new processing requirements offer the opportunity to design a new language. I have been fortunate to be involved in both implementing, influencing and designing a few data processing languages during my career ranging from T-SQL over XQuery to U-SQL. In this presentation, I will present my experiences around language designs, what in my opinion makes a good language (and what may make a not so good one), what trade-offs have to be considered and show some of the design decisions behind U-SQL.
  3. Add velocity?
  4. Hard to operate on unstructured data: Even Hive requires meta data to be created to operate on unstructured data. Adding Custom Java functions, aggregators and SerDes is involving a lot of steps and often access to server’s head node and differs based on type of operation. Requires many tools and steps. Some examples: Hive UDAgg Code and compile .java into .jar Extend AbstractGenericUDAFResolver class: Does type checking, argument checking and overloading Extend GenericUDAFEvaluator class: implements logic in 8 methods. - Deploy: Deploy jar into class path on server Edit FunctionRegistry.java to register as built-in Update the content of show functions with ant Hive UDF (as of v0.13) Code Load JAR into head node or at URI CREATE FUNCTION USING JAR to register and load jar into classpath for every function (instead of registering jar and just use the functions)
  5. Spark supports Custom “inputters and outputters” for defining custom RDDs No UDAGGs Simple integration of UDFs but only for duration of program. No reuse/sharing. Cloud dataflow? Requires has to care about scale and perf Spark UDAgg Is not yet supported ( SPARK-3947) Spark UDF Write inline function def westernState(state: String) = Seq("CA", "OR", "WA", "AK").contains(state) for SQL usage need to register the table customerTable.registerTempTable("customerTable") Register each UDF sqlContext.udf.register("westernState", westernState _) Call it val westernStates = sqlContext.sql("SELECT * FROM customerTable WHERE westernState(state)")
  6. Offers Auto-scaling and performance Operates on unstructured data without tables needed Easy to extend declaratively with custom code: consistent model for UDO, UDF and UDAgg. Easy to query remote sources even without external tables U-SQL UDAgg Code and compile .cs file: Implement IAggregate’s 3 methods :Init(), Accumulate(), Terminate() C# takes case of type checking, generics etc. Deploy: Tooling: one click registration in user db of assembly By Hand: Copy file to ADL CREATE ASSEMBLY to register assembly Use via AGG<MyNamespace.MyAggregate<T>>(a) U-SQL UDF Code in C#, register assembly once, call by C# name.
  7. Remove SCOPE for external customers?
  8. Use for language experts
  9. Extensions require .NET assemblies to be registered with a database