4. Agenda
• Power Query and the M language
• E and T and L with Power Query
• Data refresh techniques with PQ
• Next step
5. Introduction
• Power Query
• Get data experience
• Filter and combine
• Embedded M for repeatable mashup
• Power Query Formula Language (aka M)
•
•
•
•
•
Mostly pure
Higher-order
Dynamically typed
Partially lazy
Functional programming language
6. Elements of language
• Expressions – central construct
• Evaluated to a single vlaue
• Values
•
•
•
•
•
Primitives
List – ordered seq.
Record – set of fields
Table
Function
7. Evaluation
• Excel-like (surprise !)
• Nested records
• In Records
• In Lists
• Lazy evaluation
• Lists and Records (and let)
• Eager evaluation
• Everything else
8. Functions and Standard Library
• Mapping from a set of values to a single value
• (named parameters) => function body
• Common set of definitions
10. Metadata
• Information about a value that is associated with a value
• A record
• Exists for every value
• Unobtrusive way to add information
• Accessed with Value.Metadata
11. Let .....in expression
• So far only literal values
• Let allows a set of value to be:
• Computed
• Named
• Used in subsequent expressions that follows the in
let
in
Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......),
RowCount = Table.RowCount(Source)
RowCount
13. Error expression
• When an expression evaluation cannot yield a value
• Raised with error
• Handled with try
• Produces an Error record
• try...otherwise Used with default values
14. Keywords and Operators
• and as each else error false if in is let meta not
otherwise or section shared then true try type
#binary #date #datetime #datetimezone #duration
#infinity #nan #sections #shared #table #time
• , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? =>
.. ...
15. The ”E” - Why is Power Query great for Extracting data
• Multiple data sources
Hey wait ! Where is PDW ?
16. Query folding - A step toward declarative ETL approach
• Declarative vs Imperative
• Query folding similar to predicate pushdown
• Does Power Query have a Query Optimizer ?
• Demo
Query folding - the unofficial list:
• SQL Databases
• OData and OData based sources, such
as the Windows Azure Marketplace
and SharePoint Lists
• Active Directory
• HDFS.Files, Folder.Files, and
Folder.Contents (for basic operations
on paths)
•
•
•
•
Column removal
Renaming
Joins
Type conversions
17. Real life scenario – ETL for the masses
• Seen a lot of demos
• Build a lot of demos
• They are always so clean !
19. Transform
• M is how the magic happens!
• Data manipulation
• Records
• Lists
• Tables
• Merging
• Function calls
20. What about our scenario?
• Where should I get my data from?
• Pure Excel
• Excel and MDS/DQS/SSIS/SQL
• Web, SQL, XML, ?
• Let me show you ! Input
• (cvr web)
21. Let’s go to homegrown data?
• Bad web service
• Bad HTML structure
• Let’s go with local data that we can control
Isolated DB
• SQL Server
• Excel
• Let’s Query!
Local storage
22. Clean up before you merge!
• DQS
Knowledge base with CVR
+ Cleansing project with LinkedIn input
________________________________________
= Demo2.1_AndreasStrandbyClean
+
• Hit ratio increased...
Hit
250
Total
100%
90%
80%
200
70%
60%
150
50%
=
40%
100
30%
20%
50
10%
0
0%
Clean
join
Nested Merge
join
23. Smarter Power Query
• Expression.Evaluate()
• Examples
• Load query text from file
• Load function from file
• Passing parameters (as constants)
• Demo
25. Refreshing Power Query data – with VB6 !
• Back from 2006
Plus
Minus
Can be scheduled
VB6 – are you kidding ?
More robust than the non-technical
solution
• From Kim GreenLee
26. Refreshing Power Query data – with PowerShell
Plus
Minus
Robust
Hard to troubleshoot
Can not run in a task in windows task
scheduler unless the user has checked
that the user has to be logged on to run
27. Refreshing Power Query data – The non-technical way
• Let me show you !
Plus
Minus
Very easy
Not very corporate !
The spreadsheet needs to be open
Excel file not saved
Locked out when it refreshes
28. Refreshing Power Query data – The non-technical way part 2
• Let me show you !
Plus
Minus
Very easy
Not very corporate !
Uses technique from previous
The spreadsheet needs to be open
29. Refreshing Power Query data – with SSIS
Plus
Minus
Robust
Requires a SQL Server (wait, it’s a plus!)
Needs a SSIS / C# developer
30. Refreshing Power Query data – with SSIS
• Using DQS for cleansing input
• Let me show you !
31. How is Power query going to be used?
• Data store accumulating interesting data points
• Hook into read only data for reporting purposes or data marts
• One file to accumulate (Produce)
• Multiple files or programs to report (Consume)
• I don’t believe in “Data Steward”
• I believe someone will be in charge of procuring and monitoring
data stores of disparate data (such as IT or DBA’s).
32. Conclusion
• A step toward declarative ETL approach
• Still much work to do !
We have
• A declarative data integration language
• Only surfaced in Power Query
• Can push data to an Excel spreadsheet
Imagine.....
• Connection to heterogenous data sources