SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
ETL for the Masses
Régis Baccaro – IBM
@regbac
Our Sponsors
Introduction

Régis Baccaro

@regbac

http://Theblobfarm.wordpress.com
http://Thelovefarm.wordpress.com

regis@baccaro.com
•
•
•
•
•

Founder and lead organizer of SQL Saturday Denmark
PASS Regional Mentor
Works for IBM
Passionate about the community
.Net developer, BI dude, SharePoint fellow and accidental DBA
Agenda
• Power Query and the M language
• E and T and L with Power Query
• Data refresh techniques with PQ
• Next step
Introduction
• Power Query
• Get data experience
• Filter and combine
• Embedded M for repeatable mashup

• Power Query Formula Language (aka M)
•
•
•
•
•

Mostly pure
Higher-order
Dynamically typed
Partially lazy 
Functional programming language
Elements of language
• Expressions – central construct
• Evaluated to a single vlaue

• Values
•
•
•
•
•

Primitives
List – ordered seq.
Record – set of fields
Table
Function
Evaluation
• Excel-like (surprise !)
• Nested records
• In Records
• In Lists

• Lazy evaluation
• Lists and Records (and let)

• Eager evaluation
• Everything else
Functions and Standard Library
• Mapping from a set of values to a single value
• (named parameters) => function body

• Common set of definitions
Operators
• Meaning varies depending on kind of value

• & = text or list concatenation and records merge
Metadata
• Information about a value that is associated with a value
• A record
• Exists for every value
• Unobtrusive way to add information
• Accessed with Value.Metadata
Let .....in expression
• So far only literal values
• Let allows a set of value to be:
• Computed
• Named
• Used in subsequent expressions that follows the in
let
in

Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......),
RowCount = Table.RowCount(Source)
RowCount
IF expression
• Select between 2 expression based on logical condition
Error expression
• When an expression evaluation cannot yield a value
• Raised with error
• Handled with try
• Produces an Error record
• try...otherwise Used with default values
Keywords and Operators
• and as each else error false if in is let meta not
otherwise or section shared then true try type
#binary #date #datetime #datetimezone #duration
#infinity #nan #sections #shared #table #time
• , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? =>
.. ...
The ”E” - Why is Power Query great for Extracting data
• Multiple data sources

Hey wait ! Where is PDW ?
Query folding - A step toward declarative ETL approach
• Declarative vs Imperative
• Query folding similar to predicate pushdown
• Does Power Query have a Query Optimizer ?
• Demo
Query folding - the unofficial list:
• SQL Databases
• OData and OData based sources, such
as the Windows Azure Marketplace
and SharePoint Lists
• Active Directory
• HDFS.Files, Folder.Files, and
Folder.Contents (for basic operations
on paths)

•
•
•
•

Column removal
Renaming
Joins
Type conversions
Real life scenario – ETL for the masses
• Seen a lot of demos
• Build a lot of demos
• They are always so clean !
Real life scenario
Transform
• M is how the magic happens!
• Data manipulation
• Records
• Lists
• Tables

• Merging
• Function calls
What about our scenario?
• Where should I get my data from?
• Pure Excel
• Excel and MDS/DQS/SSIS/SQL
• Web, SQL, XML, ?

• Let me show you ! Input
• (cvr web)
Let’s go to homegrown data?
• Bad web service
• Bad HTML structure
• Let’s go with local data that we can control

Isolated DB

• SQL Server
• Excel

• Let’s Query!
Local storage
Clean up before you merge!
• DQS
Knowledge base with CVR
+ Cleansing project with LinkedIn input
________________________________________
= Demo2.1_AndreasStrandbyClean

+

• Hit ratio increased...

Hit

250

Total

100%
90%
80%

200

70%
60%

150

50%

=

40%

100

30%
20%

50

10%
0

0%

Clean
join

Nested Merge
join
Smarter Power Query
• Expression.Evaluate()
• Examples
• Load query text from file
• Load function from file
• Passing parameters (as constants)

• Demo
Refreshing Power Query data
• Different solutions
• All with flaws !
Refreshing Power Query data – with VB6 !
• Back from 2006
Plus

Minus

Can be scheduled

VB6 – are you kidding ?

More robust than the non-technical
solution

• From Kim GreenLee
Refreshing Power Query data – with PowerShell

Plus

Minus

Robust

Hard to troubleshoot
Can not run in a task in windows task
scheduler unless the user has checked
that the user has to be logged on to run
Refreshing Power Query data – The non-technical way
• Let me show you !
Plus

Minus

Very easy

Not very corporate !
The spreadsheet needs to be open
Excel file not saved
Locked out when it refreshes
Refreshing Power Query data – The non-technical way part 2
• Let me show you !
Plus

Minus

Very easy

Not very corporate !

Uses technique from previous

The spreadsheet needs to be open
Refreshing Power Query data – with SSIS

Plus

Minus

Robust

Requires a SQL Server (wait, it’s a plus!)
Needs a SSIS / C# developer
Refreshing Power Query data – with SSIS
• Using DQS for cleansing input

• Let me show you !
How is Power query going to be used?
• Data store accumulating interesting data points
• Hook into read only data for reporting purposes or data marts
• One file to accumulate (Produce)
• Multiple files or programs to report (Consume)
• I don’t believe in “Data Steward”
• I believe someone will be in charge of procuring and monitoring
data stores of disparate data (such as IT or DBA’s).
Conclusion
• A step toward declarative ETL approach
• Still much work to do !
We have
• A declarative data integration language
• Only surfaced in Power Query
• Can push data to an Excel spreadsheet
Imagine.....
• Connection to heterogenous data sources
THANK YOU!
@REGBAC
HTTP://THEBLOBFARM.WORDPRESS.COM
REGIS@BACCARO.COM

Mais conteúdo relacionado

Mais procurados

PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQueryin4400
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introductionBishwadeb Dey
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It MattersJohn White
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaVishal Pawar
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerSPC Adriatics
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceRightpoint
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Zeeshan Ikram
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroVishal Pawar
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506Mark Tabladillo
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsUlysses Maclaren
 

Mais procurados (20)

PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQuery
 
October2019 release
October2019 releaseOctober2019 release
October2019 release
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introduction
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It Matters
 
Ai in power platform
Ai in power platform Ai in power platform
Ai in power platform
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
 
Dax & sql in power bi
Dax & sql in power biDax & sql in power bi
Dax & sql in power bi
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint Server
 
Power BI
Power BIPower BI
Power BI
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
 
August2019 release PowerBI
August2019 release PowerBI August2019 release PowerBI
August2019 release PowerBI
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
 
Power BI for CEO
Power BI for CEOPower BI for CEO
Power BI for CEO
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on facts
 
Power bi
Power biPower bi
Power bi
 
Tableau vs PowerBI
Tableau vs PowerBITableau vs PowerBI
Tableau vs PowerBI
 

Semelhante a ETL for the masses with Power Query and M

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsSteve Knutson
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Spark Summit
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to SparkSky Yin
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureSanil Mhatre
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...European SharePoint Conference
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...BIWUG
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsKellyn Pot'Vin-Gorman
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsIke Ellis
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDavid Mann
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for AnalyticsIke Ellis
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 

Semelhante a ETL for the masses with Power Query and M (20)

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & Azure
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Breaking data
Breaking dataBreaking data
Breaking data
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Power BI Live Data Sets
Power BI Live Data SetsPower BI Live Data Sets
Power BI Live Data Sets
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 

Último

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

ETL for the masses with Power Query and M

  • 1. ETL for the Masses Régis Baccaro – IBM @regbac
  • 3. Introduction Régis Baccaro @regbac http://Theblobfarm.wordpress.com http://Thelovefarm.wordpress.com regis@baccaro.com • • • • • Founder and lead organizer of SQL Saturday Denmark PASS Regional Mentor Works for IBM Passionate about the community .Net developer, BI dude, SharePoint fellow and accidental DBA
  • 4. Agenda • Power Query and the M language • E and T and L with Power Query • Data refresh techniques with PQ • Next step
  • 5. Introduction • Power Query • Get data experience • Filter and combine • Embedded M for repeatable mashup • Power Query Formula Language (aka M) • • • • • Mostly pure Higher-order Dynamically typed Partially lazy  Functional programming language
  • 6. Elements of language • Expressions – central construct • Evaluated to a single vlaue • Values • • • • • Primitives List – ordered seq. Record – set of fields Table Function
  • 7. Evaluation • Excel-like (surprise !) • Nested records • In Records • In Lists • Lazy evaluation • Lists and Records (and let) • Eager evaluation • Everything else
  • 8. Functions and Standard Library • Mapping from a set of values to a single value • (named parameters) => function body • Common set of definitions
  • 9. Operators • Meaning varies depending on kind of value • & = text or list concatenation and records merge
  • 10. Metadata • Information about a value that is associated with a value • A record • Exists for every value • Unobtrusive way to add information • Accessed with Value.Metadata
  • 11. Let .....in expression • So far only literal values • Let allows a set of value to be: • Computed • Named • Used in subsequent expressions that follows the in let in Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......), RowCount = Table.RowCount(Source) RowCount
  • 12. IF expression • Select between 2 expression based on logical condition
  • 13. Error expression • When an expression evaluation cannot yield a value • Raised with error • Handled with try • Produces an Error record • try...otherwise Used with default values
  • 14. Keywords and Operators • and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time • , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...
  • 15. The ”E” - Why is Power Query great for Extracting data • Multiple data sources Hey wait ! Where is PDW ?
  • 16. Query folding - A step toward declarative ETL approach • Declarative vs Imperative • Query folding similar to predicate pushdown • Does Power Query have a Query Optimizer ? • Demo Query folding - the unofficial list: • SQL Databases • OData and OData based sources, such as the Windows Azure Marketplace and SharePoint Lists • Active Directory • HDFS.Files, Folder.Files, and Folder.Contents (for basic operations on paths) • • • • Column removal Renaming Joins Type conversions
  • 17. Real life scenario – ETL for the masses • Seen a lot of demos • Build a lot of demos • They are always so clean !
  • 19. Transform • M is how the magic happens! • Data manipulation • Records • Lists • Tables • Merging • Function calls
  • 20. What about our scenario? • Where should I get my data from? • Pure Excel • Excel and MDS/DQS/SSIS/SQL • Web, SQL, XML, ? • Let me show you ! Input • (cvr web)
  • 21. Let’s go to homegrown data? • Bad web service • Bad HTML structure • Let’s go with local data that we can control Isolated DB • SQL Server • Excel • Let’s Query! Local storage
  • 22. Clean up before you merge! • DQS Knowledge base with CVR + Cleansing project with LinkedIn input ________________________________________ = Demo2.1_AndreasStrandbyClean + • Hit ratio increased... Hit 250 Total 100% 90% 80% 200 70% 60% 150 50% = 40% 100 30% 20% 50 10% 0 0% Clean join Nested Merge join
  • 23. Smarter Power Query • Expression.Evaluate() • Examples • Load query text from file • Load function from file • Passing parameters (as constants) • Demo
  • 24. Refreshing Power Query data • Different solutions • All with flaws !
  • 25. Refreshing Power Query data – with VB6 ! • Back from 2006 Plus Minus Can be scheduled VB6 – are you kidding ? More robust than the non-technical solution • From Kim GreenLee
  • 26. Refreshing Power Query data – with PowerShell Plus Minus Robust Hard to troubleshoot Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run
  • 27. Refreshing Power Query data – The non-technical way • Let me show you ! Plus Minus Very easy Not very corporate ! The spreadsheet needs to be open Excel file not saved Locked out when it refreshes
  • 28. Refreshing Power Query data – The non-technical way part 2 • Let me show you ! Plus Minus Very easy Not very corporate ! Uses technique from previous The spreadsheet needs to be open
  • 29. Refreshing Power Query data – with SSIS Plus Minus Robust Requires a SQL Server (wait, it’s a plus!) Needs a SSIS / C# developer
  • 30. Refreshing Power Query data – with SSIS • Using DQS for cleansing input • Let me show you !
  • 31. How is Power query going to be used? • Data store accumulating interesting data points • Hook into read only data for reporting purposes or data marts • One file to accumulate (Produce) • Multiple files or programs to report (Consume) • I don’t believe in “Data Steward” • I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).
  • 32. Conclusion • A step toward declarative ETL approach • Still much work to do ! We have • A declarative data integration language • Only surfaced in Power Query • Can push data to an Excel spreadsheet Imagine..... • Connection to heterogenous data sources