The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Agile Data Warehousing From Start to Finish
1. Agile Data Warehousing
From Start to Finish
Presenter: Davide Mauri, Architect & Mentor, SolidQ
Moderator: Alex Whittles
2. Technical Assistance
2
If you require assistance
during the session, type
your inquiry into the
question pane on the
right side.
Maximize your screen
with the zoom button
on the top of the
presentation window
Type your questions in
the question pane on
the right side
3. Thank You Sponsors
Welcome to the Azure family!
Try DocumentDB today!
http://documentdb.com
Solutions from Dell help you
monitor, manage, protect and
improve your SQL Server
environment.
http://software.dell.com/sql-pass-vc-
dell-sql-server-solutions
4. www.PASSSummit.com
Planning on attending PASS Summit 2014? Start saving
today!
• The world’s largest gathering of SQL Server & BI professionals
• Take your SQL Server skills to the next level by learning from the world’s
SQL Server experts, in 190+ technical sessions
• Over 5000 attendees, representing 2000 companies, from 52 countries,
ready to network & learn
Use discount code 24HOP14
to save $200!
$1,895
UNTIL SEPTEMBER 26,
2014
5. Davide Mauri
SolidQ Mentor
Board of Directors, SolidQ Italy
Microsoft SQL Server MVP
Works with managers to build effective,
tailor-made BI solutions for customers
@mauridb
10. Isn’t the DWH and “old” thing?
Big Data, In Memory and all the new stuff, can’t just replace
the Data Warehouse?
The answer would be “yes”, if a DWH would be a simple
“container” of data.
But it’s much more than this.
11. What is a DWH, really?
In this new era, data is like water.
Who will ever drink from
untested, untrusted,
uncertified data?
12. What is a DWH, really?
Would a manager or a decision maker, take a decision
based on data of which he doesn’t know the source, the
integrity and the correctness?
13. What is a DWH, really?
The Data Warehouse is the place where managers and
decision makers will look for
• Correct
• Trusted
• Updated
Data in order to make a
conscious decision
14. What is a DWH, really?
The answer is now easy:
15. What is DWH, really?
A place to store consolidated data coming from the whole
company
A place where cleanse, verify and certify data
A place where historic data is stored
A place that holds the single version of truth (if there is one!)
Forms the core of a BI solution
User friendly Data models, designed to make data analysis
easier
16. Modern Data Environment
Master
Data
EDW
Data Mart
Big Data
Unstructured
Data
BI Environment
Analytics Environment
Structured
Data Data Scientist
Decision Maker
18. EDW: Reality Check
EDW is the trusted container of all company data
It cannot be created in “one day”
It has to grow and evolve with business needs.
It will never be 100% complete
20. Adapt to Survive
“50% of requirements change in the first year of a BI
project”
Andreas Bitterer, Research VP, Gartner
21. Agile Principles
Small design upfront. Prototype.
Delivery quickly, Deliver frequently.
Users are part of the development team!
Feedback is a key part of the success
They’ll grow with the solution and the solution will grow with them
Embrace Changes!
http://agilemanifesto.org/principles.html
22. Agile Challenges
Delivery Quickly and Fast
Challenge: keep high quality, no matter who’s doing the work
Embrace Changes
Challenge: don’t introduce bugs. Change the smallest part
possible. Use automatic Testing to preserve and assure data
quality.
24. Engineering the solution
To be Agile, some engineering practices needs to be included in
our work model
Agility != Anarchy
Engineering:
Apply well-known models
Define, Apply & Enforce rules
Automate and/or Check rules application
Measure
Test
2
25. Engineering the solution
Favor Kimball Approach (for user-facing models)
Dimensional Modeling
Fact & Measures
Dimensions
Use views to introduce abstraction layers
Reduce the “friction” between layers (source / stage / dwh / dm)
Apply the “Information Hiding Principle”
26. Engineering the solution
Define & Force the application of well-known ETL patterns
SCD1 / SCD2
Incremental / Partition Load
Divide Et Impera
At least two SSIS solutions
many small SSIS Packages
5 Databases (STG, CFG, LOG, MD, DWH)
27. Design Pattern
“A general reusable solution
to a commonly occurring
problem within a given
context”
31. No Monkey Work!
Let the people think and let
the machines do the
«monkey» work.
32. Invest on Automation?
Faster development
Reduce Costs
Embrace Changes
Less bugs
Increase solution quality and make it consistent throughout
the whole product
34. ETL Phases
«E» and «L» must be
Simple, Easy and Straightforward
Completely Automated
Completely Reusable
«E» and «L» have ZERO value in a DWH Solution
Should be done in the most economic way
35. Automation Tools
PowerShell / .NET
Supported by SMO & SSIS API
Microsoft creates platforms not only products!
BIML – BI Markup Language
From Varigence
Free with BIDS Helper
Full support with MIST
36. Metadata
Metadata is needed in order to make automation a
repeatable process
Source to Staging Info
Staging to DWH info
Dimension Keys
Dimension & Fact Table relationship
Extended Properties + SQL Server DMVs help to maintain
metadata coherent
38. Unit Testing
Data MUST be tested.
It’s like water, remember?
If trust is lost, DWH is an
#epicfail
39. Unit Testing
Before releasing anything data in the DW must be tested.
User has to validate a sample of data
(e.g.:total invoice amount of January 2012)
That validated value will become the reference value
Before release, the same query will be executed again.
If the data is the expected reference data then test is green
otherwise the test fails
40. Unit Testing
Of course test MUST be automated when possible
Visual Studio
NUnit extensions
NBI
BI.Quality
What to test?
Aggregated results
Specific values of some «special» rule
Fixed bugs/tickets
4
43. Like What You Heard?
Davide will be presenting at PASS Summit 2014!
PreConference:
Agile Data Warehousing: Start to Finish
General Session:
Agile BI: Unit Testing and Continuos Integration
Use discount code 24HOP14
to save $200!
@mauridb
Welcome to 24 hours of PASS: Summit Preview! We’re excited you could join us today for Davide Mauri’s session, Agile Data Warehousing: Start to Finish. This 24 Hours of PASS event consists of 24 consecutive live webcasts. Sessions will be recorded and posted online soon after the event.
My name is Alex Whittles [add brief intro about yourself] and I have a few quick introduction slides before I hand over the reigns to Davide. He will speak for 40-45 minutes and then we’ll move on to the Q&A where you can ask any questions you may have.
[move to next slide]
If you’re having any issues, type your issue into the question pane and someone will assist you.
To maximize your screen, use the zoom button located on the top of the presentation window. Feel free to enter your questions in the Q&A field at any time. The questions pane is located on the right side of your screen. Once we get to the Q&A portion of the session, I’ll read off your questions to the speaker.
Note that there will be a short evaluation at the end of the session, your feedback is important to us so please take a moment to complete it. It will show up on your screen.
[Note to moderators: You need to determine which questions are the most relevant and ask them out loud to the presenter].
I’d like to take a moment to thank our event partners. The staging of 24 Hours of PASS would not be possible without their support and dedication, they are the reason this event is available free of charge.
Thank you to our Presenting Sponsors: Microsoft and Dell Software.
Move to next slide
Next, as you all may now, this 24 Hours of PASS is a preview of PASS Summit 2014, the largest conference for SQL Server and BI professionals. With over 5000 attendees representing 2000 companies, from 52 countries, Summit is a time to share, connect and learn with your peers and industry partners. PASS Summit is not only a week of intensive learning and knowledge sharing that’ll offer strategic insights, it’s a time to network and rub shoulders with industry experts.
Taking place in Seattle, WA from November 4-7, PASS Summit will feature over 190 world-class sessions across 5 topic tracks. These 24 Hours of PASS sessions provide a mere glimpse of what you can expect from PASS Summit. Find out more at www.passSummit.com and if you register by September 26 using discount code 24HOP14, you’ll get $200 off the registration fee.
[move to next slide]
And now, please allow me to present the speakers of the hour: Davide Mauri
[move to next slide, speaker’s presentation]
Like what you heard here?
Davide will be presenting at PASS Summit 2014: catch Davide in his general session, Agile BI: Unit Testing and Continuos Integration and the full presentation of this PreConference, Agile Data Warehousing: Start to Finish at PASS Summit 2014. And don’t forget to use the discount code 24HOP14 to save $200 on PASS Summit registration.
Stay tuned for our next session, DAX Formulas in Action with Alberto Ferrari, happening in a couple of minutes.