The document provides an introduction to using PowerPivot in Excel to perform advanced analytics and improve investigative analysis. It discusses how PowerPivot allows handling large datasets, more advanced calculated fields, and formatting that stays consistent when metrics are added or removed from pivots. The document demonstrates several examples of advanced measures and calculations that can be created in PowerPivot, including distinguishing unique campaigns, calculating metrics based on filters, and creating custom segments and bands. It also provides resources for learning more about PowerPivot capabilities.
Schema on read is obsolete. Welcome metaprogramming..pdf
How to Sharpen Your Investigative Analysis with PowerPivot
1. How to Sharpen Your
Investigative Analysis
with the New Excel
(a PowerPivot intro)
Carmen Mardiros - navabi GmbH
DA Hub 2015
2. Core component of
the Microsoft
self-serve BI stack
Fast and intelligent data modelling for the Excel pro.
!
Stack also includes PowerQuery (getting and
cleaning data), PowerView (reporting) and
PowerBI (online report publishing).
!
Integrated in Excel In many ways it feels very familiar (especially if you
use pivot tables and charts extensively).
It’s FREE Well, as long as you have Excel Professional Plus
2010 or 2013.
!
Highly recommended: Get the 2013 64-bit version
What is Power Pivot?
3. How PowerPivot
helps you to become
a better analyst
Analytics tools can’t substitute you. But they can
help you to become more efficient, unlock your true
potential and get the recognition you deserve.
!
Today is about lots of examples.
!
Ready to use
formulas
Not enough time to break all formulas down or
explain how PowerPivot works in detail, but will
explain how and when to use them, what to change
about them, how your data must look like for them
to work.
Resources to develop
your PowerPivot
skills
A few titles to help you build upon what you’ve
learned today.
What today is about
4. After today, pivot tables will
never look the
same again.
!
So what’s wrong with Excel
anyway?
9. 3. Must re-create formatting every time you
add a metric to a regular pivot…
Takes 8 clicks to set the formatting for Transactions and
2 more to change the title.
!
Remove it from the pivot and add it again? Start all over.
!
Every. Single. Time
10. … in PowerPivot you change once and
formatting stays the same
12. PowerQuery: Getting multiple CSVs into
PowerPivot
Connectors for many databases, Facebook, Salesforce,
Hadoop, feeds, Excel files, CSV files etc
and very soon Google Analytics
13. PowerQuery: Getting multiple CSVs into
PowerPivot
PowerQuery has its own language as well as intuitive UI.
!
We use formula to get keep only 1 header
from our folder of cdv files
14. PowerQuery: Getting multiple CSVs into
PowerPivot
let
Source = Folder.Files(“C:UsersmooDesktopdahub"),
Tables = List.Transform(Source[Content], each
Table.PromoteHeaders(Csv.Document(_,null,null,null,1252))),
SingleTable = Table.Combine(Tables)
in
SingleTable
15. PowerQuery: Getting multiple CSVs into
PowerPivot
CSV files are combined on the fly into a single table
NOTE: *All* CSV files must have the same structure
20. Has calculated columns like Excel but that’s
where similarity ends
NOTE: avoid using calculated columns
unless you absolutely have to.
!
They are very costly in terms of performance as
they are stored in memory.
21. Portable “measures” are the unit of work
for PowerPivot
# Sessions:=SUM('dahub_sessions'[sessions])
special
equal
operator
keep this explicit
and eye-friendly full column reference
that is being summarised
22. Every measure is simply a building block
% Conversion Rate:=
[# Transactions]/[# Sessions]
23. Allows you to build
sophisticated
formulas
Each measure is made up of other measures.
PowerPivot resolves all the dependencies and
calculates them in the right order.
One change, trickles
through entire
reporting
If the name is ‘visits’ and your field is now
‘sessions’, you make 1 change and all your
measures update like magic.
Calculated on the fly,
not stored in
memory
Until you actually use them in a pivot, they add no
performance overhead. Maintainability heaven at no
extra cost.
Why measures are so amazing
30. # Unique Campaigns min 1000 sessions:=
CALCULATE(
[# Unique Campaigns],
FILTER(
VALUES('dahub_sessions'[campaign_id]),
[# Sessions] >= 1000
)
)
This is the formula…
but don’t try to take it in yet
31. First, PowerPivot sets the pivot coordinates
and calculates the “base” measure # Sessions
32. Then, *before* calculating [# Unique Campaigns],
it adds an additional filter that keeps only
campaigns that fit the criteria.
33. # Unique Campaigns min 1000 sessions:=
CALCULATE(
[# Unique Campaigns],
FILTER(
VALUES('dahub_sessions'[campaign_id]),
[# Sessions] >= 1000
)
)
Let’s break the formula down….
1. Pivot coordinates are set and underlying data
filtered accordingly.
2. Additional FILTER is applied
3. And only *afterwards* [# Unique Campaigns] is
calculated
34. !
What % of
all campaigns
are bringing in a
minimum of 100
sessions
each day?
35. Variations
Campaigns / ad
group / keywords /
landing pages
[# Sessions] >= 50
!
Size of your effectively active SEO/PPC portfolio and
how that changes over time.
Use Cost per
Conversion instead
If you have cost in your data, create a [£ Cost per
Conversion] and swap [# Sessions] with it.
!
Monitor the number of adgroups / keywords exceeding
the maximum budget
Combine multiple
conditions in the
FILTER
FILTER(
VALUES('dahub_sessions'[keyword_id]),
[£ Cost per Conversion] >= 50
&& [# Clicks] >= 10
)
36. More variations…
Campaigns and
channels bringing
most of the high
spenders
Which channels or campaigns bring the highest number
of transactions over a certain Revenue threshold?
!
(requires that you have a dataset with source_medium,
campaign, and transaction_id and you create a measure
# Unique Transactions using transaction_id)
Campaigns and
channels bringing
*predominantly* high
spenders
If you have cost in your data, create a [£ Cost per
Conversion] and swap [# Sessions] with it.
!
Monitor the number of adgroups / keywords exceeding
the maximum budget
37. # Unique Transactions min £500:=
CALCULATE(
[# Unique Transactions],
FILTER(
VALUES(‘dahub_sessions'[transaction_id]),
[£ Transaction Revenue] >= 500
)
)
Let’s break the formula down….
NOTE: In pivot you need source_medium and / or
campaign on rows and you need transaction_id in
your data
38. Banding
The problem: too many unique values to analyse.
!
The solution: creating dynamic groups to “cluster”
very granular data into a small number of groups
42. CALCULATE is a super SUMIF
The single most powerful feature in PowerPivot
# Sessions Branded:=
CALCULATE(
[# Sessions],
'dahub_sessions'[brand_group] = "brand"
)
!
# Sessions Non Branded:=
CALCULATE(
[# Sessions],
'dahub_sessions'[brand_group] = "non brand"
)
This is a CALCULATE filter that gets added
*before* [# Sessions] is calculated
43. CALCULATE allows segmentation you could
never do before
If a CALCULATE filter is on a column
that’s already in pivot, it gets overridden.
Remove the column from pivot and
calculation still works!
44. CALCULATE filters have countless uses
Determine hidden
biases in AB testing
See if your variations had a comparable % of branded /
non branded traffic which might skew the results.
!
Also works with device, mobile traffic and any other
dimension you might have in your data.
Works best when you
create custom
“clusters” using
SWITCH
The formulas work with any dimension in your dataset
but if you really want to unlock CALCULATE’s filtering
potential, it really pays to create custom calculated
columns using the SWITCH formula.
Create horizontal
conversion funnels
CALCULATE is the essential building block for taking
conversion funnel analysis to the net step
45. Step 1. The right data for Horizontal Funnels
To get a Funnel Step column you need to create
segments for each step in your web analytics tool and
export them as CSV file. Then, import into PowerPivot
using Power Query and the multiple CSV import method.
46. Step 1. The right data for Horizontal Funnels
You need these segments:
!
All Sessions (unsegmented)
Category Pages
Products
Add to Basket
Basket
Secure Login
Address
Confirm Order
Payment
47. Step 2. Use CALCULATE on each funnel step
# Sessions All:=
CALCULATE(
[# Sessions Funnel],
'dahub_funnel'[funnel_step] = "All"
)
!
…
!
# Sessions Payment:=
CALCULATE(
[# Sessions Funnel],
'dahub_funnel'[funnel_step] = "Payment"
)
Create a new measure for each step in the funnel
48. Step 2. Use CALCULATE on each funnel step
This allows you to create custom “goals” on the fly
out of *ANY* segment
49. Step 3. Create ratios for each funnel step
% Sessions Address:=
DIVIDE([# Sessions Address], [# Sessions All])
Use All Sessions as a base for division:
Use previous funnel step as a base for division:
% Sessions Address progress:=
DIVIDE([# Sessions Address], [# Sessions Secure Login])
50. Step 4. Add measures to Pivot and analyse
You can use *ANY* dimension you have available
in your dataset on rows. Here, it’s Landing Page
but you can use date, channel dimensions, device
etc.
!
Can EVEN add an additional segmentation level
like user type (newly acquired, loyal etc)
51. Resources
Best book for PowerPivot
novices with gradual learning
curve.
!
By the end it gets pretty
advanced. You learn about
relationships, how to model
multiple tables, time
intelligence functions and
much more.
52. Resources
All in one reference for
formulas for almost any
scenario. All explained and
broken down.
!
You need a good
understanding of PowerPivot
to begin with so don’t get this
first.
54. Bonus - Lifecycle metrics
Essential for comparing business entities (users,
customers) as well as assets (content, landing pages,
promos etc)
55. Step 1. Find First Date for each landing page
First Date Landing Page:=
CALCULATE(
MIN('dahub_sessions'[session_date]),
ALL('dahub_sessions'[session_date]),
VALUES('dahub_sessions'[landing_page_id])
)
56. Step 2. Find [# Sessions] on first day
# Sessions in first day:=
CALCULATE(
[# Sessions],
FILTER(
ALL('dahub_sessions'[session_date]),
'dahub_sessions'[session_date] = [First Date Landing Page]
),
VALUES('dahub_sessions'[landing_page_id])
)
57. Step 3. Find [# Sessions] in first 7 days
# Sessions in first 7 days:=
CALCULATE(
[# Sessions],
FILTER(
ALL('dahub_sessions'[session_date]),
'dahub_sessions'[session_date] >= [First Date Landing Page]
&& 'dahub_sessions'[session_date] <= [First Date Landing Page] + 7
),
VALUES('dahub_sessions'[landing_page_id])
)