Data Visualization is perhaps one of the greatest ways to introduce new users to computer programming. Some of the greatest pedagogic innovations in computer literacy can be traced to making it easy to draw on screen, from turtle graphics and logo to R’s ggplot2, there’s something intensely satisfying about seeing a small bit of code draw a picture on a screen. Often however, the trough of disillusionment comes when users realize that there is a vast array of options regarding how they draw these visualizations on the screen, and seemingly minimal guidance as to how to chose which path effectively.
In this talk, I will take the audience through a journey of over 23 different visualizations, from bar charts and scatterplots through more esoteric visualizations, and discuss the tradeoffs and scenarios in which they are the most relevant visualization. We will also compare how much code it takes to generate these visualizations in a number of environments, and perhaps develop an intuition for which are the right tools for the right job from the buffet of options available to us. We will also cover the importance of making these visualizations fully reproducible, so that provenance is maintained from exploratory analysis through presentation and consumption.
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes
1. 25 VISUALIZATIONS
E D U A R D O A R I Ñ O D E L A R U B I A
C H I E F D ATA S C I E N T I S T
E D U A R D O @ D O M I N O D ATA L A B . C O M
A N “ O U T O F M Y L E A G U E ” P R O D U C T I O N
A N D W H E N T O U S E T H E M
13. A
DISCLAIMER
There are many kinds of data
I am only talking about tabular data.
That is, arranged in a table or
systema7c arrangement by columns,
rows, etc…
There is non-tabular data out there,
like networks and trees and
whatnot. I ain’t messin’ with that.
(Except maps)
C O W A R D L Y S T A T E M E N T
14. STANDING ON THE SHOULDERS OF GIANTS IS NICE…
This presentation is based on the work of Dr. Andrew Abela’s “Extreme Presentation” method, as
well as the Financial Times fantastic Chart Doctor feature. There is a lot of amazing work out there
to help you pick the right way to present your data. None of what I’m saying is my own personal
research. It’s reading other smart peoples stuff and then telling you.
CITATION
15. Product: Open/Flexible + Full-Lifecycle Support
3. Opera&onalize / Deploy
2. Experiment & Harden
Faster
Experimenta&on
More
Collabora&on
Reproducibility &
Audi&ng
Integrate models
into the business
More Time for
Research
AutomaVc Version Control
Environment Management
Sharing and Discussion
Publishing & Deployment
Tools
Data
Code
Compute automaVon
https://app.dominodatalab.com/u/earino/plotcon2016
16. DEVIATION
Emphasize varia7ons (+/-) from a
fixed reference point. Typically
the reference point is zero but it
can also be a target or a long-
term average. Can also be used
to show sen7ment (posi7ve/
neutral/nega7ve).
OUR CATEGORIES
CORRELATION
Show the rela7onship between
two or more variables. Be mindful
that, unless you tell them
otherwise, many readers will
assume the rela7onships you
show them to be causal (i.e. one
causes the other).
RANKING
Use where an item’s posi7on in
an ordered list is more important
than its absolute or rela7ve value.
Don’t be afraid to highlight the
points of interest.
DISTRIBUTION
Show values in a dataset and how
oSen they occur. The shape (or
‘skew’) of a distribu7on can be a
memorable way of highligh7ng
the lack of uniformity or equality
in the data.
17. CHANGE
Give emphasis to changing
trends. These can be short (intra-
day) movements or extended
series traversing decades or
centuries: Choosing the correct
7me period is important to
provide suitable context for the
reader.
OUR CATEGORIES
COMPOSITION
Show how a single en7ty can be
broken down into its component
elements. If the reader’s interest
is solely in the size of the
components, consider a
magnitude-type chart instead.
SPATIAL
Used only when precise loca7ons
or geographical paXerns in data
are more important to the reader
than anything else.
23. SCATTERPLOT
The standard way to show the
rela7onship between two
con7nuous variables, each of
which has its own axis.
C O R R E L A T I O N
24. BUBBLE
Like a scaXerplot, but adds
addi7onal detail by sizing the
circles according to a third
variable and color to a fourth
C O R R E L A T I O N
25. ANIMATED
BUBBLE
Like a scaXerplot, but adds
addi7onal detail by sizing the
circles according to a third
variable and color to a fourth
and anima7on for a fiSh!
C O R R E L A T I O N
26. HEAT MAP
A good way of showing the
paXerns between 2 categories
of data, less good at showing
fine differences in amounts.
Ordering the entries can be
quite powerful!
C O R R E L A T I O N
27. 3Use where an item’s
posi7on in an ordered list is
more important than its
absolute or rela7ve value.
RANKING
33. 4Show values in a dataset
and how oSen they occur.
DISTRIBUTION
34. HISTOGRAM
The standard way to show a
sta7s7cal distribu7on - keep
the gaps between columns
small to highlight the ‘shape’
of the data.
D I S T R I B U T I O N
36. VIOLIN PLOT
Similar to a box plot but more
effec7ve with complex
distribu7ons (data that cannot
be summarized with simple
average).
Also, only nerds understand it
D I S T R I B U T I O N
37. POPULATION
PYRAMID
A standard way for showing
the age and sex breakdown of
a popula7on distribu7on;
effec7vely, back to back
histograms.
D I S T R I B U T I O N
38. 5Give emphasis to changing
trends. These can be short
(intra-day) movements or
extended series
CHANGE
39. Ca
CHANGE
The standard way to
show a changing time
series. If data are
irregular, consider
markers to represent data
points.
LINE CHART
40. Ca
CHANGE
Use to show the
uncertainty in future
projections - usually this
grows the further
forward to projection.
FAN CHART
41. Ca
CHANGE
Use with care – these are
good at showing changes
to total, but seeing
change in components
can be very difficult.
AREA CHART
42. Ca
CHANGE
A great way of showing
temporal patterns (daily,
weekly, monthly) – at the
expense of showing
precision in quantity.
CALENDAR HEAT MAP
43. 6Show how a single
en7ty can be broken
down into its
component elements.
COMPOSITION
44. STACKED
COLUMN
A simple way of showing part-
to-whole rela7onships but can
be difficult to read with more
than a few components.
C O M P O S I T I O N
45. PIE CHART
A common way of showing
part-to-whole data – but be
aware that it’s difficult to
accurately compare the size of
the segments.
C O M P O S I T I O N
46. WAFFLE
Good for showing %
informa7on, they work best
when used on whole numbers
and work well in mul7ple
layout form.
C O M P O S I T I O N
47. 7Used only when precise
loca7ons or geographical
paXerns in data are more
important to the reader
than anything else.
SPATIAL
48. Sa
SPATIAL
A great way of showing
how areas have different
population sizes and
different behaviors, not
distorted by geographic
size.
(tilegramsR is amazing)
POPULATION TILES
49. Sa
SPATIAL
Keeps the overall shape
and layout of the
geography so that it’s
identifiable, yet let’s you
focus on the state or
province level analysis
REGION HEX
51. AND FINALLY…
Gosh there are a lot of choices. You mean you can’t just pick whichever one is prettiest? Well, you
can, it just may not communicate anything to anyone, that’s up to you. Understanding what you’re
trying to communicate, and what the key components of that communication are, makes the
difference between effective and ineffective data visualization.
CONCLUSION
52. THANK YOU
E D U A R D O A R I Ñ O D E L A R U B I A
C H I E F D ATA S C I E N T I S T
D O M I N O D ATA L A B
P L O T LY A N D P L O T C O N A N D A N N A !
H T T P S : // A P P. D O M I N O D A T A L A B . C O M / U / E A R I N O / P L O T C O N 2 0 1 6