SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Stat405
        Graphical theory & critique


                           Hadley Wickham
Tuesday, 19 October 2010
Project
                    • Generally excellent
                    • Common problems: lack of proof
                      reading, lack of flow
                    • If you’re going to throw away 85% of
                      the data, I want to know how that
                      differs from the data that you kept



Tuesday, 19 October 2010
Project

                    • Don’t forget to set up a meeting time
                      with me this week
                    • (I’ll be travelling from Saturday until
                      when the project is due, so if you can’t
                      meet with me this week, email Garrett
                      to set up a time)



Tuesday, 19 October 2010
Tuesday, 19 October 2010
Exploratory graphics
                    Are for you (not others). Need to be able
                    to create rapidly because your first
                    attempt will never be the most revealing.
                    Iteration is crucial for developing the best
                    display of your data.
                    Gives rise to two key questions:



Tuesday, 19 October 2010
What should I plot?
                      How can I plot it?


Tuesday, 19 October 2010
Two general tools
                                 Plot critique toolkit:
                            “graphics are like pumpkin pie”
                                Theory behind ggplot2:
                           “A layered grammar of graphics”



                            plus lots of practice...

Tuesday, 19 October 2010
Graphics are like
                      pumpkin pie

                           The four C’s of critiquing a graphic
Tuesday, 19 October 2010
Content



Tuesday, 19 October 2010
Construction



Tuesday, 19 October 2010
Context
Tuesday, 19 October 2010
Consumption
Tuesday, 19 October 2010
Content
                     What data (variables) does the
                     graph display?
                     What non-data is present?
                     What is pumpkin (essence of the
                     graphic) vs what is spice (useful
                     additional info)?

Tuesday, 19 October 2010
Your turn

                     Identify the data and non-data
                     on “Napoleon's march” and
                     “Building an electoral victory”.
                     Which features are the most
                     important? Which are just
                     useful background information?

Tuesday, 19 October 2010
Results
                     Minard’s march: (top) latitude,
                     longitude, number of troops,
                     direction, branch, city name
                     (bottom) latitude, temperature, date
                     Building an electoral victory: state,
                     number of electoral college votes,
                     winner, margin of victory

Tuesday, 19 October 2010
Construction

                     How many layers are on the plot?
                     What data does each layer
                     display? What sort of geometric
                     object does it use? Is it a summary
                     of the raw data? How are
                     variables mapped to aesthetics?

Tuesday, 19 October 2010
Fo r i a
                                                                   r c bl
                                                                    va
                            Perceptual mapping




                                                                      on e s
                                                                        t in on
                                                                            uo l y !
                                                                              us
                           Best   1. Position along a common scale
                                  2. Position along nonaligned scale
                                  3. Length
                                  4. Angle/slope
                                  5. Area
                                  6. Volume
                           Worst 7. Colour


Tuesday, 19 October 2010
Your turn
                     Answer the following questions
                     for “Napoleon's march” and
                     “Flight delays”:
                     How many layers are on the plot?
                     What data does the layer
                     display? How does it display it?

Tuesday, 19 October 2010
Results
                     Napoleon’s march: (top) (1) path plot with width
                     mapped to number of troops, colour to direction,
                     separate group for each branch (2) labels giving
                     city names (bottom) (1) line plot with longitude on
                     x-axis and temperature on y-axis (2) text labels
                     giving dates
                     Flight delays: (1) white circles showing 100%
                     cancellation, (2) outline of states, (3) points with
                     size proportional to percent cancellations at each
                     airport.


Tuesday, 19 October 2010
Can the explain
                           composition of a graphic
                           in words, but how do we
                                  create it?



Tuesday, 19 October 2010
“If any number of
                           magnitudes are each
                           the same multiple of
                           the same number of
                           other magnitudes,
                           then the sum is that
                           multiple of the sum.”
                           Euclid, ~300 BC




Tuesday, 19 October 2010
“If any number of
                                 magnitudes are each
                                 the same multiple of
                                 the same number of
                                 other magnitudes,
                                 then the sum is that
                                 multiple of the sum.”
                                 Euclid, ~300 BC




                           m(Σx) = Σ(mx)
Tuesday, 19 October 2010
The grammar of graphics
                    An abstraction which makes thinking about,
                    reasoning about and communicating
                    graphics easier.
                    Developed by Leland Wilkinson, particularly
                    in “The Grammar of Graphics” 1999/2005
                    You’ve been using it in ggplot2 without
                    knowing it! But to do more, you need to
                    learn more about the theory.


Tuesday, 19 October 2010
What is a layer?
                    • Data
                    • Mappings from variables to aesthetics
                      (aes)
                    • A geometric object (geom)
                    • A statistical transformation (stat)
                    • A position adjustment (position)


Tuesday, 19 October 2010
layer(geom, stat, position, data, mapping, ...)

     layer(
       data = mpg,
       mapping = aes(x = displ, y = hwy),
       geom = "point",
       stat = "identity",
       position = "identity"
     )

     layer(
       data = diamonds,
       mapping = aes(x = carat),
       geom = "bar",
       stat = "bin",
       position = "stack"
     )

Tuesday, 19 October 2010
# A lot of typing!

     layer(
       data = mpg,
       mapping = aes(x = displ, y = hwy),
       geom = "point",
       stat = "identity",
       position = "identity"
     )

     # Every geom has an associated default statistic
     # (and vice versa), and position adjustment.

     geom_point(aes(displ, hwy), data = mpg)
     geom_histogram(aes(displ), data = mpg)
Tuesday, 19 October 2010
# To actually create the plot
     ggplot() +
       geom_point(aes(displ, hwy), data = mpg)

     ggplot() +
       geom_histogram(aes(displ), data = mpg)




Tuesday, 19 October 2010
# Multiple layers
     ggplot() +
       geom_point(aes(displ, hwy), data = mpg) +
       geom_smooth(aes(displ, hwy), data = mpg)

     # Avoid redundancy:
     ggplot(mpg, aes(displ, hwy)) +
       geom_point() +
       geom_smooth()




Tuesday, 19 October 2010
# Different layers can have different aesthetics
     ggplot(mpg, aes(displ, hwy)) +
       geom_point(aes(colour = class)) +
       geom_smooth()

     ggplot(mpg, aes(displ, hwy)) +
       geom_point(aes(colour = class)) +
       geom_smooth(aes(group = class), method = "lm",
         se = F)




Tuesday, 19 October 2010
Your turn
                    For each of the following plots created with
                    qplot, recreate the equivalent ggplot code.
                    qplot(price, carat, data = diamonds)
                    qplot(hwy, cty, data = mpg, geom = "jitter")
                    qplot(reorder(class, hwy), hwy, data = mpg,
                      geom = c("jitter", "boxplot"))
                    qplot(log10(price), log10(carat),
                    data = diamonds), colour = color) +
                    geom_smooth(method = "lm")


Tuesday, 19 October 2010
ggplot(diamonds, aes(price, data)) +
       geom_smooth()

     gglot(mpg, aes(hwy, cty)) +
       geom_jitter()

     ggplot(mpg, aes(reorder(class, hwy), hwy)) +
       geom_jitter() +
       geom_boxplot()

     ggplot(diamonds, aes(log10(price), log10(carat),
       colour = color)) +
       geom_point() +
       geom_smooth(method = "lm")

Tuesday, 19 October 2010
More geoms & stats

                    See http://had.co.nz/ggplot2 for complete
                    list with helpful icons:
                    Geoms: (0d) point, (1d) line, path, (2d)
                    boxplot, bar, tile, text, polygon
                    Stats: bin, summary, sum



Tuesday, 19 October 2010
Your turn

                    Go back to the descriptions of “Minard’s
                    march” and “Flight delays” that you
                    created before. Start converting your
                    textual description to ggplot2 code.




Tuesday, 19 October 2010

Mais conteúdo relacionado

Mais de Hadley Wickham (20)

23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
15 time-space
15 time-space15 time-space
15 time-space
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 

16 critique

  • 1. Stat405 Graphical theory & critique Hadley Wickham Tuesday, 19 October 2010
  • 2. Project • Generally excellent • Common problems: lack of proof reading, lack of flow • If you’re going to throw away 85% of the data, I want to know how that differs from the data that you kept Tuesday, 19 October 2010
  • 3. Project • Don’t forget to set up a meeting time with me this week • (I’ll be travelling from Saturday until when the project is due, so if you can’t meet with me this week, email Garrett to set up a time) Tuesday, 19 October 2010
  • 5. Exploratory graphics Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing. Iteration is crucial for developing the best display of your data. Gives rise to two key questions: Tuesday, 19 October 2010
  • 6. What should I plot? How can I plot it? Tuesday, 19 October 2010
  • 7. Two general tools Plot critique toolkit: “graphics are like pumpkin pie” Theory behind ggplot2: “A layered grammar of graphics” plus lots of practice... Tuesday, 19 October 2010
  • 8. Graphics are like pumpkin pie The four C’s of critiquing a graphic Tuesday, 19 October 2010
  • 13. Content What data (variables) does the graph display? What non-data is present? What is pumpkin (essence of the graphic) vs what is spice (useful additional info)? Tuesday, 19 October 2010
  • 14. Your turn Identify the data and non-data on “Napoleon's march” and “Building an electoral victory”. Which features are the most important? Which are just useful background information? Tuesday, 19 October 2010
  • 15. Results Minard’s march: (top) latitude, longitude, number of troops, direction, branch, city name (bottom) latitude, temperature, date Building an electoral victory: state, number of electoral college votes, winner, margin of victory Tuesday, 19 October 2010
  • 16. Construction How many layers are on the plot? What data does each layer display? What sort of geometric object does it use? Is it a summary of the raw data? How are variables mapped to aesthetics? Tuesday, 19 October 2010
  • 17. Fo r i a r c bl va Perceptual mapping on e s t in on uo l y ! us Best 1. Position along a common scale 2. Position along nonaligned scale 3. Length 4. Angle/slope 5. Area 6. Volume Worst 7. Colour Tuesday, 19 October 2010
  • 18. Your turn Answer the following questions for “Napoleon's march” and “Flight delays”: How many layers are on the plot? What data does the layer display? How does it display it? Tuesday, 19 October 2010
  • 19. Results Napoleon’s march: (top) (1) path plot with width mapped to number of troops, colour to direction, separate group for each branch (2) labels giving city names (bottom) (1) line plot with longitude on x-axis and temperature on y-axis (2) text labels giving dates Flight delays: (1) white circles showing 100% cancellation, (2) outline of states, (3) points with size proportional to percent cancellations at each airport. Tuesday, 19 October 2010
  • 20. Can the explain composition of a graphic in words, but how do we create it? Tuesday, 19 October 2010
  • 21. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC Tuesday, 19 October 2010
  • 22. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC m(Σx) = Σ(mx) Tuesday, 19 October 2010
  • 23. The grammar of graphics An abstraction which makes thinking about, reasoning about and communicating graphics easier. Developed by Leland Wilkinson, particularly in “The Grammar of Graphics” 1999/2005 You’ve been using it in ggplot2 without knowing it! But to do more, you need to learn more about the theory. Tuesday, 19 October 2010
  • 24. What is a layer? • Data • Mappings from variables to aesthetics (aes) • A geometric object (geom) • A statistical transformation (stat) • A position adjustment (position) Tuesday, 19 October 2010
  • 25. layer(geom, stat, position, data, mapping, ...) layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity" ) layer( data = diamonds, mapping = aes(x = carat), geom = "bar", stat = "bin", position = "stack" ) Tuesday, 19 October 2010
  • 26. # A lot of typing! layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity" ) # Every geom has an associated default statistic # (and vice versa), and position adjustment. geom_point(aes(displ, hwy), data = mpg) geom_histogram(aes(displ), data = mpg) Tuesday, 19 October 2010
  • 27. # To actually create the plot ggplot() + geom_point(aes(displ, hwy), data = mpg) ggplot() + geom_histogram(aes(displ), data = mpg) Tuesday, 19 October 2010
  • 28. # Multiple layers ggplot() + geom_point(aes(displ, hwy), data = mpg) + geom_smooth(aes(displ, hwy), data = mpg) # Avoid redundancy: ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() Tuesday, 19 October 2010
  • 29. # Different layers can have different aesthetics ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth() ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(aes(group = class), method = "lm", se = F) Tuesday, 19 October 2010
  • 30. Your turn For each of the following plots created with qplot, recreate the equivalent ggplot code. qplot(price, carat, data = diamonds) qplot(hwy, cty, data = mpg, geom = "jitter") qplot(reorder(class, hwy), hwy, data = mpg, geom = c("jitter", "boxplot")) qplot(log10(price), log10(carat), data = diamonds), colour = color) + geom_smooth(method = "lm") Tuesday, 19 October 2010
  • 31. ggplot(diamonds, aes(price, data)) + geom_smooth() gglot(mpg, aes(hwy, cty)) + geom_jitter() ggplot(mpg, aes(reorder(class, hwy), hwy)) + geom_jitter() + geom_boxplot() ggplot(diamonds, aes(log10(price), log10(carat), colour = color)) + geom_point() + geom_smooth(method = "lm") Tuesday, 19 October 2010
  • 32. More geoms & stats See http://had.co.nz/ggplot2 for complete list with helpful icons: Geoms: (0d) point, (1d) line, path, (2d) boxplot, bar, tile, text, polygon Stats: bin, summary, sum Tuesday, 19 October 2010
  • 33. Your turn Go back to the descriptions of “Minard’s march” and “Flight delays” that you created before. Start converting your textual description to ggplot2 code. Tuesday, 19 October 2010