SlideShare a Scribd company logo
1 of 44
Download to read offline
Stat405   Graphics for large data


                           Hadley Wickham
Thursday, 26 August 2010
Majoring in Stat

                    • Declare early (even if you’re not sure)
                    • Weekly lunches
                    • Summer opportunities
                      (research & internships)




Thursday, 26 August 2010
1. Leftovers from last lecture
                2. The diamonds data
                3. Histograms and bar charts
                4. More boxplots and scatterplots
                5. Homework



Thursday, 26 August 2010
# Remember: start with                                                                                                            ●                            ●


             library(ggplot2)                                                                                                                                           ●

       40

                                                                                                                                                                                    ●

                                                                                                                                                            ●
                                                                                                                                           ●
                                                                                                                                                                                         ●    ●
       35
                                                                                                                                                   ●
                                                                                                                                                                                ●
                                                                                                                                                                    ●
                                                                                                                         ●                 ●       ●
                                                                                                                    ●                                           ●
                                                                                                                    ●                                                                   ●          ●
                                                                                                                     ●       ●                                                           ●
                                                                                                                                                                                ●
       30                                                                                                                     ●   ●                                                               ●

                                                                                                                          ● ● ●       ●●           ●       ●●                  ● ● ● ●
                                                                                                                                                                               ●
                                                                                                                                                                            ● ●●
                                                                                                                                                                             ●
                                                                                                                                           ●                        ●       ●●  ●
                                                                                                                              ●                                     ●             ●
                                                                                                                         ● ● ●                 ●
 hwy




                                                                                                                            ●                                               ●                ● ●
                                                                                                                                                                                             ● ●
                                                                    ●                                                ● ●     ●●                                         ●   ●
                                                        ●                                               ●             ●● ●
                                                                                                                      ●        ●
                                                                                                                                 ●         ●●              ●●                   ●        ●
                                                                                                            ●       ● ●
                                                                                                                    ● ●
                                                                                                                    ●         ●●●          ● ●                                   ●        ●     ●●
                                                                                                                    ●                                               ●                          ●
                                                                                                                                                   ●                            ● ●
       25                                           ●       ●                                     ●                          ● ●●
                                                                                                                                                                                         ●●
                                                                                                                                                                                         ●
                                                                                                                                                                                               ●
                                                                                                                                                                                               ●
                                                ●                            ●                ●                                                    ●                 ●              ●    ●
                                                                                                                ●                              ●
                                                                                     ●●                              ●                                              ●
                                                                                                            ●                ●
                                                                ●                         ●                                                    ●                                               ●
                                                                         ●
                                   ●                    ●                                 ●                                                                     ●
                                                                                 ●        ●                                                                      ●
                                                                         ●
                                                                                                                                                       ●
                      ●        ●
       20                          ●
                                           ●     ●
                                               ● ●          ●●      ●                                                                                           ●
                   ●                                        ●●      ●●
                 ●●        ●                ●                       ●●
                                                  ●      ●
                                       ●      ●
                   ●                          ●● ● ●●●
                    ●     ●
                          ●    ●           ● ● ● ●●
                      ●  ●     ●           ●●● ●     ●     ●                     ●
                  ● ● ●
                   ●                           ●● ●     ●  ●
                  ●● ●        ●                        ● ●
                        ●
                 ●                            ● ●
       15        ● ●        ●                ● ●        ● ●
                                                 ●
                                                 ●


                  ●            ●   ●                 ●          ●




                      pickup                         suv                     minivan                  2seater            midsize           subcompact                           compact
qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter")
                            hwy, data
Thursday, 26 August 2010
●           ●




                                                                       ●

       40


                                                                                   ●


       35                                                                          ●




       30
 hwy




                                ●
                                ●

       25                       ●
                                ●
                                ●
                           ●


       20


                                        ●


       15


                           ●    ●



                      pickup   suv   minivan   2seater   midsize   subcompact   compact
qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot")
                           reorder(class, =
Thursday, 26 August 2010
●
                                                                                                                                                       ●                            ●
                                                                                                                                                                                    ●




                                                                                                                                                       ●           ●

       40


                                                                                                                                                                                    ● ●
                                                                                                                                                   ● ●

       35                                                                                                                                                                  ●        ● ●
                                                                                                                                                       ●
                                                                                                                                                                                           ●
                                                                                                                                                                   ●
                                                                                                                                ●             ● ● ●
                                                                                                               ●                                                           ●                 ●
                                                                                                                       ●        ●●                                                       ●
       30                                                                                                      ●           ●    ●                                              ●
                                                                                                                                                                              ●●
                                                                                                                                                                                        ●●
                                                                                                                                ●
                                                                                                                               ● ●                                 ●                      ●●
                                                                                                                                          ●   ●●●                           ●● ●●       ● ●●
                                                                                                                                   ●
                                                                                                                       ●       ●               ●                           ●
 hwy




                                                                                                                                ●
                                                                                                                                ●                              ●
                                                                                                               ●                 ●             ●                                    ●● ●●
                                                         ● ●                                                                                                               ●           ●
                                                                                                                   ●     ●            ●                                                 ●
                                                 ●                                                             ●        ●
                                                                                                                        ● ●          ●    ●   ●●●                  ●              ●●
                                                         ●                                             ●               ●●            ●
                                                                                                                                    ●●                                        ●      ●●
                                                                                                                                                                                      ●
                                                                                               ●                   ●    ●
                                                                                                                        ●           ●       ●   ●●                              ●
                                                                                                           ●
       25                                    ●           ●         ●                                               ●                ●                                        ● ●● ●
                                                                                                                                     ●
                                                                                                                                     ●                             ●       ●            ●
                                                         ●             ●                                   ●                              ●                ●
                                                                                                                                                           ●                         ●●
                                                                       ●       ●   ●                                   ●                                       ●
                                                               ●                                                                                                             ●
                                             ●           ●                                         ●               ●                                       ●
                                                                       ●                   ●
                           ●                                               ●
                           ●                             ●                 ●
                                                                                           ●                                                       ●           ●
                                                                                                                                                                       ●
                                                                                       ●
                                   ●          ●
       20          ●       ●               ●● ●
                                            ●        ●
                                                     ●                                                                                                 ●
                                                    ●● ● ●●
                  ●                ●● ●    ● ●       ●    ●
                                    ●
                                    ●          ●   ● ●  ●
                                                 ●
                                                 ●     ●
                             ●                  ●      ●                       ●
                 ● ●●
                    ●     ● ●●             ●●●●● ● ● ●
                                             ●
                                              ● ●● ● ● ● ●
                                                 ●                             ●
                      ● ● ●                            ●
                       ●                       ●
                  ●   ●
                      ●    ●
                         ●                   ●       ●
       15         ●     ●    ●               ●        ●●           ●
                                                     ●
                                                     ●

                                   ●             ●
                      ●        ●                         ●
                                                         ●
                                       ●



qplot(reorder(class,minivan
       pickup suv
                      hwy), 2seater data = subcompact
                                  hwy, midsize mpg,                                                                                                                            compact
  geom = c("jitter", "boxplot"))
                         reorder(class, hwy)
Thursday, 26 August 2010
Your turn

                    Read the help for reorder. Redraw the
                    previous plots with class ordered by
                    median hwy.
                    How would you put the jittered points on
                    top of the boxplots?




Thursday, 26 August 2010
Diamonds



Thursday, 26 August 2010
Diamonds data
                    ~54,000 round diamonds from
                    http://www.diamondse.info/
                    Carat, colour, clarity, cut
                    Total depth, table, depth,
                    width, height
                    Price


Thursday, 26 August 2010
x
                                   table width




                                                           z




                               depth = z / diameter
                           table = table width / x * 100

Thursday, 26 August 2010
Recall

                    Write down five ways to inspect the
                    diamonds dataset.
                    You have one minute!




Thursday, 26 August 2010
Your turn


                    Inspect the data and familiarise yourself
                    with the variables. If you don’t know what
                    they mean, look them up on wikipedia.




Thursday, 26 August 2010
Histogram &
                            bar charts


Thursday, 26 August 2010
Histograms and
                              barcharts

                    Used to display the distribution of a
                    variable
                    Categorical variable → bar chart
                    Continuous variable → histogram




Thursday, 26 August 2010
Always
     experiment with
      the bin width!
Thursday, 26 August 2010
Examples
                # With only one variable, qplot guesses that
                # you want a bar chart or histogram
                qplot(cut, data = diamonds)

                qplot(carat, data = diamonds)
                qplot(carat, data = diamonds, binwidth = 1)
                qplot(carat, data = diamonds, binwidth = 0.1)
                qplot(carat, data = diamonds, binwidth = 0.01)
                resolution(diamonds$carat)

                last_plot() + xlim(0, 3)


Thursday, 26 August 2010
Examples
                # With only one variable, qplot guesses that
                # you want a bar chart or histogram
                qplot(cut, data = diamonds)

                qplot(carat, data = diamonds)
                qplot(carat, data = diamonds, binwidth = 1)
                      Common ggplot2
                qplot(carat, data = diamonds, binwidth = 0.1)
                      technique: adding
                qplot(carat, data = diamonds, binwidth = 0.01)
                         together plot
                resolution(diamonds$carat)
                           components

                last_plot() + xlim(0, 3)


Thursday, 26 August 2010
qplot(table, data = diamonds, binwidth = 1)

     # To zoom in on a plot region   use xlim() and ylim()
     qplot(table, data = diamonds,   binwidth = 1) +
        xlim(50, 70)
     qplot(table, data = diamonds,   binwidth = 0.1) +
       xlim(50, 70)
     qplot(table, data = diamonds,   binwidth = 0.1) +
       xlim(50, 70) + ylim(0, 50)

     # Note that this type of zooming discards data
     outside of the plot regions
     # See coord_cartesian() for an alternative


Thursday, 26 August 2010
Additional variables

                    As with scatterplots can use aesthetics
                    or faceting. Using aesthetics creates
                    pretty, but ineffective, plots.
                    The following examples show the
                    difference, when investigation the
                    relationship between cut and depth.



Thursday, 26 August 2010
4000




         3000
 count




         2000




         1000




            0

                           56   58   60   62   64   66   68   70
qplot(depth, data = diamonds, binwidth = 0.2)
                          depth
Thursday, 26 August 2010
4000




         3000

                                            cut
                                                  Fair
                                                  Good
 count




         2000                                     Very Good
                                                  Premium
                                                  Ideal



         1000




            0

qplot(depth, data = diamonds, binwidth = 0.2,
        56   58   60    62  64  66   68   70
  fill = cut) + xlim(55, 70)
                      depth
Thursday, 26 August 2010
4000




         3000

                                            cut
                                                  Fair
                                                  Good
 count




         2000                                     Very Good
                                                  Premium
                                                  Ideal



         1000




         Fill is the aesthetic
           0
             for fill colour
qplot(depth, data = diamonds, binwidth = 0.2,
        56   58   60    62  64  66   68   70
  fill = cut) + xlim(55, 70)
                      depth
Thursday, 26 August 2010
Fair    Good               Very Good


         2500

         2000

         1500

         1000

         500

            0
 count




                           Premium   Ideal


         2500

         2000

         1500

         1000

         500

            0
qplot(depth, 62 64 66= 68 70 56 58 60 binwidth = 0.2) +
       56 58 60 data    diamonds, 62 64 66 68 70 56 58 60   62 64 66 68 70

  xlim(55, 70) + facet_wrap(~depth    cut)
Thursday, 26 August 2010
Your turn

                    Explore the distribution of price.
                    How does it vary with colour, or cut, and
                    clarity?
                    Practice zooming in on regions of interest.




Thursday, 26 August 2010
Box and
                           whisker plots


Thursday, 26 August 2010
Boxplots

                    Less information than a histogram, but
                    take up much less space.
                    Already seen them used with discrete x
                    values. Can also use with continuous x
                    values, by specifying how we want the
                    data grouped.



Thursday, 26 August 2010
qplot(table, price, data = diamonds)
Thursday, 26 August 2010
●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
         15000                       ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●




         10000
 price




         5000




                           50   60       70   80   90
qplot(table, price, data = diamonds, geom = "boxplot")
                             table
Thursday, 26 August 2010
●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●   ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●     ●
                            ●       ●   ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●   ●
                                                ●   ●   ●   ●   ●   ●
                                    ●   ●   ●       ●   ●   ●   ●
                            ● ●     ●
                                    ●   ●
                                        ●
                                        ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●   ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●
                                                ●   ●
                                                    ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●   ●   ●   ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●   ●   ●
                              ●     ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●   ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                                    ●
                                    ●   ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●
                                                                ●   ●           ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●     ●
                                                                          ●     ●
                                    ●   ●   ●
                                            ●   ●
                                                ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                               ●
                               ●    ●   ●   ●   ●   ●   ●
                                                        ●   ●   ●   ●
                                                                    ●   ● ●
                               ●    ●   ●
                                        ●   ●
                                            ●   ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                           ●   ●
                             ● ●    ●   ●   ●   ●
                                                ●   ●
                                                    ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●           ●
                               ●    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●           ●●
                                                                                ●
                               ●    ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●
                                    ●   ●   ●       ●
                                                    ●   ●   ●           ● ●
         15000                 ●
                               ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                        ●
                                            ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●
                                                ●
                                                    ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●
                                                        ●
                                                            ●
                                                            ●
                                                            ●
                                                                ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●
                                                                    ●   ●
                                                                        ●
                                                                          ●         ●
                                                                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                             ●      ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●       ●   ●
                                    ●
                                    ●   ●   ●   ●   ●   ●       ●
                                                                ●
                                                                ●   ●           ●
                                    ●
                                    ●   ●   ●   ●
                                                ●   ●           ●
                                    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●           ●
                                                                ●       ●
                                ●   ●
                                    ●   ●   ●   ●   ●           ●   ●   ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●
                                                                    ●   ●       ●●●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●           ●
                                ●   ●
                                    ●   ●   ●   ●
                                                ●               ●
                                                                ●   ●       ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●               ●
                                                                ●   ●   ●         ●
                                        ●   ●   ●               ●
                                                                ●   ●
                                                                    ●   ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●   ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●       ●●●     ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●
                                                                        ●
                                    ●   ●   ●
                                            ●   ●               ●
                                                                ●   ●   ●   ●           ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●                   ●
                                    ●   ●   ●
                                            ●   ●                   ●   ●           ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●   ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●
                                                                    ●   ●   ●   ●
                                ●
                                ●   ●   ●   ●                           ●   ●
                                    ●   ●   ●                           ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●
                                                                            ●   ●● ●        ●
                                    ●
                                    ●   ●   ●                           ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●   ●
                                ●
                                ●   ●   ●   ●
                                            ●                           ●         ● ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                           ●
                                ●   ●   ●
                                        ●
                                        ●   ●
                                            ●                                     ●
         10000                  ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                            ●                                   ●
 price




                                ●   ●
                                    ●   ●
                                        ●
                                ●
                                ●   ●   ●                                           ●
                                ●   ●
                                    ●   ●
                                        ●
                                ●   ●
                                    ●   ●
                                        ●
                                    ●
                                    ●   ●
                                        ●                                           ●
                                    ●
                                    ●   ●
                                    ●
                                    ●
                                    ●
                                    ●




         5000




qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Thursday, 26 August 2010
●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●   ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●     ●
                            ●       ●   ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●   ●
                                                ●   ●   ●   ●   ●   ●
                                    ●   ●   ●       ●   ●   ●   ●
                            ● ●     ●
                                    ●   ●
                                        ●
                                        ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●   ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●
                                                ●   ●
                                                    ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●   ●   ●   ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●   ●   ●
                              ●     ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●   ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                                    ●
                                    ●   ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●
                                                                ●   ●           ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●     ●
                                                                          ●     ●
                                    ●   ●   ●
                                            ●   ●
                                                ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                               ●
                               ●    ●   ●   ●   ●   ●   ●
                                                        ●   ●   ●   ●
                                                                    ●   ● ●
                               ●    ●   ●
                                        ●   ●
                                            ●   ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                           ●   ●
                             ● ●    ●   ●   ●   ●
                                                ●   ●
                                                    ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●           ●
                               ●    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●           ●●
                                                                                ●
                               ●    ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●
                                    ●   ●   ●       ●
                                                    ●   ●   ●           ● ●
         15000                 ●
                               ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                        ●
                                            ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●
                                                ●
                                                    ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●
                                                        ●
                                                            ●
                                                            ●
                                                            ●
                                                                ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●
                                                                    ●   ●
                                                                        ●
                                                                          ●         ●
                                                                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                             ●      ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●       ●   ●
                                    ●
                                    ●   ●   ●   ●   ●   ●       ●
                                                                ●
                                                                ●   ●           ●
                                    ●
                                    ●   ●   ●   ●
                                                ●   ●           ●
                                    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●           ●
                                                                ●       ●
                                ●   ●
                                    ●   ●   ●   ●   ●           ●   ●   ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●
                                                                    ●   ●       ●●●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●           ●
                                ●   ●
                                    ●   ●   ●   ●
                                                ●               ●
                                                                ●   ●       ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●               ●
                                                                ●   ●   ●         ●
                                        ●   ●   ●               ●
                                                                ●   ●
                                                                    ●   ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●   ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●       ●●●     ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●
                                                                        ●
                                    ●   ●   ●
                                            ●   ●               ●
                                                                ●   ●   ●   ●           ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●                   ●
                                    ●   ●   ●
                                            ●   ●                   ●   ●           ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●   ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●
                                                                    ●   ●   ●   ●
                                ●
                                ●   ●   ●   ●                           ●   ●
                                    ●   ●   ●                           ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●
                                                                            ●   ●● ●        ●
                                    ●
                                    ●   ●   ●                           ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●   ●
                                ●
                                ●   ●   ●   ●
                                            ●                           ●         ● ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                           ●
                                ●   ●   ●
                                        ●
                                        ●   ●
                                            ●                                     ●
         10000                  ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                            ●                                   ●
 price




                                ●   ●
                                    ●   ●
                                        ●
                                ●
                                ●   ●   ●                                           ●
                                ●   ●
                                    ●   ●
                                        ●
                                ●   ●
                                    ●   ●
                                        ●
                                    ●
                                    ●   ●
                                        ●                                           ●
                                    ●
                                    ●   ●
                                    ●
                                    ●
                                    ●
                                    ●




         5000




     One boxplot for
    each unique value
     of this aesthetic
qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Thursday, 26 August 2010
Scatterplots



Thursday, 26 August 2010
Interpreting a
                             scatterplot

                    • Global patterns
                    • Local patterns
                    • Deviations




Thursday, 26 August 2010
Thursday, 26 August 2010
Strong linear relationship.
               A number of outliers.




Thursday, 26 August 2010
Thursday, 26 August 2010
Unusual striations. Two
                           groups? Little relationship
                           between table and price?




Thursday, 26 August 2010
Thursday, 26 August 2010
Curved (exponential?)
                           relationship. Outliers mostly
                           cheaper than expected.


Thursday, 26 August 2010
But what’s the
                               problem with
                            all these plots?


qplot(carat, price, data = diamonds)
Thursday, 26 August 2010
But what’s the
                               problem with
                            all these plots?
                               In pairs, brainstorm
                           solutions for 2 minutes.

qplot(carat, price, data = diamonds)
Thursday, 26 August 2010
Idea             ggplot
                     Small points        shape = I(".")

                   Transparency         alpha = I(1/50)

                           Jittering    geom = "jitter"

                  Smooth curve          geom = "smooth"
                                        geom = "bin2d" or
                           2d bins         geom = "hex"

             Density contours          geom = "density2d"
Thursday, 26 August 2010
Your turn

                    Practice doing these plots yourself.
                    Read the online documentation for each
                    plot type: http://had.co.nz/ggplot2




Thursday, 26 August 2010
Homework

                    Practice your graphics/data exploration
                    skills with the diamonds or mpg data.
                    Due in one week.
                    Make sure to read the grading rubric, and
                    find a colour printer.



Thursday, 26 August 2010
Asking questions

                    You have two minutes to write down as
                    many questions as you can come up with
                    that you might want to answer about the
                    diamonds data.
                    Write your best question on a piece of
                    paper and turn it in.



Thursday, 26 August 2010

More Related Content

Similar to 02 Large

Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1rusersla
 
Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Hadley Wickham
 
Over Visie, Missie En Strategie
Over Visie, Missie En StrategieOver Visie, Missie En Strategie
Over Visie, Missie En StrategieGuus Vos
 
About Vision, Mission And Strategy
About Vision, Mission And StrategyAbout Vision, Mission And Strategy
About Vision, Mission And StrategyGuus Vos
 
How People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It MattersHow People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It MattersMat Morrison
 
研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤックaiesecsfc_icx2011
 
研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤックaiesecsfc_icx2011
 
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Anparasu
 
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Anparasu
 
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Anparasu
 

Similar to 02 Large (20)

04 Wrapup
04 Wrapup04 Wrapup
04 Wrapup
 
08 Continuous
08 Continuous08 Continuous
08 Continuous
 
08 Continuous
08 Continuous08 Continuous
08 Continuous
 
13 Bivariate
13 Bivariate13 Bivariate
13 Bivariate
 
Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1
 
1 basics
1 basics1 basics
1 basics
 
Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)
 
Over Visie, Missie En Strategie
Over Visie, Missie En StrategieOver Visie, Missie En Strategie
Over Visie, Missie En Strategie
 
About Vision, Mission And Strategy
About Vision, Mission And StrategyAbout Vision, Mission And Strategy
About Vision, Mission And Strategy
 
01 Intro
01 Intro01 Intro
01 Intro
 
How People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It MattersHow People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It Matters
 
14 case-study
14 case-study14 case-study
14 case-study
 
研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック
 
21 Ml
21 Ml21 Ml
21 Ml
 
研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック
 
17 polishing
17 polishing17 polishing
17 polishing
 
17 Sampling Dist
17 Sampling Dist17 Sampling Dist
17 Sampling Dist
 
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)
 
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)
 
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)
 

More from Hadley Wickham (20)

27 development
27 development27 development
27 development
 
24 modelling
24 modelling24 modelling
24 modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

02 Large

  • 1. Stat405 Graphics for large data Hadley Wickham Thursday, 26 August 2010
  • 2. Majoring in Stat • Declare early (even if you’re not sure) • Weekly lunches • Summer opportunities (research & internships) Thursday, 26 August 2010
  • 3. 1. Leftovers from last lecture 2. The diamonds data 3. Histograms and bar charts 4. More boxplots and scatterplots 5. Homework Thursday, 26 August 2010
  • 4. # Remember: start with ● ● library(ggplot2) ● 40 ● ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● 25 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter") hwy, data Thursday, 26 August 2010
  • 5. ● ● 40 ● 35 ● 30 hwy ● ● 25 ● ● ● ● 20 ● 15 ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot") reorder(class, = Thursday, 26 August 2010
  • 6. ● ● ● ● ● 40 ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 30 ● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● hwy ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● 25 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● qplot(reorder(class,minivan pickup suv hwy), 2seater data = subcompact hwy, midsize mpg, compact geom = c("jitter", "boxplot")) reorder(class, hwy) Thursday, 26 August 2010
  • 7. Your turn Read the help for reorder. Redraw the previous plots with class ordered by median hwy. How would you put the jittered points on top of the boxplots? Thursday, 26 August 2010
  • 9. Diamonds data ~54,000 round diamonds from http://www.diamondse.info/ Carat, colour, clarity, cut Total depth, table, depth, width, height Price Thursday, 26 August 2010
  • 10. x table width z depth = z / diameter table = table width / x * 100 Thursday, 26 August 2010
  • 11. Recall Write down five ways to inspect the diamonds dataset. You have one minute! Thursday, 26 August 2010
  • 12. Your turn Inspect the data and familiarise yourself with the variables. If you don’t know what they mean, look them up on wikipedia. Thursday, 26 August 2010
  • 13. Histogram & bar charts Thursday, 26 August 2010
  • 14. Histograms and barcharts Used to display the distribution of a variable Categorical variable → bar chart Continuous variable → histogram Thursday, 26 August 2010
  • 15. Always experiment with the bin width! Thursday, 26 August 2010
  • 16. Examples # With only one variable, qplot guesses that # you want a bar chart or histogram qplot(cut, data = diamonds) qplot(carat, data = diamonds) qplot(carat, data = diamonds, binwidth = 1) qplot(carat, data = diamonds, binwidth = 0.1) qplot(carat, data = diamonds, binwidth = 0.01) resolution(diamonds$carat) last_plot() + xlim(0, 3) Thursday, 26 August 2010
  • 17. Examples # With only one variable, qplot guesses that # you want a bar chart or histogram qplot(cut, data = diamonds) qplot(carat, data = diamonds) qplot(carat, data = diamonds, binwidth = 1) Common ggplot2 qplot(carat, data = diamonds, binwidth = 0.1) technique: adding qplot(carat, data = diamonds, binwidth = 0.01) together plot resolution(diamonds$carat) components last_plot() + xlim(0, 3) Thursday, 26 August 2010
  • 18. qplot(table, data = diamonds, binwidth = 1) # To zoom in on a plot region use xlim() and ylim() qplot(table, data = diamonds, binwidth = 1) + xlim(50, 70) qplot(table, data = diamonds, binwidth = 0.1) + xlim(50, 70) qplot(table, data = diamonds, binwidth = 0.1) + xlim(50, 70) + ylim(0, 50) # Note that this type of zooming discards data outside of the plot regions # See coord_cartesian() for an alternative Thursday, 26 August 2010
  • 19. Additional variables As with scatterplots can use aesthetics or faceting. Using aesthetics creates pretty, but ineffective, plots. The following examples show the difference, when investigation the relationship between cut and depth. Thursday, 26 August 2010
  • 20. 4000 3000 count 2000 1000 0 56 58 60 62 64 66 68 70 qplot(depth, data = diamonds, binwidth = 0.2) depth Thursday, 26 August 2010
  • 21. 4000 3000 cut Fair Good count 2000 Very Good Premium Ideal 1000 0 qplot(depth, data = diamonds, binwidth = 0.2, 56 58 60 62 64 66 68 70 fill = cut) + xlim(55, 70) depth Thursday, 26 August 2010
  • 22. 4000 3000 cut Fair Good count 2000 Very Good Premium Ideal 1000 Fill is the aesthetic 0 for fill colour qplot(depth, data = diamonds, binwidth = 0.2, 56 58 60 62 64 66 68 70 fill = cut) + xlim(55, 70) depth Thursday, 26 August 2010
  • 23. Fair Good Very Good 2500 2000 1500 1000 500 0 count Premium Ideal 2500 2000 1500 1000 500 0 qplot(depth, 62 64 66= 68 70 56 58 60 binwidth = 0.2) + 56 58 60 data diamonds, 62 64 66 68 70 56 58 60 62 64 66 68 70 xlim(55, 70) + facet_wrap(~depth cut) Thursday, 26 August 2010
  • 24. Your turn Explore the distribution of price. How does it vary with colour, or cut, and clarity? Practice zooming in on regions of interest. Thursday, 26 August 2010
  • 25. Box and whisker plots Thursday, 26 August 2010
  • 26. Boxplots Less information than a histogram, but take up much less space. Already seen them used with discrete x values. Can also use with continuous x values, by specifying how we want the data grouped. Thursday, 26 August 2010
  • 27. qplot(table, price, data = diamonds) Thursday, 26 August 2010
  • 28. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 price 5000 50 60 70 80 90 qplot(table, price, data = diamonds, geom = "boxplot") table Thursday, 26 August 2010
  • 29. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Thursday, 26 August 2010
  • 30. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 One boxplot for each unique value of this aesthetic qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Thursday, 26 August 2010
  • 32. Interpreting a scatterplot • Global patterns • Local patterns • Deviations Thursday, 26 August 2010
  • 34. Strong linear relationship. A number of outliers. Thursday, 26 August 2010
  • 36. Unusual striations. Two groups? Little relationship between table and price? Thursday, 26 August 2010
  • 38. Curved (exponential?) relationship. Outliers mostly cheaper than expected. Thursday, 26 August 2010
  • 39. But what’s the problem with all these plots? qplot(carat, price, data = diamonds) Thursday, 26 August 2010
  • 40. But what’s the problem with all these plots? In pairs, brainstorm solutions for 2 minutes. qplot(carat, price, data = diamonds) Thursday, 26 August 2010
  • 41. Idea ggplot Small points shape = I(".") Transparency alpha = I(1/50) Jittering geom = "jitter" Smooth curve geom = "smooth" geom = "bin2d" or 2d bins geom = "hex" Density contours geom = "density2d" Thursday, 26 August 2010
  • 42. Your turn Practice doing these plots yourself. Read the online documentation for each plot type: http://had.co.nz/ggplot2 Thursday, 26 August 2010
  • 43. Homework Practice your graphics/data exploration skills with the diamonds or mpg data. Due in one week. Make sure to read the grading rubric, and find a colour printer. Thursday, 26 August 2010
  • 44. Asking questions You have two minutes to write down as many questions as you can come up with that you might want to answer about the diamonds data. Write your best question on a piece of paper and turn it in. Thursday, 26 August 2010