SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Stat405
              Visualising time & space


                             Hadley Wickham
Wednesday, 21 October 2009
1. New data: baby names by state
               2. Visualise time (done!)
               3. Visualise time conditional on space
               4. Visualise space
               5. Visualise space conditional on time



Wednesday, 21 October 2009
Baby names by state
                   Top 100 male and female baby names
                   for each state, 1960–2008.
                   480,000 records (100 * 50 * 2 * 48)
                   Slightly different variables: state, year,
                   name, sex and number.
                   To keep the data manageable, we will
                   look at the top 25 names of all time.

                                           CC BY http://www.flickr.com/photos/the_light_show/2586781132
Wednesday, 21 October 2009
Subset
                   Easier to compare states if we have
                   proportions. To calculate proportions,
                   need births. But could only find data from
                   1981.
                   Then selected 30 names that occurred
                   fairly frequently, and had interesting
                   patterns.


Wednesday, 21 October 2009
Aaron Alex Allison Alyssa Angela Ashley
                   Carlos Chelsea Christian Eric Evan
                   Gabriel Jacob Jared Jennifer Jonathan
                   Juan Katherine Kelsey Kevin Matthew
                   Michelle Natalie Nicholas Noah Rebecca
                   Sara Sarah Taylor Thomas




Wednesday, 21 October 2009
Getting started

               library(ggplot2)
               library(plyr)

               bnames <- read.csv("interesting-names.csv",
                 stringsAsFactors = F)

               matthew <- subset(bnames, name == "Matthew")




Wednesday, 21 October 2009
Time | Space



Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990          1995   2000   2005
                                           year
Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990   1995   2000     2005
qplot(year, prop, data = matthew,year
                                  geom = "line", group = state)
Wednesday, 21 October 2009
AK        AL         AR         AZ          CA        CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                    DE        FL         GA         HI          IA        ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                    KS        KY         LA         MA          MD        ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                   MO        MS          MT         NC          NE        NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                    NV        NY         OH         OK          OR        PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                    SD        TN         TX         UT          VA        VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                   WV        WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005
                                                         year
Wednesday, 21 October 2009
AK        AL         AR         AZ         CA         CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                    DE        FL         GA         HI         IA         ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                    KS        KY         LA         MA         MD         ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                   MO        MS          MT         NC         NE         NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                    NV        NY         OH         OK         OR         PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                    SD        TN         TX         UT         VA         VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                   WV        WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005

last_plot() + facet_wrap(~ state)year
Wednesday, 21 October 2009
Your turn

                   Pick some names out of the list and
                   explore. What do you see?
                   Can you write a function that plots the
                   trend for a given name?




Wednesday, 21 October 2009
show_name <- function(name) {
       name <- bnames[bnames$name == name, ]
       qplot(year, prop, data = name, geom = "line",
         group = state)
     }

     show_name("Jessica")
     show_name("Aaron")
     show_name("Juan") + facet_wrap(~ state)




Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990          1995   2000   2005
                                           year
Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990         =          group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                 = F,
Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01


                         So we only get one smooth
                             for the whole dataset
qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990          =         group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                   = F,
Wednesday, 21 October 2009
Two useful tools
                   Smoothing: can be easier to perceive
                   overall trend by smoothing individual
                   functions
                   Indexing: remove initial differences by
                   indexing to the first value.
                   Not that useful here, but good tools to
                   have in your toolbox.


Wednesday, 21 October 2009
library(mgcv)
     smooth <- function(y, x) {
       as.numeric(predict(gam(y ~ s(x))))
     }

     matthew <- ddply(matthew, "state", transform,
       prop_s = smooth(prop, year))

     qplot(year, prop_s, data = matthew, geom = "line",
       group = state)




Wednesday, 21 October 2009
index <- function(y, x) {
       y / y[order(x)[1]]
     }

     matthew <- ddply(matthew, "state", transform,
       prop_i = index(prop, year),
       prop_si = index(prop_s, year))

     qplot(year, prop_i, data = matthew, geom = "line",
       group = state)
     qplot(year, prop_si, data = matthew, geom = "line",
       group = state)


Wednesday, 21 October 2009
Your turn
                   Create a plot to show all names
                   simultaneously. Does smoothing every
                   name in every state make it easier to see
                   patterns?
                   Hint: run the following R code on the next
                   slide to eliminate names with less than 10
                   years of data


Wednesday, 21 October 2009
longterm <- ddply(bnames, c("name", "state"),
     function(df) {
        if (nrow(df) > 10) {
          df
        }
     })




Wednesday, 21 October 2009
qplot(year, prop, data = bnames, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)

     longterm <- ddply(longterm, c("name", "state"),
       transform, prop_s = smooth(prop, year))

     qplot(year, prop_s, data = longterm, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)
     last_plot() + facet_wrap(scales = "free_y")



Wednesday, 21 October 2009
Space



Wednesday, 21 October 2009
Spatial plots

                   Choropleth map:
                   map colour of areas to value.
                   Proportional symbol map:
                   map size of symbols to value




Wednesday, 21 October 2009
juan2000 <- subset(bnames, name == "Juan" & year == 2000)

     # Turn map data into normal data frame
     library(maps)
     states <- map_data("state")
     states$state <- state.abb[match(states$region,
       tolower(state.name))]

     # Merge and then restore original order
     choropleth <- merge(juan2000, states,
       by = "state", all.y = T)
     choropleth <- choropleth[order(choropleth$order), ]

     # Plot with polygons
     qplot(long, lat, data = choropleth, geom = "polygon",
       fill = prop, group = group)

Wednesday, 21 October 2009
45




       40
                                                                 prop
                                                                        0.004
                                                                        0.006
 lat




                                                                        0.008
       35
                                                                         0.01




       30




                       −120   −110   −100      −90   −80   −70
                                        long
Wednesday, 21 October 2009
What’s the problem
                                                   with this map?
       45                                      How could we fix it?


       40
                                                                     prop
                                                                            0.004
                                                                            0.006
 lat




                                                                            0.008
       35
                                                                             0.01




       30




                       −120   −110   −100      −90    −80    −70
                                        long
Wednesday, 21 October 2009
ggplot(choropleth, aes(long, lat, group = group)) +
       geom_polygon(fill = "white", colour = "grey50") +
       geom_polygon(aes(fill = prop))




Wednesday, 21 October 2009
45




       40
                                                                 prop
                                                                        0.004
                                                                        0.006
 lat




                                                                        0.008
       35
                                                                         0.01




       30




                       −120   −110   −100      −90   −80   −70
                                        long
Wednesday, 21 October 2009
Problems?

                   What are the problems with this sort of
                   plot?
                   Take one minute to brainstorm some
                   possible issues.




Wednesday, 21 October 2009
Problems
                   Big areas most striking. But in the US (as
                   with most countries) big areas tend to
                   least populated. Most populated areas
                   tend to be small and dense - e.g. the East
                   coast.
                   (Another computational problem: need to
                   push around a lot of data to create these
                   plots)


Wednesday, 21 October 2009
●




       45
                        ●

                                                                                 ●




                                                                                         ●
                                                  ●




       40                                                     ●
                                                                                     ●



                                            ●                                                      prop
                                 ●                    ●
                                                                                                    ●     0.004
                             ●                                                                     ●      0.006
 lat




                                                      ●                     ●
                                                                                                   ●      0.008
       35
                                     ●      ●                                                      ●      0.010

                                                                       ●




                                                ●
       30

                                                                   ●




                       −120          −110       −100         −90           −80               −70
                                                      long
Wednesday, 21 October 2009
mid_range <- function(x) mean(range(x))
     centres <- ddply(states, c("state"), summarise,
       lat = mid_range(lat), long = mid_range(long))

     bubble <- merge(juan2000, centres, by = "state")
     qplot(long, lat, data = bubble,
       size = prop, colour = prop)

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey50") +
       geom_point(aes(size = prop, colour = prop))


Wednesday, 21 October 2009
Your turn


                   Replicate either a choropleth or a
                   proportional symbol map with the name
                   of your choice.




Wednesday, 21 October 2009
Space | Time



Wednesday, 21 October 2009
Wednesday, 21 October 2009
Your turn


                   Try and create this plot yourself. What is
                   the main difference between this plot and
                   the previous?




Wednesday, 21 October 2009
juan <- subset(bnames, name == "Juan")
     bubble <- merge(juan, centres, by = "state")

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey50") +
       geom_point(aes(size = prop, colour = prop)) +
       facet_wrap(~ year)




Wednesday, 21 October 2009
Aside: geographic data

                   Boundaries for most countries available
                   from: http://gadm.org
                   To use with ggplot2, use the fortify
                   function to convert to usual data frame.
                   Will also need to install the sp package.



Wednesday, 21 October 2009
# install.packages("sp")

     library(sp)
     load(url("http://gadm.org/data/rda/CHE_adm1.RData"))

     head(as.data.frame(gadm))
     ch <- fortify(gadm, region = "ID_1")
     str(ch)

     qplot(long, lat, group = group, data = ch,
       geom = "polygon", colour = I("white"))



Wednesday, 21 October 2009
Wednesday, 21 October 2009
This work is licensed under the Creative
      Commons Attribution-Noncommercial 3.0 United
      States License. To view a copy of this license,
      visit http://creativecommons.org/licenses/by-nc/
      3.0/us/ or send a letter to Creative Commons,
      171 Second Street, Suite 300, San Francisco,
      California, 94105, USA.




Wednesday, 21 October 2009

Mais conteúdo relacionado

Mais de Hadley Wickham (20)

22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
06 data
06 data06 data
06 data
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
04 reports
04 reports04 reports
04 reports
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

15 Time Space

  • 1. Stat405 Visualising time & space Hadley Wickham Wednesday, 21 October 2009
  • 2. 1. New data: baby names by state 2. Visualise time (done!) 3. Visualise time conditional on space 4. Visualise space 5. Visualise space conditional on time Wednesday, 21 October 2009
  • 3. Baby names by state Top 100 male and female baby names for each state, 1960–2008. 480,000 records (100 * 50 * 2 * 48) Slightly different variables: state, year, name, sex and number. To keep the data manageable, we will look at the top 25 names of all time. CC BY http://www.flickr.com/photos/the_light_show/2586781132 Wednesday, 21 October 2009
  • 4. Subset Easier to compare states if we have proportions. To calculate proportions, need births. But could only find data from 1981. Then selected 30 names that occurred fairly frequently, and had interesting patterns. Wednesday, 21 October 2009
  • 5. Aaron Alex Allison Alyssa Angela Ashley Carlos Chelsea Christian Eric Evan Gabriel Jacob Jared Jennifer Jonathan Juan Katherine Kelsey Kevin Matthew Michelle Natalie Nicholas Noah Rebecca Sara Sarah Taylor Thomas Wednesday, 21 October 2009
  • 6. Getting started library(ggplot2) library(plyr) bnames <- read.csv("interesting-names.csv", stringsAsFactors = F) matthew <- subset(bnames, name == "Matthew") Wednesday, 21 October 2009
  • 7. Time | Space Wednesday, 21 October 2009
  • 8. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Wednesday, 21 October 2009
  • 9. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 qplot(year, prop, data = matthew,year geom = "line", group = state) Wednesday, 21 October 2009
  • 10. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 year Wednesday, 21 October 2009
  • 11. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 last_plot() + facet_wrap(~ state)year Wednesday, 21 October 2009
  • 12. Your turn Pick some names out of the list and explore. What do you see? Can you write a function that plots the trend for a given name? Wednesday, 21 October 2009
  • 13. show_name <- function(name) { name <- bnames[bnames$name == name, ] qplot(year, prop, data = name, geom = "line", group = state) } show_name("Jessica") show_name("Aaron") show_name("Juan") + facet_wrap(~ state) Wednesday, 21 October 2009
  • 14. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Wednesday, 21 October 2009
  • 15. 0.04 0.03 prop 0.02 0.01 qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Wednesday, 21 October 2009
  • 16. 0.04 0.03 prop 0.02 0.01 So we only get one smooth for the whole dataset qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Wednesday, 21 October 2009
  • 17. Two useful tools Smoothing: can be easier to perceive overall trend by smoothing individual functions Indexing: remove initial differences by indexing to the first value. Not that useful here, but good tools to have in your toolbox. Wednesday, 21 October 2009
  • 18. library(mgcv) smooth <- function(y, x) { as.numeric(predict(gam(y ~ s(x)))) } matthew <- ddply(matthew, "state", transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = matthew, geom = "line", group = state) Wednesday, 21 October 2009
  • 19. index <- function(y, x) { y / y[order(x)[1]] } matthew <- ddply(matthew, "state", transform, prop_i = index(prop, year), prop_si = index(prop_s, year)) qplot(year, prop_i, data = matthew, geom = "line", group = state) qplot(year, prop_si, data = matthew, geom = "line", group = state) Wednesday, 21 October 2009
  • 20. Your turn Create a plot to show all names simultaneously. Does smoothing every name in every state make it easier to see patterns? Hint: run the following R code on the next slide to eliminate names with less than 10 years of data Wednesday, 21 October 2009
  • 21. longterm <- ddply(bnames, c("name", "state"), function(df) { if (nrow(df) > 10) { df } }) Wednesday, 21 October 2009
  • 22. qplot(year, prop, data = bnames, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) longterm <- ddply(longterm, c("name", "state"), transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = longterm, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) last_plot() + facet_wrap(scales = "free_y") Wednesday, 21 October 2009
  • 24. Spatial plots Choropleth map: map colour of areas to value. Proportional symbol map: map size of symbols to value Wednesday, 21 October 2009
  • 25. juan2000 <- subset(bnames, name == "Juan" & year == 2000) # Turn map data into normal data frame library(maps) states <- map_data("state") states$state <- state.abb[match(states$region, tolower(state.name))] # Merge and then restore original order choropleth <- merge(juan2000, states, by = "state", all.y = T) choropleth <- choropleth[order(choropleth$order), ] # Plot with polygons qplot(long, lat, data = choropleth, geom = "polygon", fill = prop, group = group) Wednesday, 21 October 2009
  • 26. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 27. What’s the problem with this map? 45 How could we fix it? 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 28. ggplot(choropleth, aes(long, lat, group = group)) + geom_polygon(fill = "white", colour = "grey50") + geom_polygon(aes(fill = prop)) Wednesday, 21 October 2009
  • 29. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 30. Problems? What are the problems with this sort of plot? Take one minute to brainstorm some possible issues. Wednesday, 21 October 2009
  • 31. Problems Big areas most striking. But in the US (as with most countries) big areas tend to least populated. Most populated areas tend to be small and dense - e.g. the East coast. (Another computational problem: need to push around a lot of data to create these plots) Wednesday, 21 October 2009
  • 32. 45 ● ● ● ● 40 ● ● ● prop ● ● ● 0.004 ● ● 0.006 lat ● ● ● 0.008 35 ● ● ● 0.010 ● ● 30 ● −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 33. mid_range <- function(x) mean(range(x)) centres <- ddply(states, c("state"), summarise, lat = mid_range(lat), long = mid_range(long)) bubble <- merge(juan2000, centres, by = "state") qplot(long, lat, data = bubble, size = prop, colour = prop) ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) Wednesday, 21 October 2009
  • 34. Your turn Replicate either a choropleth or a proportional symbol map with the name of your choice. Wednesday, 21 October 2009
  • 35. Space | Time Wednesday, 21 October 2009
  • 37. Your turn Try and create this plot yourself. What is the main difference between this plot and the previous? Wednesday, 21 October 2009
  • 38. juan <- subset(bnames, name == "Juan") bubble <- merge(juan, centres, by = "state") ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) + facet_wrap(~ year) Wednesday, 21 October 2009
  • 39. Aside: geographic data Boundaries for most countries available from: http://gadm.org To use with ggplot2, use the fortify function to convert to usual data frame. Will also need to install the sp package. Wednesday, 21 October 2009
  • 40. # install.packages("sp") library(sp) load(url("http://gadm.org/data/rda/CHE_adm1.RData")) head(as.data.frame(gadm)) ch <- fortify(gadm, region = "ID_1") str(ch) qplot(long, lat, group = group, data = ch, geom = "polygon", colour = I("white")) Wednesday, 21 October 2009
  • 42. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Wednesday, 21 October 2009