REFERENCES
#1. RStudio Official Documentations (Help & Cheat Sheet)
Free Webpage) https://www.rstudio.com/resources/cheatsheets/
#2. Wickham, H. and Grolemund, G., 2016.R for data science: import, tidy, transform, visualize, and model data. O'Reilly.
Free Webpage) https://r4ds.had.co.nz/
Cf) Tidyverse syntax (www.tidyverse.org), rather than R Base syntax
Cf) Hadley Wickham: Chief Scientist at RStudio. Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University
3. First example)
engine size (엔진 크기) vs. fuel usage (연료 소모량)
• Research question (연구 질문)
• Do cars with big engines use more fuel than cars with small engines?
• 엔진이 큰 차가 엔진이 작은 차보다 연료 소모량이 큰가?
4. 내장된 데이터) mpg data
Cf) Miles per Gallon (MPG): 기름 1갤런당 몇 마일 가나
mpg
#> # A tibble: 234 x 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto(… f 18 29 p comp…
#> 2 audi a4 1.8 1999 4 manua… f 21 29 p comp…
#> 3 audi a4 2 2008 4 manua… f 20 31 p comp…
#> 4 audi a4 2 2008 4 auto(… f 21 30 p comp…
#> 5 audi a4 2.8 1999 6 auto(… f 16 26 p comp…
#> 6 audi a4 2.8 1999 6 manua… f 18 26 p comp…
#> # ... with 228 more rows
help(mpg)
5. mpg {ggplot2} R Documentation
Fuel economy data from 1999 and 2008 for 38 popular models of car
Description
This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which
had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
Usage
mpg
Format
A data frame with 234 rows and 11 variables
manufacturer
model
model name
displ
engine displacement, in litres
year
year of manufacture
cyl
number of cylinders
trans
type of transmission
drv
f = front-wheel drive, r = rear wheel drive, 4 = 4wd
cty
city miles per gallon
hwy
highway miles per gallon
fl
fuel type
class
"type" of car
6. Let's plot first in a 2-D plane(2차원 평면): x- & y-axis(축)
x = displ, y = hwy
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
12. ggplot2
• ggplot2 is based on the grammar of
graphics(그래픽 문법), the idea that you
can build every graph from the same
components: a data set, a coordinate
system(좌표계), and geoms(기하,
도형)—visual marks that represent data
points.
• To display (화면에 표현하다) values (값),
map (대응시키다) variables (변수) in the
data to visual properties of the geom
(기하, 도형 = aesthetics 미학) like
size(크기), color(색), and x and y
locations(위치, 좌표).
13. Let's plot first in a 2-D plane(2차원 평면): x- & y-axis(축)
x = displ, y = hwy
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
14. Let's map (대응시키다) a 3rd dimension (차원) in color(색상)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
15. Let's map (대응시키다) a 3rd dimension (차원) in size (크기)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
16. Let's map (대응시키다) a 3rd dimension (차원) in alpha
(transparency, 투명도)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
17. Let's map (대응시키다) a 3rd dimension (차원) in shape(모양)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
18. Let's map (대응시키다) a 3rd dimension (차원) in shape(모양)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
19.
20. A calculated variable (계산한 변수) as a 3rd dimension (차원)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))
21.
22. Cf) Common problem: syntax error~!
e.g.) Location of "+"
ggplot(data = mpg)
+ geom_point(mapping = aes(x = displ, y = hwy))
23. Cf) Common problem
aesthetic mapping (미학적 대응) is for variables (변수) in the data
e.g.) "blue" is not a variable
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
25. Let's map (대응시키다) a 3rd dimension (차원) in color(색상)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
26. facet (면): subplots that each display one subset of the data
facet_wrap(~VariableName)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
27. facet (면): subplots that each display one subset of the data
facet_grid(VariabeName4Row ~ VariabeName4Column)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
28. facet (면): subplots that each display one subset of the data
facet_grid(VariabeName4Row ~ VariabeName4Column)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
29. facet (면): subplots that each display one subset of the data
facet_grid(VariabeName4Row ~ VariabeName4Column)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
30. facet (면): subplots that each display one subset of the data
facet_grid(VariabeName4Row ~ VariabeName4Column)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
31. facet (면): subplots that each display one subset of the data
facet_grid(VariabeName4Row ~ VariabeName4Column)
-> use a variable with more unique (고유) levels (단계, 범주)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(trans ~ drv)
32. facet (면): subplots that each display one subset of the data
facet_grid(VariabeName4Row ~ VariabeName4Column)
-> use a variable with more unique (고유) levels (단계, 범주)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ trans)
47. 내장된 데이터) diamond data
> diamonds
# A tibble: 53,940 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
# ... with 53,930 more rows
> help(diamonds)
48. diamonds {ggplot2} R Documentation
Prices of 50,000 round cut diamonds
Description
A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:
Usage
diamonds
Format
A data frame with 53940 rows and 10 variables:
price
price in US dollars ($326–$18,823)
carat
weight of the diamond (0.2–5.01)
cut
quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color
diamond colour, from J (worst) to D (best)
clarity
a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
x
length in mm (0–10.74)
y
width in mm (0–58.9)
z
depth in mm (0–31.8)
depth
total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
table
width of top of diamond relative to widest point (43–95)
49. Plot in a 2-D plane(2차원 평면): x- & y-axis(축)
geom_bar
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
55. Plot in a 2-D plane(2차원 평면): x- & y-axis(축)
geom_bar(): ..prop..
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
56. The data after the statistical transformation
demo <- tribble(
~cut, ~freq,
"Fair", 1610,
"Good", 4906,
"Very Good", 12082,
"Premium", 13791,
"Ideal", 21551
)
demo
# # A tibble: 5 x 2
# cut freq
# <chr> <dbl>
# 1 Fair 1610
# 2 Good 4906
# 3 Very Good 12082
# 4 Premium 13791
# 5 Ideal 21551
57. Plot in a 2-D plane(2차원 평면): x- & y-axis(축)
The data after the statistical transformation
ggplot(data = demo) +
geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")
77. the grammar of graphics
= a formal system for building plots
78. the grammar of graphics
= a formal system for building plots
79. the grammar of graphics
= a formal system for building plots
80. REFERENCES
#1. RStudio Official Documentations (Help & Cheat Sheet)
Free Webpage) https://www.rstudio.com/resources/cheatsheets/
#2. Wickham, H. and Grolemund, G., 2016.R for data science:
import, tidy, transform, visualize, and model data. O'Reilly.
Free Webpage) https://r4ds.had.co.nz/
Cf) Tidyverse syntax (www.tidyverse.org), rather than R Base
syntax
Cf) Hadley Wickham: Chief Scientist at RStudio. Adjunct
Professor of Statistics at the University of Auckland, Stanford
University, and Rice University