SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
An Introduction to Mapping and Spatial Modelling R
(preview version)
By and © Richard Harris, School of Geographical Sciences, University of Bristol

An Introduction to Mapping and Modelling R (preview version) by Richard Harris is licensed under
a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Based on a work at www.social-statistics.org.
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:
Attribution — You must attribute the work in the following manner: Based on An Introduction to
Mapping and Spatial Modelling R by Richard Harris (www.social-statistics.org).
Noncommercial — You may not use this work for commercial purposes. Use for education in a
recognised higher education institution (a College or University) is permissible.
Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting
work only under the same or similar license to this one.
With the understanding that:
Waiver — Any of the above conditions can be waived if you get permission from the copyright
holder (Richard Harris, rich.harris@bris.ac.uk)
Public Domain — Where the work or any of its elements is in the public domain under applicable
law, that status is in no way affected by the license.
Other Rights — In no way are any of the following rights affected by the license:
Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
The author's moral rights;
Rights other persons may have either in the work itself or in how the work is used, such as publicity
or privacy rights.
Notice — For any reuse or distribution, you must make clear to others the license terms of this
work which applies also to derivatives.
(Document version 0.1, November, 2013. Draft preview version.)

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

1
Introduction and contents
This document presents a short introduction to R highlighting some geographical functionality.
Specifically, it provides:
•
•

A short 'showcase' of using R for data analysis and mapping (Session 2)

•

Further information about how R works (Session 3)

•

Guidance on how to use R as a simple GIS (Session 4)

•
10

A basic introduction to R (Session 1)

Details on how to create a spatial weights matrix (Session 5)

•

An introduction to spatial regression modelling including Geographically Weighted
Regression (Session 6)

Further sessions will be added in the months (more likely, years) ahead.
The document is provided in good faith and the contents have been tested by the author. However,
use is entirely as the user's risk. Absolutely no responsibility or liability is accepted by the author
for consequences arising from this document howsoever it is used. It is is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (see above).
Before starting the following should be considered.
First, you will notice that in this document the pages and, more unusually, the lines are numbered.
The reason is educational: it makes directing a class to a specific part of a page easier and faster. For
other readers, the line numbers can be ignored.
20

Second, the sessions presume that, as well as R, a number of additional R packages (libraries) have
been installed and are available to use. You can install them by following the 'Before you begin'
instructions below.
Third, each session is written to be completed in a single sitting. If that is not possible, then it would
normally be possible to stop at a convenient point, save the workspace before quitting R, then
reload the saved workspace when you wish to continue. Note, however, that whereas the additional
packages (libraries) need be installed only once, they must be loaded each time you open R and
require them. Any objects that were attached before quitting R also need to be attached again to take
you back to the point at which you left off.

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

2
Before you begin
Install R. It can be downloaded from http://cran.r-project.org/.
I currently am using version 3.0.2.
Start R. Use the drop-down menus to change your working directory to somewhere
you are happy to download all the files you need for this tutorial.
At the > prompt type,
download.file("http://dl.dropboxusercontent.com/u/214159700/RIntro.zip", "Rintro.zip")
and press return.
10

Next, type
unzip("Rintro.zip")

All the data you need for the sessions are now available in the working directory.
If you would like to install all the libraries (packages) you need for these practicals, type
load(“begin.RData”)

and then
install.libs()

Please note:
this is a draft version of the document and has not as yet
been thoroughly checked for typos and other errors.

20

The full and most up-to-date version can be downloaded from
https://www.dropbox.com/sh/zzibpn2keilrhv3/rsuA7L_jlK

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

3
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

4
Session 1: Getting Started with R
This session provides a brief introduction to how R works and introduces some of the more
common commands and procedures. Don't worry if not everything is clear at this stage. The
purpose is to get you started not to make you an expert user. If you would prefer to jump straight to
seeing R in action, then move on the Session 2 (p.13) and come back to this introduction later.

1.1 About R
R is an open source software package, licensed under the GNU General Public Licence. You can
obtain and install it for free, with versions available for PCs, Macs and Linux. To find out what is
available, go to the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/
10

Being free is not necessarily a good reason to use R. However, R is also well developed, well
documented, widely used and well supported by an extensive user community. It is not just software
for 'hobbyists'. It is widely used in research, both academic and commercial. It has well developed
capabilities for mapping and spatial analysis.
In his book R in a Nutshell (O'Reilly, 2010), Joseph Adler writes, “R is very good at plotting
graphics, analyzing data, and fitting statistical models using data that fits in the computer's
memory.” Nevertheless, no software provides the perfect tool for every job and Adler adds that “it's
not good at storing data in complicated structures, efficiently querying data, or working with data
that doesn't fit in the computer's memory.”

20

30

To these caveats it should be added that R does not offer spreadsheet editing of data of the type
found, for example, in Microsoft Excel. Consequently, it is often easier to prepare and 'clean' data
prior to loading them into R. There is an add-in to R that provides some integration with Excel. Go
to http://rcom.univie.ac.at/ and look for RExcel.
A possible barrier to learning R is that it is generally command-line driven. That is, the user types a
command that the software interprets and responds to. This can be daunting for those who are used
to extensive graphical user interfaces (GUIs) with drop-down menus, tabs, pop-up menus, left or
right-clicking and other navigational tools to steer you through a process. It may mean that R takes
a while longer to learn; however, that time is well spent. Once you know the commands it is usually
much faster to type them than to work through a series of menu options. They can be easily edited
to change things such as the size or colour of symbols on a graph, and a log or script of the
commands can be saved for use on another occasion or for sharing with others.
Saying that, a fairly simple and platform independent GUI called R Commander can be installed
(see http://cran.r-project.org/web/packages/Rcmdr/index.html). Field et al.'s book Discovering
Statistics Using R provides a comprehensive introduction to statistical analysis in R using both
command-lines and R Commander.

1.2 Getting Started

40

Assuming R has been installed in the normal way on your computer, clicking on the link/shortcut to
R on the desktop will open the RGui, offering some drop-down menu options, and also the R
Console, within which R commands are typed and executed. The appearance of the RGui differs a
little depending upon the operating system being used (Windows, Mac or Linux) but having used
one it should be fairly straightforward to navigate around another.

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

5
Figure 1.1. Screen shot of the R Gui for Windows

1.2.1 Using R as a calculator

At its simplest, R can be used as a calculator. Typing 1 + 1 after the prompt > will (after pressing
the return/enter key, ↵) produce the result 2, as in the following example:
> 1 + 1
[1] 2

Comments can be indicated with a hash tag and will be ignored
> # This is a comment, no need to type it

Some other simple mathematical expressions are given below.
10

20

30

> 10 - 5
[1] 5
> 10 * 2
[1] 20
> 10 - 5 * 2
[1] 0
> (10 - 5) * 2
[1] 10
> sqrt(100)
[1] 10
> 10^2
[1] 100
> 100^0.5
[1] 10
> 10^3
[1] 1000
> log10(100)
[1] 2
> log10(1000)
[1] 3
> 100 / 5
[1] 20
> 100^0.5 / 5
[1] 2

# The order of operations gives priority to
# multiplication
# The use of brackets changes the order
# Uses the function that calculates the square root
# 102
# 100.5, i.e. the square root again

# Uses the function that calculates the common log

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

6
1.2.2 Incomplete commands

If you see the + symbol instead of the usual (>) prompt it is because what has been typed is
incomplete. Often there is a missing bracket. For example,

10

> sqrt(
+ 100
+ )
[1] 10
> (1 + 2) * (5 - 1
+ )
[1] 12

# The + symbol indicates that the command is incomplete

Commands broken over multiple lines can be easier to read.

20

> for (i in 1:10) {
+ print(i)
+ }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

# This is a simple loop
# printing the numbers 1 to 10 on-screen

1.2.3 Repeating or modifying a previous command

If there is a mistake in a line of a code that needs to be corrected or if some previously typed
commands will be repeated then the ↑ and ↓ keys on the keyboard can be used to scroll between
previous entries in the R Console. Try it!

1.3 Scripting and Logging in R
30

1.3.1 Scripting

You can create a new script file from the drop down menu File → New script (in Windows) or File
→ New Document (Mac OS). It is basically a text file in which you could write, for example,
a <- 1:10
print(a)

40

In Windows, if you move the cursor up to the required line of the script and press Ctrl + R, then it
will be run in the R Console. So, for example, move the cursor to where you have typed a <- 1:10
and press Ctrl + R. Then move down a line and do the same. The contents of a, the numbers 1 to 10,
should be printed in the R Console. If you continue to keep the focus on the Scripting window and
go to Edit in the RGui you will find an option to run everything. Similar commands are available
for other Operating Systems (e.g. Mac key + Return). You can save files and load previously saved
files.
Scripting is both good practice and good sense. It is good practice because it allows for
reproducibility of your work. It is good sense because if you need to go back and change things you
can do so easily without having to start from scratch.
Tip: It can be sensible to create the script in a simple text editor that is independent of R, such as
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

7
Notepad. Although you will not be able to use Ctrl + R in the same way, if R crashes for any reason
you will not lose your script file.
1.3.2 Logging

You can save the contents of the R Console window to a text file which will then give you a log file
of the commands you have been using (including any mistakes). The easiest way to do this is to
click on the R Console (to take the focus from the Scripting window) and then use File → Save
History (in Windows) or File → Save As (Mac). Note that graphics are not usually plotted in the R
Console and therefore need to be saved separately.

1.4 Some R Basics
10

1.4.1 Functions, assignments and getting help

It is helpful to understand R as an object-oriented system that assigns information to objects within
the current workspace. The workspace is simply all the objects that have been created or loaded
since beginning the session in R. Look at it this way: the objects are like box files, containing useful
information, and the workspace is a larger storage container, keeping the box files together. A useful
feature of this is that R can operate on multiple tables of data at once: they are just stored as
separate objects within the workspace.
To view the objects currently in the workspace, type
> ls()
character(0)

20

Doing this runs the function ls(), which lists the contents of the workspace. The result,
character(0), indicates that the workspace is empty. (Assuming it currently is).
To find out more about a function, type ? or help with the function name,
> ?ls()
> help(ls)

This will provide details about the function, including examples of its use. It will also list the
arguments required to run the function, some of which may be optional and some of which may
have default values which can be changed as required. Consider, for example,
> ?log()

A required argument is x, which is the data value or values. Typing log() omits any data and
generates an error. However, log(100) works just fine. The argument base takes a default value of e1
which is approximately 2.72 and means the natural logarithm is calculated. Because the default is
assumed unless otherwise stated so log(100) gives the same answer as log(100, base=exp(1)).
Using log(100, base=10) gives the common logarithm, which can also be calculated using the
convenience function log10(100).
30

The results of mathematical expressions can be assigned to objects, as can the outcome of many
commands executed in the R Console. When the object is given a name different to other objects
within the current workspace, a new object will be created. Where the name and object already
exist, the previous contents of the object will be over-written, without warning – so be careful!
> a <- 10 – 5
> print(a)
[1] 5
> b <- 10 * 2
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

8
> print(b)
[1] 20
> print(a * b)
[1] 100
> a <- a * b
> print(a)
[1] 100

10

20

In these examples the assignment is achieved using the combination of < and -, as in a <- 100.
Alternatively, 100 -> a could be used or, more simply, a = 100. The print(..)command can often
be omitted, though it is useful, and sometimes necessary (for example, when what you hope should
appear on-screen doesn't).
> f = a * b
> print(f)
[1] 2000
> f
[1] 2000
> sqrt(b)
[1] 4.472136
> print(sqrt(b), digits=3)
[1] 4.47
> c(a,b)
[1] 100 20
> c(a,sqrt(b))
[1] 100.000000
4.472136
> print(c(a,sqrt(b)), digits=3)
[1] 100.00
4.47

# The additional parameter now specifies
# the number of significant figures
# The c(...) function combines its arguments

1.4.2 Naming objects in the workspace

Although the naming of objects is flexible, there are some exceptions,
30

> _a <- 10
Error: unexpected input in "_"
> 2a <- 10
Error: unexpected symbol in "2a"

Note also that R is case sensitive, so a and A are different objects
> a
> A
> a
[1]

<- 10
<- 20
== A
FALSE

The following is rarely sensible because it won't appear in the workspace, although it is there,
40

> .a <- 10
> ls()
[1] "a" "b" "f"
> .a
[1] 10
> rm(.a, A)

# Removes the objects .a and A (see below)

1.4.3 Removing objects from the workspace

From typing ls() we know when the workspace is not empty. To remove an object from the
workspace it can be referenced to explicitly – as in rm(A) – or indirectly by its position in the
workspace. To see how the second of these options will work, type
> ls()

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

9
[1] "a" "b" "f"

The output returned from the ls() function is here a vector of length three where the first element is
the first object (alphabetically) in the workspace, the second is the second object, and so forth. We
can access specific elements by using notation of the form ls[index.number]. So, the first element –
the first object in the workspace – can be obtained using,
> ls()[1]
[1] "a"
> ls()[2]
[1] "b"

Note how the square brackets
Similarly,
10

# Get the brackets right! some rounded some square

[…]

are used to reference specific elements within the vector.

> ls()[3]
[1] "f"
> ls()[c(1,3)]
[1] "a" "f"
> ls()[c(1,2,3)]
[1] "a" "b" "f"
> ls()[c(1:3)]
[1] "a" "b" "f"

# 1:3 means the numbers 1 to 3

Using the remove function, rm(...), the second and third objects in the workspace can be removed
using
20

> rm(list=ls()[c(1,3)])
> ls()
[1] "b"

Alternatively, objects can be removed by name
> rm(b)

To delete all the objects in the workspace and therefore empty it, type the following code but – be
warned! – there is no undo function. Whenever rm(...) is used the objects are deleted permanently.
> rm(list=ls())
> ls()
character(0)

30

# In other words, the workspace is empty

1.4.4 Saving and loading workspaces

Because objects are deleted permanently, a sensible precaution prior to using rm(...) is to save the
workspace. To do so permits the workspace to be reloaded if necessary and the objects recovered.
One way to save the workspace is to use
> save.image(file.choose(new=T))

Alternatively, the drop-down menus can be used (File → Save Workspace in the Windows version
of the RGui). In either case, type the extension .RData manually else it risks being omitted, making
it harder to locate and reload what has been saved. Try creating a couple of objects in your
workspace and then save it with the names workspace1.RData
To load a previously saved workspace, use
40

> load(file.choose())

or the drop-down menus.
When quitting R, it will prompt to save the workspace image. If the option to save is chosen it will
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

10
be saved to the file .RData within the working directory. Assuming that directory is the default one,
the workspace and all the objects it contains will be reloaded automatically each and every time R is
opened, which could be useful but also potentially irritating. To stop it, locate and delete the file.
The current working directory is identified using the get working directory function, getwd() and
changed most easily using the drop-down menus.
> getwd()
[1] "/Users/rich_harris"

(Your working directory will differ from the above)
10

Tip: A good strategy for file management is to create a new folder for each project in R, saving the
workspace regularly using a naming convention such as Dec_8_1.RData, Dec_8_2.RData etc. That
way you can easily find and recover work.

1.5 Quitting R
Before quitting R, you may wish to save the workspace. To quit R use either the drop-down menus
or
> q()

As promised, you will be prompted whether to save the workspace. Answering yes will save the
workspace to the file .RData in the current working directory (see section 1.4.4, 'Saving and loading
workspaces', on page 10, above). To exit without the prompt, use
> q(save = "no")

20

Or, more simply,
> q("no")

1.6 Getting Help
In addition to the use of the ? or help(…) documentation and the material available at CRAN,
http://cran.r-project.org/, R has an active user community. Helpful mailing lists can be accessed
from www.r-project.org/mail.html.
Perhaps the best all round introduction to R is the An Introduction to R which is freely available at
CRAN (http://cran.r-project.org/manuals.html) or by using the drop-down Help menus in the RGui.
It is clear and succinct.
30

I also have a free introduction to statistical analysis in R which accompanies the book Statistics for
Geography and Environmental Science. It can be obtained from http://www.social-statistics.org/?
p=354.
There are many books available. My favourite, with a moderate level statistical leaning and written
with clarity is,
Maindonald, J. & Braun, J., 2007. Data Analysis and Graphics using R (2nd edition). Cambridge:
CUP.
I also find useful,
Adler, J., 2010. R in a Nutshell. O'Reilly: Sebastopol, CA.
Crawley, MJ, 2005. Statistics: An Introduction using R. Chichester: Wiley (which is a shortened
version of The R Book by the same author).

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

11
Field, A., Miles, J. & Field, Z., 2012. Discovering Statistics Using R. London: Sage
However, none of these books is about mapping or spatial analysis (of particular interest to me as a
geographer). For that, the authoritative guide making the links between geographical information
science, geographical data analysis and R (but not really written for R newcomers) is,
Bivand, R.S., Pebesma, E.J. & Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R.
Berlin: Springer.
Also helpful is,
Ward, M.D. & Skrede Gleditsch, K., 2008. Spatial Regression Models. London: Sage. (Which uses
R code examples).
10

And
Chun, Y. & Griffith, D.A., 2013. Spatial Statistics and Geostatistics. London: Sage. (I found this
book a little eccentric but it contains some very good tips on its subject and gives worked examples
in R).
The following book has a short section of maps as well as other graphics in R (and is also, as the
title suggests, good for practical guidance on how to analyse surveys using cluster and stratified
sampling, for example):
Lumley, T., 2010. Complex Surveys. A Guide to Analysis Using R. Hoboken, NJ: Wiley.

20

Springer publish an ever-growing series of books under the banner Use R! If you are interested in
visualization, time-series analysis, Bayesian approaches, econometrics, data mining, …, then you'll
find something of relevance at http://www.springer.com/series/6991. But you may well also find
what you are looking for for free on the Internet.

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

12
Session 2: A Geographical Demonstration of R
This session provides a quick tour of some of R's functionality, with a focus on some geographical
applications. The idea here is to showcase a little of what R can do rather than providing a
comprehensive explanation to all that is going on. Aim for an intuitive understanding of the
commands and procedures but do not worry about the detail. More information about the workings
of R is given in the next session. More details about how to use R as a GIS and for spatial analysis
are given in Sessions 4, 5 and 6.
Note: this session assumes the libraries RgoogleMaps, png, sp and spdep are installed and available
for use. You can find out which packages you currently have installed by using
10

> row.names(installed.packages())

If the packages cannot be found then they can be installed using
install.packages(c("RgoogleMaps","png","sp","spdep")). Note that you may need administrative
rights on your computer to install the package.

2.1 Getting Started
As the focus of this session is on showing what R can do rather than teaching you how to do it.
instead of requiring you to type a series of commands, they can instead be executed automatically
from a previously written source file (a script: see Section 1.3.1, page 7). As the commands are
executed we will ask R to echo (print) them to the screen so you can following what is going on. At
regular intervals you will be prompted to press return before the script continues.
20

To begin, type,
> source(file.choose(), echo=T)

and load the source file session2.R. After some comments that you should ignore, you will be
prompted to load the .csv file schools.csv:
> ## Read in the file schools.csv file
> wait()
Please presss return
schools.data <- read.csv(file.choose())

30

Assuming there is no error, we will now proceed to a simple inspection of the data. Remember: the
commands you see written below are the ones that appear in the source file. You do not need to type
them yourself for this session.

2.2 Checking the data
It is always sensible to check a data table for any obvious errors.
> head(schools.data)
> tail(schools.data)

# Shows the first few rows of the data
# Shows the bottom few rows of the data

We can produce a summary of each column in the data table using
> summary(schools.data)

In this instance, each column is a continuous variable so we obtain a six-number summary of the
centre and spread of each variable.
The names of the variables are

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

13
> names(schools.data)

Next the number of columns and rows; and a check – row-by-row – to see if the data are complete
(have no missing data).
> ncol(schools.data)
> nrow(schools.data)
> complete.cases(schools.data)

It is not the most comprehensive check but everything appears to be in order.

2.3 Some simple graphics
10

The file schools.csv contains information about the location and some attributes of schools in
Greater London (in 2008). The locations are given as a grid reference (Easting, Northing). The
information is not real but is realistic. It should not, however, be used to make inferences about real
schools in London.
Of particular interest is the average attainment on leaving primary school (elementary school) of
pupils entering their first year of secondary school. Do some schools in London attract higher
attaining pupils more than others? The variable attainment contains this information.
A stripchart and then a histogram will show that (not surprisingly) there is variation in the average
prior attainment by school.

20

>
>
>
+

attach(schools.data)
stripchart(attainment, method="stack", xlab="Mean Prior Attainment by School")
hist(attainment, col="light blue", border="dark blue", freq=F, ylim=c(0,0.30),
xlab=”Mean attainment)

Here the histogram is scaled so the total area sums to one. To this we can add a rug plot,
> rug(attainment)

also a density curve, a Normal curve for comparison and a legend.

30

>
>
>
>
>
>
+

lines(density(sort(attainment)))
xx <- seq(from=23, to=35, by=0.1)
yy <- dnorm(xx, mean(attainment), sd(attainment))
lines(xx, yy, lty="dotted")
rm(xx, yy)
legend("topright", legend=c("density curve","Normal curve"),
lty=c("solid","dotted"))

If would be interesting to know if attainment varies by school type. A simple way to consider this is
to produce a box plot. The data contain a series of dummy variables for each of a series of school
types (Voluntary Aided Church of England school: coe = 1; Voluntary Aided Roman Catholic: rc =
1; Voluntary controlled faith school: vol.con = 1; another type of faith school: other.faith = 1; a
selective school (sets an entrance exam): selective = 1). We will combine these into a single,
categorical variable then produce the box plot showing the distribution of average attainment by
school type.
First the categorical variable:
40

> school.type <- rep("Not Faith/Selective", times=nrow(schools.data))
# This gives each school an initial value which will
# then be replaced with its actual type
> school.type[coe==1] <- "VA CoE"
# Voluntary Aided Church of England schools are given
# the category VA CoE

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

14
> school.type[rc==1] <- "VA RC"
# Voluntary Aided Roman Catholic schools are given
# the category VA RC [etc.]
> school.type[vol.con==1] <- "VC"
> school.type[other.faith==1] <- "Other Faith"
> school.type[selective==1] <- "Selective"
> school.type <- factor(school.type)
> levels(school.type)
# A list of the categories
[1] "Not Faith/Selective" "Other Faith"
"Selective" [etc.]

10

Now the box plots:
>
>
+
>
>

par(mai=c(1,1.4,0.5,0.5))
# Changes the graphic margins
boxplot(attainment ~ school.type, horizontal=T, xlab="Mean attainment", las=1,
cex.axis=0.8)
# Includes options to draw the boxes and labels horizontally
abline(v=mean(attainment), lty="dashed")
# Adds the mean value to the plot
legend("topright", legend="Grand Mean", lty="dashed")

Not surprisingly, the selective schools (those with an entrance exam) recruit the pupils with highest
average prior attainment.

Figure 2.1. A histogram with annotation in R

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

15
Figure 2.2. Mean prior attainment by school type

2.4 Some simple statistics
It appears (in Figure 2.2) that there are differences in the levels of prior attainment of pupils in
different school types. We can test whether the variation is significant using an analysis of variance.
> summary(aov(attainment ~ school.type))
Df Sum Sq Mean Sq F value Pr(>F)
school.type
5 479.8
95.95
71.42 <2e-16 ***
Residuals
361 485.0
1.34

It is, at a greater than 99.9% confidence (F = 71.42, p < 0.001).
10

We might also be interested in comparing those schools with the highest and lowest proportions of
Free School Meal eligible pupils to see if they are recruiting pupils with equal or differing mean
prior attainment. We expect a difference because free school meal eligibility is used as an indicator
of a low income household and there is a link between economic disadvantage and educational
progress in the UK.
>
#
>
#
>

attainment.high.fsm.schools <- attainment[fsm > quantile(fsm, probs=0.75)]
Finds the attainment scores for schools with the highest proportions of FSM pupils
attainment.low.fsm.schools <- attainment[fsm < quantile(fsm, probs=0.25)]
Finds the attainment scores for schools with the lowest proportions of FSM pupils
t.test(attainment.high.fsm.schools, attainment.low.fsm.schools)
Welch Two Sample t-test

20

data: attainment.high.fsm.schools and attainment.low.fsm.schools
t = -15.0431, df = 154.164, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.437206 -2.639240
sample estimates:
mean of x mean of y
26.58352 29.62174

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

16
It comes as little surprise to learn that those schools with the greatest proportions of FSM eligible
pupils are also those recruiting lower attaining pupils on average (mean attainment 26.6 Vs 29.6, t =
-15.0, p < 0.001, the 95% confidence interval is from -3.44 to 2.64).
Exploring this further, the Pearson correlation between the mean prior attainment of pupils entering
each school and the proportion of them that are FSM eligible is -0.689, and significant (p < 0.001):
> round(cor(fsm, attainment),3)
> cor.test(fsm, attainment)
Pearson's product-moment correlation

10

data: fsm and attainment
t = -18.1731, df = 365, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.7394165 -0.6313939
sample estimates:
cor
-0.6892159

Of course, the use of the Pearson correlation assumes that the relationship is linear, so let's check:
> plot(attainment ~ fsm)
> abline(lm(attainment ~ fsm))

20

# Adds a line of best fit (a regression line)

There is some suggestion the relationship might be curvilinear. However, we will ignore that here.
Finally, some regression models. The first seeks to explain the mean prior attainment scores for the
schools in London by the proportion of their intake who are free school meal eligible. (The result is
the line of best fit added to the scatterplot above).
The second model adds a variable giving the proportion of the intake of a white ethnic group.
The third adds a dummy variable indicating whether the school is selective or not.
> model1 <- lm(attainment ~ fsm, data=schools.data)
> summary(model1)
Call:
lm(formula = attainment ~ fsm, data = schools.data)

30

Residuals:
Min
1Q Median
-2.8871 -0.7413 -0.1186

3Q
0.5487

Max
3.6681

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.6190
0.1148 258.12
<2e-16 ***
fsm
-6.5469
0.3603 -18.17
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

40

Residual standard error: 1.178 on 365 degrees of freedom
Multiple R-squared: 0.475, Adjusted R-squared: 0.4736
F-statistic: 330.3 on 1 and 365 DF, p-value: < 2.2e-16

> model2 <- lm(attainment ~ fsm + white, data=schools.data)
> summary(model2)

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

17
Call:
lm(formula = attainment ~ fsm + white, data = schools.data)
Residuals:
Min
1Q Median
-2.9442 -0.7295 -0.1335

10

3Q
0.5111

Max
3.7837

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.1250
0.1979 152.21 < 2e-16 ***
fsm
-7.2502
0.4214 -17.20 < 2e-16 ***
white
-0.8722
0.2796
-3.12 0.00196 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.164 on 364 degrees of freedom
Multiple R-squared: 0.4887,
Adjusted R-squared: 0.4859
F-statistic: 173.9 on 2 and 364 DF, p-value: < 2.2e-16

> model3 <- update(model2, . ~ . + selective)
# Means: take the previous model and add the variable 'selective'
> summary(model3)

20

Call:
lm(formula = attainment ~ fsm + white + selective, data = schools.data)
Residuals:
Min
1Q
-2.6262 -0.5620

30

Median
0.0537

3Q
0.5607

Max
3.6215

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.1706
0.1689 172.712
<2e-16 ***
fsm
-5.2381
0.3591 -14.586
<2e-16 ***
white
-0.2299
0.2249 -1.022
0.307
selective
3.4768
0.2338 14.872
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9189 on 363 degrees of freedom
Multiple R-squared: 0.6823,
Adjusted R-squared: 0.6796
F-statistic: 259.8 on 3 and 363 DF, p-value: < 2.2e-16

Looking at the adjusted R-squared value, each model appears to be an improvement on the one that
precedes it (marginally so for model 2). However, looking at the last (model 3), we may suspect that
we could drop the white ethnicity variable with no significant loss in the amount of variance
explained. An analysis of variance confirms that to be the case.
40

> model4 <- update(model3, . ~ . - white)
# Means: take the previous model but remove the variable 'white'
> anova(model4, model3)
Analysis of Variance Table
Model 1: attainment ~ fsm + selective
Model 2: attainment ~ fsm + white + selective
Res.Df
RSS Df Sum of Sq
F Pr(>F)
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

18
1
2

364 307.42
363 306.54

1

0.88222 1.0447 0.3074

The residual error, measured by the residual sum of squares (RSS), is not very different for the two
models, and that difference, 0.882, is not significant (F = 1.045, p = 0.307).

2.5 Some simple maps
For a geographer like myself, R becomes more interesting when we begin to look at its
geographical data handling capabilities.

10

The schools data contain geographical coordinates and are therefore geographical data.
Consequently they can be mapped. The simplest way for point data is to use a 2-dimensional plot,
making sure the aspect ratio is fixed correctly.
> plot(Easting, Northing, asp=1, main="Map of London schools")
# The argument asp=1 fixes the aspect ratio correctly

Amongst the attribute data for the schools, the variable esl gives the proportion of pupils who speak
English as an additional language. It would be interesting for the size of the symbol on the map to
be proportional to it.
> plot(Easting, Northing, asp=1, main="Map of London schools",
+ cex=sqrt(esl*5))

It would also be nice to add a little colour to the map. We might, for example, change the default
plotting 'character' to a filled circle with a yellow background.
20

> plot(Easting, Northing, asp=1, main="Map of London schools",
+ cex=sqrt(esl*5), pch=21, bg="yellow")

A more interesting option would be to have the circles filled with a colour gradient that is related to
a second variable in the data – the proportion of pupils eligible for free school meals for example.
To achieve this, we can begin by creating a simple colour palette:
> palette <- c("yellow","orange","red","purple")

We now cut the free school meals eligibility variable into quartiles (four classes, each containing
approximately the same number of observations).
> map.class <-

30

cut(fsm, quantile(fsm), labels=FALSE, include.lowest=TRUE)

The result is to split the fsm variable into four groups with the value 1 given to the first quarter of
the data (schools with the lowest proportions of eligible pupils), the value 2 given to the next
quarter, then 3, and finally the value 4 for schools with the highest proportions of FSM eligible
pupils.
There are, then, now four map classes and the same number of colours in the palette. Schools in
map class 1 (and with the lowest proportion of fsm-eligible pupils) will be coloured yellow, the next
class will be orange, and so forth.
Bringing it all together,
> plot(Easting, Northing, asp=1, main="Map of London schools",
+ cex=sqrt(esl*5), pch=21, bg=palette[map.class])

40

It would be good to add a legend, and perhaps a scale bar and North arrow. Nevertheless, as a first
map in R this isn't too bad!

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

19
Figure 2.3. A simple point map in R

Why don't we be a bit more ambitious and overlay the map on a Google Maps tile, adding a legend
as we do so? This requires us to load an additional library for R and to have an active Internet
connection.
> library(RgoogleMaps)

If you get an error such as the following
Error in library(RgoogleMaps) : there is no package called ‘RgoogleMaps’

it is because the library has not been installed.

10

Assuming that the data frame, schools.data, remains in the workspace and attached (it will be if you
have followed the instructions above), and that the colour palette created above has not been
deleted, then the map shown in Figure 2.4 is created with the following code:
> MyMap <- MapBackground(lat=Lat, lon=Long)
> PlotOnStaticMap(MyMap, Lat, Long, cex=sqrt(esl*5), pch=21,
bg=palette[map.class])
> legend("topleft", legend=paste("<",tapply(fsm, map.class, max)),
pch=21, pt.bg=palette, pt.cex=1.5, bg="white", title="P(FSM-eligible)")
> legVals <- seq(from=0.2,to=1,by=0.2)
> legend("topright", legend=round(legVals,3), pch=21, pt.bg="white",
pt.cex=sqrt(legVals*5), bg="white", title="P(ESL)")

20

(If you are running the script for this session then the code you see on-screen will differ slightly.
That is because it has some error trapping included in it incase there is no Internet connection
available)
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

20
Remember that the data are simulated. The points shown on the map are not the true locations of
schools in London. Do not worry about understanding the code in detail – the purpose is to see the
sort of things R can do with geographical data. We will look more closely at the detail in later
sessions.

Figure 2.4. A slightly less simple map produced in R

2.6 Some simple geographical analysis
Remember the regression models from earlier? It would be interesting to test the assumption that
the residuals exhibit independence by looking for spatial dependencies. To do this we will consider
to what degree the residual value for any one school correlates with the mean residual value for its
six nearest other schools (the choice of six is completely arbitrary).
10

First, we will take a copy of the schools data and convert it into an explicitly spatial object in R:
>
>
>
>
>
>
>
>
>

detach(schools.data)
schools.xy <- schools.data
library(sp)
attach(schools.xy)
coordinates(schools.xy) <- c("Easting", "Northing")
# Converts into a spatial object
class(schools.xy)
detach(schools.xy)
proj4string(schools.xy) <- CRS("+proj=tmerc datum=OSGB36")
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

21
> # Sets the Coordinate Referencing System

Second, we find the six nearest neighbours for each school.
> library(spdep)
> nearest.six <- knearneigh(schools.xy, k=6, RANN=F)
> # RANN = F to override the use of the RANN package that may not be installed

We can learn from this that the six nearest schools to the first school in the data (row 1) are schools
5, 38, 2, 40, 223 and 6:
> nearest.six$nn[1,]
[1]
5 38
2 40 223

10

6

The neighbours object, nearest.six, is an object of class knn:
>

class(nearest.six)

It is next converted into the more generic class of neighbours.

20

> neighbours <- knn2nb(nearest.six)
> class(neighbours)
[1] "nb"
> summary(neighbours)
Neighbour list object:
Number of regions: 367
Number of nonzero links: 2202
Percentage nonzero weights: 1.634877
Average number of links: 6
[etc.]

The connections between each point and its neighbours can then be plotted. It may take a few
minutes.
> plot(neighbours, coordinates(schools.xy))

Having identified the six nearest neighbours to each school we could give each equal weight in a
spatial weights matrix or, alternatively, decrease the weight with distance away (so the first nearest
neighbour gets most weight and the sixth nearest the least). Creating a matrix with equal weight
given to all neighbours is sufficient for the time being.
30

> spatial.weights <- nb2listw(neighbours)

(The other possibility is achieved by creating then supplying a list of general weights to the
function, see ?nb2listw)
We now have all the information required to test whether there are spatial dependencies in the
residuals. The answer is yes (Moran's I = 0.218, p < 0.001, indicating positive spatial
autocorrelation).
> lm.morantest(model4, spatial.weights)
Global Moran's I for regression residuals

40

data:
model: lm(formula = attainment ~ fsm + selective, data = schools.data)
weights: spatial.weights
Moran I statistic standard deviate = 7.9152, p-value = 1.235e-15
alternative hypothesis: greater
sample estimates:
Observed Moran's I
Expectation
Variance
0.2181914682
-0.0038585704
0.0007870118

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

22
2.7 Tidying up
It is better to save your workspace regularly whilst you are working (see Section 1.4.4, 'Saving and
loading workspaces', page 10) and certainly before you finish. Don't forget to include the
extension .RData when saving. Having done so, you can tidy-up the workspace.
> save.image(file.choose(new=T))
> rm(list=ls())
# Be careful, it deletes everything!

2.8 Further Information

10

A simple introduction to graphics and statistical analysis in R is given in Statistics for Geography
and Environmental Science: An Introduction in R, available at http://www.social-statistics.org/?
p=354.

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

23
An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

24
Download the full version from
https://www.dropbox.com/sh/zzibpn2keilrhv3/rsuA7L_jlK

An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013

25

Mais conteúdo relacionado

Destaque (20)

Gis Concepts 5/5
Gis Concepts 5/5Gis Concepts 5/5
Gis Concepts 5/5
 
Gis Concepts 3/5
Gis Concepts 3/5Gis Concepts 3/5
Gis Concepts 3/5
 
Gis powerpoint
Gis powerpointGis powerpoint
Gis powerpoint
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatial
 
ppt spatial data
ppt spatial datappt spatial data
ppt spatial data
 
remote sensing
remote sensingremote sensing
remote sensing
 
Photogrammetry 1.
Photogrammetry 1.Photogrammetry 1.
Photogrammetry 1.
 
Photogrammetry- Surveying
Photogrammetry- SurveyingPhotogrammetry- Surveying
Photogrammetry- Surveying
 
GIS Data Types
GIS Data TypesGIS Data Types
GIS Data Types
 
Total station
Total stationTotal station
Total station
 
Lecture on photogrammetry
Lecture on photogrammetryLecture on photogrammetry
Lecture on photogrammetry
 
"GPS" Global Positioning System [PDF]
"GPS" Global Positioning System  [PDF]"GPS" Global Positioning System  [PDF]
"GPS" Global Positioning System [PDF]
 
Welcome to the presentation on ‘total station
Welcome to the presentation on ‘total stationWelcome to the presentation on ‘total station
Welcome to the presentation on ‘total station
 
Open Channel Flow
Open Channel FlowOpen Channel Flow
Open Channel Flow
 
Remote sensing ppt
Remote sensing pptRemote sensing ppt
Remote sensing ppt
 
Global positioning System
Global positioning SystemGlobal positioning System
Global positioning System
 
Gis (geographic information system)
Gis (geographic information system)Gis (geographic information system)
Gis (geographic information system)
 
Gps ppt
Gps pptGps ppt
Gps ppt
 
GIS presentation
GIS presentationGIS presentation
GIS presentation
 
Remote Sensing PPT
Remote Sensing PPTRemote Sensing PPT
Remote Sensing PPT
 

Mais de Rich Harris

Quantitative Methods in Geography Making the Connections between Schools, Uni...
Quantitative Methods in Geography Making the Connections between Schools, Uni...Quantitative Methods in Geography Making the Connections between Schools, Uni...
Quantitative Methods in Geography Making the Connections between Schools, Uni...Rich Harris
 
White flight, ethnic cliffs and other unhelpful hyperbole?
White flight, ethnic cliffs and other unhelpful hyperbole?White flight, ethnic cliffs and other unhelpful hyperbole?
White flight, ethnic cliffs and other unhelpful hyperbole?Rich Harris
 
Optimal models of segregation
Optimal models of segregationOptimal models of segregation
Optimal models of segregationRich Harris
 
Motion Charts, White Flight and Ethnic Cliffs?
Motion Charts, White Flight and Ethnic Cliffs?Motion Charts, White Flight and Ethnic Cliffs?
Motion Charts, White Flight and Ethnic Cliffs?Rich Harris
 
Commentary: Ethno-demographic change in English local authorities, 1991-2011
Commentary: Ethno-demographic change in English local authorities, 1991-2011Commentary: Ethno-demographic change in English local authorities, 1991-2011
Commentary: Ethno-demographic change in English local authorities, 1991-2011Rich Harris
 
"gis us a clue" - quantitative methods teaching in geography
"gis us a clue" - quantitative methods teaching in geography"gis us a clue" - quantitative methods teaching in geography
"gis us a clue" - quantitative methods teaching in geographyRich Harris
 
Contrasts: the story of Easter
Contrasts: the story of EasterContrasts: the story of Easter
Contrasts: the story of EasterRich Harris
 
Jesus in a new light
Jesus in a new lightJesus in a new light
Jesus in a new lightRich Harris
 
Geographies of ethnicity by school in London
Geographies of ethnicity by school in LondonGeographies of ethnicity by school in London
Geographies of ethnicity by school in LondonRich Harris
 
Good news or a great challenge? Luke 4: 14-30
Good news or a great challenge? Luke 4: 14-30Good news or a great challenge? Luke 4: 14-30
Good news or a great challenge? Luke 4: 14-30Rich Harris
 
Count on us? A crisis of numeracy in geography and related disciplines?
Count on us? A crisis of numeracy in geography and related disciplines?Count on us? A crisis of numeracy in geography and related disciplines?
Count on us? A crisis of numeracy in geography and related disciplines?Rich Harris
 
‘White flight’ from London?
‘White flight’ from London?‘White flight’ from London?
‘White flight’ from London?Rich Harris
 
Geographies of ethnicity in the 2011 Census of England and Wales
Geographies of ethnicity in the 2011 Census of England and WalesGeographies of ethnicity in the 2011 Census of England and Wales
Geographies of ethnicity in the 2011 Census of England and WalesRich Harris
 
Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...
Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...
Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...Rich Harris
 
Neoconservatism, Nature and the American Christian Right
Neoconservatism, Nature and the American Christian RightNeoconservatism, Nature and the American Christian Right
Neoconservatism, Nature and the American Christian RightRich Harris
 
Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?Rich Harris
 
Using geographical micro-data to measure segregation at the scale of competin...
Using geographical micro-data to measure segregation at the scale of competin...Using geographical micro-data to measure segregation at the scale of competin...
Using geographical micro-data to measure segregation at the scale of competin...Rich Harris
 
Who benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, EnglandWho benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, EnglandRich Harris
 
Who benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, EnglandWho benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, EnglandRich Harris
 
Sample of slides for Statistics for Geography and Environmental Science
Sample of slides for Statistics for Geography and Environmental ScienceSample of slides for Statistics for Geography and Environmental Science
Sample of slides for Statistics for Geography and Environmental ScienceRich Harris
 

Mais de Rich Harris (20)

Quantitative Methods in Geography Making the Connections between Schools, Uni...
Quantitative Methods in Geography Making the Connections between Schools, Uni...Quantitative Methods in Geography Making the Connections between Schools, Uni...
Quantitative Methods in Geography Making the Connections between Schools, Uni...
 
White flight, ethnic cliffs and other unhelpful hyperbole?
White flight, ethnic cliffs and other unhelpful hyperbole?White flight, ethnic cliffs and other unhelpful hyperbole?
White flight, ethnic cliffs and other unhelpful hyperbole?
 
Optimal models of segregation
Optimal models of segregationOptimal models of segregation
Optimal models of segregation
 
Motion Charts, White Flight and Ethnic Cliffs?
Motion Charts, White Flight and Ethnic Cliffs?Motion Charts, White Flight and Ethnic Cliffs?
Motion Charts, White Flight and Ethnic Cliffs?
 
Commentary: Ethno-demographic change in English local authorities, 1991-2011
Commentary: Ethno-demographic change in English local authorities, 1991-2011Commentary: Ethno-demographic change in English local authorities, 1991-2011
Commentary: Ethno-demographic change in English local authorities, 1991-2011
 
"gis us a clue" - quantitative methods teaching in geography
"gis us a clue" - quantitative methods teaching in geography"gis us a clue" - quantitative methods teaching in geography
"gis us a clue" - quantitative methods teaching in geography
 
Contrasts: the story of Easter
Contrasts: the story of EasterContrasts: the story of Easter
Contrasts: the story of Easter
 
Jesus in a new light
Jesus in a new lightJesus in a new light
Jesus in a new light
 
Geographies of ethnicity by school in London
Geographies of ethnicity by school in LondonGeographies of ethnicity by school in London
Geographies of ethnicity by school in London
 
Good news or a great challenge? Luke 4: 14-30
Good news or a great challenge? Luke 4: 14-30Good news or a great challenge? Luke 4: 14-30
Good news or a great challenge? Luke 4: 14-30
 
Count on us? A crisis of numeracy in geography and related disciplines?
Count on us? A crisis of numeracy in geography and related disciplines?Count on us? A crisis of numeracy in geography and related disciplines?
Count on us? A crisis of numeracy in geography and related disciplines?
 
‘White flight’ from London?
‘White flight’ from London?‘White flight’ from London?
‘White flight’ from London?
 
Geographies of ethnicity in the 2011 Census of England and Wales
Geographies of ethnicity in the 2011 Census of England and WalesGeographies of ethnicity in the 2011 Census of England and Wales
Geographies of ethnicity in the 2011 Census of England and Wales
 
Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...
Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...
Faith and Climate Change Scepticism: Competing Christian theologies of Enviro...
 
Neoconservatism, Nature and the American Christian Right
Neoconservatism, Nature and the American Christian RightNeoconservatism, Nature and the American Christian Right
Neoconservatism, Nature and the American Christian Right
 
Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?
 
Using geographical micro-data to measure segregation at the scale of competin...
Using geographical micro-data to measure segregation at the scale of competin...Using geographical micro-data to measure segregation at the scale of competin...
Using geographical micro-data to measure segregation at the scale of competin...
 
Who benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, EnglandWho benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, England
 
Who benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, EnglandWho benefits from grammar schools? A case study of Buckinghamshire, England
Who benefits from grammar schools? A case study of Buckinghamshire, England
 
Sample of slides for Statistics for Geography and Environmental Science
Sample of slides for Statistics for Geography and Environmental ScienceSample of slides for Statistics for Geography and Environmental Science
Sample of slides for Statistics for Geography and Environmental Science
 

Último

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...liera silvan
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 

Último (20)

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 

An introduction to Mapping and Spatial Modelling in R (preview version)

  • 1. An Introduction to Mapping and Spatial Modelling R (preview version) By and © Richard Harris, School of Geographical Sciences, University of Bristol An Introduction to Mapping and Modelling R (preview version) by Richard Harris is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at www.social-statistics.org. You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution — You must attribute the work in the following manner: Based on An Introduction to Mapping and Spatial Modelling R by Richard Harris (www.social-statistics.org). Noncommercial — You may not use this work for commercial purposes. Use for education in a recognised higher education institution (a College or University) is permissible. Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. With the understanding that: Waiver — Any of the above conditions can be waived if you get permission from the copyright holder (Richard Harris, rich.harris@bris.ac.uk) Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license. Other Rights — In no way are any of the following rights affected by the license: Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; The author's moral rights; Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. Notice — For any reuse or distribution, you must make clear to others the license terms of this work which applies also to derivatives. (Document version 0.1, November, 2013. Draft preview version.) An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 1
  • 2. Introduction and contents This document presents a short introduction to R highlighting some geographical functionality. Specifically, it provides: • • A short 'showcase' of using R for data analysis and mapping (Session 2) • Further information about how R works (Session 3) • Guidance on how to use R as a simple GIS (Session 4) • 10 A basic introduction to R (Session 1) Details on how to create a spatial weights matrix (Session 5) • An introduction to spatial regression modelling including Geographically Weighted Regression (Session 6) Further sessions will be added in the months (more likely, years) ahead. The document is provided in good faith and the contents have been tested by the author. However, use is entirely as the user's risk. Absolutely no responsibility or liability is accepted by the author for consequences arising from this document howsoever it is used. It is is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (see above). Before starting the following should be considered. First, you will notice that in this document the pages and, more unusually, the lines are numbered. The reason is educational: it makes directing a class to a specific part of a page easier and faster. For other readers, the line numbers can be ignored. 20 Second, the sessions presume that, as well as R, a number of additional R packages (libraries) have been installed and are available to use. You can install them by following the 'Before you begin' instructions below. Third, each session is written to be completed in a single sitting. If that is not possible, then it would normally be possible to stop at a convenient point, save the workspace before quitting R, then reload the saved workspace when you wish to continue. Note, however, that whereas the additional packages (libraries) need be installed only once, they must be loaded each time you open R and require them. Any objects that were attached before quitting R also need to be attached again to take you back to the point at which you left off. An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 2
  • 3. Before you begin Install R. It can be downloaded from http://cran.r-project.org/. I currently am using version 3.0.2. Start R. Use the drop-down menus to change your working directory to somewhere you are happy to download all the files you need for this tutorial. At the > prompt type, download.file("http://dl.dropboxusercontent.com/u/214159700/RIntro.zip", "Rintro.zip") and press return. 10 Next, type unzip("Rintro.zip") All the data you need for the sessions are now available in the working directory. If you would like to install all the libraries (packages) you need for these practicals, type load(“begin.RData”) and then install.libs() Please note: this is a draft version of the document and has not as yet been thoroughly checked for typos and other errors. 20 The full and most up-to-date version can be downloaded from https://www.dropbox.com/sh/zzibpn2keilrhv3/rsuA7L_jlK An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 3
  • 4. An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 4
  • 5. Session 1: Getting Started with R This session provides a brief introduction to how R works and introduces some of the more common commands and procedures. Don't worry if not everything is clear at this stage. The purpose is to get you started not to make you an expert user. If you would prefer to jump straight to seeing R in action, then move on the Session 2 (p.13) and come back to this introduction later. 1.1 About R R is an open source software package, licensed under the GNU General Public Licence. You can obtain and install it for free, with versions available for PCs, Macs and Linux. To find out what is available, go to the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/ 10 Being free is not necessarily a good reason to use R. However, R is also well developed, well documented, widely used and well supported by an extensive user community. It is not just software for 'hobbyists'. It is widely used in research, both academic and commercial. It has well developed capabilities for mapping and spatial analysis. In his book R in a Nutshell (O'Reilly, 2010), Joseph Adler writes, “R is very good at plotting graphics, analyzing data, and fitting statistical models using data that fits in the computer's memory.” Nevertheless, no software provides the perfect tool for every job and Adler adds that “it's not good at storing data in complicated structures, efficiently querying data, or working with data that doesn't fit in the computer's memory.” 20 30 To these caveats it should be added that R does not offer spreadsheet editing of data of the type found, for example, in Microsoft Excel. Consequently, it is often easier to prepare and 'clean' data prior to loading them into R. There is an add-in to R that provides some integration with Excel. Go to http://rcom.univie.ac.at/ and look for RExcel. A possible barrier to learning R is that it is generally command-line driven. That is, the user types a command that the software interprets and responds to. This can be daunting for those who are used to extensive graphical user interfaces (GUIs) with drop-down menus, tabs, pop-up menus, left or right-clicking and other navigational tools to steer you through a process. It may mean that R takes a while longer to learn; however, that time is well spent. Once you know the commands it is usually much faster to type them than to work through a series of menu options. They can be easily edited to change things such as the size or colour of symbols on a graph, and a log or script of the commands can be saved for use on another occasion or for sharing with others. Saying that, a fairly simple and platform independent GUI called R Commander can be installed (see http://cran.r-project.org/web/packages/Rcmdr/index.html). Field et al.'s book Discovering Statistics Using R provides a comprehensive introduction to statistical analysis in R using both command-lines and R Commander. 1.2 Getting Started 40 Assuming R has been installed in the normal way on your computer, clicking on the link/shortcut to R on the desktop will open the RGui, offering some drop-down menu options, and also the R Console, within which R commands are typed and executed. The appearance of the RGui differs a little depending upon the operating system being used (Windows, Mac or Linux) but having used one it should be fairly straightforward to navigate around another. An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 5
  • 6. Figure 1.1. Screen shot of the R Gui for Windows 1.2.1 Using R as a calculator At its simplest, R can be used as a calculator. Typing 1 + 1 after the prompt > will (after pressing the return/enter key, ↵) produce the result 2, as in the following example: > 1 + 1 [1] 2 Comments can be indicated with a hash tag and will be ignored > # This is a comment, no need to type it Some other simple mathematical expressions are given below. 10 20 30 > 10 - 5 [1] 5 > 10 * 2 [1] 20 > 10 - 5 * 2 [1] 0 > (10 - 5) * 2 [1] 10 > sqrt(100) [1] 10 > 10^2 [1] 100 > 100^0.5 [1] 10 > 10^3 [1] 1000 > log10(100) [1] 2 > log10(1000) [1] 3 > 100 / 5 [1] 20 > 100^0.5 / 5 [1] 2 # The order of operations gives priority to # multiplication # The use of brackets changes the order # Uses the function that calculates the square root # 102 # 100.5, i.e. the square root again # Uses the function that calculates the common log An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 6
  • 7. 1.2.2 Incomplete commands If you see the + symbol instead of the usual (>) prompt it is because what has been typed is incomplete. Often there is a missing bracket. For example, 10 > sqrt( + 100 + ) [1] 10 > (1 + 2) * (5 - 1 + ) [1] 12 # The + symbol indicates that the command is incomplete Commands broken over multiple lines can be easier to read. 20 > for (i in 1:10) { + print(i) + } [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 # This is a simple loop # printing the numbers 1 to 10 on-screen 1.2.3 Repeating or modifying a previous command If there is a mistake in a line of a code that needs to be corrected or if some previously typed commands will be repeated then the ↑ and ↓ keys on the keyboard can be used to scroll between previous entries in the R Console. Try it! 1.3 Scripting and Logging in R 30 1.3.1 Scripting You can create a new script file from the drop down menu File → New script (in Windows) or File → New Document (Mac OS). It is basically a text file in which you could write, for example, a <- 1:10 print(a) 40 In Windows, if you move the cursor up to the required line of the script and press Ctrl + R, then it will be run in the R Console. So, for example, move the cursor to where you have typed a <- 1:10 and press Ctrl + R. Then move down a line and do the same. The contents of a, the numbers 1 to 10, should be printed in the R Console. If you continue to keep the focus on the Scripting window and go to Edit in the RGui you will find an option to run everything. Similar commands are available for other Operating Systems (e.g. Mac key + Return). You can save files and load previously saved files. Scripting is both good practice and good sense. It is good practice because it allows for reproducibility of your work. It is good sense because if you need to go back and change things you can do so easily without having to start from scratch. Tip: It can be sensible to create the script in a simple text editor that is independent of R, such as An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 7
  • 8. Notepad. Although you will not be able to use Ctrl + R in the same way, if R crashes for any reason you will not lose your script file. 1.3.2 Logging You can save the contents of the R Console window to a text file which will then give you a log file of the commands you have been using (including any mistakes). The easiest way to do this is to click on the R Console (to take the focus from the Scripting window) and then use File → Save History (in Windows) or File → Save As (Mac). Note that graphics are not usually plotted in the R Console and therefore need to be saved separately. 1.4 Some R Basics 10 1.4.1 Functions, assignments and getting help It is helpful to understand R as an object-oriented system that assigns information to objects within the current workspace. The workspace is simply all the objects that have been created or loaded since beginning the session in R. Look at it this way: the objects are like box files, containing useful information, and the workspace is a larger storage container, keeping the box files together. A useful feature of this is that R can operate on multiple tables of data at once: they are just stored as separate objects within the workspace. To view the objects currently in the workspace, type > ls() character(0) 20 Doing this runs the function ls(), which lists the contents of the workspace. The result, character(0), indicates that the workspace is empty. (Assuming it currently is). To find out more about a function, type ? or help with the function name, > ?ls() > help(ls) This will provide details about the function, including examples of its use. It will also list the arguments required to run the function, some of which may be optional and some of which may have default values which can be changed as required. Consider, for example, > ?log() A required argument is x, which is the data value or values. Typing log() omits any data and generates an error. However, log(100) works just fine. The argument base takes a default value of e1 which is approximately 2.72 and means the natural logarithm is calculated. Because the default is assumed unless otherwise stated so log(100) gives the same answer as log(100, base=exp(1)). Using log(100, base=10) gives the common logarithm, which can also be calculated using the convenience function log10(100). 30 The results of mathematical expressions can be assigned to objects, as can the outcome of many commands executed in the R Console. When the object is given a name different to other objects within the current workspace, a new object will be created. Where the name and object already exist, the previous contents of the object will be over-written, without warning – so be careful! > a <- 10 – 5 > print(a) [1] 5 > b <- 10 * 2 An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 8
  • 9. > print(b) [1] 20 > print(a * b) [1] 100 > a <- a * b > print(a) [1] 100 10 20 In these examples the assignment is achieved using the combination of < and -, as in a <- 100. Alternatively, 100 -> a could be used or, more simply, a = 100. The print(..)command can often be omitted, though it is useful, and sometimes necessary (for example, when what you hope should appear on-screen doesn't). > f = a * b > print(f) [1] 2000 > f [1] 2000 > sqrt(b) [1] 4.472136 > print(sqrt(b), digits=3) [1] 4.47 > c(a,b) [1] 100 20 > c(a,sqrt(b)) [1] 100.000000 4.472136 > print(c(a,sqrt(b)), digits=3) [1] 100.00 4.47 # The additional parameter now specifies # the number of significant figures # The c(...) function combines its arguments 1.4.2 Naming objects in the workspace Although the naming of objects is flexible, there are some exceptions, 30 > _a <- 10 Error: unexpected input in "_" > 2a <- 10 Error: unexpected symbol in "2a" Note also that R is case sensitive, so a and A are different objects > a > A > a [1] <- 10 <- 20 == A FALSE The following is rarely sensible because it won't appear in the workspace, although it is there, 40 > .a <- 10 > ls() [1] "a" "b" "f" > .a [1] 10 > rm(.a, A) # Removes the objects .a and A (see below) 1.4.3 Removing objects from the workspace From typing ls() we know when the workspace is not empty. To remove an object from the workspace it can be referenced to explicitly – as in rm(A) – or indirectly by its position in the workspace. To see how the second of these options will work, type > ls() An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 9
  • 10. [1] "a" "b" "f" The output returned from the ls() function is here a vector of length three where the first element is the first object (alphabetically) in the workspace, the second is the second object, and so forth. We can access specific elements by using notation of the form ls[index.number]. So, the first element – the first object in the workspace – can be obtained using, > ls()[1] [1] "a" > ls()[2] [1] "b" Note how the square brackets Similarly, 10 # Get the brackets right! some rounded some square […] are used to reference specific elements within the vector. > ls()[3] [1] "f" > ls()[c(1,3)] [1] "a" "f" > ls()[c(1,2,3)] [1] "a" "b" "f" > ls()[c(1:3)] [1] "a" "b" "f" # 1:3 means the numbers 1 to 3 Using the remove function, rm(...), the second and third objects in the workspace can be removed using 20 > rm(list=ls()[c(1,3)]) > ls() [1] "b" Alternatively, objects can be removed by name > rm(b) To delete all the objects in the workspace and therefore empty it, type the following code but – be warned! – there is no undo function. Whenever rm(...) is used the objects are deleted permanently. > rm(list=ls()) > ls() character(0) 30 # In other words, the workspace is empty 1.4.4 Saving and loading workspaces Because objects are deleted permanently, a sensible precaution prior to using rm(...) is to save the workspace. To do so permits the workspace to be reloaded if necessary and the objects recovered. One way to save the workspace is to use > save.image(file.choose(new=T)) Alternatively, the drop-down menus can be used (File → Save Workspace in the Windows version of the RGui). In either case, type the extension .RData manually else it risks being omitted, making it harder to locate and reload what has been saved. Try creating a couple of objects in your workspace and then save it with the names workspace1.RData To load a previously saved workspace, use 40 > load(file.choose()) or the drop-down menus. When quitting R, it will prompt to save the workspace image. If the option to save is chosen it will An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 10
  • 11. be saved to the file .RData within the working directory. Assuming that directory is the default one, the workspace and all the objects it contains will be reloaded automatically each and every time R is opened, which could be useful but also potentially irritating. To stop it, locate and delete the file. The current working directory is identified using the get working directory function, getwd() and changed most easily using the drop-down menus. > getwd() [1] "/Users/rich_harris" (Your working directory will differ from the above) 10 Tip: A good strategy for file management is to create a new folder for each project in R, saving the workspace regularly using a naming convention such as Dec_8_1.RData, Dec_8_2.RData etc. That way you can easily find and recover work. 1.5 Quitting R Before quitting R, you may wish to save the workspace. To quit R use either the drop-down menus or > q() As promised, you will be prompted whether to save the workspace. Answering yes will save the workspace to the file .RData in the current working directory (see section 1.4.4, 'Saving and loading workspaces', on page 10, above). To exit without the prompt, use > q(save = "no") 20 Or, more simply, > q("no") 1.6 Getting Help In addition to the use of the ? or help(…) documentation and the material available at CRAN, http://cran.r-project.org/, R has an active user community. Helpful mailing lists can be accessed from www.r-project.org/mail.html. Perhaps the best all round introduction to R is the An Introduction to R which is freely available at CRAN (http://cran.r-project.org/manuals.html) or by using the drop-down Help menus in the RGui. It is clear and succinct. 30 I also have a free introduction to statistical analysis in R which accompanies the book Statistics for Geography and Environmental Science. It can be obtained from http://www.social-statistics.org/? p=354. There are many books available. My favourite, with a moderate level statistical leaning and written with clarity is, Maindonald, J. & Braun, J., 2007. Data Analysis and Graphics using R (2nd edition). Cambridge: CUP. I also find useful, Adler, J., 2010. R in a Nutshell. O'Reilly: Sebastopol, CA. Crawley, MJ, 2005. Statistics: An Introduction using R. Chichester: Wiley (which is a shortened version of The R Book by the same author). An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 11
  • 12. Field, A., Miles, J. & Field, Z., 2012. Discovering Statistics Using R. London: Sage However, none of these books is about mapping or spatial analysis (of particular interest to me as a geographer). For that, the authoritative guide making the links between geographical information science, geographical data analysis and R (but not really written for R newcomers) is, Bivand, R.S., Pebesma, E.J. & Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R. Berlin: Springer. Also helpful is, Ward, M.D. & Skrede Gleditsch, K., 2008. Spatial Regression Models. London: Sage. (Which uses R code examples). 10 And Chun, Y. & Griffith, D.A., 2013. Spatial Statistics and Geostatistics. London: Sage. (I found this book a little eccentric but it contains some very good tips on its subject and gives worked examples in R). The following book has a short section of maps as well as other graphics in R (and is also, as the title suggests, good for practical guidance on how to analyse surveys using cluster and stratified sampling, for example): Lumley, T., 2010. Complex Surveys. A Guide to Analysis Using R. Hoboken, NJ: Wiley. 20 Springer publish an ever-growing series of books under the banner Use R! If you are interested in visualization, time-series analysis, Bayesian approaches, econometrics, data mining, …, then you'll find something of relevance at http://www.springer.com/series/6991. But you may well also find what you are looking for for free on the Internet. An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 12
  • 13. Session 2: A Geographical Demonstration of R This session provides a quick tour of some of R's functionality, with a focus on some geographical applications. The idea here is to showcase a little of what R can do rather than providing a comprehensive explanation to all that is going on. Aim for an intuitive understanding of the commands and procedures but do not worry about the detail. More information about the workings of R is given in the next session. More details about how to use R as a GIS and for spatial analysis are given in Sessions 4, 5 and 6. Note: this session assumes the libraries RgoogleMaps, png, sp and spdep are installed and available for use. You can find out which packages you currently have installed by using 10 > row.names(installed.packages()) If the packages cannot be found then they can be installed using install.packages(c("RgoogleMaps","png","sp","spdep")). Note that you may need administrative rights on your computer to install the package. 2.1 Getting Started As the focus of this session is on showing what R can do rather than teaching you how to do it. instead of requiring you to type a series of commands, they can instead be executed automatically from a previously written source file (a script: see Section 1.3.1, page 7). As the commands are executed we will ask R to echo (print) them to the screen so you can following what is going on. At regular intervals you will be prompted to press return before the script continues. 20 To begin, type, > source(file.choose(), echo=T) and load the source file session2.R. After some comments that you should ignore, you will be prompted to load the .csv file schools.csv: > ## Read in the file schools.csv file > wait() Please presss return schools.data <- read.csv(file.choose()) 30 Assuming there is no error, we will now proceed to a simple inspection of the data. Remember: the commands you see written below are the ones that appear in the source file. You do not need to type them yourself for this session. 2.2 Checking the data It is always sensible to check a data table for any obvious errors. > head(schools.data) > tail(schools.data) # Shows the first few rows of the data # Shows the bottom few rows of the data We can produce a summary of each column in the data table using > summary(schools.data) In this instance, each column is a continuous variable so we obtain a six-number summary of the centre and spread of each variable. The names of the variables are An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 13
  • 14. > names(schools.data) Next the number of columns and rows; and a check – row-by-row – to see if the data are complete (have no missing data). > ncol(schools.data) > nrow(schools.data) > complete.cases(schools.data) It is not the most comprehensive check but everything appears to be in order. 2.3 Some simple graphics 10 The file schools.csv contains information about the location and some attributes of schools in Greater London (in 2008). The locations are given as a grid reference (Easting, Northing). The information is not real but is realistic. It should not, however, be used to make inferences about real schools in London. Of particular interest is the average attainment on leaving primary school (elementary school) of pupils entering their first year of secondary school. Do some schools in London attract higher attaining pupils more than others? The variable attainment contains this information. A stripchart and then a histogram will show that (not surprisingly) there is variation in the average prior attainment by school. 20 > > > + attach(schools.data) stripchart(attainment, method="stack", xlab="Mean Prior Attainment by School") hist(attainment, col="light blue", border="dark blue", freq=F, ylim=c(0,0.30), xlab=”Mean attainment) Here the histogram is scaled so the total area sums to one. To this we can add a rug plot, > rug(attainment) also a density curve, a Normal curve for comparison and a legend. 30 > > > > > > + lines(density(sort(attainment))) xx <- seq(from=23, to=35, by=0.1) yy <- dnorm(xx, mean(attainment), sd(attainment)) lines(xx, yy, lty="dotted") rm(xx, yy) legend("topright", legend=c("density curve","Normal curve"), lty=c("solid","dotted")) If would be interesting to know if attainment varies by school type. A simple way to consider this is to produce a box plot. The data contain a series of dummy variables for each of a series of school types (Voluntary Aided Church of England school: coe = 1; Voluntary Aided Roman Catholic: rc = 1; Voluntary controlled faith school: vol.con = 1; another type of faith school: other.faith = 1; a selective school (sets an entrance exam): selective = 1). We will combine these into a single, categorical variable then produce the box plot showing the distribution of average attainment by school type. First the categorical variable: 40 > school.type <- rep("Not Faith/Selective", times=nrow(schools.data)) # This gives each school an initial value which will # then be replaced with its actual type > school.type[coe==1] <- "VA CoE" # Voluntary Aided Church of England schools are given # the category VA CoE An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 14
  • 15. > school.type[rc==1] <- "VA RC" # Voluntary Aided Roman Catholic schools are given # the category VA RC [etc.] > school.type[vol.con==1] <- "VC" > school.type[other.faith==1] <- "Other Faith" > school.type[selective==1] <- "Selective" > school.type <- factor(school.type) > levels(school.type) # A list of the categories [1] "Not Faith/Selective" "Other Faith" "Selective" [etc.] 10 Now the box plots: > > + > > par(mai=c(1,1.4,0.5,0.5)) # Changes the graphic margins boxplot(attainment ~ school.type, horizontal=T, xlab="Mean attainment", las=1, cex.axis=0.8) # Includes options to draw the boxes and labels horizontally abline(v=mean(attainment), lty="dashed") # Adds the mean value to the plot legend("topright", legend="Grand Mean", lty="dashed") Not surprisingly, the selective schools (those with an entrance exam) recruit the pupils with highest average prior attainment. Figure 2.1. A histogram with annotation in R An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 15
  • 16. Figure 2.2. Mean prior attainment by school type 2.4 Some simple statistics It appears (in Figure 2.2) that there are differences in the levels of prior attainment of pupils in different school types. We can test whether the variation is significant using an analysis of variance. > summary(aov(attainment ~ school.type)) Df Sum Sq Mean Sq F value Pr(>F) school.type 5 479.8 95.95 71.42 <2e-16 *** Residuals 361 485.0 1.34 It is, at a greater than 99.9% confidence (F = 71.42, p < 0.001). 10 We might also be interested in comparing those schools with the highest and lowest proportions of Free School Meal eligible pupils to see if they are recruiting pupils with equal or differing mean prior attainment. We expect a difference because free school meal eligibility is used as an indicator of a low income household and there is a link between economic disadvantage and educational progress in the UK. > # > # > attainment.high.fsm.schools <- attainment[fsm > quantile(fsm, probs=0.75)] Finds the attainment scores for schools with the highest proportions of FSM pupils attainment.low.fsm.schools <- attainment[fsm < quantile(fsm, probs=0.25)] Finds the attainment scores for schools with the lowest proportions of FSM pupils t.test(attainment.high.fsm.schools, attainment.low.fsm.schools) Welch Two Sample t-test 20 data: attainment.high.fsm.schools and attainment.low.fsm.schools t = -15.0431, df = 154.164, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.437206 -2.639240 sample estimates: mean of x mean of y 26.58352 29.62174 An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 16
  • 17. It comes as little surprise to learn that those schools with the greatest proportions of FSM eligible pupils are also those recruiting lower attaining pupils on average (mean attainment 26.6 Vs 29.6, t = -15.0, p < 0.001, the 95% confidence interval is from -3.44 to 2.64). Exploring this further, the Pearson correlation between the mean prior attainment of pupils entering each school and the proportion of them that are FSM eligible is -0.689, and significant (p < 0.001): > round(cor(fsm, attainment),3) > cor.test(fsm, attainment) Pearson's product-moment correlation 10 data: fsm and attainment t = -18.1731, df = 365, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.7394165 -0.6313939 sample estimates: cor -0.6892159 Of course, the use of the Pearson correlation assumes that the relationship is linear, so let's check: > plot(attainment ~ fsm) > abline(lm(attainment ~ fsm)) 20 # Adds a line of best fit (a regression line) There is some suggestion the relationship might be curvilinear. However, we will ignore that here. Finally, some regression models. The first seeks to explain the mean prior attainment scores for the schools in London by the proportion of their intake who are free school meal eligible. (The result is the line of best fit added to the scatterplot above). The second model adds a variable giving the proportion of the intake of a white ethnic group. The third adds a dummy variable indicating whether the school is selective or not. > model1 <- lm(attainment ~ fsm, data=schools.data) > summary(model1) Call: lm(formula = attainment ~ fsm, data = schools.data) 30 Residuals: Min 1Q Median -2.8871 -0.7413 -0.1186 3Q 0.5487 Max 3.6681 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.6190 0.1148 258.12 <2e-16 *** fsm -6.5469 0.3603 -18.17 <2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 40 Residual standard error: 1.178 on 365 degrees of freedom Multiple R-squared: 0.475, Adjusted R-squared: 0.4736 F-statistic: 330.3 on 1 and 365 DF, p-value: < 2.2e-16 > model2 <- lm(attainment ~ fsm + white, data=schools.data) > summary(model2) An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 17
  • 18. Call: lm(formula = attainment ~ fsm + white, data = schools.data) Residuals: Min 1Q Median -2.9442 -0.7295 -0.1335 10 3Q 0.5111 Max 3.7837 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.1250 0.1979 152.21 < 2e-16 *** fsm -7.2502 0.4214 -17.20 < 2e-16 *** white -0.8722 0.2796 -3.12 0.00196 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.164 on 364 degrees of freedom Multiple R-squared: 0.4887, Adjusted R-squared: 0.4859 F-statistic: 173.9 on 2 and 364 DF, p-value: < 2.2e-16 > model3 <- update(model2, . ~ . + selective) # Means: take the previous model and add the variable 'selective' > summary(model3) 20 Call: lm(formula = attainment ~ fsm + white + selective, data = schools.data) Residuals: Min 1Q -2.6262 -0.5620 30 Median 0.0537 3Q 0.5607 Max 3.6215 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.1706 0.1689 172.712 <2e-16 *** fsm -5.2381 0.3591 -14.586 <2e-16 *** white -0.2299 0.2249 -1.022 0.307 selective 3.4768 0.2338 14.872 <2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9189 on 363 degrees of freedom Multiple R-squared: 0.6823, Adjusted R-squared: 0.6796 F-statistic: 259.8 on 3 and 363 DF, p-value: < 2.2e-16 Looking at the adjusted R-squared value, each model appears to be an improvement on the one that precedes it (marginally so for model 2). However, looking at the last (model 3), we may suspect that we could drop the white ethnicity variable with no significant loss in the amount of variance explained. An analysis of variance confirms that to be the case. 40 > model4 <- update(model3, . ~ . - white) # Means: take the previous model but remove the variable 'white' > anova(model4, model3) Analysis of Variance Table Model 1: attainment ~ fsm + selective Model 2: attainment ~ fsm + white + selective Res.Df RSS Df Sum of Sq F Pr(>F) An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 18
  • 19. 1 2 364 307.42 363 306.54 1 0.88222 1.0447 0.3074 The residual error, measured by the residual sum of squares (RSS), is not very different for the two models, and that difference, 0.882, is not significant (F = 1.045, p = 0.307). 2.5 Some simple maps For a geographer like myself, R becomes more interesting when we begin to look at its geographical data handling capabilities. 10 The schools data contain geographical coordinates and are therefore geographical data. Consequently they can be mapped. The simplest way for point data is to use a 2-dimensional plot, making sure the aspect ratio is fixed correctly. > plot(Easting, Northing, asp=1, main="Map of London schools") # The argument asp=1 fixes the aspect ratio correctly Amongst the attribute data for the schools, the variable esl gives the proportion of pupils who speak English as an additional language. It would be interesting for the size of the symbol on the map to be proportional to it. > plot(Easting, Northing, asp=1, main="Map of London schools", + cex=sqrt(esl*5)) It would also be nice to add a little colour to the map. We might, for example, change the default plotting 'character' to a filled circle with a yellow background. 20 > plot(Easting, Northing, asp=1, main="Map of London schools", + cex=sqrt(esl*5), pch=21, bg="yellow") A more interesting option would be to have the circles filled with a colour gradient that is related to a second variable in the data – the proportion of pupils eligible for free school meals for example. To achieve this, we can begin by creating a simple colour palette: > palette <- c("yellow","orange","red","purple") We now cut the free school meals eligibility variable into quartiles (four classes, each containing approximately the same number of observations). > map.class <- 30 cut(fsm, quantile(fsm), labels=FALSE, include.lowest=TRUE) The result is to split the fsm variable into four groups with the value 1 given to the first quarter of the data (schools with the lowest proportions of eligible pupils), the value 2 given to the next quarter, then 3, and finally the value 4 for schools with the highest proportions of FSM eligible pupils. There are, then, now four map classes and the same number of colours in the palette. Schools in map class 1 (and with the lowest proportion of fsm-eligible pupils) will be coloured yellow, the next class will be orange, and so forth. Bringing it all together, > plot(Easting, Northing, asp=1, main="Map of London schools", + cex=sqrt(esl*5), pch=21, bg=palette[map.class]) 40 It would be good to add a legend, and perhaps a scale bar and North arrow. Nevertheless, as a first map in R this isn't too bad! An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 19
  • 20. Figure 2.3. A simple point map in R Why don't we be a bit more ambitious and overlay the map on a Google Maps tile, adding a legend as we do so? This requires us to load an additional library for R and to have an active Internet connection. > library(RgoogleMaps) If you get an error such as the following Error in library(RgoogleMaps) : there is no package called ‘RgoogleMaps’ it is because the library has not been installed. 10 Assuming that the data frame, schools.data, remains in the workspace and attached (it will be if you have followed the instructions above), and that the colour palette created above has not been deleted, then the map shown in Figure 2.4 is created with the following code: > MyMap <- MapBackground(lat=Lat, lon=Long) > PlotOnStaticMap(MyMap, Lat, Long, cex=sqrt(esl*5), pch=21, bg=palette[map.class]) > legend("topleft", legend=paste("<",tapply(fsm, map.class, max)), pch=21, pt.bg=palette, pt.cex=1.5, bg="white", title="P(FSM-eligible)") > legVals <- seq(from=0.2,to=1,by=0.2) > legend("topright", legend=round(legVals,3), pch=21, pt.bg="white", pt.cex=sqrt(legVals*5), bg="white", title="P(ESL)") 20 (If you are running the script for this session then the code you see on-screen will differ slightly. That is because it has some error trapping included in it incase there is no Internet connection available) An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 20
  • 21. Remember that the data are simulated. The points shown on the map are not the true locations of schools in London. Do not worry about understanding the code in detail – the purpose is to see the sort of things R can do with geographical data. We will look more closely at the detail in later sessions. Figure 2.4. A slightly less simple map produced in R 2.6 Some simple geographical analysis Remember the regression models from earlier? It would be interesting to test the assumption that the residuals exhibit independence by looking for spatial dependencies. To do this we will consider to what degree the residual value for any one school correlates with the mean residual value for its six nearest other schools (the choice of six is completely arbitrary). 10 First, we will take a copy of the schools data and convert it into an explicitly spatial object in R: > > > > > > > > > detach(schools.data) schools.xy <- schools.data library(sp) attach(schools.xy) coordinates(schools.xy) <- c("Easting", "Northing") # Converts into a spatial object class(schools.xy) detach(schools.xy) proj4string(schools.xy) <- CRS("+proj=tmerc datum=OSGB36") An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 21
  • 22. > # Sets the Coordinate Referencing System Second, we find the six nearest neighbours for each school. > library(spdep) > nearest.six <- knearneigh(schools.xy, k=6, RANN=F) > # RANN = F to override the use of the RANN package that may not be installed We can learn from this that the six nearest schools to the first school in the data (row 1) are schools 5, 38, 2, 40, 223 and 6: > nearest.six$nn[1,] [1] 5 38 2 40 223 10 6 The neighbours object, nearest.six, is an object of class knn: > class(nearest.six) It is next converted into the more generic class of neighbours. 20 > neighbours <- knn2nb(nearest.six) > class(neighbours) [1] "nb" > summary(neighbours) Neighbour list object: Number of regions: 367 Number of nonzero links: 2202 Percentage nonzero weights: 1.634877 Average number of links: 6 [etc.] The connections between each point and its neighbours can then be plotted. It may take a few minutes. > plot(neighbours, coordinates(schools.xy)) Having identified the six nearest neighbours to each school we could give each equal weight in a spatial weights matrix or, alternatively, decrease the weight with distance away (so the first nearest neighbour gets most weight and the sixth nearest the least). Creating a matrix with equal weight given to all neighbours is sufficient for the time being. 30 > spatial.weights <- nb2listw(neighbours) (The other possibility is achieved by creating then supplying a list of general weights to the function, see ?nb2listw) We now have all the information required to test whether there are spatial dependencies in the residuals. The answer is yes (Moran's I = 0.218, p < 0.001, indicating positive spatial autocorrelation). > lm.morantest(model4, spatial.weights) Global Moran's I for regression residuals 40 data: model: lm(formula = attainment ~ fsm + selective, data = schools.data) weights: spatial.weights Moran I statistic standard deviate = 7.9152, p-value = 1.235e-15 alternative hypothesis: greater sample estimates: Observed Moran's I Expectation Variance 0.2181914682 -0.0038585704 0.0007870118 An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 22
  • 23. 2.7 Tidying up It is better to save your workspace regularly whilst you are working (see Section 1.4.4, 'Saving and loading workspaces', page 10) and certainly before you finish. Don't forget to include the extension .RData when saving. Having done so, you can tidy-up the workspace. > save.image(file.choose(new=T)) > rm(list=ls()) # Be careful, it deletes everything! 2.8 Further Information 10 A simple introduction to graphics and statistical analysis in R is given in Statistics for Geography and Environmental Science: An Introduction in R, available at http://www.social-statistics.org/? p=354. An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 23
  • 24. An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 24
  • 25. Download the full version from https://www.dropbox.com/sh/zzibpn2keilrhv3/rsuA7L_jlK An Introduction to Mapping and Spatial Modelling in R. © Richard Harris, 2013 25