2. CONTENTS
2
1. Introduction to R programming language.
2. Spatial analysis in ‘R’
3. R and GIS
4. Literature review
5. Case studies
6. Summary
7. References.
3. Introduction to R language
3
Environment for statistical computing and graphics
- Free software
Associated with simple ,interpreted programming language
Versions of R exist of Windows, MacOS, Linux other Unix
flavours
Easy to create your own functions in R
Simple GIS tasks like topological overlay, raster algebra
etc., can be carried
4. R language includes
4
an effective data handling and storage facility,
a suite of operators for calculations on arrays, in particular
matrices,
a large, coherent, integrated collection of intermediate tools
for data analysis,
graphical facilities for data analysis and display either on-
screen or on hardcopy, and
a well-developed, simple and effective programming
language which includes conditionals, loops, user-defined
recursive functions and input and output facilities.
5. R Function Libraries
5
Implement many common statistical procedures.
Provide excellent graphics functionality.
A convenient starting point for many data analysis projects.
Examples :
maps: allows you to make maps of the world, the US, and
smaller areas
mapproj: allows you to do cartographic projections
6. 6
Fig 1 R project for statistical computing
www. r-project.org
8. 8
R developers have written the R package ‘sp’ to extend R with
classes and methods for spatial data .
- Classes specify a structure and define how spatial data
are organised and stored. - -
- Methods are instances of functions specialised for a
particular data class.
9. Analysis of spatial data in R using points:
9
Points are pairs of coordinates (x; y), representing events, observation posts,
individuals, cities or any other discrete object denned in space.
Let's take a look at the dataset crime, which is just a table of geographic
coordinates (decimal degrees) for crime locations in Baltimore, MD.
head(crime)
ID LONG LAT
1 -76.65159 39.23941
2 -76.47434 39.35274
3 -76.51726 39.25874
4 -76.52607 39.40707
5 -76.51001 39.33571
6 -76.70375 39.26605
11. Polygons and lines:
11
Polygons can be thought of as sequences of connected points, where the
first point is the same as the last.
- An open polygon, where the sequence of points does not result in a
closed shape with a denned area, is called a line.
In the R environment, line and polygon data are stored in objects of classes
SpatialPolygons and Spatial Lines")
Class Polygon [package "sp"]
Name: labpt area hole ringDir coords
Class: numeric numeric logical integer matrix
The data are stored as a SpatialPolygons dataframe, which is a subclass of
SpatialPolygons containing a dataframe of attributes.
12. Preparation of a simple map in R
12
Fig 4- showing a simple map
library(maps)
library(mapdata)
map("worldhires","canada",
xlim=c(-141,-53),
ylim=c(40,85),
col="gray90",
fill=TRUE)
http://www.r-bloggers.com/maps-with-r-and-polygon-
boundaries/
13. R and GIS
13
The aim of integrating R and ArcGIS is to provide
an automated way of offering R scripts as ArcGIS
Geoprocessing Tools.
In some cases the analysis is composed by several
steps, demanding different capabilities in such cases
this kind of interface is most suitable.
14. 14
Examples of R packages providing an interface to
GIS:
GRASS GIS can be connected through R package spgrass6.
R can access SAGAGIS modules through the R
package RSAGA (currently Windows, Linux, FreeBSD and probably
others); SAGA GIS is an open-source GIS with mainly raster processing
capabilities such as terrain analysis.
R can also run ArcGIS geoprocessing tools through the R
package RPyGeo (Windows only).
-RPyGeo uses Python scripts to communicate with ArcGIS.
RPyGeo/ArcGIS operates on files (raster and shapefiles).
15. 15
Figure 5- shows the workflow required to expose an R script as a Python toolbox and how the
toolbox communicate with the original R script in order to run the algorithm.
16. Applications:
16
Geosciences
Water resources
Environmental science
Agriculture and soil science
Mathematics and statistics
Ecology
Geodesy
The exploitation of fossil fuels, and
Meteorology
17. LITERATURE REVIEW
17
Bivand(2001) gives the sketching of key modes of spatial data
analysis (point pattern, continuous surface, areal/lattice), and
how they integrate into legacy GIS data models.
Roger(2007) gave a brief description of how to access data
and also covered how coordinate reference systems are
handled, because they are the foundation for spatial data
integration
Bajat(2012)presents possibilities of applying the
geographically weighted regression, method in mapping
population change index
18. 18
Bajat(2012) presents possibilities of applying the
geographically weighted regression, method in mapping
population change index in the spatial modelling of population
concentration
Shane(2013) described some statistical and mapping
techniques developed for handling and interrogating large-
scale multi-media geochemical datasets using the R with
Python scripting languages along with GIS
19. CASE STUDY 1
19
Kate(2013) Utilized open-source programming languages to
statistically and spatially analyse regional-scale
geoenvironmental datasets.
Objective
Making best use of open-source programming languages such
as R in analysing regional-scale geoenvironmental datasets and
developing a web mapping service and online viewer for the
datasets.
Study area
The border region of Northern Ireland and interior of
Northern Ireland.
20. 20
Fig 5: graphical plots produced in R after quality assurance and quality
control assessment of analytical data
Methodology
R–Statistical analyses:
-R is employed initially to output a range of graphical plots for quality
assurance and quality control assessment of analytical data with respect to
laboratory reference materials (as shown in fig5).
21. 21
Exploratory data analyses are carried out to assess the data
distribution.
Multivariate analytical techniques such as robust factor analyses
and hierarchical cluster analyses are used to investigate statistical
and spatial correlations between elements.
Mapping
R and Python code have been developed to automate the
process of exploratory data analysis, spatial data analysis, data
interpolation.
Map production using the arcPy mapping module is done.
Online viewer
Finally , a web mapping service and online viewer for the
mapped datasets, with live links to a managed database is
developed.
23. CASE STUDY 2
23
Acta Silvae (2013) illustrated the use of R programming language
in the analyses of spatial data.
Objective
The aim of this article is to demonstrate the R’s potential for the
spatial data processing and presentation.
Study area
Snežnik (south Slovenia) forest measuring 20 ha.
Increases in altitude from 820 m to 880 m.
Silver fir and European beech are the dominant tree species. The
terrain is characterized by abundant sinkholes.
24. 24
Methodology
Manipulation of vector data :
Coordinates were recorded using GPS devices and exported to
a text file.
This text file was imported into the R environment using the
library ‘map tools’ .
Geospatial spatial interpolation:
A spatial interpolation (kriging) of the temperature throughout
the research area using the library gstat .
A variogram model which is a function of the spatial
dependence of random variables is to be selected.
The point measurements were used to create a continuous
temperature field in raster format .
26. 26
LiDAR data processing:
R is used as a tool for large amounts of data processing,
programming of the raw LiDAR data for 1 km2.
Has a size of 539,468 KB (539 MB) and contains 20,736,221
rows and 62,208,663 data points.
In R, an algorithm is written to eliminate points that represent
forest trees in the whole cloud of points, yielding a point of the
terrain.
27. 27
Fig. 8: The 3D point cloud (gray) of longitudinal profile in the research area. The red points are
marked on the floor, which were determined based on the algorithm written in R.
28. 28
-A digital elevation model (DEM) was produced based on these classified
points.
Fig. 9: 3D elevation model based on LiDAR data. The surface coloured with a colour range
of the altitude value
29. SUMMARY
29
R has become a high quality open-source software environment
for statistical computing and graphics
It has a high performance GIS tool that can be used for
geospatial data production, analysis, and mapping.
R allows the usage of many control flows, loops and user-
defined functions, multiple input and output data formats.
It gives the opportunity to codify the existing data and
functions.
The entire process of analyzing data within R is run through a
written script and syntax, which means that it is simple to rerun
these analyses if needed.
30. References
30
Bivand et Albrecht implementing functions for spatial
statistical analysis using the r language journal of geographical
systems, 2:307-317, 2000.
B.bajat (2012) spatial modelling of population concentration
using geographically weighted regression method, journal of
the geographical institute sass 01/2011; 61:151-167.
Howarth (1983) vol.2: ‘statistics and data analysis in
geochemical prospection’, in handbook of exploration
geochemistry, pages 69-73, elsevier, Amsterdam,
31. 31
M. Mcclelland et. wang(2010) ‘a python package for using r
in python’, journal of statistical software, code snippets.
Thibaul et al. Using an R package for exploratory spatial data
analysis april 2012, volume 47, issue 2.
(2012) ‘Statda: Statistical Analysis for Environmental Data.’
URL: http://CRAN.R-project.org/package=statda. R package
version 1.6.2.