2. Welcome
In this modern world, huge amount of data originating from various sources like financial transactions,
geographical data , ecommerce websites there are plenty of data sources in the form of raw data’s. Next question
will running in our mind how we are going to make those raw data to meaningful assets . Famous quote by Ann
Winblad “In its raw form, oil has little value. Once processed and refined, it helps power the world” due to it’s
important of processing the raw data’s - data analytics and data science has emerged in present world .
Made with ❤ by:
Senthil Kumar Srinivasan
3. 1 Why R ?
Agenda
2 Ten steps to Install R
3 Commands & Description
4 Variables and Operators
5 R –Data Structures
6 Function in R
7 About R Package
8 R - Import Data
4. 1.Why R ? • It is an open source language under GNU
General Public License
• R has state-of-the-art graphics capabilities. If
you want to visualize complex data,R has the
most comprehensive and powerful feature set
available.
• R is cross-platform compatible. So, we can
utilized in Windows, Mac, or a Linux
• As on current date, there are over 7,000
packages available which are developed for
different kind of applications and these package
are all free to use
5. 2.Ten Steps to Install R in Windows
1. To download R go to
https://cran.r-project.org/
2. Select the OS version which you have to installed in you Machine
3. Select the base option from Sub directory
4.Select the base option from Sub directory
6. 2.Ten Steps to Install R in Windows
5. select and start the installation process for R-3.3.2-win.exe
6. Select your preferable language.
7. Click the next wizard license information enter the location where R tool to install .
8. select your component for R programming.
7. 2.Ten Steps to Install R in Windows
9. Click next for setting name for Start up
10. After submitting the next button
Installation will complete and
you will get the R icon in desktop.
8. 3.Commands & Description
Function Description
help.start()
help(“search”)
help(search)
General help format It will open new browser page.
It will pull out the details exact term related to search.
It will show help content with exact term as search.
Demo()
List of all demonstrations
Vignette()
List all available vignettes for currently installed
packages.
Data()
List all available example datasets contained in
currently loaded packages.
getwd()
List the current working directory.
setwd("mydirectory")
Change the current working directory to my directory.
9. 3.Commands & Description
Function Description
ls() List the objects in the current workspace.
rm(objectlist) Remove one or more objects.
options() View or set current options
Data() List all available example datasets contained in
currently loaded packages.
history(#) Display your last # commands (default = 25).
savehistory("myfile") Save the commands history to myfile ( default = .Rhistory).
q() Quit R.
10. 4. Variables and Operators
• testValue <-100 . A variable can be considered as a
storage which can hold any object
• If we have assign a value to a variable , we can use the
assignment operator (<- ) . The assignment operator is
denoted by less than symbol followed by a dash
symbol. It would be advised to provide the space
before and after the assignment operator in the
statement. This spacing is not mandatory, but ideally
you should use such spacing as it increases the
programs readability, and reduces any unwanted
confusion.
• R Variables are case sensitive . In below case we
assigned value as 100 to testValue and due to case
sensitive second value as testvalue is mention as
object is not found due to case sensitive
11. 4. Variables and Operators
Operators
Arithmetic Logical
Addition +
Subtraction -
Multiplication -
Division /
Exponent ^
Modulus %%
Integer Division %/%
>= Greater than
Equal to
> Greater than
< Less than
<=Less than
equal to
== Equal
!= Not equal to
&& Logical AND
|| Logical OR
13. 5. R - Data Structure
R has a wide variety of objects for holding data,
including scalars, vectors, matrix, arrays, data frames,
and lists. They differ in terms of the type of data they
can hold, how they’re created, their structural
complexity, and the notation used to identify and
access individual elements
Vector.
Matrix.
Array.
Data Frame
List.
14. 5.1 Vector
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data.
In otherworld we can say this as an Homogeneous data structure .
The combine function c() is used to form the vector. Here are examples of each type of vector
15. 5.2 Matrix
In R Matrix, data is stored in row and columns, and we can access the matrix element using both
the row index and column index (like an Excel File) and Matrix would be an homogeneous data structure .
It contains elements of same class and It's typically used to store numeric data .A matrix is created using the
matrix() function .
Syntax for creating matrix in R matrix ( data, nrow , ncol, byrow ,dimnames ) and following description would
be
Data Contains the elements of matrix .
Nrow Number of rows would be created
Crow Number of column would be created
Byrow if it’s true then elements are arranged by row
Dimname names assigned to rows and columns.
18. 5.3 Array
Array will be an homogeneous data structure like matrixes and vector and it will have similar types
of items , vector would be an one dimensional and matrixes are two dimensional array but array have multi
dimensional setup.
Syntax for creating Array in R myarray < - array(vector, dimensions, dimnames) .
Vector contains the data for the array.
Dimensions is a numeric vector giving the maximal index for each dimension,.
dimnames is an optional list of dimension labels.
21. 5.4 Data Frame
It is a heterogeneous data structure. Means this data structure can also contain
elements of different classes, just like the list. You can image that data frames would be your spreadsheets
where you have different columns. Each column is a field and values will be stored in different data rows
Data frame is created with the data.frame() function : mydata <- data.frame(col1, col2, col3,…)
22. View data first or last data in employee data frames – head( employees ) ,
tail ( employees ) ,
head ( employees , 2) ,
24. 5.5 List
List is a heterogeneous data structure. It means that you can put items of different classes or types
and list are one-dimensional in nature, so all elements of the list will be arrange in one dimension.
For example, a list may contain a combination of vectors, matrices, data frames, and even other lists.
You create a list using the list() function : mylist <- list(name1=object1, name2=object2, …)
26. 6.Function in R
A function is a set of statements organized together to perform a specific task . In R , a function is
a object so R interpreter is able to pass and control the function , along with arguments that may be necessary
for the function to accomplish the actions. The function in turn performs its task and returns control to the
interpreter as well as any result which may be stored in other objects .
The basic syntax for R function would be
GetMultiplicationResult <- function(value1 , value2).
Function Name - GetMultiplicationResult
Keyword - function .
Arguments - value1 , value 2
27. 6.1 Calling a function with default arguments
We can define the value of arguments in the function definition and call the function without
supplying any arguments to get default result in other word we can say as an argument with some default
value, is also known as default argument. This is an argument which is not required to specify in the function
call.
In below example we have passed the default value as 10 and 25 for two arguments and function
has been called with default parameter as GetFunctionWithDefaultArguments
28. 6.2 Lazy Arguments Evaluation
Arguments to functions are evaluated lazily , which means they are evaluated only when needed
by the function body . It is also very useful for default values of parameters. These are evaluated inside the
scope of the function, so you can write default values that depend on other parameters
29. 6.3 Multiple Return Values
we have looked at the functions which return only a single value. And we learned that there is no
need of explicit return statement and by default, the last line of the function will be treated as the return value.
However, if you have to return multiple values from a function, a neat way to do this is to use a list
with the return statement
30. 6.4 Functions as Objects
In R, functions are first-class R objects just like any other R objects .It means that you can look
into them. You can assign them to some other variable. Or you can pass them as arguments to other functions
also
31. 7.About R Package
R-Package is basically a collection of different items, such as, Functions, Datasets, Compiled code
and Documentation. So typically, each package is intended for a certain area and are used to accomplish
certain tasks in that area. These packages are available in some central repositories. From there, anyone can
download these packages. So when you download and install an R-Package on your local machine, then
contents of that package is copied from the repository and stored in a directory on your local machine.
If you want to see a list of all installed packages on your machine, then you can use the library
command as library(). Once it’s executed you can able to see the results are displayed in another tab in
RStudio. Here you can see, there are several packages already installed on your machine and These
packages are installed when you install the base R version on your machine
32. 7.1 Load R Package
Why R does not load all installed packages, when we start any R session? Well, R does not load
all installed packages by default purposefully. When you install R on your local machine, only few standard
packages are installed. But over the period, you can install various add-on packages. And after some time,
there may be a large number of packages installed on your machine.
Search() To see the list of all packages which are currently loaded and attached in R session
library("parallel") suppose you want to use some functions, available in the parallel package, then you have
to load the package into the memory and to do so you can use the library command and you can denotes
without quotes also .
detach(package:parallel, unload=TRUE) To Remove the packages
33. 7.2 Install R Package
To install a package, you need to first download the package, from some repository. Repository
can contain several packages, and you can download and install the required package from the repository.
Repository has CRAN, or Comprehensive R Archive Network. Currently more than 5,500 packages as on date
available in repository and more and more packages are getting added to this repository. These packages are
developed, for a variety of application areas.
To install single package - install.packages("ggplot2") #Single package
34. 8 Import Data
For any data analysis project using R, we first need to bring all required data in the R environment
so that we can work on them. In the real world scenario data may be available from a variety of sources and in
a variety of formats.
8.1 How to import data from csv
CSV, or comma separated value files, are one of the most common file formats for storing data.
So if you have opened a CSV file in a Notepad application, then you will find values are separated by comma.
So you may have header information in the file, and then in each row values will be stored and are separated
by comma
employeeTrainingDet
ails.csv
35. 8.2 How to import data from xml ?
XML, or extensible markup language, is another very common and open source file format to store
data. Here's the typical XML data, this XML data contains information about two employee about trainings.
Trainingdetails.xml
36. 8.3 How to import data from Excel ?
In R, there are several add on packages, To import Excel files in R, we will take an package,
known as XL Connect. This package will work on all major operating systems such as Windows, Mac and
Linux provided you have a Java installed on your machine. This package can be used to read both xls as well
as xlsx file.
37. 8.3 How to import data from Excel ?
employeetraining.xlsx