SlideShare uma empresa Scribd logo
1 de 23
R Programming
What is R?
 R is world’s most widely used statistics programming language .
R is a programming language and software environment for
 Statistical analysis.
 Graphics representation and reporting .
R provides a suite of operators for calculations on arrays, lists,
vectors and matrices.
History
 R is a programming language it was an
implementation over S language. R was first
designed by Ross Ihaka and Robert Gentleman
at the University of Auckland in 1993
 It was stable released on October 31st 2014 the
four months ago, by R Development Core
Team Under GNU General Public License
Introduction
 R is a programming language and software environment for statistical computing
and graphics
 The R language is widely used among statisticians software and data analysis
 It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS.
 R can be downloaded and installed from CRAN website, CRAN stands for
Comprehensive R Archive Network
R - Data Types
Primitive (or atomic) data types in R are:
• Numeric (integer, double, complex)
• Character
• Logical
• Function
Text Mining with R
 R is an open source language and environment for statistical computing and
graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which
are used to carry out the earlier-mentioned steps in text processing. The first
prerequisite is that Rand R Studio need to be installed on your machine. R is an
open source language and environment for statistical computing and graphics. It
includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to
carry out the earlier-mentioned steps in text processing. The first prerequisite is
that Rand R Studio need to be installed on your machine.
Packages Used in Text Mining
 RSQLite, ‘SQLite’ Interface for R
 tm, framework for text mining applications
 SnowballC, text stemming library
 Wordloud, for making wordCloud visualizations
 Syuzhet, text sentiment analysis
Reading SQLite data in R
 Docs <- Corpus(docs,VectorSource(docs$comments))
# Get all the emails sent by Hillary
 Comm <- read.csv(“comments.csv”, header = TRUE)
 emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")
Cleaning Text in R
 Install.packages(“tm”)
 Install.packages(“NLP”)
 Load text mining package - library(“tm”)
 docs <- Corpus(VerctorSum(emailRaw)) – Corpus it is a collection of text
documents
Processing text in R
 docs <- tm_map(docs, content_transformer(tolower)) – It makes all the words to
lower cases.
 docs <- tm_map(docs, removeNumbers) - It removes numbers
 docs <- tm_map(docs, removeWords, stopWords(“english”)) – It removes stop
words like the, is, of
 docs <- tm_map(docs, removePunctuation) – It removes Punctuation
 docs <- tm_map(docs, stripWhiteSpace) – It removes extra White Spaces
SnowballC to Stem Text
 #Text stemming (reduces words to their root form)
 library("SnowballC")
 docs <- tm_map(docs, stemDocument)
 # Remove additional stopwords
 docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))
SnowballC to Stem Text
 dtm <- TermDocumentMatrix(docs)
 m <- as.matrix(dtm)
 v <- sort(rowSums(m),decreasing=TRUE)
 d <- data.frame(word = names(v),freq=v)
 head(d, 10)
Some picture
Visualizations
 #Wordcloud
 Uses two libraries libraries – wordcloud and
RcolorBrewer
 #Sentiment Analysis
 Uses library - syuzhet
k

Mais conteúdo relacionado

Mais procurados

Formal Methods lecture 01
Formal Methods lecture 01Formal Methods lecture 01
Formal Methods lecture 01
Sidra Ashraf
 
Uml Presentation
Uml PresentationUml Presentation
Uml Presentation
mewaseem
 

Mais procurados (20)

R programming presentation
R programming presentationR programming presentation
R programming presentation
 
Advance database systems (part 1)
Advance database systems (part 1)Advance database systems (part 1)
Advance database systems (part 1)
 
DBMS Practical File
DBMS Practical FileDBMS Practical File
DBMS Practical File
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Principles of operating system
Principles of operating systemPrinciples of operating system
Principles of operating system
 
TOC 1 | Introduction to Theory of Computation
TOC 1 | Introduction to Theory of ComputationTOC 1 | Introduction to Theory of Computation
TOC 1 | Introduction to Theory of Computation
 
Formal Methods lecture 01
Formal Methods lecture 01Formal Methods lecture 01
Formal Methods lecture 01
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
 
Evolution of enterprise systems architectures
Evolution of enterprise systems architecturesEvolution of enterprise systems architectures
Evolution of enterprise systems architectures
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Uml Presentation
Uml PresentationUml Presentation
Uml Presentation
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
 

Semelhante a Data Mining with R programming

Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
anshikagoel52
 

Semelhante a Data Mining with R programming (20)

R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
 
R programming language
R programming languageR programming language
R programming language
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
R_L1-Aug-2022.pptx
R_L1-Aug-2022.pptxR_L1-Aug-2022.pptx
R_L1-Aug-2022.pptx
 
R language
R languageR language
R language
 
R programming
R programmingR programming
R programming
 
Basics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptxBasics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptx
 
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSTAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 

Último

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Último (20)

Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 

Data Mining with R programming

  • 2. What is R?  R is world’s most widely used statistics programming language . R is a programming language and software environment for  Statistical analysis.  Graphics representation and reporting . R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
  • 3. History  R is a programming language it was an implementation over S language. R was first designed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1993  It was stable released on October 31st 2014 the four months ago, by R Development Core Team Under GNU General Public License
  • 4. Introduction  R is a programming language and software environment for statistical computing and graphics  The R language is widely used among statisticians software and data analysis  It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS.  R can be downloaded and installed from CRAN website, CRAN stands for Comprehensive R Archive Network
  • 5. R - Data Types Primitive (or atomic) data types in R are: • Numeric (integer, double, complex) • Character • Logical • Function
  • 6. Text Mining with R  R is an open source language and environment for statistical computing and graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to carry out the earlier-mentioned steps in text processing. The first prerequisite is that Rand R Studio need to be installed on your machine. R is an open source language and environment for statistical computing and graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to carry out the earlier-mentioned steps in text processing. The first prerequisite is that Rand R Studio need to be installed on your machine.
  • 7. Packages Used in Text Mining  RSQLite, ‘SQLite’ Interface for R  tm, framework for text mining applications  SnowballC, text stemming library  Wordloud, for making wordCloud visualizations  Syuzhet, text sentiment analysis
  • 8.
  • 9. Reading SQLite data in R  Docs <- Corpus(docs,VectorSource(docs$comments)) # Get all the emails sent by Hillary  Comm <- read.csv(“comments.csv”, header = TRUE)  emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")
  • 10. Cleaning Text in R  Install.packages(“tm”)  Install.packages(“NLP”)  Load text mining package - library(“tm”)  docs <- Corpus(VerctorSum(emailRaw)) – Corpus it is a collection of text documents
  • 11. Processing text in R  docs <- tm_map(docs, content_transformer(tolower)) – It makes all the words to lower cases.  docs <- tm_map(docs, removeNumbers) - It removes numbers  docs <- tm_map(docs, removeWords, stopWords(“english”)) – It removes stop words like the, is, of  docs <- tm_map(docs, removePunctuation) – It removes Punctuation  docs <- tm_map(docs, stripWhiteSpace) – It removes extra White Spaces
  • 12. SnowballC to Stem Text  #Text stemming (reduces words to their root form)  library("SnowballC")  docs <- tm_map(docs, stemDocument)  # Remove additional stopwords  docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))
  • 13. SnowballC to Stem Text  dtm <- TermDocumentMatrix(docs)  m <- as.matrix(dtm)  v <- sort(rowSums(m),decreasing=TRUE)  d <- data.frame(word = names(v),freq=v)  head(d, 10)
  • 14. Some picture Visualizations  #Wordcloud  Uses two libraries libraries – wordcloud and RcolorBrewer  #Sentiment Analysis  Uses library - syuzhet
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. k

Notas do Editor

  1. Old programming No multithreading Data loaded directly into memory limits fuctionlaity for larger datasets Sandbox…subsample data Microsoft working on multicore r h2o