2. Course Outline
• Introduction
– Different Statistical Software
• Data Preparation, Management, Manipulation,
Summarization with:
– SPSS
– R (R Commander)
– Ms. Excel
• Data Tabulation and Visualization
Computational Statistics 2
3. Course Outline
• Generate Different Statistical Distribution (with
Rcmdr)
• Simple Linear Regression and Correlation
• Basic R Programming
• Developing Simple Graphical User Interface in R
• Resampling Methods
• Statistical Inference (Point and interval
estimation)
Computational Statistics 3
4. Course Outline
• Hypothesis testing: one, two sample t-test (test
for mean difference, proportion and variance)
• Analysis of Variance (Anova): one and two way
Anova.
• Introduction to Design of Experiment
• Final Project
Computational Statistics 4
5. Course Workload
• 20% Theory, 80% practice
• Group Project (5 students)
• Presentation every week
• R code would be provided
• Slides can be seen at :
http://www.slideshare.net/hafidztio/
Computational Statistics 5
7. Reference Books
• John Maindonald dan W. John Braun. Data Analysis and
Graphics Using R – an Example-Based Approach. 3rd
Edition. Cambridge University Press: Cambridge.2010.
• John Fox. Journal of Statistical Software, The R
Commander : A Basic-Statistics Graphical User Interface
to R.Volume 14, Issue 9, September 2005.
• Chris Beeley. Web Application Development with R
Using Shiny. Packt Publishing: Birmingham.2013.
• SPSS Statistics Base User’s Guide 17.0. Polar
Engineering and Consulting : Chicago, 2007.
Computational Statistics 7
8. Reference Books
• Jurusan Komputasi Statistik. Modul Mata Kuliah
Komputasi Statistik. 2014
• Kerns, G. Jays. Introduction to Probability and Statistics
Using R. E book. GNU Free Documentation License.
2010.
• Geof H. Givens dan Jennifer A. Hoeting. Computational
Statistics, 2nd edition. John Wiley and Sons : New
Jersey. 2013
• Jochen Voss. Statistical Computing. E book. 2011.
• Brent B. Welch, Ken Jones dan Jeffrey Hobbs. Practical
Programming in Tcl and Tk. 4Th edition. Prentice Hall
PTR: New Jersey.2003.
Computational Statistics 8
13. What is Statistics?
• Statistics: is the science which deals with
collection, classification and tabulation of
numerical facts as the basis for explanation,
description and comparison of phenomenon”.
Computational Statistics 13
14. Observations on the
Bills of
Mortality (1662)
Recorded Plague
related death for
100 years
Computational Statistics 14
15. What is Statistics?
• Exploring data: Using graphical and numerical
techniques to study patterns and departures from
patterns (in order to interpreting data)
• Sampling and experimentation: Clarifying the
question, deciding on methods of collection and analysis
to produce valid information.
• Anticipating patterns: Exploring random phenomena
using probability and simulation. Probability is our tool for
anticipating distributions...
• Statistical Inference: Estimating population parameters
and testing hypothesis
Computational Statistics 15
16. “Statistical thinking will one day be as
necessary for efficient citizenship as the
ability to read and write” HG Well
Computational Statistics 16
17. Areas of Statistics
Two areas of statistics:
Descriptive Statistics: collection, presentation,
and description of sample data.
Inferential Statistics: making decisions and
drawing conclusions about populations.
Computational Statistics 17
18. Statistics Descriptive
What is your conclusion?
The fatality rate is:
– 40% in the group of drivers who did not wear seat belts
– 20%in drivers who did wear seat belts
• Seat belts appear to save lives
18Computational Statistics
19. Inferential Statistics
• Are results applicable to the population of all drivers?
(generalization)
• Does wearing seat belts save lives? (assess strength of
evidence)
• Is the fatality rate of those not wearing seat belts higher than
the fatality rate of those wearing seat belts? (comparison)
• How many lives can be saved by wearing seat belts?
(prediction)
• Do other variables influence the conclusion? For example:
the age of driver, alcohol use, type of car, speed at impact
(ask more questions)
19Computational Statistics
20. Statistics and the Technology
• The electronic technology has had a tremendous effect
on the field of statistics.
• Many statistical techniques are repetitive in nature:
computers and calculators are good at this.
• Lots of statistical software packages: R, MINITAB,
SYSTAT, STATA, SAS, Statgraphics, SPSS, MS Excel,
and calculators.
Computational Statistics 20
32. R is HOT !
• R is HOT !
http://r4stats.com/articles/popularity/Computational Statistics 32
33. R is HOT !
http://r4stats.com/articles/popularity/Computational Statistics 33
34. R is HOT !
http://r4stats.com/articles/popularity/Computational Statistics 34
35. What is R?
• A language and environment for statistical computing and
graphics.
• An integrated suite of software facilities for data
manipulation, calculation and graphical display.
• First appeared in 1996 by Prof. Ross Ihaka and Robert
Gentleman of the University of Auckland, NZ.
• GNU software -> Free. Similar like S language.
• Open source, maintained and developed by a community
of developers.
• Works in Windows, Unix, MacOsComputational Statistics 35
36. R includes
• Effective data handling and storage facility,
• A suite of operators for calculations on arrays, in particular
matrices
• A large, coherent, integrated collection of intermediate
tools for data analysis,
• Graphical facilities for data analysis and display either on-
screen or on hardcopy
• Well-developed, simple and effective programming
language which includes conditionals, loops, user-defined
recursive functions and input and output facilities.
http://www.r-project.org/
Computational Statistics 36
37. Why R?
• It is not only statistical software but
also a language
• 5000 add-on packages lots of pre-
prepared packages (http://cran.r-
project.org/web/packages/)
• With many applications http://cran.r-
project.org/web/views/,
http://www.revolutionanalytics.com/r-
language-features-applications-and-
extensions#thirdparty .
• Access to powerful, cutting-edge
analytics Computational Statistics 37
38. Why R?
• Flexible (complex or standard statistical practices, bayesian
modelling, GIS map building, building interactive web
applications, building interactive tests, etc. )
• We can make our own package and publish it
• Great Graphics and data visualization
• Can be used for High Performance Computer Clusters
• Well Supported by R Community (http://www.inside-r.org/r-
resources-web)
• And many more…..
Computational Statistics 38
39. Why R?
• Can be integrated with other languages (C/C++,
Java).
• R can interact with many data sources and other
statistical packages (SAS, Stata, SPSS, and Minitab).
• For the high performance computing task
multiple cores, either on a single machine or across a
network.
39Computational Statistics
40. But…..
• R has no warranty
• Command Line Interface : difficult for some users.
• Users must learn a new way of thinking about data
and data analysis sequence
• That’s all ….. I guess
Computational Statistics 40
41. Companies using R in 2013
• The New York Times routinely uses R for interactive and print data
visualization.
• Google has more than 500 R users.
• The FDA supports the use of R for clinical trials of new drugs.
• The National Weather Service uses R to predict the extent of flooding
events.
• Zillow uses R to model housing prices.
• The Consumer Financial Protection Bureau uses R and other open
source tools.
• Twitter uses R for data science applications on the Twitter database.
• FourSquare uses R to develop its recommendation engine.
• Facebook uses R to model all sorts of user behaviour.
Source: RevolutionanalyticsComputational Statistics 41
42. R Library/packages
R Base Packages
lme4
IsoGene
foreign
survival
zoo
ggplot2
zoo
reshape2
nlme
Computational Statistics 42
43. My R Packages
• IsoGene
• IsoGeneGUI
• nea
• neaGUI
• biclustGUI
• OCRME
• More detail: http://setiopramono.wordpress.com/r-
programming/
Computational Statistics 43
44. R For Cutting Edge
Technologies
44Computational Statistics
45. R Graphics and Visualization
• R provides wide range graphics and visualizations
• Basic Plots: bar plots, basic 3D plots, heatmap.,etc
• Geographic Maps
• Projection Maps
• Social Network Graphs
• Animated graphics and movies (animation)
• Motion Charts (GoogleViz)
• Interactive Graphics (rggobi)
• Image format: BMP, JPEG, PDF, PNG etc…
• and….many more………
Computational Statistics 45
50. R Graphical User Interfaces
• R uses Command line interface and it is preferred for
advanced users allows direct control, more accurate,
flexible and the analysis is reproducible.
• Requires good knowledge of the language difficult for
beginners or less frequent users.
• R provides tools for building GUIs RGUI
Computational Statistics 50
51. R GUI Projects
• Integrated development environment (IDE)/Script
Editors aimed to provide feature-rich environments to
edit R scripts and code: Rstudio (www.rstudio.com),
and architect (www.Openanalytics.eu)
• Web based application: the Rweb (Banfield, 1999),
R.Net (www.u.arizona.edu/~ryckman/Net.php),
or gWidgetsWWW (Verzani, 2012).
51Computational Statistics
52. R GUI Projects
• Python: OpenMeta-Analyst (Wallace et al, 2012)
• Java: JGR (Java GUI for R), Deducer (Fellows, 2012),
and Glotaran (Snellenburg, 2012).
• Php: R-php (http://dssm.unipa.it/R-php/)
• Other extensions connect R to graphical toolboxes for
developing menus and dialog boxes: Tcltk, Gtk.
52Computational Statistics
53. R Studio
• Download from
Rstudio.com
• Powerfull IDE
(Integrated
Development
Environment) for R.
Computational Statistics 53
60. RGUI: Shiny
• A new package from Rstudio to build interactive web
applications with R.
• Really Easy!
• Build useful web applications with only a few lines of
code—no JavaScript required.
• Self learning: http://shiny.rstudio.com/
• http://www.showmeshiny.com/
Computational Statistics 60
61. RGUI using Shiny: FAST
Figure 5. FAST main page
61Computational Statistics
63. Want to Learn R? Need Help?
Lots of Self learning Resources
http://www.rdatamining.com/resources/onlinedocs
Blogs:
Software # Blogs Blogs Source
R 550 R-Bloggers.com
Python 60 SciPy.org
SAS 40 PROC-X.com, sasCommunity.org Planet
Stata 11 Stata-Bloggers.com
User Group: Stockholm R User group, etc…
Indonesia/Jakarta?
https://sites.google.com/site/biostatinfocore/introduction-to-r
Computational Statistics 63