SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Data Analysis Course
Basics & Terminology(Version-1)
 Venkat Reddy
Data Analysis Course
• Data analysis design document
•
•   Descriptive statistics
•   Data exploration, validation & sanitization




                                                               2
                                                                 Venkat Reddy
                                                          Data Analysis Course
•   Probability distributions examples and applications
•   Simple correlation and regression analysis
•   Multiple liner regression analysis
•   Logistic regression analysis
•   Testing of hypothesis
•   Clustering and decision trees
•   Time series analysis and forecasting
•   Credit Risk Model building-1
•   Credit Risk Model building-2
Note
• This presentation is just class notes. The course notes for Data
  Analysis Training is by written by me, as an aid for myself.
• The best way to treat this is as a high-level summary; the
  actual session went more in depth and contained other




                                                                          3
                                                                            Venkat Reddy
                                                                     Data Analysis Course
  information.
• Most of this material was written as informal notes, not
  intended for publication
• Please send questions/comments/corrections to
  venkat@trenwiseanalytics.com or 21.venkat@gmail.com
• Please check my website for latest version of this document
                                         -Venkat Reddy
What is “Statistics”?
• Statistics is the science of data that involves:
  •   Collecting
  •   Classifying
  •   Summarizing
  •   Organizing and




                                                                               Venkat Reddy
                                                                        Data Analysis Course
  •   Interpretation
Of numerical information.
• Examples:
  •   Cricket batting averages
  •   Stock price
  •   Climatology data such as rainfall amounts, average temperatures
  •   Marketing information
                                                                               4
  •   Gambling?
Key Terms
• What is Data?
   • facts or information that is relevant or appropriate to a decision
     maker
• Population?
   • the totality of objects under consideration




                                                                                 Venkat Reddy
                                                                          Data Analysis Course
• Sample?
   • a portion of the population that is selected for analysis
• Parameter?
   • a summary measure (e.g., mean) that is computed to describe a
     characteristic of the population
• Statistic?
   • a summary measure (e.g., mean) that is computed to describe a
     characteristic of the sample                                                5
Variables
• Traits or characteristics that can change values from case to
  case.
• Examples:
  •   Age




                                                                         Venkat Reddy
                                                                  Data Analysis Course
  •   Gender
  •   Income
  •   Social class




                                                                         6
Types Of Variables
• In causal relationships:
          CAUSE               EFFECT
   independent variable  dependent variable
• Independent variable: is a variable that can be controlled or




                                                                          Venkat Reddy
                                                                   Data Analysis Course
  manipulated.
• Dependent variable: is a variable that cannot be controlled or
  manipulated. Its values are predicted from the independent
  variable.
• Discrete variables are measured in units that cannot be
  subdivided. Example: Number of children
• Continuous variables are measured in a unit that can be
  subdivided infinitely. Example: Height                                  7
Lab
•   Print product sales data
•   What are cause variables, what are effect variables
•   Identify the continuous & discrete variables
•   What is the population




                                                                              Venkat Reddy
                                                                       Data Analysis Course
•   Filter data and pick a sample
•   Calculate a parameter (Mean of the population)
•   Calculate a statistic
•   How close is the statistics to parameter? Is it a good estimate?
•   Self study: Randomly pick 10 samples, calculate mean for each
    sample. Find the mean of the means & see whether it is a
    good estimate of the population mean                                      8
Descriptive Statistics
•   Gives us the overall picture about data
•   Presents data in the form of tables, charts and graphs
•   Includes summary data
•   Avoids inferences




                                                                    Venkat Reddy
                                                             Data Analysis Course
•   Examples:
    • Measures of central location
       • Mean, median, mode and midrange
    • Measures of Variation
       • Variance, Standard Deviation, z-scores



                                                                    9
Details later
Lab
•   Download product sales data
•   Run proc means to print the descriptive statistics
•   Run proc univariate to print the descriptive statistics
•   Identify Measures of central location




                                                                     Venkat Reddy
                                                              Data Analysis Course
•   Identify Measures of variation




                                                                 10
Inferential Statistics
• Take decision on overall population using a sample
• “Sampled” data are incomplete but can still be representative
  of the population
• Permits the making of generalizations (inferences) about the




                                                                         Venkat Reddy
                                                                  Data Analysis Course
  data
• Probability theory is a major tool used to analyze sampled
  data




-Details later
                                                                      11
Predictive Modeling
• The science of predicting future outcomes based on historical
  events.
• Model Building: “Developing set of equations or
  mathematical formulation to forecast future
  behaviors based on current or historical data.”




                                                                         Venkat Reddy
                                                                  Data Analysis Course
• Regression, logistic Regression, time series analysis etc.,




                                                                     12
-Details later
Statistical Computer Packages

 Typical Software
   •   SAS
   •   R
   •   SPSS
   •   MINITAB
   •   Excel
Venkat Reddy Konasani
Manager at Trendwise Analytics
venkat@TrendwiseAnalytics.com




                                      14
21.venkat@gmail.com




                                        Venkat Reddy
                                 Data Analysis Course
+91 9886 768879

Mais conteúdo relacionado

Mais de Venkata Reddy Konasani

Mais de Venkata Reddy Konasani (20)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Decision tree
Decision treeDecision tree
Decision tree
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 

Último

Último (20)

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 

Introduction to Statistical Analysis

  • 1. Data Analysis Course Basics & Terminology(Version-1) Venkat Reddy
  • 2. Data Analysis Course • Data analysis design document • • Descriptive statistics • Data exploration, validation & sanitization 2 Venkat Reddy Data Analysis Course • Probability distributions examples and applications • Simple correlation and regression analysis • Multiple liner regression analysis • Logistic regression analysis • Testing of hypothesis • Clustering and decision trees • Time series analysis and forecasting • Credit Risk Model building-1 • Credit Risk Model building-2
  • 3. Note • This presentation is just class notes. The course notes for Data Analysis Training is by written by me, as an aid for myself. • The best way to treat this is as a high-level summary; the actual session went more in depth and contained other 3 Venkat Reddy Data Analysis Course information. • Most of this material was written as informal notes, not intended for publication • Please send questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com • Please check my website for latest version of this document -Venkat Reddy
  • 4. What is “Statistics”? • Statistics is the science of data that involves: • Collecting • Classifying • Summarizing • Organizing and Venkat Reddy Data Analysis Course • Interpretation Of numerical information. • Examples: • Cricket batting averages • Stock price • Climatology data such as rainfall amounts, average temperatures • Marketing information 4 • Gambling?
  • 5. Key Terms • What is Data? • facts or information that is relevant or appropriate to a decision maker • Population? • the totality of objects under consideration Venkat Reddy Data Analysis Course • Sample? • a portion of the population that is selected for analysis • Parameter? • a summary measure (e.g., mean) that is computed to describe a characteristic of the population • Statistic? • a summary measure (e.g., mean) that is computed to describe a characteristic of the sample 5
  • 6. Variables • Traits or characteristics that can change values from case to case. • Examples: • Age Venkat Reddy Data Analysis Course • Gender • Income • Social class 6
  • 7. Types Of Variables • In causal relationships: CAUSE  EFFECT independent variable  dependent variable • Independent variable: is a variable that can be controlled or Venkat Reddy Data Analysis Course manipulated. • Dependent variable: is a variable that cannot be controlled or manipulated. Its values are predicted from the independent variable. • Discrete variables are measured in units that cannot be subdivided. Example: Number of children • Continuous variables are measured in a unit that can be subdivided infinitely. Example: Height 7
  • 8. Lab • Print product sales data • What are cause variables, what are effect variables • Identify the continuous & discrete variables • What is the population Venkat Reddy Data Analysis Course • Filter data and pick a sample • Calculate a parameter (Mean of the population) • Calculate a statistic • How close is the statistics to parameter? Is it a good estimate? • Self study: Randomly pick 10 samples, calculate mean for each sample. Find the mean of the means & see whether it is a good estimate of the population mean 8
  • 9. Descriptive Statistics • Gives us the overall picture about data • Presents data in the form of tables, charts and graphs • Includes summary data • Avoids inferences Venkat Reddy Data Analysis Course • Examples: • Measures of central location • Mean, median, mode and midrange • Measures of Variation • Variance, Standard Deviation, z-scores 9 Details later
  • 10. Lab • Download product sales data • Run proc means to print the descriptive statistics • Run proc univariate to print the descriptive statistics • Identify Measures of central location Venkat Reddy Data Analysis Course • Identify Measures of variation 10
  • 11. Inferential Statistics • Take decision on overall population using a sample • “Sampled” data are incomplete but can still be representative of the population • Permits the making of generalizations (inferences) about the Venkat Reddy Data Analysis Course data • Probability theory is a major tool used to analyze sampled data -Details later 11
  • 12. Predictive Modeling • The science of predicting future outcomes based on historical events. • Model Building: “Developing set of equations or mathematical formulation to forecast future behaviors based on current or historical data.” Venkat Reddy Data Analysis Course • Regression, logistic Regression, time series analysis etc., 12 -Details later
  • 13. Statistical Computer Packages Typical Software • SAS • R • SPSS • MINITAB • Excel
  • 14. Venkat Reddy Konasani Manager at Trendwise Analytics venkat@TrendwiseAnalytics.com 14 21.venkat@gmail.com Venkat Reddy Data Analysis Course +91 9886 768879