SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
Course notes:
Descriptive
statistics
Course notes:
Descriptive
statistics
COURSE NOTES: INFERENTIAL
STATISTICS
Distributions
In statistics, when we talk about distributions we usually mean
probability distributions.
Definition (informal): A distribution is a function that shows
the possible values for a variable and how often they occur.
Definition (Wikipedia): In probability theory and statistics, a
probability distribution is a mathematical function that, stated
in simple terms, can be thought of as providing the
probabilities of occurrence of different possible outcomes in
an experiment.
Examples: Normal distribution, Student’s T distribution, Poisson
distribution, Uniform distribution, Binomial distribution
Graphical representation
It is a common mistake to believe that the distribution is the
graph. In fact the distribution is the ‘rule’ that determines how
values are positioned in relation to each other.
Very often, we use a graph to visualize the data. Since
different distributions have a particular graphical
representation, statisticians like to plot them.
Examples:
Uniform distribution Binomial distribution
Normal distribution Student’s T distribution
Definition
The Normal Distribution
The Normal distribution is also known as Gaussian
distribution or the Bell curve. It is one of the most
common distributions due to the following reasons:
• It approximates a wide variety of random variables
• Distributions of sample means with large enough
samples sizes could be approximated to normal
• All computable statistics are elegant
• Heavily used in regression analysis
• Good track record
Examples:
• Biology. Most biological measures are normally
distributed, such as: height; length of arms, legs,
nails; blood pressure; thickness of tree barks, etc.
• IQ tests
• Stock market information
𝑁~(𝜇, 𝜎2
)
N stands for normal;
~ stands for a distribution;
μ is the mean;
𝜎2
is the variance.
The Normal Distribution
0
𝝁 = 𝟕𝟒𝟑
Origin
0
𝝁 = 𝟒𝟕𝟎
𝝈 = 𝟏𝟒𝟎
0
𝝁 = 𝟗𝟔𝟎
𝝈 = 𝟏𝟒𝟎 𝝈 = 𝟏𝟒𝟎
Controlling for the standard deviation
Keeping the standard deviation constant, the graph
of a normal distribution with:
• a smaller mean would look in the same way, but
be situated to the left (in gray)
• a larger mean would look in the same way, but
be situated to the right (in red)
The Normal Distribution
0
𝝁 = 𝟕𝟒𝟑
Origin
𝝈 = 𝟏𝟒𝟎
0
0
𝝁 = 𝟕𝟒𝟑
𝝁 = 𝟕𝟒𝟑
𝝈 = 𝟕𝟎
𝝈 = 𝟐𝟏𝟎
Controlling for the mean
Keeping the mean constant, a normal distribution
with:
• a smaller standard deviation would be situated in
the same spot, but have a higher peak and
thinner tails (in red)
• a larger standard deviation would be situated in
the same spot, but have a lower peak and fatter
tails (in gray)
The Standard Normal Distribution
𝑁~(0,1)
The Standard Normal distribution is a particular case
of the Normal distribution. It has a mean of 0 and a
standard deviation of 1.
Every Normal distribution can be ‘standardized’ using
the standardization formula:
𝑧 =
𝑥 − 𝜇
𝜎
A variable following the Standard Normal
distribution is denoted with the letter z.
Why standardize?
Standardization allows us to:
• compare different normally distributed datasets
• detect normality
• detect outliers
• create confidence intervals
• test hypotheses
• perform regression analysis
Rationale of the formula for standardization:
We want to transform a random variable from 𝑁~ μ, 𝜎2
to 𝑁~(0,1). Subtracting the mean from all observations would cause a
transformation from 𝑁~ μ, 𝜎2
to 𝑁~ 0, 𝜎2
, moving the graph to the origin. Subsequently, dividing all observations by the standard
deviation would cause a transformation from 𝑁~ 0, 𝜎2
to 𝑁~ 0,1 , standardizing the peak and the tails of the graph.
The Central Limit Theorem
The Central Limit Theorem (CLT) is one of the greatest statistical insights. It states that no matter the underlying distribution of the
dataset, the sampling distribution of the means would approximate a normal distribution. Moreover, the mean of the sampling
distribution would be equal to the mean of the original distribution and the variance would be n times smaller, where n is the size of the
samples. The CLT applies whenever we have a sum or an average of many variables (e.g. sum of rolled numbers when rolling dice).
The theorem Why is it useful? Where can we see it?
➢ No matter the distribution
➢ The distribution of 𝑥1, 𝑥2, 𝑥3, 𝑥4,
… , 𝑥𝑘 would tend to 𝑁~ μ,
𝜎2
𝑛
➢ The more samples, the closer to
Normal ( k -> ∞ )
➢ The bigger the samples, the closer
to Normal ( n -> ∞ )
The CLT allows us to assume normality
for many different variables. That is very
useful for confidence intervals,
hypothesis testing, and regression
analysis. In fact, the Normal distribution
is so predominantly observed around
us due to the fact that following the
CLT, many variables converge to
Normal.
Since many concepts and events are a
sum or an average of different effects,
CLT applies and we observe normality
all the time. For example, in regression
analysis, the dependent variable is
explained through the sum of error
terms.
Click here for a CLT simulator.
Estimators and Estimates
Estimators Estimates
Broadly, an estimator is a mathematical function that
approximates a population parameter depending only on
sample information.
Examples of estimators and the corresponding parameters:
Term Estimator Parameter
Mean ഥ
𝒙 μ
Variance 𝒔𝟐
𝝈𝟐
Correlation r ρ
An estimate is the output that you get from the estimator
(when you apply the formula). There are two types of
estimates: point estimates and confidence interval
estimates.
Point
estimates
Confidence
intervals
A single value.
Examples:
• 1
• 5
• 122.67
• 0.32
An interval.
Examples:
• ( 1 , 5 )
• ( 12 , 33)
• ( 221.78 , 745.66)
• ( - 0.71 , 0.11 )
Estimators have two important properties:
• Bias
The expected value of an unbiased estimator is the population
parameter. The bias in this case is 0. If the expected value of an
estimator is (parameter + b), then the bias is b.
• Efficiency
The most efficient estimator is the one with the smallest
variance.
Confidence intervals are much more precise than point
estimates. That is why they are preferred when making
inferences.
Confidence Intervals and the Margin of Error
Definition: A confidence interval is an interval within which we are confident (with a certain percentage of confidence) the population parameter will fall.
We build the confidence interval around the point estimate.
(1-α) is the level of confidence. We are (1-α)*100% confident that the population parameter will fall in the specified interval. Common alphas are: 0.01, 0.05, 0.1.
General formula:
[ ത
𝒙- ME, ത
𝒙 + ME ] , where ME is the margin of error.
ME = reliabilityfactor∗
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Interval start Interval end
Point estimate
𝒛𝜶/𝟐 ∗
𝝈
𝒏
𝒕υ,𝜶/𝟐 ∗
𝒔
𝒏
Term Effect on width of CI
(1-α) ↑ ↑
𝝈 ↑ ↑
n ↑ ↓
Student’s T Distribution
The Student’s T distribution is used
predominantly for creating confidence
intervals and testing hypotheses with
normally distributed populations when
the sample sizes are small. It is
particularly useful when we don’t have
enough information or it is too costly to
obtain it.
All else equal, the Student’s T distribution
has fatter tails than the Normal distribution
and a lower peak. This is to reflect the
higher level of uncertainty, caused by the
small sample size.
Student’s T distribution
Normal distribution
A random variable following the t-distribution is denoted 𝑡υ,α , where υ are the degrees of freedom.
We can obtain the student’s T distribution for a variable with a Normally distributed population using the formula: 𝑡υ,α =
ҧ
𝑥−𝜇
𝑠/ 𝑛
Formulas for Confidence Intervals
𝒛𝜶/𝟐 ∗
𝝈
𝒏
𝒕𝒅.𝒇.,𝜶/𝟐 ∗
𝒔
𝒏
# populations
Population
variance
Samples Statistic Variance Formula
One known - z 𝜎2 ത
x ± z Τ
α 2
σ
n
One unknown - t 𝑠2 ത
x ± 𝑡𝑛−1, Τ
α 2
s
n
Two - dependent t 𝑠𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒
2 ത
d ± 𝑡𝑛−1, Τ
α 2
𝑠𝑑
n
Two Known independent z σ𝑥
2
, σ𝑦
2
( ҧ
𝑥 − ത
𝑦) ± 𝑧 Τ
𝛼 2
σ𝑥
2
𝑛𝑥
+
σ𝑦
2
𝑛𝑦
Two
unknown,
assumed
equal
independent t
𝑠𝑝
2
=
𝑛𝑥 − 1 𝑠𝑥
2
+ 𝑛𝑦 − 1 𝑠𝑦
2
𝑛𝑥 + 𝑛𝑦 − 2
( ҧ
𝑥 − ത
𝑦) ± 𝑡 Τ
𝑛𝑥+𝑛𝑦−2,𝛼 2
𝑠𝑝
2
𝑛𝑥
+
𝑠𝑝
2
𝑛𝑦
Two
unknown,
assumed
different
independent t 𝑠𝑥
2
, 𝑠𝑦
2
( ҧ
𝑥 − ത
𝑦) ± 𝑡 Τ
υ,𝛼 2
𝑠𝑥
2
𝑛𝑥
+
𝑠𝑦
2
𝑛𝑦

Mais conteúdo relacionado

Mais procurados

Ch1 The Nature of Statistics
Ch1 The Nature of StatisticsCh1 The Nature of Statistics
Ch1 The Nature of StatisticsFarhan Alfin
 
Descriptive Statistics - Thiyagu K
Descriptive Statistics - Thiyagu KDescriptive Statistics - Thiyagu K
Descriptive Statistics - Thiyagu KThiyagu K
 
Day 4 normal curve and standard scores
Day 4 normal curve and standard scoresDay 4 normal curve and standard scores
Day 4 normal curve and standard scoresElih Sutisna Yanto
 
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION Tanya Singla
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing researchNursing Path
 
Frequency Measures for Healthcare Professioanls
Frequency Measures for Healthcare ProfessioanlsFrequency Measures for Healthcare Professioanls
Frequency Measures for Healthcare Professioanlsalberpaules
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in StatisticsAzmi Mohd Tamil
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersionRajaKrishnan M
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersionForensic Pathology
 
Stat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisStat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisForensic Pathology
 
statistics
statisticsstatistics
statisticsfoyezcse
 
15. descriptive statistics
15. descriptive statistics15. descriptive statistics
15. descriptive statisticsAshok Kulkarni
 
Ch3 Probability and The Normal Distribution
Ch3 Probability and The Normal Distribution Ch3 Probability and The Normal Distribution
Ch3 Probability and The Normal Distribution Farhan Alfin
 

Mais procurados (19)

Ch1 The Nature of Statistics
Ch1 The Nature of StatisticsCh1 The Nature of Statistics
Ch1 The Nature of Statistics
 
Descriptive Statistics - Thiyagu K
Descriptive Statistics - Thiyagu KDescriptive Statistics - Thiyagu K
Descriptive Statistics - Thiyagu K
 
Day 4 normal curve and standard scores
Day 4 normal curve and standard scoresDay 4 normal curve and standard scores
Day 4 normal curve and standard scores
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
 
Statstics in nursing
Statstics in nursing Statstics in nursing
Statstics in nursing
 
Measure of Dispersion in statistics
Measure of Dispersion in statisticsMeasure of Dispersion in statistics
Measure of Dispersion in statistics
 
Measures of dispersion discuss 2.2
Measures of dispersion discuss 2.2Measures of dispersion discuss 2.2
Measures of dispersion discuss 2.2
 
Central tendency and Measure of Dispersion
Central tendency and Measure of DispersionCentral tendency and Measure of Dispersion
Central tendency and Measure of Dispersion
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing research
 
The Normal Distribution
The Normal DistributionThe Normal Distribution
The Normal Distribution
 
Frequency Measures for Healthcare Professioanls
Frequency Measures for Healthcare ProfessioanlsFrequency Measures for Healthcare Professioanls
Frequency Measures for Healthcare Professioanls
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersion
 
Stat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisStat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesis
 
statistics
statisticsstatistics
statistics
 
15. descriptive statistics
15. descriptive statistics15. descriptive statistics
15. descriptive statistics
 
Ch3 Probability and The Normal Distribution
Ch3 Probability and The Normal Distribution Ch3 Probability and The Normal Distribution
Ch3 Probability and The Normal Distribution
 

Semelhante a 1.1 course notes inferential statistics

Stat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisStat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisForensic Pathology
 
Stat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisStat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisForensic Pathology
 
ders 3 Unit root test.pptx
ders 3 Unit root test.pptxders 3 Unit root test.pptx
ders 3 Unit root test.pptxErgin Akalpler
 
ders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptxders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptxErgin Akalpler
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.pptNobelFFarrar
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of datadrasifk
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptTripthiDubey
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submitBINCYKMATHEW
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatisticNeurologyKota
 
best for normal distribution.ppt
best for normal distribution.pptbest for normal distribution.ppt
best for normal distribution.pptDejeneDay
 
statical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptstatical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptNazarudinManik1
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in ResearchManoj Sharma
 

Semelhante a 1.1 course notes inferential statistics (20)

estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Stat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisStat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesis
 
Stat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesisStat 4 the normal distribution & steps of testing hypothesis
Stat 4 the normal distribution & steps of testing hypothesis
 
lecture-2.ppt
lecture-2.pptlecture-2.ppt
lecture-2.ppt
 
ders 3 Unit root test.pptx
ders 3 Unit root test.pptxders 3 Unit root test.pptx
ders 3 Unit root test.pptx
 
ders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptxders 3.2 Unit root testing section 2 .pptx
ders 3.2 Unit root testing section 2 .pptx
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
 
template.pptx
template.pptxtemplate.pptx
template.pptx
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submit
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Statistics excellent
Statistics excellentStatistics excellent
Statistics excellent
 
Inferential statistics-estimation
Inferential statistics-estimationInferential statistics-estimation
Inferential statistics-estimation
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
 
best for normal distribution.ppt
best for normal distribution.pptbest for normal distribution.ppt
best for normal distribution.ppt
 
statical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptstatical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.ppt
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 

Último

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 

Último (20)

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 

1.1 course notes inferential statistics

  • 2. Distributions In statistics, when we talk about distributions we usually mean probability distributions. Definition (informal): A distribution is a function that shows the possible values for a variable and how often they occur. Definition (Wikipedia): In probability theory and statistics, a probability distribution is a mathematical function that, stated in simple terms, can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment. Examples: Normal distribution, Student’s T distribution, Poisson distribution, Uniform distribution, Binomial distribution Graphical representation It is a common mistake to believe that the distribution is the graph. In fact the distribution is the ‘rule’ that determines how values are positioned in relation to each other. Very often, we use a graph to visualize the data. Since different distributions have a particular graphical representation, statisticians like to plot them. Examples: Uniform distribution Binomial distribution Normal distribution Student’s T distribution Definition
  • 3. The Normal Distribution The Normal distribution is also known as Gaussian distribution or the Bell curve. It is one of the most common distributions due to the following reasons: • It approximates a wide variety of random variables • Distributions of sample means with large enough samples sizes could be approximated to normal • All computable statistics are elegant • Heavily used in regression analysis • Good track record Examples: • Biology. Most biological measures are normally distributed, such as: height; length of arms, legs, nails; blood pressure; thickness of tree barks, etc. • IQ tests • Stock market information 𝑁~(𝜇, 𝜎2 ) N stands for normal; ~ stands for a distribution; μ is the mean; 𝜎2 is the variance.
  • 4. The Normal Distribution 0 𝝁 = 𝟕𝟒𝟑 Origin 0 𝝁 = 𝟒𝟕𝟎 𝝈 = 𝟏𝟒𝟎 0 𝝁 = 𝟗𝟔𝟎 𝝈 = 𝟏𝟒𝟎 𝝈 = 𝟏𝟒𝟎 Controlling for the standard deviation Keeping the standard deviation constant, the graph of a normal distribution with: • a smaller mean would look in the same way, but be situated to the left (in gray) • a larger mean would look in the same way, but be situated to the right (in red)
  • 5. The Normal Distribution 0 𝝁 = 𝟕𝟒𝟑 Origin 𝝈 = 𝟏𝟒𝟎 0 0 𝝁 = 𝟕𝟒𝟑 𝝁 = 𝟕𝟒𝟑 𝝈 = 𝟕𝟎 𝝈 = 𝟐𝟏𝟎 Controlling for the mean Keeping the mean constant, a normal distribution with: • a smaller standard deviation would be situated in the same spot, but have a higher peak and thinner tails (in red) • a larger standard deviation would be situated in the same spot, but have a lower peak and fatter tails (in gray)
  • 6. The Standard Normal Distribution 𝑁~(0,1) The Standard Normal distribution is a particular case of the Normal distribution. It has a mean of 0 and a standard deviation of 1. Every Normal distribution can be ‘standardized’ using the standardization formula: 𝑧 = 𝑥 − 𝜇 𝜎 A variable following the Standard Normal distribution is denoted with the letter z. Why standardize? Standardization allows us to: • compare different normally distributed datasets • detect normality • detect outliers • create confidence intervals • test hypotheses • perform regression analysis Rationale of the formula for standardization: We want to transform a random variable from 𝑁~ μ, 𝜎2 to 𝑁~(0,1). Subtracting the mean from all observations would cause a transformation from 𝑁~ μ, 𝜎2 to 𝑁~ 0, 𝜎2 , moving the graph to the origin. Subsequently, dividing all observations by the standard deviation would cause a transformation from 𝑁~ 0, 𝜎2 to 𝑁~ 0,1 , standardizing the peak and the tails of the graph.
  • 7. The Central Limit Theorem The Central Limit Theorem (CLT) is one of the greatest statistical insights. It states that no matter the underlying distribution of the dataset, the sampling distribution of the means would approximate a normal distribution. Moreover, the mean of the sampling distribution would be equal to the mean of the original distribution and the variance would be n times smaller, where n is the size of the samples. The CLT applies whenever we have a sum or an average of many variables (e.g. sum of rolled numbers when rolling dice). The theorem Why is it useful? Where can we see it? ➢ No matter the distribution ➢ The distribution of 𝑥1, 𝑥2, 𝑥3, 𝑥4, … , 𝑥𝑘 would tend to 𝑁~ μ, 𝜎2 𝑛 ➢ The more samples, the closer to Normal ( k -> ∞ ) ➢ The bigger the samples, the closer to Normal ( n -> ∞ ) The CLT allows us to assume normality for many different variables. That is very useful for confidence intervals, hypothesis testing, and regression analysis. In fact, the Normal distribution is so predominantly observed around us due to the fact that following the CLT, many variables converge to Normal. Since many concepts and events are a sum or an average of different effects, CLT applies and we observe normality all the time. For example, in regression analysis, the dependent variable is explained through the sum of error terms. Click here for a CLT simulator.
  • 8. Estimators and Estimates Estimators Estimates Broadly, an estimator is a mathematical function that approximates a population parameter depending only on sample information. Examples of estimators and the corresponding parameters: Term Estimator Parameter Mean ഥ 𝒙 μ Variance 𝒔𝟐 𝝈𝟐 Correlation r ρ An estimate is the output that you get from the estimator (when you apply the formula). There are two types of estimates: point estimates and confidence interval estimates. Point estimates Confidence intervals A single value. Examples: • 1 • 5 • 122.67 • 0.32 An interval. Examples: • ( 1 , 5 ) • ( 12 , 33) • ( 221.78 , 745.66) • ( - 0.71 , 0.11 ) Estimators have two important properties: • Bias The expected value of an unbiased estimator is the population parameter. The bias in this case is 0. If the expected value of an estimator is (parameter + b), then the bias is b. • Efficiency The most efficient estimator is the one with the smallest variance. Confidence intervals are much more precise than point estimates. That is why they are preferred when making inferences.
  • 9. Confidence Intervals and the Margin of Error Definition: A confidence interval is an interval within which we are confident (with a certain percentage of confidence) the population parameter will fall. We build the confidence interval around the point estimate. (1-α) is the level of confidence. We are (1-α)*100% confident that the population parameter will fall in the specified interval. Common alphas are: 0.01, 0.05, 0.1. General formula: [ ത 𝒙- ME, ത 𝒙 + ME ] , where ME is the margin of error. ME = reliabilityfactor∗ 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 Interval start Interval end Point estimate 𝒛𝜶/𝟐 ∗ 𝝈 𝒏 𝒕υ,𝜶/𝟐 ∗ 𝒔 𝒏 Term Effect on width of CI (1-α) ↑ ↑ 𝝈 ↑ ↑ n ↑ ↓
  • 10. Student’s T Distribution The Student’s T distribution is used predominantly for creating confidence intervals and testing hypotheses with normally distributed populations when the sample sizes are small. It is particularly useful when we don’t have enough information or it is too costly to obtain it. All else equal, the Student’s T distribution has fatter tails than the Normal distribution and a lower peak. This is to reflect the higher level of uncertainty, caused by the small sample size. Student’s T distribution Normal distribution A random variable following the t-distribution is denoted 𝑡υ,α , where υ are the degrees of freedom. We can obtain the student’s T distribution for a variable with a Normally distributed population using the formula: 𝑡υ,α = ҧ 𝑥−𝜇 𝑠/ 𝑛
  • 11. Formulas for Confidence Intervals 𝒛𝜶/𝟐 ∗ 𝝈 𝒏 𝒕𝒅.𝒇.,𝜶/𝟐 ∗ 𝒔 𝒏 # populations Population variance Samples Statistic Variance Formula One known - z 𝜎2 ത x ± z Τ α 2 σ n One unknown - t 𝑠2 ത x ± 𝑡𝑛−1, Τ α 2 s n Two - dependent t 𝑠𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 2 ത d ± 𝑡𝑛−1, Τ α 2 𝑠𝑑 n Two Known independent z σ𝑥 2 , σ𝑦 2 ( ҧ 𝑥 − ത 𝑦) ± 𝑧 Τ 𝛼 2 σ𝑥 2 𝑛𝑥 + σ𝑦 2 𝑛𝑦 Two unknown, assumed equal independent t 𝑠𝑝 2 = 𝑛𝑥 − 1 𝑠𝑥 2 + 𝑛𝑦 − 1 𝑠𝑦 2 𝑛𝑥 + 𝑛𝑦 − 2 ( ҧ 𝑥 − ത 𝑦) ± 𝑡 Τ 𝑛𝑥+𝑛𝑦−2,𝛼 2 𝑠𝑝 2 𝑛𝑥 + 𝑠𝑝 2 𝑛𝑦 Two unknown, assumed different independent t 𝑠𝑥 2 , 𝑠𝑦 2 ( ҧ 𝑥 − ത 𝑦) ± 𝑡 Τ υ,𝛼 2 𝑠𝑥 2 𝑛𝑥 + 𝑠𝑦 2 𝑛𝑦