SlideShare uma empresa Scribd logo
1 de 100
1
1
1
1
1
1
1
1
1
UNIVERSITY OF LIBERIA
T.J.R. FAULKNER COLLEGE OF SCIENCE, TECHNOLOGY AND ENVIRONMENTAL SCIENCES
DEPARTMENT OF MATHEMATICS
STATISTICS PROGRAM
STATS 101: INTRODUCTION TO STATISTICS
Instructor:
Mr. Mulbah K.A. Kromah,
Principal Analyst, Office of the DDGSDP &
Part-time instructor, UL department of mathematics
2
PLAN
1 2
01 INTRODUCTION
02
Overview of the course
outline
06
Random variables
History of statistics
03
Basic definitions, types of data
and other key concepts
04
08
Correlation &
regression
Descriptive statistics
05
07
Probability distributions &
Statistical inference
09
SUMMARY
3
Overview of the course outline
3
OBJECTIVE OF THE COURSE:
 Provide students with a brief history of statistics;
 Help students understand the basic definitions used in statistics, the types of data
used and the basic sampling methods;
 Help students learn how to avoid making misleading conclusions;
 Introduces students to the field of descriptive statistics (Data organization,
visualization and summarization);
4
Overview of the course outline
4
OBJECTIVE OF THE COURSE:
 Introduces students to the concept of random variables;
 Help students learn about the basic index numbers;
 Introduces students to probability distributions;
 Introduces students to statistical inference, correlation and regression
5
ORGANIZATION OF DATA
5
INTRODUCTION TO DESCRIPTIVE STATISTICS
FREQUENCY DISTRIBUTION AND GRAPHS
6
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
Quantitative Data Representation
When data are quantitative, several types of representations are
often used:
 Histogram
 Frequency polygon
 Ogive
 Stem and leaf plot
 Dot plot
 Scatter plot, etc.
6
7
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
Quantitative Data Representation
In this course, we will provide a brief explanation of the uses of
four of these graphs, namely:
 Histogram
 Frequency polygon
 Ogive
 Stem and leaf plot
 Scatter plot (specifically time series graph)
 Bar graph
7
8
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
Quantitative Data Representation
8
9
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
Quantitative and quantitative Data Representation
9
10
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
10
Distribution Shapes
 When describing data, it is important to recognize the shapes of the distribution
values. This is important in understanding which statistical method to use in
analyzing the data.
 A distribution can have many shapes, and one method of analyzing a distribution is
to draw a histogram or frequency polygon for the distribution.
 Distributions are most often not perfectly shaped, so it is not necessary to have an
exact shape but rather to identify an overall pattern.
11
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
11
Classification of Distribution Shapes
12
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
12
Avoid using misleading graphs
 Changing the units at the starting point
on the y axis can convey a very different
visual representation of the data.
 Avoid exaggerating a one-dimensional
increase by showing it in two
dimensions.
 Avoid omitting labels or units on the
axes of the graph.
 Always include the basic elements of a
graph (titles, units, source and notes)
13
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
13
Summary of Graphs and Uses of Each
14
III. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
14
Stem and Leaf Plots
 A stem and leaf plot is a data plot that uses part of the data value as the stem and
part of the data value as the leaf to form groups or classes.
 Stem and leaf plot can be used to compare two related distributions (back-to-back
stem and leaf plot)
 When analyzing a stem and leaf plot, one look for peaks and gaps in the distribution.
You should also analyze the form of the distribution (symmetric or skewed). Check
the variability of the data by looking at the spread (range, variance, SD).
15
IIII. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
15
How to construct a Stem and Leaf Plot?
Step 1: Arrange the data in order (Optional but very helpful).
Step 2: Separate the data according to the classes.
Step 3: Plot the data using one of the diagram below:
Leading
digit
(stem)
Trailing
digit for
dist. 2
(leaf)
Trailing
digit for
dist. 1
(leaf)
Trailing
digit
(leaf)
Leading
digit
(stem)
Back-to-back stem and leaf plot
16
IIII. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
16
Exercise 1
Bomi citizens age
55 33 5 37 27
31 42 12 45 5
0 44 6 17 8
3 10 42 9 3
26 34 28 7 55
3 3 9 48 2
Bassa citizen age
30 28 2 10 8
40 23 26 8 3
4 62 42 29 35
2 45 5 27 26
3 40 22 0 16
41 26 11 62 6
The dataset on the right shows
the ages of 30 Bomi and Bassa
citizens extracted from the 2008
NPHC of Liberia. Use this dataset
to construct stem and leaf plots
for the two counties. Use a back-
to-back stem and leaf plot to
compare the two distributions.
17
IIII. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
17
Solution
0 0 2 3 3 3 3 5 5 6 7 8 9 9
1 0 2 7
2 6 7 8
3 1 3 4 7
4 2 2 4 5 8
5 5 5
Distribution of 30 randomly
selected Bomi citizens’ age
0 0 2 2 3 3 4 5 6 8 8
1 0 1 6
2 2 3 6 6 6 7 8 9
3 0 5
4 0 0 1 2 5
5
6 2 2
Distribution of 30 randomly
selected Bassa citizens’ age
Note: There are no data in the sixth class for Bassa. Do not put 0 in the leaf for this class,
just leave it blank (that is why we only wrote the stem number, 5)
18
IIII. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
18
Solution
0 0 2 3 3 3 3 5 5 6 7 8 9 9
1 0 2 7
2 6 7 8
3 1 3 4 7
4 2 2 4 5 8
5 5 5
Bomi
8 8 6 5 4 3 3 2 2 0 0
6 1 0 1
9 8 7 6 6 6 3 2 2
5 0 3
5 2 1 0 0 4
5
2 2 6
Bassa
19
IIII. ORGANIZATION OF DATA: Frequency distribution and graphs
19
Summary
Statisticians or researchers collect raw data.
To obtain much information from this data, they must organize it in some meaningful way.
A frequency distribution using classes is often used for this purpose.
Once a frequency distribution is constructed, the representation of the data by graphs
becomes easy.
The most commonly used graphs in research statistics are the histogram, frequency
polygon, ogive, bar graph, Pareto chart, time series graph, and pie graph.
Finally, a stem and leaf plot uses part of the data values as stems and part of the data
values as leaves. This graph has the advantages of a frequency distribution and a
histogram.
20
IIII. ORGANIZATION OF DATA: Frequency distribution and graphs
- 3.3 Data representation using Graphs
20
Exercise 2
Bomi citizens age
55 33 5 37 27
31 42 12 45 5
0 44 6 17 8
3 10 42 9 3
26 34 28 7 55
3 3 9 48 2
Bassa citizen age
30 28 2 10 8
40 23 26 8 3
4 62 42 29 35
2 45 5 27 26
3 40 22 0 16
41 26 11 62 6
Use the dataset on the right to
construct stem and leaf plots for
the two counties using the
following age groupings:
0 - 9
10 - 14
15 - 24
25 - 44
45 - 54
55 - 64.
21
21
CHAPTER FOUR
DATA DESCRIPTION
22
22
IV. ORGANIZATION OF DATA
In the previous chapter, we learned how to obtain useful
information from raw data by organizing them into a
frequency distribution and then presenting the data by using
various graphs.
In this chapter, we will learn about the statistical methods
that can be used to summarize data. Our main objective will
be to find the “central number” or the “most typical case” in
our dataset and then analyze the relationship between this
number and the other numbers in the dataset.
23
23
IV. ORGANIZATION OF DATA
First, we will look at the measures of average, also called
measures of central tendency. They include the mean,
median, mode, and midrange.
Next, we will learn about measures of variation, or
measures of dispersion. These measures include the range,
variance, and standard deviation.
Lastly, we will learn how to compute and interpret measures
of position, which include percentiles, deciles, and
quartiles.
24
24
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Mean
The mean, also known as the arithmetic average, is found by adding the values
of the data and dividing by the total number of values.
𝑋 =
𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛
𝑛
=
𝑖=1
𝑛
𝑋𝑖
𝑛
Sample mean
µ=
𝑋1+𝑋2+𝑋3+⋯+𝑋𝑁
𝑁
= 𝑖=1
𝑁
𝑋𝑖
𝑁
Population mean
𝑋 =
𝑊1𝑋1 + 𝑊2𝑋2 + 𝑊3𝑋3 + ⋯ + 𝑊
𝑛𝑋𝑛
𝑊1 + 𝑊2 + 𝑊3 + ⋯ + 𝑊
𝑛
=
𝑖=1
𝑛
𝑊𝑖𝑋𝑖
𝑖=1
𝑛
𝑊𝑖
Weighted mean
where 𝑊1, 𝑊2, 𝑊3, … , 𝑊
𝑛 are the weights and 𝑋1, 𝑋2, 𝑋3, … , 𝑋𝑛 are the values.
25
25
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Mean
𝑋 =
𝑓1𝑀1 + 𝑓2𝑀2 + 𝑓3𝑀3 + ⋯ + 𝑓𝑗𝑀𝑗
𝑛
=
𝑗=1
𝑛
𝑓𝑗𝑀𝑗
𝑛
Mean for a group data
where 𝑓1, 𝑓2, 𝑓3, … , 𝑓𝑗 are the frequencies and 𝑀1, 𝑀2, 𝑀3, … , 𝑀𝑗 are the midpoints
of the classes.
26
26
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Mean
Examples
2. Find student La Paix GPA if he has the following grades:
An A in English 201 (4 credits), a C in Statistics 103 (3 credits), a D in Math 202 (4 credits)
and an F in Statistics 203 (3 credits), considering that A=4 points, B=3 points, C= 2 points,
D= 1 point and F= 0 point.
1. The daily transportation of 6 UL students are given below:
$150LD, $450LD, $600LD, $200LD, $700LD, $150LD. Find the average daily transportation
of these students.
27
27
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Mean
Examples
3- Find the average of the group data given on the
right.
28
28
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Median
The median is the midpoint of the data array. The symbol for the median is MD.
A data array is a dataset that has been ordered. To find the median, all one needs to do
is to arrange the dataset in order and then locate the middle number. When the
number of data values is even, the median will be the midpoint of the two middle
numbers.
The median tells us that 50% of the data values are above it while 50% are below it.
29
29
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Median
We can also find the median for a grouped data using the formula below:
𝑀𝐷 = 𝐿𝑚 +
𝑤
𝑓𝑚
(0.5𝑛 − 𝑐𝑓𝑏)
𝑊ℎ𝑒𝑟𝑒, 𝑳𝒎 is the lowest limit of the median class, 𝒇𝒎 is the frequency of the median
class, 𝒘 is the width of the median class, 𝒏 is the sample size and 𝒄𝒇𝒃 is the cumulative
frequency of the class before the median class.
Note: The median class is the first class having a cumulative relative frequency greater than 50%.
30
30
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Median
Examples
Find the median of the following datasets:
Dataset 1: 713, 300, 618, 595, 311, 401, and 292.
Dataset 2: 684, 764, 656, 702, 856, 1133, 1132, 1303.
31
31
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Mode
The value that occurs most often in a data set is called the mode.
 Unimodal: a dataset with one mode.
 Bimodal: a dataset with two modes.
 Multimodal: a dataset with more than two modes.
 No mode: a dataset can have no mode.
Note: a dataset can have no mode, one mode, two modes or even more modes.
The mode for grouped data is the modal class. The modal class is the class with
the largest frequency.
32
32
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Mode
Examples:
Find the mode in the following datasets:
Dataset 1: 20.0, 16.0, 34.3, 13, 12.5, 13, 12.4, 13.
Dataset 2: 110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 752.
Dataset 3: 104, 104, 104, 104, 104, 107, 109, 109, 109, 110, 109, 111, 112, 111, 109.
Table 1: Distribution of students by major field of
studies
Table 2: frequency distribution of miles that 20
runners ran in one week.
33
33
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
Comparison of the Mean, Median and Mode
A small company consists of the owner, the manager, the salesperson, and two
technicians, all of whose annual salaries are listed here. (Assume that this is the entire
population.)
34
34
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Midrange
The midrange is a rough estimate of the middle. It is found by adding the lowest and
highest values in the data set and dividing by 2. It is a very rough estimate of the
average and can be affected by one extremely high or low value.
MR=
𝑋𝑚𝑖𝑛+𝑋𝑚𝑎𝑥
2
35
35
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
The Midrange
Example
Find the midrange of this dataset and compare it with the mean. What can you say?
Dataset: 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
36
36
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
In statistics, several measures can be used for an average. The most common measures
are the mean, median, mode, and midrange. Each has its own specific purpose and
use. However, several other averages, such as the harmonic mean, the geometric
mean, and the quadratic mean exist. Their applications are limited to specific areas.
37
37
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
Properties and Uses of Central Tendency
The mean
1. It is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken from
the same population and all three measures are computed for these samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean for the data set is unique and not necessarily one of the data values.
5. The mean cannot be computed for the data in a frequency distribution that
has an open-ended class.
6. The mean is affected by extremely high or low values, called outliers, and may
not be the appropriate average to use in these situations.
38
38
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
Properties and Uses of Central Tendency
The Median
1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data values
fall into the upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low
values.
39
39
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
Properties and Uses of Central Tendency
The Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
3. The mode can be used when the data are nominal, such as religious
preference, gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or
the mode may not exist for a data set.
40
40
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
Properties and Uses of Central Tendency
The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set..
41
41
IV. ORGANIZATION OF DATA
4.1- Measures of Central Tendency
Class discussions.
 Discuss the effect of the measures of central tendency on the shape of a
distribution.
 Give some practical examples of the most commonly seen distributions.
 How does the shape of a distribution determines which measures of central
tendency to use.
42
42
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
In order to better describe a dataset, Statisticians do not only consider measures
of central tendency, but they also look at other measures such as measures of
variation and position.
In this section, we will learn how to compute and interpret measures of
variation such as the range, variance and standard variation.
43
43
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Consider this example from the Elementary Statistics
book:
A testing lab wishes to test two experimental brands of
outdoor paint to see how long each will last before
fading. The testing lab makes 6 gallons of each paint to
test. Since different chemical agents are added to each
group and only six cans are involved, these two groups
constitute two small populations. The results (in months)
are shown to the right. Find the mean of each group.
44
44
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Solution
As seen, the two brands
have the same means, 35
but brand B varies less
then brand A (indicating
that Brand B is more
consistent).
45
45
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
The Range
The range is the highest value minus the lowest value.
The symbol 𝑅 is used for the range. 𝑅= highest value − lowest value
Note: The range can greatly be affected by outliers. Because of this,
statisticians usually used variance and standard deviation.
46
46
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
The Variance
The variance is the average of the squares of the distance each value is
from the mean. The symbol for the population variance is 𝜎2(𝜎 is the
Greek lowercase letter sigma).
Note: The Standard deviation is given by the square root of the variance.
𝜎2 =
(𝑋 − 𝜇)2
𝑁
Population Variance
𝑠2 =
(𝑋 − 𝑋)2
𝑛 − 1
Sample Variance
47
47
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
The Variance
Note: The Standard deviation is given by the square root of the variance.
𝑠2
=
𝑛( 𝑋2) − ( 𝑋)2
𝑛(𝑛 − 1)
Simplest formula for finding Sample Variance
48
48
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Example
Note: The Standard deviation is given by the square root of the variance.
Find the variances of the two brands
of paints given to the right.
49
49
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Variance and Standard Deviation for Grouped Data
𝑠2 =
𝑛( 𝑓∙𝑋𝑚
2 )−( 𝑓∙𝑋𝑚)2
𝑛(𝑛−1)
, where 𝑋𝑚 represents the class midpoint.
50
50
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Variance and Standard Deviation for Grouped Data
𝑬𝒙𝒂𝒎𝒑𝒍𝒆.
Find the variance and
standard deviation of
this dataset.
51
51
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Uses of the Variance and Standard Deviation
1. The variances and standard deviations can be used to determine the spread of the
data. If the variance or standard deviation is large, the data are more dispersed. This
information is useful in comparing two (or more) data sets to determine which is more
(most) variable.
2. The measures of variance and standard deviation are used to determine the
consistency of a variable. For example, in the manufacture of fittings, such as nuts and
bolts, the variation in the diameters must be small, or the parts will not fit together.
52
52
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Uses of the Variance and Standard Deviation
3. The variance and standard deviation are used to determine the number of data
values that fall within a specified interval in a distribution.
4. Finally, the variance and standard deviation are used quite often in inferential
statistics.
Note: The range can be used to approximate the standard deviation.
The approximation is called the range rule of thumb. 𝑆 ≈
𝑅
4
53
53
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Coefficient of Variation
The Coefficient of Variation (CV) is a statistic that allows us to
compare standard deviations when the units are different.
For example, we might want to compare the standard deviation of the number of hours
that Firestone employees work weekly with the standard deviation of their weekly
earnings.
54
54
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Coefficient of Variation
The CV is the standard deviation divided by the mean. The result is expressed as a
percentage.
𝑪𝑽 =
𝑆
𝑿
∙ 100
Sample
𝑪𝑽 =
𝜎
𝜇
∙ 100
Population
55
55
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Coefficient of Variation
Example
Suppose the mean of the number of hours that Firestone employees work weekly
is 48 hours and the standard deviation is 3 hours. Assuming also that the mean of
their weekly earnings is $15, 250 LD, and the standard deviation is $850 LD.
Compare the variations of the two variables.
56
56
IV. ORGANIZATION OF DATA
4.2- Measures of Variation
Coefficient of Variation
Solution
𝑪𝑽 =
3
48
∙ 100 = 6.25%
Number of hours that Firestone employees work weekly
𝑪𝑽 =
850
15,250
∙ 100 = 5.57%
Firestone employees weekly earnings
Interpretation: Since the coefficient of variation is smaller for Firestone employees weekly
earnings, we can say that the weekly earning of the employees is less variable than the
number of hours they work weekly.
57
57
IV. ORGANIZATION OF DATA
Group Presentation
 Divide the class into two groups;
 Each group is to make a presentation on one of the following:
a). Chebyshev’s theorem;
b). The Normal Rule.
 Each presentation should highlight the following:
• Brief description of the theorem or rule;
• Importance of the theorem or rule;
• Presentation of formula (s) if any;
• A practical example of how the theorem or rule is used in real life situations.
Note: Each group will have a maximum of 10 minutes for their presentation, including Q&As
58
58
CHAPTER FIVE
DISCRETE PROBABILITY
DISTRIBUTIONS
Discrete Probability Distributions
OBJECTIVES
After completing this chapter, you should be able to :
1 - Construct a probability distribution for a random variable.
2 - Find the mean, variance, standard deviation, and expected value
for a discrete random variable.
3 - Find the exact probability for X successes in n trials of a binomial
experiment.
Discrete Probability Distributions
OBJECTIVES
4 - Find the mean, variance, and standard deviation for the variable of
a binomial distribution.
5 - Find probabilities for outcomes of variables, using the Poisson,
hypergeometric, and multinomial distributions.
Discrete Probability Distributions
INTRODUCTION
By assigning probabilities to all possible outcome, we can make many
decisions.
For example, a crime statistician at the LNP can compute the probabilities that 0, 1, 2
or more crimes will be committed next month.
A statistician at the MOT might choose to assign probabilities to the number of
vehicles that will be register next year.
Discrete Probability Distributions
INTRODUCTION
Once these probabilities are assigned, statistics such as the 𝜇, 𝜎2
and
𝜎 can be computed for these events. With these statistics, various
decisions can be made. The crime statistician will be able to compute
the average number of crimes next month. The MOT statistician can
easily advise the management on how many license plates should be
made available next year.
Discrete Probability Distributions
PROBABILITY DISTRIBUTIONS
We firstly need to review the definition of a variable.
What is a variable?
A variable is a characteristic or attribute that can assume different
values. Various letters of the alphabet, such as X, Y, or Z, are used to
represent variables.
Discrete Probability
Distributions
PROBABILITY DISTRIBUTIONS
A random variable is a variable whose values are determined by
chance.
A random variable can be discrete or continuous.
Discrete variables have a finite number of possible values or an
infinite number of values that can be counted.
Discrete Probability
Distributions
PROBABILITY DISTRIBUTIONS
A discrete probability distribution consists of the values a random
variable can assume and the corresponding probabilities of the values.
Discrete probability distributions can be shown by using a graph or a
table. Probability distributions can also be represented by a formula.
Discrete Probability
Distributions
PROBABILITY DISTRIBUTIONS
EX.
Construct a probability distribution for the number of heads when a
coin is tossed three times.
Discrete Probability
Distributions
PROBABILITY DISTRIBUTIONS
Two Requirements for a Probability Distribution
1. The sum of the probabilities of all the events in the sample space
must equal 1; that is, P(X) = 1.
2. The probability of each event in the sample space must be
between 0 and 1 (or equal to 0 or 1). That is, 0 ≤ P(X) ≤ 1.
Discrete Probability
Distributions
MEAN, VARIANCE, STANDARD DEVIATION, AND EXPECTATION
The mean, variance, and standard deviation for a probability
distribution are computed differently from the mean, variance, and
standard deviation for samples.
How are means calculated for samples or population?
Discrete Probability
Distributions
THE MEAN
Formula for the Mean of a Probability Distribution
𝜇 = 𝑋1 ∙ 𝑃 𝑋1 + 𝑋2 ∙ 𝑃 𝑋2 +𝑋3 ∙ 𝑃 𝑋3 + ⋯ + 𝑋𝑁 ∙ 𝑃 𝑋𝑁 =
𝑖=1
𝑁
𝑋𝑖 ∙ 𝑃(𝑋𝑖)
where 𝑋1, 𝑋2, 𝑋3, . . . , 𝑋𝑁 are the outcomes and 𝑃 𝑋1 , 𝑃 𝑋2 , 𝑃 𝑋3 ,. . . , 𝑃 𝑋𝑁 are the
corresponding probabilities.
Discrete Probability
Distributions
THE MEAN
EX.
Find the mean of the number of heads that appear when a coin is
tossed three times.
Discrete Probability
Distributions
THE VARIANCE AND STANDARD DEVIATION
Formula for the Variance of a Probability Distribution
𝜎2
=
𝑖=1
𝑁
[𝑋𝑖
2
∙ 𝑃(𝑋𝑖)] − 𝜇2
The SD is: 𝜎 = 𝑖=1
𝑁
[𝑋𝑖
2
∙ 𝑃(𝑋𝑖)] − 𝜇2
Discrete Probability
Distributions
THE VARIANCE AND STANDARD DEVIATION
Compute the variance and standard deviation for the probability
distribution in the previous example.
Discrete Probability
Distributions
EXPECTATION
Another concept related to the mean for a probability distribution is
that of expected value or expectation.
Expected value is used in various types of games of chance, in
insurance, and in other areas, such as decision theory
Discrete Probability
Distributions
EXPECTATION
The expected value of a discrete random variable of a probability
distribution is the theoretical average of the variable.
𝜇 = 𝐸 𝑋 = 𝑋 ∗ 𝑃(𝑋)
Discrete Probability
Distributions
EXPECTATION
EX 1.
One thousand tickets are sold at $1 each for a color television valued
at $350. What is the expected value of the gain if you purchase one
ticket?
Discrete Probability
Distributions
EXPECTATION
SOLUTION Win Lose
Gain X 349 -1
Probability 1
1000
999
1000
𝐸 𝑋 =
𝑖=1
𝑁
𝑋𝑖 ∙ 𝑃 𝑋𝑖 = 349 ∙
1
1000
+ −1 ∙
999
1000
𝑬 𝑿 = -$0.65
Discrete Probability
Distributions
EXPECTATION
EX 2.
One thousand tickets are sold at $1 each for four prizes of $100, $50,
$25, and $10. After each prize drawing, the winning ticket is then
returned to the pool of tickets. What is the expected value if you
purchase two tickets?
Discrete Probability
Distributions
EXPECTATION
SOLUTION Win Lose
Gain X $98 $48 $23 $8 -$2
Probability
2
1000
2
1000
2
1000
2
1000
992
1000
𝐸 𝑋 =
𝑖=1
𝑁
𝑋𝑖 ∙ 𝑃 𝑋𝑖 = 98 ∙
2
1000
+ 48 ∙
2
1000
+ 23 ∙
2
1000
+ 8 ∙
2
1000
+ (−2) ∙
992
1000
𝑬 𝑿 = -$1.63
Discrete Probability
Distributions
THE BINOMIAL DISTRIBUTION
Many types of probability problems have only two outcomes or can be
reduced to two outcomes.
For example, when a coin is tossed, it can land heads or tails. When a
baby is born, it will be either male or female. In a basketball game, a
team either wins or loses.
A true/false item can be answered in only two ways, true or false.
Discrete Probability
Distributions
THE BINOMIAL DISTRIBUTION
A binomial experiment is a probability experiment that satisfies the
following four requirements:
1. There must be a fixed number of trials.
1. Each trial can have only two outcomes or outcomes that can be
reduced to two outcomes. These outcomes can be considered as
either success or failure.
Discrete Probability
Distributions
THE BINOMIAL DISTRIBUTION
3. The outcomes of each trial must be independent of one another.
4. The probability of a success must remain the same for each trial.
A binomial experiment and its results give rise to a special probability
distribution called the binomial distribution.
Discrete Probability
Distributions
THE BINOMIAL DISTRIBUTION
The outcomes of a binomial experiment and the corresponding
probabilities of these outcomes are called a binomial distribution.
Discrete Probability
Distributions
NOTATION FOR THE BINOMIAL DISTRIBUTION
P(S) => probability of success
P(F) => probability of failure
p => numerical probability of a success
q => numerical probability of a failure
P(S) = p and P(F) = 1 - p = q
n number of trials
X number of successes in n trials
Discrete Probability
Distributions
BINOMIAL PROBABILITY FORMULA
P(X) =
𝑛!
(𝑛 −𝑋)!𝑋!
𝑝𝑋
*𝑞𝑛−𝑋
Discrete Probability
Distributions
BINOMIAL PROBABILITY FORMULA
A coin is tossed 3 times. Find the probability of getting exactly two
heads (Use the binomial probability formula).
Discrete Probability
Distributions
MEAN, VARIANCE, AND STANDARD DEVIATION FOR THE
BINOMIAL DISTRIBUTION
Mean: 𝜇 = n ∙ 𝑝
Variance: 𝜎2= n ∙ 𝑝 ∙ 𝑞
Standard deviation: σ = 𝑛 ∙ 𝑝 ∙ 𝑞
Discrete Probability
Distributions
CREATING A BINOMIAL DISTRIBUTION AND GRAPH IN EXCEL
See page 282 of the text book for step by step instruction.
Discrete Probability
Distributions
THE MULTINOMIAL DISTRIBUTION
We use Multinomial Distribution in cases where each trial has more
than two outcomes.
Ex. In an experiment involving choice of best subject (Math, English,
and Biology)
Discrete Probability
Distributions
THE MULTINOMIAL DISTRIBUTION
In Multinomial Distribution,
 probability of success is constant for each trial,
 outcomes are independent for a fixed number of trials,
 events are mutually exclusive.
Discrete Probability
Distributions
FORMULA FOR THE MULTINOMIAL DISTRIBUTION
𝑃 𝑋 =
𝑛!
𝑋1!∙𝑋2!∙𝑋3!∙⋯𝑋𝑘!
∙ 𝑝1
𝑋1
∙ 𝑝2
𝑋2
∙ 𝑝3
𝑋3
… . 𝑝𝑘
𝑋𝑘
where 𝑋1 + 𝑋2 + 𝑋3 + … + 𝑋𝑘 = 𝑛 𝑎𝑛𝑑 𝑝1 + 𝑝2 + 𝑝3 … + 𝑝𝑘 = 1.
Discrete Probability
Distributions
EX.
In a large city, 50% of the people choose a movie, 30% choose dinner
and a play, and 20% choose shopping as a leisure activity. If a sample
of 5 people is randomly selected, find the probability that 3 are
planning to go to a movie, 1 to a play, and 1 to a shopping mall
Discrete Probability
Distributions
THE POISSON DISTRIBUTION
A discrete probability distribution that is useful when n is large and p
is small and when the independent variables occur over a period of
time is called the Poisson distribution.
Discrete Probability
Distributions
THE POISSON DISTRIBUTION
The Poisson distribution can be used when a density of items is
distributed over a given area or volume, such as the number of plants
growing per acre or the number of defects in a given length of
videotape.
Discrete Probability
Distributions
FORMULA FOR THE POISSON DISTRIBUTION
𝑃 𝑋, 𝜆 =
℮−𝜆𝜆𝑋
𝑋!
where 𝑋 = 0,1,2, …
The letter ℮ is a constant approximately equal to 2.7183.
Discrete Probability
Distributions
EX 1.
If there are 200 typographical errors randomly distributed in a 500-
page manuscript, find the probability that a given page contains
exactly 3 errors.
Discrete Probability
Distributions
EX 2.
A sales firm receives, on average, 3 calls per hour on its toll-free
number. For any given hour, find the probability that it will receive the
following.
a. At most 3 calls
b. At least 3 calls
c. 5 or more calls
Discrete Probability
Distributions
FORMULA FOR THE HYPERGEOMETRIC DISTRIBUTION
𝑃 𝑋 =
𝑎𝐶𝑋 ∙ 𝑏𝐶𝑛−𝑋
𝑎 + 𝑏𝐶𝑛
Discrete Probability
Distributions
EX 1.
Ten people apply for a job as assistant manager of a restaurant. Five
have completed college and five have not. If the manager selects 3
applicants at random, find the probability that all 3 are college
graduates.
Discrete Probability
Distributions
EX 2.
A recent study found that 2 out of every 10 houses in a neighborhood
have no insurance. If 5 houses are selected from 10 houses, find the
probability that exactly 1 will be uninsured.
CONCLUSION
C

Mais conteúdo relacionado

Semelhante a STATS 101 WK7 NOTE.pptx

2. week 2 data presentation and organization
2. week 2 data presentation and organization2. week 2 data presentation and organization
2. week 2 data presentation and organizationrenz50
 
lesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptxlesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptxAnalynPasto
 
Unit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptxUnit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptxshakirRahman10
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X ClassRanveer Kumar
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptxssuser03ba7c
 
Unit 4 editing and coding (2)
Unit 4 editing and coding (2)Unit 4 editing and coding (2)
Unit 4 editing and coding (2)kalailakshmi
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statisticsSachinKumar1799
 
Presentation of data
Presentation of dataPresentation of data
Presentation of dataRuby Ocenar
 
[Tema 1] estadística descriptiva
[Tema 1] estadística descriptiva[Tema 1] estadística descriptiva
[Tema 1] estadística descriptiva7158AS
 
Spss presentation
Spss presentationSpss presentation
Spss presentationKabir Khan
 
Basic understanding of Plots and diagrams used in data interpretation
 Basic understanding of Plots and diagrams used in data interpretation   Basic understanding of Plots and diagrams used in data interpretation
Basic understanding of Plots and diagrams used in data interpretation Subedi Suraj
 
datacollection and presentation.pdf
datacollection and presentation.pdfdatacollection and presentation.pdf
datacollection and presentation.pdfDibyenduBiswas31
 
Histograms and polygons
Histograms and polygonsHistograms and polygons
Histograms and polygonsshivang1999
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfHimakshi7
 
2 biostatistics presenting data
2  biostatistics presenting data2  biostatistics presenting data
2 biostatistics presenting dataDr. Nazar Jaf
 

Semelhante a STATS 101 WK7 NOTE.pptx (20)

Bba 2001
Bba 2001Bba 2001
Bba 2001
 
2. week 2 data presentation and organization
2. week 2 data presentation and organization2. week 2 data presentation and organization
2. week 2 data presentation and organization
 
lesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptxlesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptx
 
Unit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptxUnit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptx
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X Class
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx
 
Unit 4 editing and coding (2)
Unit 4 editing and coding (2)Unit 4 editing and coding (2)
Unit 4 editing and coding (2)
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statistics
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
[Tema 1] estadística descriptiva
[Tema 1] estadística descriptiva[Tema 1] estadística descriptiva
[Tema 1] estadística descriptiva
 
Spss presentation
Spss presentationSpss presentation
Spss presentation
 
Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data
 
Data handling
Data handlingData handling
Data handling
 
Basic understanding of Plots and diagrams used in data interpretation
 Basic understanding of Plots and diagrams used in data interpretation   Basic understanding of Plots and diagrams used in data interpretation
Basic understanding of Plots and diagrams used in data interpretation
 
datacollection and presentation.pdf
datacollection and presentation.pdfdatacollection and presentation.pdf
datacollection and presentation.pdf
 
statistic.ppt
statistic.pptstatistic.ppt
statistic.ppt
 
Histograms and polygons
Histograms and polygonsHistograms and polygons
Histograms and polygons
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
 
2 biostatistics presenting data
2  biostatistics presenting data2  biostatistics presenting data
2 biostatistics presenting data
 
Chp 3
Chp 3Chp 3
Chp 3
 

Último

Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 

Último (20)

Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 

STATS 101 WK7 NOTE.pptx

  • 1. 1 1 1 1 1 1 1 1 1 UNIVERSITY OF LIBERIA T.J.R. FAULKNER COLLEGE OF SCIENCE, TECHNOLOGY AND ENVIRONMENTAL SCIENCES DEPARTMENT OF MATHEMATICS STATISTICS PROGRAM STATS 101: INTRODUCTION TO STATISTICS Instructor: Mr. Mulbah K.A. Kromah, Principal Analyst, Office of the DDGSDP & Part-time instructor, UL department of mathematics
  • 2. 2 PLAN 1 2 01 INTRODUCTION 02 Overview of the course outline 06 Random variables History of statistics 03 Basic definitions, types of data and other key concepts 04 08 Correlation & regression Descriptive statistics 05 07 Probability distributions & Statistical inference 09 SUMMARY
  • 3. 3 Overview of the course outline 3 OBJECTIVE OF THE COURSE:  Provide students with a brief history of statistics;  Help students understand the basic definitions used in statistics, the types of data used and the basic sampling methods;  Help students learn how to avoid making misleading conclusions;  Introduces students to the field of descriptive statistics (Data organization, visualization and summarization);
  • 4. 4 Overview of the course outline 4 OBJECTIVE OF THE COURSE:  Introduces students to the concept of random variables;  Help students learn about the basic index numbers;  Introduces students to probability distributions;  Introduces students to statistical inference, correlation and regression
  • 5. 5 ORGANIZATION OF DATA 5 INTRODUCTION TO DESCRIPTIVE STATISTICS FREQUENCY DISTRIBUTION AND GRAPHS
  • 6. 6 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs Quantitative Data Representation When data are quantitative, several types of representations are often used:  Histogram  Frequency polygon  Ogive  Stem and leaf plot  Dot plot  Scatter plot, etc. 6
  • 7. 7 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs Quantitative Data Representation In this course, we will provide a brief explanation of the uses of four of these graphs, namely:  Histogram  Frequency polygon  Ogive  Stem and leaf plot  Scatter plot (specifically time series graph)  Bar graph 7
  • 8. 8 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs Quantitative Data Representation 8
  • 9. 9 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs Quantitative and quantitative Data Representation 9
  • 10. 10 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 10 Distribution Shapes  When describing data, it is important to recognize the shapes of the distribution values. This is important in understanding which statistical method to use in analyzing the data.  A distribution can have many shapes, and one method of analyzing a distribution is to draw a histogram or frequency polygon for the distribution.  Distributions are most often not perfectly shaped, so it is not necessary to have an exact shape but rather to identify an overall pattern.
  • 11. 11 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 11 Classification of Distribution Shapes
  • 12. 12 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 12 Avoid using misleading graphs  Changing the units at the starting point on the y axis can convey a very different visual representation of the data.  Avoid exaggerating a one-dimensional increase by showing it in two dimensions.  Avoid omitting labels or units on the axes of the graph.  Always include the basic elements of a graph (titles, units, source and notes)
  • 13. 13 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 13 Summary of Graphs and Uses of Each
  • 14. 14 III. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 14 Stem and Leaf Plots  A stem and leaf plot is a data plot that uses part of the data value as the stem and part of the data value as the leaf to form groups or classes.  Stem and leaf plot can be used to compare two related distributions (back-to-back stem and leaf plot)  When analyzing a stem and leaf plot, one look for peaks and gaps in the distribution. You should also analyze the form of the distribution (symmetric or skewed). Check the variability of the data by looking at the spread (range, variance, SD).
  • 15. 15 IIII. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 15 How to construct a Stem and Leaf Plot? Step 1: Arrange the data in order (Optional but very helpful). Step 2: Separate the data according to the classes. Step 3: Plot the data using one of the diagram below: Leading digit (stem) Trailing digit for dist. 2 (leaf) Trailing digit for dist. 1 (leaf) Trailing digit (leaf) Leading digit (stem) Back-to-back stem and leaf plot
  • 16. 16 IIII. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 16 Exercise 1 Bomi citizens age 55 33 5 37 27 31 42 12 45 5 0 44 6 17 8 3 10 42 9 3 26 34 28 7 55 3 3 9 48 2 Bassa citizen age 30 28 2 10 8 40 23 26 8 3 4 62 42 29 35 2 45 5 27 26 3 40 22 0 16 41 26 11 62 6 The dataset on the right shows the ages of 30 Bomi and Bassa citizens extracted from the 2008 NPHC of Liberia. Use this dataset to construct stem and leaf plots for the two counties. Use a back- to-back stem and leaf plot to compare the two distributions.
  • 17. 17 IIII. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 17 Solution 0 0 2 3 3 3 3 5 5 6 7 8 9 9 1 0 2 7 2 6 7 8 3 1 3 4 7 4 2 2 4 5 8 5 5 5 Distribution of 30 randomly selected Bomi citizens’ age 0 0 2 2 3 3 4 5 6 8 8 1 0 1 6 2 2 3 6 6 6 7 8 9 3 0 5 4 0 0 1 2 5 5 6 2 2 Distribution of 30 randomly selected Bassa citizens’ age Note: There are no data in the sixth class for Bassa. Do not put 0 in the leaf for this class, just leave it blank (that is why we only wrote the stem number, 5)
  • 18. 18 IIII. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 18 Solution 0 0 2 3 3 3 3 5 5 6 7 8 9 9 1 0 2 7 2 6 7 8 3 1 3 4 7 4 2 2 4 5 8 5 5 5 Bomi 8 8 6 5 4 3 3 2 2 0 0 6 1 0 1 9 8 7 6 6 6 3 2 2 5 0 3 5 2 1 0 0 4 5 2 2 6 Bassa
  • 19. 19 IIII. ORGANIZATION OF DATA: Frequency distribution and graphs 19 Summary Statisticians or researchers collect raw data. To obtain much information from this data, they must organize it in some meaningful way. A frequency distribution using classes is often used for this purpose. Once a frequency distribution is constructed, the representation of the data by graphs becomes easy. The most commonly used graphs in research statistics are the histogram, frequency polygon, ogive, bar graph, Pareto chart, time series graph, and pie graph. Finally, a stem and leaf plot uses part of the data values as stems and part of the data values as leaves. This graph has the advantages of a frequency distribution and a histogram.
  • 20. 20 IIII. ORGANIZATION OF DATA: Frequency distribution and graphs - 3.3 Data representation using Graphs 20 Exercise 2 Bomi citizens age 55 33 5 37 27 31 42 12 45 5 0 44 6 17 8 3 10 42 9 3 26 34 28 7 55 3 3 9 48 2 Bassa citizen age 30 28 2 10 8 40 23 26 8 3 4 62 42 29 35 2 45 5 27 26 3 40 22 0 16 41 26 11 62 6 Use the dataset on the right to construct stem and leaf plots for the two counties using the following age groupings: 0 - 9 10 - 14 15 - 24 25 - 44 45 - 54 55 - 64.
  • 22. 22 22 IV. ORGANIZATION OF DATA In the previous chapter, we learned how to obtain useful information from raw data by organizing them into a frequency distribution and then presenting the data by using various graphs. In this chapter, we will learn about the statistical methods that can be used to summarize data. Our main objective will be to find the “central number” or the “most typical case” in our dataset and then analyze the relationship between this number and the other numbers in the dataset.
  • 23. 23 23 IV. ORGANIZATION OF DATA First, we will look at the measures of average, also called measures of central tendency. They include the mean, median, mode, and midrange. Next, we will learn about measures of variation, or measures of dispersion. These measures include the range, variance, and standard deviation. Lastly, we will learn how to compute and interpret measures of position, which include percentiles, deciles, and quartiles.
  • 24. 24 24 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Mean The mean, also known as the arithmetic average, is found by adding the values of the data and dividing by the total number of values. 𝑋 = 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛 𝑛 = 𝑖=1 𝑛 𝑋𝑖 𝑛 Sample mean µ= 𝑋1+𝑋2+𝑋3+⋯+𝑋𝑁 𝑁 = 𝑖=1 𝑁 𝑋𝑖 𝑁 Population mean 𝑋 = 𝑊1𝑋1 + 𝑊2𝑋2 + 𝑊3𝑋3 + ⋯ + 𝑊 𝑛𝑋𝑛 𝑊1 + 𝑊2 + 𝑊3 + ⋯ + 𝑊 𝑛 = 𝑖=1 𝑛 𝑊𝑖𝑋𝑖 𝑖=1 𝑛 𝑊𝑖 Weighted mean where 𝑊1, 𝑊2, 𝑊3, … , 𝑊 𝑛 are the weights and 𝑋1, 𝑋2, 𝑋3, … , 𝑋𝑛 are the values.
  • 25. 25 25 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Mean 𝑋 = 𝑓1𝑀1 + 𝑓2𝑀2 + 𝑓3𝑀3 + ⋯ + 𝑓𝑗𝑀𝑗 𝑛 = 𝑗=1 𝑛 𝑓𝑗𝑀𝑗 𝑛 Mean for a group data where 𝑓1, 𝑓2, 𝑓3, … , 𝑓𝑗 are the frequencies and 𝑀1, 𝑀2, 𝑀3, … , 𝑀𝑗 are the midpoints of the classes.
  • 26. 26 26 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Mean Examples 2. Find student La Paix GPA if he has the following grades: An A in English 201 (4 credits), a C in Statistics 103 (3 credits), a D in Math 202 (4 credits) and an F in Statistics 203 (3 credits), considering that A=4 points, B=3 points, C= 2 points, D= 1 point and F= 0 point. 1. The daily transportation of 6 UL students are given below: $150LD, $450LD, $600LD, $200LD, $700LD, $150LD. Find the average daily transportation of these students.
  • 27. 27 27 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Mean Examples 3- Find the average of the group data given on the right.
  • 28. 28 28 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Median The median is the midpoint of the data array. The symbol for the median is MD. A data array is a dataset that has been ordered. To find the median, all one needs to do is to arrange the dataset in order and then locate the middle number. When the number of data values is even, the median will be the midpoint of the two middle numbers. The median tells us that 50% of the data values are above it while 50% are below it.
  • 29. 29 29 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Median We can also find the median for a grouped data using the formula below: 𝑀𝐷 = 𝐿𝑚 + 𝑤 𝑓𝑚 (0.5𝑛 − 𝑐𝑓𝑏) 𝑊ℎ𝑒𝑟𝑒, 𝑳𝒎 is the lowest limit of the median class, 𝒇𝒎 is the frequency of the median class, 𝒘 is the width of the median class, 𝒏 is the sample size and 𝒄𝒇𝒃 is the cumulative frequency of the class before the median class. Note: The median class is the first class having a cumulative relative frequency greater than 50%.
  • 30. 30 30 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Median Examples Find the median of the following datasets: Dataset 1: 713, 300, 618, 595, 311, 401, and 292. Dataset 2: 684, 764, 656, 702, 856, 1133, 1132, 1303.
  • 31. 31 31 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Mode The value that occurs most often in a data set is called the mode.  Unimodal: a dataset with one mode.  Bimodal: a dataset with two modes.  Multimodal: a dataset with more than two modes.  No mode: a dataset can have no mode. Note: a dataset can have no mode, one mode, two modes or even more modes. The mode for grouped data is the modal class. The modal class is the class with the largest frequency.
  • 32. 32 32 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Mode Examples: Find the mode in the following datasets: Dataset 1: 20.0, 16.0, 34.3, 13, 12.5, 13, 12.4, 13. Dataset 2: 110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 752. Dataset 3: 104, 104, 104, 104, 104, 107, 109, 109, 109, 110, 109, 111, 112, 111, 109. Table 1: Distribution of students by major field of studies Table 2: frequency distribution of miles that 20 runners ran in one week.
  • 33. 33 33 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency Comparison of the Mean, Median and Mode A small company consists of the owner, the manager, the salesperson, and two technicians, all of whose annual salaries are listed here. (Assume that this is the entire population.)
  • 34. 34 34 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Midrange The midrange is a rough estimate of the middle. It is found by adding the lowest and highest values in the data set and dividing by 2. It is a very rough estimate of the average and can be affected by one extremely high or low value. MR= 𝑋𝑚𝑖𝑛+𝑋𝑚𝑎𝑥 2
  • 35. 35 35 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency The Midrange Example Find the midrange of this dataset and compare it with the mean. What can you say? Dataset: 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
  • 36. 36 36 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency In statistics, several measures can be used for an average. The most common measures are the mean, median, mode, and midrange. Each has its own specific purpose and use. However, several other averages, such as the harmonic mean, the geometric mean, and the quadratic mean exist. Their applications are limited to specific areas.
  • 37. 37 37 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency Properties and Uses of Central Tendency The mean 1. It is found by using all the values of the data. 2. The mean varies less than the median or mode when samples are taken from the same population and all three measures are computed for these samples. 3. The mean is used in computing other statistics, such as the variance. 4. The mean for the data set is unique and not necessarily one of the data values. 5. The mean cannot be computed for the data in a frequency distribution that has an open-ended class. 6. The mean is affected by extremely high or low values, called outliers, and may not be the appropriate average to use in these situations.
  • 38. 38 38 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency Properties and Uses of Central Tendency The Median 1. The median is used to find the center or middle value of a data set. 2. The median is used when it is necessary to find out whether the data values fall into the upper half or lower half of the distribution. 3. The median is used for an open-ended distribution. 4. The median is affected less than the mean by extremely high or extremely low values.
  • 39. 39 39 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency Properties and Uses of Central Tendency The Mode 1. The mode is used when the most typical case is desired. 2. The mode is the easiest average to compute. 3. The mode can be used when the data are nominal, such as religious preference, gender, or political affiliation. 4. The mode is not always unique. A data set can have more than one mode, or the mode may not exist for a data set.
  • 40. 40 40 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency Properties and Uses of Central Tendency The Midrange 1. The midrange is easy to compute. 2. The midrange gives the midpoint. 3. The midrange is affected by extremely high or low values in a data set..
  • 41. 41 41 IV. ORGANIZATION OF DATA 4.1- Measures of Central Tendency Class discussions.  Discuss the effect of the measures of central tendency on the shape of a distribution.  Give some practical examples of the most commonly seen distributions.  How does the shape of a distribution determines which measures of central tendency to use.
  • 42. 42 42 IV. ORGANIZATION OF DATA 4.2- Measures of Variation In order to better describe a dataset, Statisticians do not only consider measures of central tendency, but they also look at other measures such as measures of variation and position. In this section, we will learn how to compute and interpret measures of variation such as the range, variance and standard variation.
  • 43. 43 43 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Consider this example from the Elementary Statistics book: A testing lab wishes to test two experimental brands of outdoor paint to see how long each will last before fading. The testing lab makes 6 gallons of each paint to test. Since different chemical agents are added to each group and only six cans are involved, these two groups constitute two small populations. The results (in months) are shown to the right. Find the mean of each group.
  • 44. 44 44 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Solution As seen, the two brands have the same means, 35 but brand B varies less then brand A (indicating that Brand B is more consistent).
  • 45. 45 45 IV. ORGANIZATION OF DATA 4.2- Measures of Variation The Range The range is the highest value minus the lowest value. The symbol 𝑅 is used for the range. 𝑅= highest value − lowest value Note: The range can greatly be affected by outliers. Because of this, statisticians usually used variance and standard deviation.
  • 46. 46 46 IV. ORGANIZATION OF DATA 4.2- Measures of Variation The Variance The variance is the average of the squares of the distance each value is from the mean. The symbol for the population variance is 𝜎2(𝜎 is the Greek lowercase letter sigma). Note: The Standard deviation is given by the square root of the variance. 𝜎2 = (𝑋 − 𝜇)2 𝑁 Population Variance 𝑠2 = (𝑋 − 𝑋)2 𝑛 − 1 Sample Variance
  • 47. 47 47 IV. ORGANIZATION OF DATA 4.2- Measures of Variation The Variance Note: The Standard deviation is given by the square root of the variance. 𝑠2 = 𝑛( 𝑋2) − ( 𝑋)2 𝑛(𝑛 − 1) Simplest formula for finding Sample Variance
  • 48. 48 48 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Example Note: The Standard deviation is given by the square root of the variance. Find the variances of the two brands of paints given to the right.
  • 49. 49 49 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Variance and Standard Deviation for Grouped Data 𝑠2 = 𝑛( 𝑓∙𝑋𝑚 2 )−( 𝑓∙𝑋𝑚)2 𝑛(𝑛−1) , where 𝑋𝑚 represents the class midpoint.
  • 50. 50 50 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Variance and Standard Deviation for Grouped Data 𝑬𝒙𝒂𝒎𝒑𝒍𝒆. Find the variance and standard deviation of this dataset.
  • 51. 51 51 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Uses of the Variance and Standard Deviation 1. The variances and standard deviations can be used to determine the spread of the data. If the variance or standard deviation is large, the data are more dispersed. This information is useful in comparing two (or more) data sets to determine which is more (most) variable. 2. The measures of variance and standard deviation are used to determine the consistency of a variable. For example, in the manufacture of fittings, such as nuts and bolts, the variation in the diameters must be small, or the parts will not fit together.
  • 52. 52 52 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Uses of the Variance and Standard Deviation 3. The variance and standard deviation are used to determine the number of data values that fall within a specified interval in a distribution. 4. Finally, the variance and standard deviation are used quite often in inferential statistics. Note: The range can be used to approximate the standard deviation. The approximation is called the range rule of thumb. 𝑆 ≈ 𝑅 4
  • 53. 53 53 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Coefficient of Variation The Coefficient of Variation (CV) is a statistic that allows us to compare standard deviations when the units are different. For example, we might want to compare the standard deviation of the number of hours that Firestone employees work weekly with the standard deviation of their weekly earnings.
  • 54. 54 54 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Coefficient of Variation The CV is the standard deviation divided by the mean. The result is expressed as a percentage. 𝑪𝑽 = 𝑆 𝑿 ∙ 100 Sample 𝑪𝑽 = 𝜎 𝜇 ∙ 100 Population
  • 55. 55 55 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Coefficient of Variation Example Suppose the mean of the number of hours that Firestone employees work weekly is 48 hours and the standard deviation is 3 hours. Assuming also that the mean of their weekly earnings is $15, 250 LD, and the standard deviation is $850 LD. Compare the variations of the two variables.
  • 56. 56 56 IV. ORGANIZATION OF DATA 4.2- Measures of Variation Coefficient of Variation Solution 𝑪𝑽 = 3 48 ∙ 100 = 6.25% Number of hours that Firestone employees work weekly 𝑪𝑽 = 850 15,250 ∙ 100 = 5.57% Firestone employees weekly earnings Interpretation: Since the coefficient of variation is smaller for Firestone employees weekly earnings, we can say that the weekly earning of the employees is less variable than the number of hours they work weekly.
  • 57. 57 57 IV. ORGANIZATION OF DATA Group Presentation  Divide the class into two groups;  Each group is to make a presentation on one of the following: a). Chebyshev’s theorem; b). The Normal Rule.  Each presentation should highlight the following: • Brief description of the theorem or rule; • Importance of the theorem or rule; • Presentation of formula (s) if any; • A practical example of how the theorem or rule is used in real life situations. Note: Each group will have a maximum of 10 minutes for their presentation, including Q&As
  • 59. Discrete Probability Distributions OBJECTIVES After completing this chapter, you should be able to : 1 - Construct a probability distribution for a random variable. 2 - Find the mean, variance, standard deviation, and expected value for a discrete random variable. 3 - Find the exact probability for X successes in n trials of a binomial experiment.
  • 60. Discrete Probability Distributions OBJECTIVES 4 - Find the mean, variance, and standard deviation for the variable of a binomial distribution. 5 - Find probabilities for outcomes of variables, using the Poisson, hypergeometric, and multinomial distributions.
  • 61. Discrete Probability Distributions INTRODUCTION By assigning probabilities to all possible outcome, we can make many decisions. For example, a crime statistician at the LNP can compute the probabilities that 0, 1, 2 or more crimes will be committed next month. A statistician at the MOT might choose to assign probabilities to the number of vehicles that will be register next year.
  • 62. Discrete Probability Distributions INTRODUCTION Once these probabilities are assigned, statistics such as the 𝜇, 𝜎2 and 𝜎 can be computed for these events. With these statistics, various decisions can be made. The crime statistician will be able to compute the average number of crimes next month. The MOT statistician can easily advise the management on how many license plates should be made available next year.
  • 63. Discrete Probability Distributions PROBABILITY DISTRIBUTIONS We firstly need to review the definition of a variable. What is a variable? A variable is a characteristic or attribute that can assume different values. Various letters of the alphabet, such as X, Y, or Z, are used to represent variables.
  • 64. Discrete Probability Distributions PROBABILITY DISTRIBUTIONS A random variable is a variable whose values are determined by chance. A random variable can be discrete or continuous. Discrete variables have a finite number of possible values or an infinite number of values that can be counted.
  • 65. Discrete Probability Distributions PROBABILITY DISTRIBUTIONS A discrete probability distribution consists of the values a random variable can assume and the corresponding probabilities of the values. Discrete probability distributions can be shown by using a graph or a table. Probability distributions can also be represented by a formula.
  • 66. Discrete Probability Distributions PROBABILITY DISTRIBUTIONS EX. Construct a probability distribution for the number of heads when a coin is tossed three times.
  • 67. Discrete Probability Distributions PROBABILITY DISTRIBUTIONS Two Requirements for a Probability Distribution 1. The sum of the probabilities of all the events in the sample space must equal 1; that is, P(X) = 1. 2. The probability of each event in the sample space must be between 0 and 1 (or equal to 0 or 1). That is, 0 ≤ P(X) ≤ 1.
  • 68. Discrete Probability Distributions MEAN, VARIANCE, STANDARD DEVIATION, AND EXPECTATION The mean, variance, and standard deviation for a probability distribution are computed differently from the mean, variance, and standard deviation for samples. How are means calculated for samples or population?
  • 69. Discrete Probability Distributions THE MEAN Formula for the Mean of a Probability Distribution 𝜇 = 𝑋1 ∙ 𝑃 𝑋1 + 𝑋2 ∙ 𝑃 𝑋2 +𝑋3 ∙ 𝑃 𝑋3 + ⋯ + 𝑋𝑁 ∙ 𝑃 𝑋𝑁 = 𝑖=1 𝑁 𝑋𝑖 ∙ 𝑃(𝑋𝑖) where 𝑋1, 𝑋2, 𝑋3, . . . , 𝑋𝑁 are the outcomes and 𝑃 𝑋1 , 𝑃 𝑋2 , 𝑃 𝑋3 ,. . . , 𝑃 𝑋𝑁 are the corresponding probabilities.
  • 70. Discrete Probability Distributions THE MEAN EX. Find the mean of the number of heads that appear when a coin is tossed three times.
  • 71. Discrete Probability Distributions THE VARIANCE AND STANDARD DEVIATION Formula for the Variance of a Probability Distribution 𝜎2 = 𝑖=1 𝑁 [𝑋𝑖 2 ∙ 𝑃(𝑋𝑖)] − 𝜇2 The SD is: 𝜎 = 𝑖=1 𝑁 [𝑋𝑖 2 ∙ 𝑃(𝑋𝑖)] − 𝜇2
  • 72. Discrete Probability Distributions THE VARIANCE AND STANDARD DEVIATION Compute the variance and standard deviation for the probability distribution in the previous example.
  • 73. Discrete Probability Distributions EXPECTATION Another concept related to the mean for a probability distribution is that of expected value or expectation. Expected value is used in various types of games of chance, in insurance, and in other areas, such as decision theory
  • 74. Discrete Probability Distributions EXPECTATION The expected value of a discrete random variable of a probability distribution is the theoretical average of the variable. 𝜇 = 𝐸 𝑋 = 𝑋 ∗ 𝑃(𝑋)
  • 75. Discrete Probability Distributions EXPECTATION EX 1. One thousand tickets are sold at $1 each for a color television valued at $350. What is the expected value of the gain if you purchase one ticket?
  • 76. Discrete Probability Distributions EXPECTATION SOLUTION Win Lose Gain X 349 -1 Probability 1 1000 999 1000 𝐸 𝑋 = 𝑖=1 𝑁 𝑋𝑖 ∙ 𝑃 𝑋𝑖 = 349 ∙ 1 1000 + −1 ∙ 999 1000 𝑬 𝑿 = -$0.65
  • 77. Discrete Probability Distributions EXPECTATION EX 2. One thousand tickets are sold at $1 each for four prizes of $100, $50, $25, and $10. After each prize drawing, the winning ticket is then returned to the pool of tickets. What is the expected value if you purchase two tickets?
  • 78. Discrete Probability Distributions EXPECTATION SOLUTION Win Lose Gain X $98 $48 $23 $8 -$2 Probability 2 1000 2 1000 2 1000 2 1000 992 1000 𝐸 𝑋 = 𝑖=1 𝑁 𝑋𝑖 ∙ 𝑃 𝑋𝑖 = 98 ∙ 2 1000 + 48 ∙ 2 1000 + 23 ∙ 2 1000 + 8 ∙ 2 1000 + (−2) ∙ 992 1000 𝑬 𝑿 = -$1.63
  • 79. Discrete Probability Distributions THE BINOMIAL DISTRIBUTION Many types of probability problems have only two outcomes or can be reduced to two outcomes. For example, when a coin is tossed, it can land heads or tails. When a baby is born, it will be either male or female. In a basketball game, a team either wins or loses. A true/false item can be answered in only two ways, true or false.
  • 80. Discrete Probability Distributions THE BINOMIAL DISTRIBUTION A binomial experiment is a probability experiment that satisfies the following four requirements: 1. There must be a fixed number of trials. 1. Each trial can have only two outcomes or outcomes that can be reduced to two outcomes. These outcomes can be considered as either success or failure.
  • 81. Discrete Probability Distributions THE BINOMIAL DISTRIBUTION 3. The outcomes of each trial must be independent of one another. 4. The probability of a success must remain the same for each trial. A binomial experiment and its results give rise to a special probability distribution called the binomial distribution.
  • 82. Discrete Probability Distributions THE BINOMIAL DISTRIBUTION The outcomes of a binomial experiment and the corresponding probabilities of these outcomes are called a binomial distribution.
  • 83. Discrete Probability Distributions NOTATION FOR THE BINOMIAL DISTRIBUTION P(S) => probability of success P(F) => probability of failure p => numerical probability of a success q => numerical probability of a failure P(S) = p and P(F) = 1 - p = q n number of trials X number of successes in n trials
  • 84. Discrete Probability Distributions BINOMIAL PROBABILITY FORMULA P(X) = 𝑛! (𝑛 −𝑋)!𝑋! 𝑝𝑋 *𝑞𝑛−𝑋
  • 85. Discrete Probability Distributions BINOMIAL PROBABILITY FORMULA A coin is tossed 3 times. Find the probability of getting exactly two heads (Use the binomial probability formula).
  • 86. Discrete Probability Distributions MEAN, VARIANCE, AND STANDARD DEVIATION FOR THE BINOMIAL DISTRIBUTION Mean: 𝜇 = n ∙ 𝑝 Variance: 𝜎2= n ∙ 𝑝 ∙ 𝑞 Standard deviation: σ = 𝑛 ∙ 𝑝 ∙ 𝑞
  • 87. Discrete Probability Distributions CREATING A BINOMIAL DISTRIBUTION AND GRAPH IN EXCEL See page 282 of the text book for step by step instruction.
  • 88. Discrete Probability Distributions THE MULTINOMIAL DISTRIBUTION We use Multinomial Distribution in cases where each trial has more than two outcomes. Ex. In an experiment involving choice of best subject (Math, English, and Biology)
  • 89. Discrete Probability Distributions THE MULTINOMIAL DISTRIBUTION In Multinomial Distribution,  probability of success is constant for each trial,  outcomes are independent for a fixed number of trials,  events are mutually exclusive.
  • 90. Discrete Probability Distributions FORMULA FOR THE MULTINOMIAL DISTRIBUTION 𝑃 𝑋 = 𝑛! 𝑋1!∙𝑋2!∙𝑋3!∙⋯𝑋𝑘! ∙ 𝑝1 𝑋1 ∙ 𝑝2 𝑋2 ∙ 𝑝3 𝑋3 … . 𝑝𝑘 𝑋𝑘 where 𝑋1 + 𝑋2 + 𝑋3 + … + 𝑋𝑘 = 𝑛 𝑎𝑛𝑑 𝑝1 + 𝑝2 + 𝑝3 … + 𝑝𝑘 = 1.
  • 91. Discrete Probability Distributions EX. In a large city, 50% of the people choose a movie, 30% choose dinner and a play, and 20% choose shopping as a leisure activity. If a sample of 5 people is randomly selected, find the probability that 3 are planning to go to a movie, 1 to a play, and 1 to a shopping mall
  • 92. Discrete Probability Distributions THE POISSON DISTRIBUTION A discrete probability distribution that is useful when n is large and p is small and when the independent variables occur over a period of time is called the Poisson distribution.
  • 93. Discrete Probability Distributions THE POISSON DISTRIBUTION The Poisson distribution can be used when a density of items is distributed over a given area or volume, such as the number of plants growing per acre or the number of defects in a given length of videotape.
  • 94. Discrete Probability Distributions FORMULA FOR THE POISSON DISTRIBUTION 𝑃 𝑋, 𝜆 = ℮−𝜆𝜆𝑋 𝑋! where 𝑋 = 0,1,2, … The letter ℮ is a constant approximately equal to 2.7183.
  • 95. Discrete Probability Distributions EX 1. If there are 200 typographical errors randomly distributed in a 500- page manuscript, find the probability that a given page contains exactly 3 errors.
  • 96. Discrete Probability Distributions EX 2. A sales firm receives, on average, 3 calls per hour on its toll-free number. For any given hour, find the probability that it will receive the following. a. At most 3 calls b. At least 3 calls c. 5 or more calls
  • 97. Discrete Probability Distributions FORMULA FOR THE HYPERGEOMETRIC DISTRIBUTION 𝑃 𝑋 = 𝑎𝐶𝑋 ∙ 𝑏𝐶𝑛−𝑋 𝑎 + 𝑏𝐶𝑛
  • 98. Discrete Probability Distributions EX 1. Ten people apply for a job as assistant manager of a restaurant. Five have completed college and five have not. If the manager selects 3 applicants at random, find the probability that all 3 are college graduates.
  • 99. Discrete Probability Distributions EX 2. A recent study found that 2 out of every 10 houses in a neighborhood have no insurance. If 5 houses are selected from 10 houses, find the probability that exactly 1 will be uninsured.

Notas do Editor

  1. The absolute frequencies or cumulative frequencies are placed on the y-axis while the categories, class midpoints( frequency polygons) or class boundaries (ogive) are placed on the x-axis.
  2. It is not wrong to truncate an axis of the graph; many times it is necessary to do so. However, the reader should be aware of this fact and interpret the graph accordingly. Do not be misled if an inappropriate impression is given.
  3. Another common method used to organize data is the stem and leaf plot. It is a combination of sorting and graphing. Unlike grouped frequency distribution, a stem and leaf plot retains the actual data while showing them in graphical form.
  4. When the data values are in the tens such as 39, the stem is 3 (leading digit) and the leaf is 9 (trailing digit) When the data values are in hundreds, such as 435, the stem is 43 (leading digits) and the leaf is 5 (trailing digit).
  5. When the data values are in the tens such as 39, the stem is 3 (leading digit) and the leaf is 9 (trailing digit) When the data values are in hundreds, such as 435, the stem is 43 (leading digits) and the leaf is 5 (trailing digit).
  6. Most of Bomi citizens selected are less than 10 years old. Few Bomi citizens are in their 55s. Like Bomi, most of Bassa citizens selected are less than 10 years old. Unlike Bomi, there are more Bassa citizens who are in their late 20s.
  7. Most of Bomi citizens selected are less than 10 years old. Few Bomi citizens are in their 55s. Like Bomi, most of Bassa citizens selected are less than 10 years old. Unlike Bomi, there are more Bassa citizens who are in their late 20s.
  8. When the data values are in the tens such as 39, the stem is 3 (leading digit) and the leaf is 9 (trailing digit) When the data values are in hundreds, such as 435, the stem is 43 (leading digits) and the leaf is 5 (trailing digit).
  9. In addition to knowing the average, you must know how the data values are dispersed. That is, do the data values cluster around the mean, or are they spread more evenly throughout the distribution? Measures of position tell where a specific data value falls within the dataset or its relative position in comparison with other data values.
  10. The mean, in most cases, is not an actual data value.
  11. The mean, in most cases, is not an actual data value.
  12. The mean, in most cases, is not an actual data value.
  13. The modal class is 20.5–25.5, since it has the largest frequency. Sometimes the midpoint of the class is used rather than the boundaries; hence, the mode could also be given as 23 miles. An extremely high or extremely low data value in a data set can have a striking effect on the mean of the data set. These extreme values are called outliers. This is one reason why when analyzing a frequency distribution, you should be aware of any of these values.
  14. In this example, the mean is much higher than the median or the mode. This is so because the extremely high salary of the owner tends to raise the value of the mean. In this and similar situations, the median should be used as the measure of central tendency.
  15. If the data set contains one extremely large value or one extremely small value, a higher or lower midrange value will result and may not be a typical description of the middle.
  16. If the data set contains one extremely large value or one extremely small value, a higher or lower midrange value will result and may not be a typical description of the middle.
  17. If the data set contains one extremely large value or one extremely small value, a higher or lower midrange value will result and may not be a typical description of the middle.
  18. When a distribution is extremely skewed, the value of the mean will be pulled toward the tail, but the majority of the data values will be greater than the mean or less than the mean (depending on which way the data are skewed); hence, the median rather than the mean is a more appropriate measure of central tendency. An extremely skewed distribution can also affect other statistics.
  19. The mean, in most cases, is not an actual data value.
  20. In the Previous example, the range for brand A shows that 50 months separate the largest data value from the smallest data value. For brand B, 20 months separate the largest data value from the smallest data value, which is less than one-half of brand A’s range.
  21. For example, Chebyshev’s theorem shows that, for any distribution, at least 75% of the data values will fall within 2 standard deviations of the mean. A note of caution should be mentioned here. The range rule of thumb is only an approximation and should be used when the distribution of data values is unimodal and roughly symmetric.
  22. When we have sample that are measure in the same units (usually the same variable), we can compare the variance and standard deviations directly. But what happens if the units are not the same?
  23. Many decisions in business, insurance, and other real-life situations are made by assigning probabilities to all possible outcomes pertaining to the situation and then evaluating the results. For example, a saleswoman can compute the probability that she will make 0, 1, 2, or 3 or more sales in a single day. An insurance company might be able to assign probabilities to the number of vehicles a family owns. A self-employed speaker might be able to compute the probabilities for giving 0, 1, 2, 3, or 4 or more speeches each week. Once these probabilities are assigned, statistics such as the mean, variance, and standard deviation can be computed for these events. With these statistics, various decisions can be made. The saleswoman will be able to compute the average number of sales she makes per week, and if she is working on commission, she will be able to approximate her weekly income over a period of time, say, monthly. The public speaker will be able to blu34978_ch05.qxd 8/5/08 1:30 PM Page 252 plan ahead and approximate his average income and expenses. The insurance company can use its information to design special computer forms and programs to accommodate its customers’ future needs. This chapter explains the concepts and applications of what is called a probability distribution. In addition, special probability distributions, such as the binomial, multinomial, Poisson, and hypergeometric distributions, are explained.
  24. Many decisions in business, insurance, and other real-life situations are made by assigning probabilities to all possible outcomes pertaining to the situation and then evaluating the results. For example, a saleswoman can compute the probability that she will make 0, 1, 2, or 3 or more sales in a single day. An insurance company might be able to assign probabilities to the number of vehicles a family owns. A self-employed speaker might be able to compute the probabilities for giving 0, 1, 2, 3, or 4 or more speeches each week. Once these probabilities are assigned, statistics such as the mean, variance, and standard deviation can be computed for these events. With these statistics, various decisions can be made. The saleswoman will be able to compute the average number of sales she makes per week, and if she is working on commission, she will be able to approximate her weekly income over a period of time, say, monthly. The public speaker will be able to blu34978_ch05.qxd 8/5/08 1:30 PM Page 252 plan ahead and approximate his average income and expenses. The insurance company can use its information to design special computer forms and programs to accommodate its customers’ future needs. This chapter explains the concepts and applications of what is called a probability distribution. In addition, special probability distributions, such as the binomial, multinomial, Poisson, and hypergeometric distributions, are explained.
  25. Before probability distribution is defined formally, the definition of a variable is reviewed. In Chapter 1, a variable was defined as a characteristic or attribute that can assume different values. Various letters of the alphabet, such as X, Y, or Z, are used to represent variables. Since the variables in this chapter are associated with probability, they are called random variables. For example, if a die is rolled, a letter such as X can be used to represent the outcomes. Then the value that X can assume is 1, 2, 3, 4, 5, or 6, corresponding to the outcomes of rolling a single die. If two coins are tossed, a letter, say Y, can be used to represent the number of heads, in this case 0, 1, or 2.
  26. The word counted means that they can be enumerated using the numbers 1, 2, 3, etc. For example, the number of joggers in Riverview Park each day and the number of phone calls received after a TV commercial airs are examples of discrete variables, since they can be counted.
  27. A discrete probability distribution consists of the values a random variable can assume and the corresponding probabilities of the values. The probabilities are determined theoretically or by observation. Discrete probability distributions can be shown by using a graph or a table. Probability distributions can also be represented by a formula.
  28. Refer to text book for solution.
  29. Objective: Find the mean, variance, standard deviation, and expected value for a discrete random variable
  30. Note: X∗P(X) means to sum the products.
  31. Refer to text book for solution.
  32. Find the variance of a probability distribution by multiplying the square of each outcome by its corresponding probability, summing those products, and subtracting the square of the mean. Remember that the variance and standard deviation cannot be negative.
  33. Refer to text book for solution.
  34. Refer to text book for solution.
  35. The formula for the expected value is the same as the formula for the theoretical mean. The expected value, then, is the theoretical mean of the probability distribution. That is, E(X) = 𝜇. When expected value problems involve money, it is customary to round the answer to the nearest cent.
  36. Refer to text book for solution.
  37. Note that the expectation is $0.65. This does not mean that you lose $0.65, since you can only win a television set valued at $350 or lose $1 on the ticket. What this expectation means is that the average of the losses is $0.65 for each of the 1000 ticket holders. Here is another way of looking at this situation: If you purchased one ticket each week over a long time, the average loss would be $0.65 per ticket, since theoretically, on average, you would win the set once for each 1000 tickets purchased.
  38. Refer to text book for solution.
  39. Note that the expectation is $0.65. This does not mean that you lose $0.65, since you can only win a television set valued at $350 or lose $1 on the ticket. What this expectation means is that the average of the losses is $0.65 for each of the 1000 ticket holders. Here is another way of looking at this situation: If you purchased one ticket each week over a long time, the average loss would be $0.65 per ticket, since theoretically, on average, you would win the set once for each 1000 tickets purchased.
  40. Refer to text book for solution.
  41. Refer to text book for solution.
  42. The outcomes of a binomial experiment and the corresponding probabilities of these outcomes are called a binomial distribution.
  43. The outcomes of a binomial experiment and the corresponding probabilities of these outcomes are called a binomial distribution. In binomial experiments, the outcomes are usually classified as successes or failures. For example, the correct answer to a multiple-choice item can be classified as a success, but any of the other choices would be incorrect and hence classified as a failure.
  44. Note that 0 ≤ X ≤ n and X = 0, 1, 2, 3, . . . , n.
  45. In a binomial experiment, the probability of exactly X successes in n trials is
  46. This situation fits the binomial experiment because the four conditions are satisfied There are a fixed number of trials (three). There are only two outcomes for each trial, heads or tails. The outcomes are independent of one another (the outcome of one toss in no way affects the outcome of another toss). The probability of a success (heads) is in each case. Refer to text book for solution.
  47. If X consists of events E1, E2, E3, . . . , Ek, which have corresponding probabilities p1, p2, p3, . . . , pk of occurring, and X1 is the number of times E1 will occur, X2 is the number of times E2 will occur, X3 is the number of times E3 will occur, etc., then the probability that X will occur is given by the formula above.
  48. Refer to the textbook for solution.
  49. Refer to textbook for solution.
  50. Refer to textbook for solution.
  51. Given a population with only two types of objects (females and males, defective and nondefective, successes and failures, etc.), such that there are a items of one kind and b items of another kind and a + b equals the total population, the probability P(X) of selecting without replacement a sample of size n with X items of type a and n - X items of type b is given by the formula above. The basis of the formula is that there are aCX ways of selecting the first type of items, bCn-X ways of selecting the second type of items, and a+bCn ways of selecting n items from the entire population.