This is knowledge sharing PPT specially designed for Non-statisticians to understand basic fundamentals regarding statistics & related to pharmaceutical statistics.
How statistics involve in daily life as well as pharmaceutical industry etc., not limited.
#WhatisMeanByStatistics? #WhyStatistics? #HowStatisticsEssentialtoEverydayLife? #StatisticalApplicationsinDailyLife #Toothpaste
#IndependentDependentVariables #Tea #TypesofData #ClassificationofDiscreteVariableContinuousVariables #TypesofDataMeasurementScale
#StatisticalMethodsforAnalyzingData #ConceptofPopulationSampleandPointEstimate
#DescriptiveStatistics #InferentialStatistics
#MeasuresofCentralTendency #MeasuresofDispersion #RealLifeApplications #DataPresentation #PictorialView
#PharmaceuticalStatistics #ResearchDevelopment #Statistician
2. What is mean by Statistics?
Why Statistics?
How statistics essential to everyday life?
Statistical applications in daily life
Independent & dependent variables
Types of data & measurement scale
Two Main Statistical Methods
Descriptive Statistics
Inferential Statistics
Concept of Population, Sample and Point estimate
Descriptive Statistics in brief:
Measures of central tendency
Measures of dispersion
AGENDA
3. STATISTICS
Statistics means -
Collection
Presentation
Analysis
Interpretation of numeric data.
Statistics is the study of populations based upon samples taken from these populations
Statistics plays with numbers & predict future.
Statistics are sets of mathematical equations that are used to analyze what is happening in the world
around us.
Statistics - A bunch of numbers looking for a Fight.
4. WHY STATISTICS?
For prediction of future.
Statistics tell us any trends in what happened in the past and can be useful in predicting what may happen in
the future.
Statistics play a big role in the medical field. Before any drugs prescribed, scientist must show a statistically
valid rate of effectiveness. Statistics are behind all the study of medical.
Weather Forecasts - Do you watch the weather forecast sometime during the day? How do you use that
information? Have you ever heard the forecaster talk about weather models? These computer models are
built using statistics that compare prior weather conditions with current weather to predict future weather.
Genetics, Stock market, Quality testing, Political Campaign & so on.
5. HOW STATISTICS ESSENTIAL TO
EVERYDAY LIFE?
Most people don't realize how essential statistics is. Daily life is surrounded by the products of
statistics.
You brush your teeth. The fluoride in the toothpaste was studied by scientists using statistical methods to
carefully assure the safety and effectiveness of the ingredient and the proper concentration. The
toothpaste was formulated through a series of designed experiments that determined the optimal
formulation through statistical modeling. The toothpaste production was monitored by statistical process
control to ensure quality and consistency, and to reduce variability.
The attributes of the product were studied in consumer trials using statistical methods. The pricing,
packaging and marketing were determined through studies that used statistical methods to determine the
best marketing decisions.
6. Even the location of the toothpaste on the supermarket shelf was the result of statistically based studies.
The advertising was monitored using statistical methods. Your purchase transaction became data that was
analyzed statistically. The credit card used for the purchase was scrutinized by a statistical model to make
sure that it wasn't fraudulent.
So statistics is important to the whole process of not just toothpaste, but every product we consume,
every service we use, every activity we choose. Yet we don't need to be aware of it, since it is just an
embedded part of the process. Statistics is useful everywhere you look.
7. STATISTICS IN DAILY LIFE
Weather forecasting:
Everybody watches weather forecasting. Have you ever think how do you get that information? There are
some computers models build on statistical concepts. These computer models compare prior weather
with the current weather and predict future weather.
Research:
Using statistical skills collect data, analyze data & draw inferences.
Insurance:
Everybody has some kind of insurance, whether it is medical, home or any other insurance. Based on an
individual application some businesses use statistical models to calculate the risk of giving insurance.
Financial market/Stock market:
Statistic plays a great role. Statistics are the key of how traders and businessmen invest and make money.
8. Quality testing:
Companies make many products on a daily basis and every company should make sure that they sold the
best quality items. But companies cannot test all the products, so they use statistics sample.
Prediction of disease:
Doctors predict disease on based on statistics concepts. Suppose a survey shows that 75%-80% people
have cancer and not able to find the reason. When the statistics become involved, then you can have a
better idea of how the cancer may affect your body or is smoking is the major reason for it.
Medical field and/or Pharmaceutical industry:
Before any drugs prescribed, scientist must show a statistically valid rate of effectiveness. Statistics are
behind all the study of medical.
9. Online shopping:
Every time you go to an online shopping site like Flipkart or Amazon, you see the ratings of the products,
the reviews of the customers etc. All this data that you see is nothing but statistics.
Agriculture:
What amount of crops are grown this year in comparison to previous year or in comparison to required
amount of crop for the country. Quality and size of grains grown due to use of different fertilizers.
Sports:
Use to compare test scores/ run rate of different players in different test matches/series and make
prediction for individual player using previous matches data. (e.g. IPL matches)
Statistical concepts use:
Average: Calculating travelling distance, speed, students marks, salary of workers.
Median: Used to find height of players (basket ball players), poverty line
Mode: Public transport (Maximum frequency of buses for particular route), daily intake of roties.
10. Election:
News reporter makes a prediction of winner for elections based on political campaigns. Here statistics
play a strong part in who will be your governments.
Prediction:
For examples, we keep the alarm for the morning when we don’t know that we will be alive in the
morning or not. Here we use statistics basics to make predictions.
Unemployment rate figures
Gambling:
Budget forecast
Taxation & pension forecast
Food prices
Performance appraisal of an employee:
Used rating scale
11. DEPENDENT AND INDEPENDENT VARIABLES
Independent Variables
It is the variable that is
changed or controlled in a
scientific experiment to test the
effects on the dependent
variable.
Dependent Variables
It is the variable being tested
and measured in a scientific
experiment.
It is an outcome of an
experiment.
Milk
Sugar
Tea powder
Cardamom
Ginger
Appearance
Thickness
Taste
13. Quantitative data
(Numerical)
Which can be observed & measurable.
Information about quantities.
Quantitative implies quantity.
E.g. Height, weight, total number of
students in a class, length of your finger
nails, shoe size.
Qualitative data
(Quality)
Which can be observed & not measurable.
Information about qualities.
Qualitative implies quality.
E.g. Taste, smoke, beauty, Size of T shirt,
The color of the sky.
QUALITATIVE DATA VS QUANTITATIVE DATA
14. DISCRETE VS. CONTINUOUS DATA
Discrete Data
The data which can be measurable and
contains finite number of values.
It consist of whole numbers.
E.g.- Flipping a coin (‘H’,‘T’)
Total no of students in a class.
Months in a year.
Continuous Data
The data which can be measurable and
contains countable infinite number of
values.
It consist of numbers other than whole
numbers, like decimals and fractions.
E.g.- Total no of stars in the sky.
Height, weight, length temperature,
income etc.
15. Which of the following variables are classified as
discrete variables & continuous variables?
Tablet weight
Dissolution - pass or fail criteria
Amount of active ingredient (content uniformity)
Disintegration rate
Formulation A, B or C
Friability - pass or fail criteria
Size - thickness/diameter
Change in manufacturing process old process vs. new
Impurities - present or absent
Hardness
Immediate release or sustained release
16. Identify independent and dependent variables, and
determine if these variables are discrete or continuous
Samples were taken from a specific batch of drug and randomly divided into two groups of tablets.
One group was assayed by the manufacturer's own quality control laboratories.
The second group of tablets was sent to a contract laboratory for identical analysis.
Percentage of Labeled Amount of Drug
Manufacturer Contract Lab
101.1 98.8 97.5 99.1
100.6 99.0 101.1 98.7
100.8 98.7 97.8 99.5
17. MEASUREMENT SCALES
Levels of Measurements
Nominal
Ratio
Interval
Ordinal
Consists of categories
No logical order or
particular relationship
(e.g. Gender - M/F )
Distinct categories
along with order
(e.g. Rating - Excelent,
Good, Fair, Poor)
Named + Order
Numerical
measurements in
which distance
between numbers is
known & constant
size
(e.g. Calendar years,
Time)
Named + Order
Numerical
measurements in
which distance
between numbers is
known & constant
size also additional
non-arbitrary zero
point.
(e.g. )
18. Nominal scale: It is also called the categorical variable scale.
Nominal scales are used for labeling variables, without any quantitative value. nominal scales are kind of
like “names” or labels.
Mode can be calculated using the nominal scale.
What is your
gender?
I- M-Male
II- F-Female
What is your hair
color?
1-Brown
2-Black
3-gray
4-Other
Where do you
live?
A-Rural
B-Urban
19. Ordinal scale: The order of the values is important and significant, but the differences between each one
is not really known.
Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort,
etc.
The best way to determine central tendency on a set of ordinal data is to use the mode or median;
How do you feel today?
1-Very Unhappy
2-Unhappy
3-Ok
4-Happy
5-Very Happy
How satisfied are you with
our statistical service?
1-Very Unsatisfied
2- Somewhat unsatisfied
2- neutral
4- Somewhat satisfied
3- Very Satisfied
20. Interval scale: Interval scales are numeric scales in which we know both the order and the exact
differences between the values.
Example:
The classic example of an interval scale is temperature in Celsius because the difference between
each value is the same. the difference between 60 and 50 degrees is a measurable 10 degrees, as is the
difference between 80 and 70 degrees.
Time is also a very common example of interval scale as the values are already established, constant
and measurable.
Calendar years.
Mean and median can be calculated using the interval scale.
21. Ratio scale: It is defined as a variable measurement scale that not only produces the order of variables
but also makes the difference between variables known along with information on the value of true zero.
Best examples of ratio scales are weight and height.
In market research, a ratio scale is used to calculate market share, annual sales, the price of an upcoming
product, number of consumers etc.
What is your weight in kilograms?
Less than 50 kilograms
51- 70 kilograms
71- 90 kilograms It contains order as well as interval
91-110 kilograms
More than 110 kilogram
Mean, mode and median can be calculated using the ratio scale.
22. STATISTICAL METHODS
Descriptive Statistics
• A descriptive statistics is a summary statistic that
quantitatively describes or summarizes features of a
collection of information
• e.g., Measures of Central tendency
(Mean, Median, Mode etc)
Measures of Dispersion
(Range, Variance, SD, SE, Q1, Q3, IQR,etc.)
• It doesn’t allow us to draw any conclusions or make
any interferences about the data.
Inferential Statistics
• A inferential statistics is to draw conclusion
about population on the basis of sample drawn
from it.
• e.g., Hypothesis testing
Confidence interval estimation
(For population mean (µ) when ϭ is known and
unknown).
• Systems and techniques for making probability-
based decisions and accurate predictions.
23. CONCEPT OF POPULATION, SAMPLE AND
POINT ESTIMATE
How population & sample correlates to each other ?
Point estimate/Statistic - It gives single value or estimate based on sample drawn from population.
Mean, SD
Statistic
Parameter
25. MEASURES OF CENTRAL TENDENCY
A measure of central tendency is a single value that describes the way in which a group of data cluster
around a central value.
It is a way to describe the center of a data set.
There are two important aspects of central tendency:
1) the center of that distribution and 2) how the observations are dispersed within the distribution.
There are three measures associated with center:
Mean (A.M., G.M., H.M.)
Median
Mode
There are three measures associated with dispersion:
Variance
Standard deviation
Range
26. Measures of central tendency can be used when dealing with ordinal, interval or ratio scales.
Mean (A.M.):
Sum of all observations divided by total no of observations in the data set.
Use - Calculating travelling distance, students marks, salary of workers.
On Average we may go 2 or 3 times to GYM
n
1i
Xi/nXA.M.
CENTER OF DISTRIBUTION
27. Geometric mean:
The nth root of the product of the data values. This measure is valid only for data that are measured absolutely
on a strictly positive scale.
Use - Mostly used in finance domain, Share market, Average growth rate of population and Consumer price
index numbers.
For example, suppose you have an investment which earns 10% the first year, 60% the second year, and 20%
the third year. What is its average rate of return? It is not the arithmetic mean, because what these numbers
mean is that on the first year your investment was multiplied (not added to) by 1.10, on the second year it was
multiplied by 1.60, and the third year it was multiplied by 1.20. The relevant quantity is the geometric mean of
these three numbers. The question about finding the average rate of return.
(1/n))^Xi(G.M.
n
1i
28. Harmonic mean:
The reciprocal of the arithmetic mean of the reciprocals of the data values.
Use - Average speed of vehicle.
1 - For example, in first test a typist types 400 words in 50 minutes, in second test he types the same words
(400) in 40 minutes and in third test he takes 30 minutes to type the 400 words. Then average time of typing
can be calculated by harmonic mean.
2 - If a vehicle travels a certain distance d at a speed x (60 km/h) and then the same distance again at a speed
y (40 km/h), then its average speed is the harmonic mean of x and y (48 km/h).
1/xin/H.M.
29. Median:
It gives middle value in the data set. It divides the data into two equal parts.
Even number: Median is the average of two middle values.
E.g. 1,2,3,4,5,6 then median is (3+4)/2 i.e. Median=3.5
Odd number: Median is the middle value in the data.
E.g. 1,2,3,4,5,6 ,7 then median is 4.
Height of students in basket ball team.
30. Mode:
Most repeated value in the data set/ Item with maximum frequency.
e.g. What is the mode of your salary?
Series of observations 1, 2, 2, 3, 4, 4, 4, 5, 5, 6 ; the mode of the series is 4.
Public transport (Maximum frequency of buses for particular route)
What is the mode of Series of observations
2,6,7,5,3,8,7,6,5,3,2,5,4,6,8,3,4,4,7,6,5,1,5
Mode =5
31. RESEARCH DESIGN
RESEARCH DESIGN
Research question
Formulate hypothesis
Collection of data
Presentation of data
Analysis of data
Interpretation of data
A research design is a framework that has been created to find answers to research questions.
32. A pharmaceutical manufacturing company produces a specific dosage form of a drug in batches (lots) of
50,000 tablets.
To define one of the populations parameters, we could weight each of the 50,000 tablets.
1. Research question: Calculate the average weight for the entire lot
This would give us the exact weight parameters for the total batch; however, it would be a very time
consuming process.
2. Research question: Hardness Tester to measure the hardness of each tablet We could then determine the
average hardness of the total batch
The process we would destroy all 50,000 tables. This is obviously not a good manufacturing procedure.
Parameter
33. We draw random sample from a given population, perform a statistical analysis of this information, and
make a statement (inference) regarding the population.
It would be more practical to periodically withdraw 20 tablets during the manufacturing process, then
perform weight and hardness tests. And assume these sample statistics are representative of the entire
population of 50,000 units.
Statistic
34. 2.0 MEASURES OF DISPERSION
Dispersion:
In statistics, dispersion is the extent to which a
distribution is stretched or squeezed.
It is a way of describing how spread out a set
of data is. When a data set has a large value,
the values in the set are widely scattered; when
it is small the items in the set are tightly
clustered.
Dispersion is also called as Variability, Spread
or Scatter.
Common examples of measures of statistical
dispersion are the variance, standard deviation,
and interquartile range.
Wide
spread
Small
spread
35. Absolute Measures
Range
Relative Measures
Graphical MeasuresAlgebraic Measures
Measures of Dispersion
Quartile
deviation
Standard
deviation
Mean
deviation
Coefficient
of Range
Coefficient of
Variation
Coefficient of Quartile
Deviation
Coefficient of Mean
Deviation
36. Range (R) - Is the difference between largest observation & smallest observation.
R = Largest observation - Smallest observation
Quartile deviation (QD) - It is one half the difference between third quartile & first quartile.
QD = ½ (Q3-Q1)
Standard deviation (SD) - It is a measure that is used to quantify the amount of variation or dispersion of
a set of data values form its mean.
Mean deviation - In a statistical distribution, the average of the absolute values of the differences between
individual numbers and their mean.
ABSOLUTE MEASURES
1)/(Nx)2(xiSD
N
1i
/NXXinionfrommeaMeandeviat
N
1i
37. Coefficient of Range - This is a relative measure of dispersion and is based on the value of the range. It
is also called range coefficient of dispersion. It is defined as,
Coefficient of Range =
Coefficient of Variation - It is define as the ratio of the standard deviation to the mean.
Coefficient of Variation =
Coefficient of Quartile deviation - A relative measure of dispersion based on the quartile deviation is
called the coefficient of quartile deviation. It is defined as,
Coefficient of Quartile deviation =
RELATIVE MEASURES
SL
SL
100*
Mean
SD
13
13
QQ
Q-Q
38. Coefficient of Mean deviation (about mean) - It is defined as the ratio of the mean deviation of the
average used in the calculation of the mean deviation.
Coefficient of Mean deviation (about mean) = Mean deviation from mean/ Mean
39. EXAMPLE BASED ON MEASURES OF CENTRAL TENDENCY &
MEASURES OF DISPERSION
<< XYZ >> Tablets
Sr. No Assay
1 98.3
2 99
3 100.2
4 98.1
5 98.6
6 101.2
7 96.6
8 98.2
9 97.1
10 96.4
11 96.8
12 98.5
13 98.1
14 97
15 100
Specification Limit 95% - 105%
Calculate mean, standard deviation & coefficient of
variation for assay of << XYZ >> tablets.
Mean 98.273
SD 1.4013
CV (RSD) 1.4259
Assay 96.4 96.6 96.8 97 97.1 98.1 98.1 98.2 98.3 98.5 98.6 99 100 100.2 101.2
Mode
Median
Assay data
arrangement
in ascending
order
41. DATA PRESENTATION
Data can be communicated in one of four different methods:
1) verbal; 2) written descriptions; 3) tables; or 4) graphic presentations.
Tabulation of Data:
For example, working in a quality control laboratory we ale requested to Sample 30 tetracycline capsules
during a production run and to report to the supervisor the results of this sample. The table represents the
assay results for the random sample of 30 capsules. We could arrange the results of the 30 samples III order
from the smallest to largest.
42. No of
Capsules
Assay
(mg)
No of
Capsules
Assay
(mg)
1 251 16 252
2 250 17 251
3 253 18 249
4 249 19 246
5 250 20 250
6 252 21 250
7 247 22 254
8 248 23 248
9 254 24 252
10 245 25 251
11 250 26 248
12 253 27 250
13 251 28 247
14 250 29 251
15 249 30 249
No of
Capsules
Assay
(mg)
No of
Capsules
Assay
(mg)
1 245 16 250
2 246 17 250
3 247 18 250
4 247 19 251
5 248 20 251
6 248 21 251
7 248 22 251
8 249 23 251
9 249 24 252
10 249 25 252
11 249 26 252
12 250 27 253
13 250 28 253
14 250 29 254
15 250 30 254
On the data we can observed that,
1) Most of the observations cluster near the middle of the distribution (i.e., 250 mg) and
2) The spread of outcomes varies from as small as 245 mg to as large as 254.
Assay of 30 Tetracycline Capsules Ascending order of arrangement
44. Visual display of data:
Bar graphs (Block diagram) are appropriate for visualizing the frequencies associated with different levels of
discrete variable.
In preparing bar graphs, the horizontal plane (x-axis or abscissa) usually represents observed values or the
discrete levels of the variable (in this case <250, =250 or >250 mg.). The vertical axis (y-axis or ordinate)
represents the frequency or proportion of observations.
0 5 10 15
< 250 mg
= 250 mg
> 250 mg
< 250 mg = 250 mg > 250 mg
f 11 7 12
Bar diagram
11
7
12
36.7%
23.3%
40%
0 10 20 30 40 50
< 250 mg
= 250 mg
> 250 mg
Frequency
AssayResults
Bar diagram
%
f
45. Dot plot:
Examine the location or central tendency, shape and spread of your data by plotting all of data
points along with number line.
46. Histogram:
Examine the shape, central tendency and spread of your data by using bars to show the
frequency of data within each interval.
47. Box plot (or Box-and-whisker plot)
The box plot (or box-and-whisker plot) is a very useful way to display data.
A box plot displays the minimum, the maximum, the lower and upper quartiles (the 25th percentile and the
75th percentile, respectively), and the median (the 50th percentile) on a rectangular box aligned either
horizontally or vertically.
Add diagram
48. Pie charts:
It provide a method for viewing and comparing levels of a discrete variable in relationship to that
variable as a whole. Whenever a data set can be divided into parts, a pie chart may provide the
most convenient and effective method for presenting the data
11, 37%
7, 23%
12, 40%
Pie diagram
< 250 mg
= 250 mg
> 250 mg
49. A percentile is a measure used in statistics indicating the value below which a given percentage of
observations in a group of observations falls. (i.e., The value below which a percentage of data falls)
For example, the 20th percentile is the value below which 20% of the observations may be found.
What does it mean when your baby is in the 99th percentile?
To score 99 percentile means that there are about 99 percent of people (who appeared for the test) who
have scored less than you, that means you are in the top one percent of students who have scored great .
What does "percentile" mean in a growth chart?
This is easiest to explain by example. If your 18-month-old son is in the 40th percentile for weight, that
means 40 percent of 18-month-old boys weigh the same as or less than your child, and 60 percent weigh
more.
50. Conditions:
1. If series of observations are even, then percentile is
Qx = size of [ X (N+1)/100] th observation
2. If series of observations are an odd, then percentile is
Qx = size of [ XN/100] th observation
Example:
51. You are the fourth tallest person in a group of 20, then
what is 80th percentile
20 is an even number so,
Q80 = Size of [ 80*(20+1)/100] th observation.
Q80 = 17th observation
17th observation is the 4th tallest person
80% of persons height lesser than you & 20% of
people having height greater than or equals to you.
1. If series of observations are an even numbers The 75th percentile of PET score for fresher PhD
candidates are 190.
8 is an even number so,
Q75 = Size of [ 75*(8+1)/100] th observation.
Q75 = 6.75th observation = 7th observation
7th observation is the 4th tallest person
80% of candidates having score lesser than or equal
to 7th observation & 25% of people having score
greater than 7th observation.
52. 2. If series of observations are an even numbers
Students growth percentiles:
53. A total of 10,000 people visited the INOX shopping mall over 12 hours.
1. Estimate the 30th percentile (when 30% of the visitors had arrived).
Q30 = [30*(10,000+1)/100]th observation
Q30 = 3000 visitors arrived.
At what time 3000 visitors arrived?
For this we can use line plot
From the plot we can observed that, 30% (3000) visitors had arrived at 6.5 hours.
2. Estimate what percentile of visitors had arrived after 11 hours?
95th percentiles i.e. 9500 visitors had arrived after 11 hours.
Time (hours) People
0 0
2 350
4 1100
6 2400
8 6500
10 8850
12 10000
55. “STATISTICAL THINKING WILL ONE DAY BE AS NECESSARY FOR EFFICIENT
CITIZENSHIP AS THE ABILITY TO READ AND WRITE.”
56. Identify discrete or continuous variables
Discrete variables:
Dissolution - pass or fail criteria
Friability - pass or fail criteria
Impurities - present or absent
Change in manufacturing process old process vs. new
Immediate release or sustained release
Formulation A, B or C
Continuous variables:
Amount of active ingredient (content uniformity)
Disintegration rate
Hardness
Size - thickness/diameter
Tablet weight
57. Identify independent and dependent variables, and
determine if these variables are discrete or continuous
Samples were taken from a specific batch of drug and randomly divided into two groups of tablets. One
group was assayed by the manufacturer's own quality control laboratories. The second group of tablets
was sent to a contract laboratory for identical analysis.
Percentage of Labeled Amount of Drug
Manufacturer Contract Lab
101.1 98.8 97.5 99.1
100.6 99.0 101.1 98.7
100.8 98.7 97.8 99.5
Independent variable:
Laboratory
(manufacturer vs. contract lab)
Discrete
Dependent variable:
Assay results
(% labeled amount of drug)
Continuous
Notas do Editor
The age of your car. (Quantitative.)
The number of hairs on your knuckle. (Quantitative.)
The softness of a cat. (Qualitative.)
The color of the sky. (Qualitative.)
The number of pennies in your pocket. (Quantitative.)
I'm not agreed to this sentence accurate prediction bcz inferential statistics we make assumption about data. & give inference based on sample their is chance of inaccuracy.
Skew distribution – Used geometric mean
Movie, GYM , Playing cricket
Skew distribution (PSD-GM used)
Bike
Hardness e.g. Tablet weight
Mean SD RSD ......U must know
What do you think which is best measure of central tendency?