SlideShare uma empresa Scribd logo
1 de 46
What is a variable?

In statistics, a variable has two defining characteristics:


A variable is an attribute that describes a person, place, thing, or idea.


The value of the variable can "vary" from one entity to another.

For example, a person's hair color is a potential variable, which could
have the value of "blond" for one person and "brunette" for another.
Qualitative vs. Quantitative Variables


Variables can be classified
as qualitative (aka,
categorical – Age, likert
scale, race) or quantitative
(aka, numeric).
Examples of types of data

                               Quantitative

Continuous                           Discrete

Blood pressure, height, weight,      Number of children, Number of
age                                  attacks of asthma per week

                               Categorical

Ordinal (Ordered categories)         Nominal (Unordered categories)

Grade of breast cancer               Sex (male/female)
Better, same, worse                  Alive or dead
Disagree, neutral, agree             Blood group O, A, B, AB
Graphical presentation of data is

• better understood and appreciated by humans.
• brings out the hidden pattern and trends of the complex data
  sets.

Thus the reason for displaying data graphically is
two fold:
• Investigators can have a better look at the information
  collected and the distribution of data
• To communicate this information to others quickly We shall
  discuss in detail some of the commonly used graphical
  presentations.
Bar Charts : Bar charts are
used for qualitative type of
variable




Here the variable studied is
plotted in the form of bar
along the X-axis (horizontal)
and the height of the bar is
equal to the percentage or
frequencies which are plotted
along the Y-axis (vertical).
Pie Chart

Another interesting
method of displaying
categorical (qualitative)
data is a pie diagram also
called as circular diagram.



X/100*360
Click of a button
Click of a button
Click of a button
Pie Chart


A pie diagram is best when
  the total categories are
      between 2 to 6.




  If there are more than 6
 categories, try and reduce
     them by “clubbing”,
   otherwise the diagram
becomes too overcrowded.
Stem-and-leaf plots
This presentation is used for quantitative type of
data.



To construct a stem-and-leaf plot, we divide each
value into a stem component and leaf component.


The digits in the tens-place becomes stem
component and the digits in units place becomes
leaf components.

It is of much utility in quickly assessing whether
the data is following a “normal” distribution or
not, by seeing whether the stem and leaf is
showing a bell shape or not.


For example consider a sample of 10 values of age
in years : 21, 42, 05, 11, 30, 50, 28, 27, 24, 52.
Histogram
A histogram is used for
quantitative continuous type
of data where, on the X-axis,
we plot the quantitative
exclusive type of class intervals
and on the Y-axis we plot the
frequencies.

The difference between bar
charts and histogram is that
since histogram is the best
representation for quantitative
data measured on continuous
scale, there are no gaps
between the bars.
Box-and-Whisker plot
A box-and-whisker plot reveals
maximum of the information to the
audience.

A box-and whisker plot can be useful
for handling many data values.

They allow people to explore data and
to draw informal conclusions when
two or more variables are present.

It shows only certain statistics rather
than all the data.
Box-and-Whisker plot
Five-number summary is another name for the
visual representations of the box and whisker
plot.

                                                                Maximum
The five-number summary consists of the                            Q3
median, the quartiles (lower quartile and upper
quartile), and the smallest and greatest values   Range   IQR      Median

in the distribution.
                                                                  Q1


                                                                Minimum



Thus a box-and-whisker plot displays the


• center,
• the spread,
• overall range of distribution
Scatter Diagram
 A scatter diagram gives a quick visual
 display of the association between two
 variables, both of which are measured on
 numerical continuous or numerical
 discrete scale. (Both quantitative)


 Figure shows instant finding that weight
 and age are associated - as age increases,
 weight increases.



 Be careful to record the dependent
 variable along the vertical (Y) axis and the
 independent variable along the
 horizontal (X) axis.
Scatter Diagram
 In this example weight is
 dependent on age (as age
 increases weight is likely to
 increase) but age is not dependent
 on weight (if weight increases, age
 will not necessarily increase).


 Thus, weight is the dependent
 variable, and has been plotted on Y
 axis while age is the independent
 variable, plotted along X axis.
Correlation coefficient

The degree of association is measured by
a correlation coefficient, denoted by r.

It is sometimes called Pearson's
correlation coefficient after its originator
and is a measure of linear association.
Correlation coefficient
The correlation coefficient is measured on a
scale that varies from + 1 through 0 to - 1.

Complete correlation between two variables is
expressed by either + 1 or -1.
• When one variable increases as the other increases the
  correlation is positive; (coffee v/s wakefulness)
• when one decreases as the other increases it is negative.
  (Old is gold!)
• Complete absence of correlation is represented by 0.
A perfect correlation of ± 1
occurs only when the data
points all lie exactly on a
straight line.


A correlation greater than
0.8 would be described as
strong, whereas a correlation
less than 0.5 would be
described as weak.
Correlation coefficient v/s Regression
              analysis
                                       Regression is used
 When the objective is to              extensively in making
 determine association or the          predictions based on
 strength of relationship between
 two such variables, we use            finding unknown Y values
 correlation coefficient (r).          from known X values.

 If the objective is to quantify and   Multiple Regression is the
 describe the existing relationship    same as regression except
 with a view of prediction, we use     that it attempts to predict Y
 regression analysis.                  from two or more
                                       independent X variables.
Summarising the Data:
 Measures of Central
   Tendency and
     Variability
Measures of Central Tendency

This gives the centrality measure of the data set i.e. where the observations are
concentrated. There are numerous measures of central tendency. These are : Mean;
Median; Mode; Geometric Mean; Harmonic Mean.

                   Mean (Arithmetic Mean) or Average

                                                       It is calculated as follows.
This is most appropriate measure for
data following normal distribution. It
is calculated by summing all the
observations and then dividing by
number of observations. It is
generally denoted by x.
Mean (Arithmetic Mean) or Average
It is the simplest of
    the centrality
                                     It depends on all
   measure but is
                                    values of the data
    influenced by
                                    set but is affected
extreme values and
                                    by the fluctuations
hence at times may
                                        of sampling
    give fallacious
        results.
Example : The serum cholesterol level (mg/dl) of 10 subjects
were found to be as follows:

192 242 203 212 175 284 256 218 182 228
Median
 .

When the data is skewed, another measure of central tendency called
median is used.


Median is a locative measure which is the middle most observation
after all the values are arranged in ascending or descending order.


In case when there is odd number of observations we have a single
most middle value which is the median value.

In case when even number of observations is present there are two
middle values and the median is calculated by taking the mean of
these two middle observations


It is less affected by fluctuations of sampling than mean.
Mode
                                 Though mode is easy to
                               calculate, at times it may be
Mode is the most common
                                  impossible to calculate
value that repeats itself in
                               mode if we do not have any
       the data set.
                               value repeating itself in the
                                         data set.


   At other end it may so
   happen that we come              In such cases the
 across two or more values       distribution are said to
repeating themselves same        bimodal or multimodal.
     number of times.
Measures of Relative Position
                (Quantiles)
Quantiles are the values that divide a set numerical data arranged in
increasing order into equal number of parts.

Quartiles divide the numerical data arranged in increasing order into four
equal parts of 25% each.
 • Thus there are 3 quartiles Q1, Q2 and Q3 respectively.

Deciles are values which divide the arranged data into ten equal parts of 10%
each.
 • Thus we have 9 deciles which divide the data in ten equal parts.

Percentiles are the values that divide the arranged data into hundred equal
parts of 1% each.
 • Thus there are 99 percentiles.
 • Q) Median = ___ percentile, ____ decile and ____quartile.
Answer


The 50th percentile, 5 th

decile and 2 nd quartile

are equal to median.
Measures of Variability

In contrast to measures of central
tendency which describes the
center of the data set, measures of
variability describes the variability
or spreadness of the observation
from the center of the data.
Measures of Variability

Various measures of dispersion
are as follows.
• Range
• Interquartile range
• Mean deviation
• Standard deviation
• Coefficient of variation
Range
 One of the simplest measures
 of variability is range. Range is
the difference between the two       Range = maximum observation
  extremes i.e. the difference          – minimum observation
  between the maximum and
     minimum observation.




  Drawback of range is that it
                                       It gives rough idea of the
uses only extreme observations
                                         dispersion of the data.
     and ignores the rest.
Interquartile Range

As in the case of range difference in extreme
observations is found, similarly interquartile
range is calculated by taking difference in the
values of the two extreme quartiles.



Interquartile range = Q3 - Q1
Standard Deviation
Coefficient of Variation
                           • measures variability in relation to
Besides the measures         the mean (or average) and is used
of variability discussed     to compare the relative dispersion
above, we have one           in one type of data with the
                             relative dispersion in another type
more important               of data.
measure called the         • The data to be compared may be
coefficient of variation     in the same units, in different
which compares the           units, with the same mean, or with
                             different means.
variability in two data
sets.
Graphical presentation of data
Graphical presentation of data

Mais conteúdo relacionado

Mais procurados

Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersions
Capricorn
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
jennytuazon01630
 
Measures of relationships
Measures of relationshipsMeasures of relationships
Measures of relationships
yogesh ingle
 

Mais procurados (20)

Range
RangeRange
Range
 
Statistics report THE RANGE
Statistics report THE RANGEStatistics report THE RANGE
Statistics report THE RANGE
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
 
Frequency Distribution
Frequency DistributionFrequency Distribution
Frequency Distribution
 
Scatter Diagram
Scatter DiagramScatter Diagram
Scatter Diagram
 
Graphical Representation of Statistical data
Graphical Representation of Statistical dataGraphical Representation of Statistical data
Graphical Representation of Statistical data
 
Skewness.ppt
Skewness.pptSkewness.ppt
Skewness.ppt
 
Frequency distribution
Frequency distributionFrequency distribution
Frequency distribution
 
Bar Diagram (chart) in Statistics presentation
Bar Diagram (chart) in Statistics presentationBar Diagram (chart) in Statistics presentation
Bar Diagram (chart) in Statistics presentation
 
Types of graphs
Types of graphsTypes of graphs
Types of graphs
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Skewness
SkewnessSkewness
Skewness
 
Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersions
 
Skewness
SkewnessSkewness
Skewness
 
Coefficient of correlation...ppt
Coefficient of correlation...pptCoefficient of correlation...ppt
Coefficient of correlation...ppt
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Biostatistics Frequency distribution
Biostatistics Frequency distributionBiostatistics Frequency distribution
Biostatistics Frequency distribution
 
What is statistics
What is statisticsWhat is statistics
What is statistics
 
Measures of relationships
Measures of relationshipsMeasures of relationships
Measures of relationships
 
The Standard Normal Distribution
The Standard Normal Distribution  The Standard Normal Distribution
The Standard Normal Distribution
 

Destaque

biostatstics :Type and presentation of data
biostatstics :Type and presentation of databiostatstics :Type and presentation of data
biostatstics :Type and presentation of data
naresh gill
 
Line Graph Presentation
Line Graph Presentation Line Graph Presentation
Line Graph Presentation
Jennifer Field
 
direct and inverse proportion
direct and inverse proportiondirect and inverse proportion
direct and inverse proportion
Santosh Kumar
 

Destaque (20)

Data presentation 2
Data presentation 2Data presentation 2
Data presentation 2
 
biostatstics :Type and presentation of data
biostatstics :Type and presentation of databiostatstics :Type and presentation of data
biostatstics :Type and presentation of data
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
Divisibility rules (Properties of Divisibility)
Divisibility rules (Properties of Divisibility)Divisibility rules (Properties of Divisibility)
Divisibility rules (Properties of Divisibility)
 
Types of graphs
Types of graphsTypes of graphs
Types of graphs
 
Playing with numbers
Playing with numbersPlaying with numbers
Playing with numbers
 
Divisibility Rules
Divisibility RulesDivisibility Rules
Divisibility Rules
 
Intro to Graphing Data Powerpoint-7th and 8th Grade
Intro to Graphing Data Powerpoint-7th and 8th GradeIntro to Graphing Data Powerpoint-7th and 8th Grade
Intro to Graphing Data Powerpoint-7th and 8th Grade
 
Line Graph Presentation
Line Graph Presentation Line Graph Presentation
Line Graph Presentation
 
Introduction to graph class 8
Introduction to graph class 8Introduction to graph class 8
Introduction to graph class 8
 
Rules of Divisibility
Rules of DivisibilityRules of Divisibility
Rules of Divisibility
 
Ratio and proportion
Ratio and proportionRatio and proportion
Ratio and proportion
 
Chapter 4 presentation of data
Chapter 4 presentation of dataChapter 4 presentation of data
Chapter 4 presentation of data
 
factorisation maths PPT by kanishk schdeva class 8th
factorisation maths PPT by kanishk schdeva class 8th factorisation maths PPT by kanishk schdeva class 8th
factorisation maths PPT by kanishk schdeva class 8th
 
direct and inverse proportion
direct and inverse proportiondirect and inverse proportion
direct and inverse proportion
 
Presentation on inverse proportion
Presentation on inverse proportionPresentation on inverse proportion
Presentation on inverse proportion
 
Ratio and Proportion
Ratio and ProportionRatio and Proportion
Ratio and Proportion
 
Grade 9: Mathematics Unit 3 Variation
Grade 9: Mathematics Unit 3 VariationGrade 9: Mathematics Unit 3 Variation
Grade 9: Mathematics Unit 3 Variation
 
Graphs ppt
Graphs pptGraphs ppt
Graphs ppt
 
Direct and inverse proportion
Direct and inverse proportionDirect and inverse proportion
Direct and inverse proportion
 

Semelhante a Graphical presentation of data

Data Representations
Data RepresentationsData Representations
Data Representations
bujols
 
MSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdfMSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdf
Suchita Rawat
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statistics
Sanu Kumar
 

Semelhante a Graphical presentation of data (20)

Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
 
Data Representations
Data RepresentationsData Representations
Data Representations
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"
 
Summary statistics
Summary statisticsSummary statistics
Summary statistics
 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Medical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsMedical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statistics
 
MSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdfMSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdf
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statistics
 
SUMMARY MEASURES.pdf
SUMMARY MEASURES.pdfSUMMARY MEASURES.pdf
SUMMARY MEASURES.pdf
 
1.1 course notes inferential statistics
1.1 course notes inferential statistics1.1 course notes inferential statistics
1.1 course notes inferential statistics
 
Statistics excellent
Statistics excellentStatistics excellent
Statistics excellent
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.ppt
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
 
Statistics (GE 4 CLASS).pptx
Statistics (GE 4 CLASS).pptxStatistics (GE 4 CLASS).pptx
Statistics (GE 4 CLASS).pptx
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 

Último

Último (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Graphical presentation of data

  • 1.
  • 2. What is a variable? In statistics, a variable has two defining characteristics: A variable is an attribute that describes a person, place, thing, or idea. The value of the variable can "vary" from one entity to another. For example, a person's hair color is a potential variable, which could have the value of "blond" for one person and "brunette" for another.
  • 3. Qualitative vs. Quantitative Variables Variables can be classified as qualitative (aka, categorical – Age, likert scale, race) or quantitative (aka, numeric).
  • 4. Examples of types of data Quantitative Continuous Discrete Blood pressure, height, weight, Number of children, Number of age attacks of asthma per week Categorical Ordinal (Ordered categories) Nominal (Unordered categories) Grade of breast cancer Sex (male/female) Better, same, worse Alive or dead Disagree, neutral, agree Blood group O, A, B, AB
  • 5. Graphical presentation of data is • better understood and appreciated by humans. • brings out the hidden pattern and trends of the complex data sets. Thus the reason for displaying data graphically is two fold: • Investigators can have a better look at the information collected and the distribution of data • To communicate this information to others quickly We shall discuss in detail some of the commonly used graphical presentations.
  • 6. Bar Charts : Bar charts are used for qualitative type of variable Here the variable studied is plotted in the form of bar along the X-axis (horizontal) and the height of the bar is equal to the percentage or frequencies which are plotted along the Y-axis (vertical).
  • 7.
  • 8. Pie Chart Another interesting method of displaying categorical (qualitative) data is a pie diagram also called as circular diagram. X/100*360
  • 9. Click of a button
  • 10. Click of a button
  • 11. Click of a button
  • 12. Pie Chart A pie diagram is best when the total categories are between 2 to 6. If there are more than 6 categories, try and reduce them by “clubbing”, otherwise the diagram becomes too overcrowded.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Stem-and-leaf plots This presentation is used for quantitative type of data. To construct a stem-and-leaf plot, we divide each value into a stem component and leaf component. The digits in the tens-place becomes stem component and the digits in units place becomes leaf components. It is of much utility in quickly assessing whether the data is following a “normal” distribution or not, by seeing whether the stem and leaf is showing a bell shape or not. For example consider a sample of 10 values of age in years : 21, 42, 05, 11, 30, 50, 28, 27, 24, 52.
  • 20. Histogram A histogram is used for quantitative continuous type of data where, on the X-axis, we plot the quantitative exclusive type of class intervals and on the Y-axis we plot the frequencies. The difference between bar charts and histogram is that since histogram is the best representation for quantitative data measured on continuous scale, there are no gaps between the bars.
  • 21. Box-and-Whisker plot A box-and-whisker plot reveals maximum of the information to the audience. A box-and whisker plot can be useful for handling many data values. They allow people to explore data and to draw informal conclusions when two or more variables are present. It shows only certain statistics rather than all the data.
  • 22. Box-and-Whisker plot Five-number summary is another name for the visual representations of the box and whisker plot. Maximum The five-number summary consists of the Q3 median, the quartiles (lower quartile and upper quartile), and the smallest and greatest values Range IQR Median in the distribution. Q1 Minimum Thus a box-and-whisker plot displays the • center, • the spread, • overall range of distribution
  • 23. Scatter Diagram A scatter diagram gives a quick visual display of the association between two variables, both of which are measured on numerical continuous or numerical discrete scale. (Both quantitative) Figure shows instant finding that weight and age are associated - as age increases, weight increases. Be careful to record the dependent variable along the vertical (Y) axis and the independent variable along the horizontal (X) axis.
  • 24. Scatter Diagram In this example weight is dependent on age (as age increases weight is likely to increase) but age is not dependent on weight (if weight increases, age will not necessarily increase). Thus, weight is the dependent variable, and has been plotted on Y axis while age is the independent variable, plotted along X axis.
  • 25.
  • 26. Correlation coefficient The degree of association is measured by a correlation coefficient, denoted by r. It is sometimes called Pearson's correlation coefficient after its originator and is a measure of linear association.
  • 27. Correlation coefficient The correlation coefficient is measured on a scale that varies from + 1 through 0 to - 1. Complete correlation between two variables is expressed by either + 1 or -1. • When one variable increases as the other increases the correlation is positive; (coffee v/s wakefulness) • when one decreases as the other increases it is negative. (Old is gold!) • Complete absence of correlation is represented by 0.
  • 28. A perfect correlation of ± 1 occurs only when the data points all lie exactly on a straight line. A correlation greater than 0.8 would be described as strong, whereas a correlation less than 0.5 would be described as weak.
  • 29. Correlation coefficient v/s Regression analysis Regression is used When the objective is to extensively in making determine association or the predictions based on strength of relationship between two such variables, we use finding unknown Y values correlation coefficient (r). from known X values. If the objective is to quantify and Multiple Regression is the describe the existing relationship same as regression except with a view of prediction, we use that it attempts to predict Y regression analysis. from two or more independent X variables.
  • 30. Summarising the Data: Measures of Central Tendency and Variability
  • 31. Measures of Central Tendency This gives the centrality measure of the data set i.e. where the observations are concentrated. There are numerous measures of central tendency. These are : Mean; Median; Mode; Geometric Mean; Harmonic Mean. Mean (Arithmetic Mean) or Average It is calculated as follows. This is most appropriate measure for data following normal distribution. It is calculated by summing all the observations and then dividing by number of observations. It is generally denoted by x.
  • 32. Mean (Arithmetic Mean) or Average It is the simplest of the centrality It depends on all measure but is values of the data influenced by set but is affected extreme values and by the fluctuations hence at times may of sampling give fallacious results.
  • 33. Example : The serum cholesterol level (mg/dl) of 10 subjects were found to be as follows: 192 242 203 212 175 284 256 218 182 228
  • 34. Median . When the data is skewed, another measure of central tendency called median is used. Median is a locative measure which is the middle most observation after all the values are arranged in ascending or descending order. In case when there is odd number of observations we have a single most middle value which is the median value. In case when even number of observations is present there are two middle values and the median is calculated by taking the mean of these two middle observations It is less affected by fluctuations of sampling than mean.
  • 35.
  • 36. Mode Though mode is easy to calculate, at times it may be Mode is the most common impossible to calculate value that repeats itself in mode if we do not have any the data set. value repeating itself in the data set. At other end it may so happen that we come In such cases the across two or more values distribution are said to repeating themselves same bimodal or multimodal. number of times.
  • 37. Measures of Relative Position (Quantiles) Quantiles are the values that divide a set numerical data arranged in increasing order into equal number of parts. Quartiles divide the numerical data arranged in increasing order into four equal parts of 25% each. • Thus there are 3 quartiles Q1, Q2 and Q3 respectively. Deciles are values which divide the arranged data into ten equal parts of 10% each. • Thus we have 9 deciles which divide the data in ten equal parts. Percentiles are the values that divide the arranged data into hundred equal parts of 1% each. • Thus there are 99 percentiles. • Q) Median = ___ percentile, ____ decile and ____quartile.
  • 38. Answer The 50th percentile, 5 th decile and 2 nd quartile are equal to median.
  • 39. Measures of Variability In contrast to measures of central tendency which describes the center of the data set, measures of variability describes the variability or spreadness of the observation from the center of the data.
  • 40. Measures of Variability Various measures of dispersion are as follows. • Range • Interquartile range • Mean deviation • Standard deviation • Coefficient of variation
  • 41. Range One of the simplest measures of variability is range. Range is the difference between the two Range = maximum observation extremes i.e. the difference – minimum observation between the maximum and minimum observation. Drawback of range is that it It gives rough idea of the uses only extreme observations dispersion of the data. and ignores the rest.
  • 42. Interquartile Range As in the case of range difference in extreme observations is found, similarly interquartile range is calculated by taking difference in the values of the two extreme quartiles. Interquartile range = Q3 - Q1
  • 44. Coefficient of Variation • measures variability in relation to Besides the measures the mean (or average) and is used of variability discussed to compare the relative dispersion above, we have one in one type of data with the relative dispersion in another type more important of data. measure called the • The data to be compared may be coefficient of variation in the same units, in different which compares the units, with the same mean, or with different means. variability in two data sets.