O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Central tedancy & correlation project - 1

2.577 visualizações

Publicada em

Its for Business statistics students.

Publicada em: Educação
  • Entre para ver os comentários

Central tedancy & correlation project - 1

  1. 1. 1 | P a g e Final Term Project report On Measurement of Central Tendency & Correlation and Regression Submitted by: Faizan Ch MBA-P-F13-19 Hina Shaheen MBA-P-F13-10 FaisalSaeed MBA-P-F13-07 Submitted to: Prof. Hafiz M. Javed Iqbal Class: MBA-Professional 3rd Semester The Superior University Lahore (Okara Campus)
  2. 2. 2 | P a g e Executive Summary In this project the statistical technique correlation is described. This technique is used when we have to find the linear relationship between two quantitative variables. The results show that either the two variables are positively, negatively or no correlated by the scatter plot. Regression is a technique concerned with predicting some variables by knowing others, it’s a process of predicting variable Y using variable X. The project also described the topic central tendency a measure of the exact central value in the data, including mean, median, mode, geometric mean and harmonic mean etc. The best method to measure central tendency is by use of mean, because mean is quick and easy to calculate. The mean is used in a variety of ways and in almost every industry as described in our project.
  3. 3. 3 | P a g e TABLE OF CONTENT Correlation No. History 4 Define Correlation 4 Co-efficient of Correlation 4 Types of Correlation 5 Uses of Correlation 6 Conclusion 6 Regression History 7 Define Regression 7 Co-efficient of Regression 7 Uses of Regression 8 Regression line 8 Regression Equation 9 Uses of Regression Equation 9 Conclusion 9 MCQ 10 Average Central Tendency Introduction 17 Define Centraltendency 17 Measures of Central Tendency 17 Mean 17 Median 18 Mode 18 Harmonic mean 19 Geometric mean 20 Conclusion 21 Discuss Central Tendency Advantages 22 Discuss Central Tendency Disadvantages 22 MCQ 25
  4. 4. 4 | P a g e Q.No.1:Define correlation? How many types of correlation& discuss its uses? History: In 1846, the French physicist Auguste Bravais (1811–1863) first developed what would become the correlation coefficient. After examining forearm and height measurements, Galton independently rediscovered the concept of correlation in 1888 (Bulmer 2003, pp. 191–196) and demonstrated its application in the study of heredity, anthropology, and psychology. Galton's later statistical study of the probability of extinction of surnames led to the concept of Galton Watson stochastic processes (Bulmer 2003, pp. 182–184). This is now a core of modern statistics and regression. Galton invented the use of the regression line (Bulmer 2003, p. 184), and was the first to describe and explain the common phenomenon of regression toward the mean, which he first observed in his experiments on the size of the seeds of successive generations of sweet peas. He is responsible for the choice of r (for reversion or regression) to represent the correlation coefficient. In the 1870s and 1880s he was a pioneer in the use of normal distribution to fit histograms of actual tabulated data. Define Correlation: a) Correlation measure the relationship or interdependence or association between two variables, in which with the change in the values of one variable, the values of other variable also changed. or b) Correlation coefficients measure the strength of association between two variables. The most common correlation coefficient, called the Pearson product-moment correlation coefficient, measures the strength of the linear association between variables. Correlation Coefficient:The co-efficient of correlation is a measure of the degree of interdependence between two variables. The co-efficient of correlation is denoted by r.It is a pure number and varies between -1 and +1 with the central value of zero. When r = 0, it means that there is no correlation between two variable. 2 2 2 2 ( ) ( ) n xy x y r n x x n y y                  Where: r =is the correlation coefficient between X and Y.n = is the size of the sample. X = is the individual’s score on the X variable. Y = is the individual’s score on the Y variable. XY = is the product of each X score times its corresponding Y score. X2= is the individual X score squared. Y2 = is the individual Y score squared.
  5. 5. 5 | P a g e Types of Correlation Positive Correlation:  When the two variables move in sympathy with each other in the same direction, they are said to be positively correlated. Or  Positive correlation occurs when an increase in one variable increases the value in another. Example: An increase in the height of children is usually accompanied by an increase in their weights. Negative Correlation:  When the two variables move in sympathy with each other in the opposite direction, they are said to be negatively correlated. Or  Negative correlation occurs when an increase in one variable decreases the value of another. Example: A decrease in the supply of commodity is accompanied by an increase in the price of the commodity. Zero Correlation:  When increase or decrease in the values of one variable has no effect on the values of other variable, the correlation will be zero. Perfect Correlation:  Perfect correlation exists only when the changes in one variable, are precisely proportional with those in the other variable. Perfect correlation is indicated numerically as +1 and -1. or  Perfect correlation occurs when there is a functional dependency between the variables. Linear Correlation:  Correlation said to be linear, if the degree of movement in some variable has a constant ration to the amount of change in the other variable. For example if 10% increase in price of each time, the demand decrease by 15% there should be linear correlation between two variables. Non-Correlation (curvilinear):
  6. 6. 6 | P a g e  Correlation is said to be non-linear, it the ration of change is not constant. In economics data of ratio of change in two variables, is usually not constant. Uses of Correlation 1. A Pearson's correlation is used when you want to find a linear relationship between two variables. It can be used in a causal as well as an associative research hypothesis but it can't be used with an attributive Research Hypothesis because it is univariate. 2. Pearson's correlation should be used only when there is a linear relationship between variables. It can be a positive or negative relationship, as long as it is significant. Correlation is used for testing in Within Groups studies. A possible research hypothesis for this statistical model would be that there is a positive linear relationship between variables. Another possible research hypothesis would be that there is a negative linear relationship. If there is no linear relationship between the variables, then we would retain the null hypothesis. 3. Pearson's correlation should be used when there is a significant effect. (p > .05) When there is a relationship between two variables. There can be a positive or negative correlation. It cannot be used when we retain the null hypothesis because then there is no relationship. It can be used if the null is rejected. 4. A Pearson's correlation is used when two quantitative variables are being tested in the RH. This cannot test attributive RH, but can associative and causal. The associative hypothesis can be tested whenever we want with a correlation. The causal Research Hypothesis can only be used with a correlation when a well-ran true experiment is being run. 5. You use a Pearson's correlation when you believe there is a linear relationship in your data. With a correlation, you can hypothesize that you will have a positive linear relationship (ex. as __ increases, so will __), a negative linear relationship (ex. as __increases, __ will decrease), or a nonlinear relationship in which the data is not related. You normally only use a Pearson's correlation if you believe there is a linear relationship. 6. Use a Pearson's correlation to determine if a significant linear relationship exists in a bivariate association. The three possible research hypotheses for this model are: positive linear relationship, negative linear relationship, and no linear relationship (H0 :). Correlations may be used to test two quantitative variables. Correlations can be used to predict relationships in longitudinal studies or to identify relationships between variables. 7. Use Pearson's correlation model when you are comparing two quantitative variables in which the results can be seen by using a linear model/graph. A possible research hypothesis for a Pearson's correlation is as follows: It is hypothesized that as the number of hours spent studying will increase the final exam scores. The hypothesis used in the example would be a causal experiment where you would use Pearson's correlation model.
  7. 7. 7 | P a g e 8. You use Pearson's correlation when you want to predict and find a linear relationship between two variables. You would use correlation when dealing with two variables that are quantitative. Therefore, if your research hypothesis contained two variables in which you wanted to find the linear relationship between (positive or negative) than you would use Pearson's Correlation. I will state an example using the variables age and GPA. Age and GPA are both quantitative variables I wish to measure. I hypothesize that age and GPA will have a positive linear correlation. 9. Pearson's correlation will be used when there are two quantitative variables. It will also be used when the research hypothesis identifies a linear bivariate relationship. Pearson’s correlation can have the following Research hypotheses, a positive linear relationship, a negative linear relationship, or no linear relationship (also known as HO :). Correlation can be used in a causal research hypothesis if the study is a true experiment. Correlation can also be used in an associative hypothesis because Pearson’s r is bivariate. Correlation cannot be used in an attributive hypothesis. 10. Pearson's correlation is used when you are working with two quantitative variables in a population. The possible research hypotheses are that the variables will show a positive linear relationship, a negative linear relationship, or no linear relationship at all. Correlation cannot be used to test an attributive research hypothesis, but if it is a true experiment, it can be used to test a casual hypothesis. Correlation can always be used to test an associative hypothesis. 11. The ANOVA is used when there is either a within groups (wg ANOVA) or a between groups (bg ANOVA) that is looking for a relationship in the mean differences of variables. The possible research hypotheses used are based on a greater than, less than, or equal too comparison. ANOVA can be used in all associative situations and causal situations if it is a true experiment with manipulation of the IV. Conclusion: Correlation is a statistical technique used to find the relationship between two variables which are quantitative. We have used this technique to interpret the results about the given data, that what type of relationship the variables possess. It is either positive, negative, or no correlation between the variables. The result is given by using formula of co-efficient of correlation.
  8. 8. 8 | P a g e Q.NO.2:What is Regression? And describe regressionco-efficient X on Y and Y on X. also discuss regression equations and its uses. History: The earliest form of regression was the method of least squares, which was published by Legendre in 1805 and by Gauss in 1809. The term "regression" was coined by Francis Galton in the nineteenth century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average.For Galton, regression had only this biological meaning,but his work was later extended by Udny Yule and Karl Pearson to a more general statistical context. In the work of Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and 1925. Fisher assumed that the conditional distribution of the response variable is Gaussian, but the joint distribution need not be. In this respect, Fisher's assumption is closer to Gauss's formulation of 1821.In the 1950s and 1960s, economists used electromechanical desk calculators to calculate regressions. Before 1970, it sometimes took up to 24 hours to receive the result from one regression. Regression methods continue to be an area of active research. In recent decades, new methods have been developed for robust regression, regression involving correlated responses such as time series and growth curves, regression in which the predictor or response variables are curves, images, graphs, or other complex data objects, regression methods accommodating various types of missing data, nonparametric regression, Bayesian methods for regression, regression in which the predictor variables are measured with error, regression with more predictor variables than observations, and causal inference with regression. Define Regression: 1. In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. 2. A statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). RegressionCo-efficient: Regression co-efficient is the rate of change in the expected value of the dependent variable for a give observed variable. There are two co-efficient; regression co-efficient of X on Y and regression co-efficient of Y on X. regression co-efficient are denoted by xyb and yxb .
  9. 9. 9 | P a g e 1. RegressionCo-efficientofX on Y: Regression co-efficient of X on Y gives the value by which X variable changes for a unit in the value of Y variable. The co-efficient of regression of X on Y is given as: 2 2 ( ) xy n xy x y b n y y         2. RegressionCo-efficientof Y on X: Regression co-efficient of Y on X gives the value by which the Y variable changes for a unit in the value of X variable. The co-efficient of regression of Y on X is given as: 2 2 ( ) yx n xy x y b n x x         Uses of Regression: The analyst may use regression analysis to determine the actual relationship between these variables by looking at a corporation's sales and profits over the past several years. The regression results show whether this relationship is valid. In addition to sales, other factors may also determine the corporation's profits, or it may turn out that sales don't explain profits at all. In particular, researchers, analysts, portfolio managers, and traders can use regression analysis to estimate historical relationships among different financial assets. They can then use this information to develop trading strategies and measure the risk contained in a portfolio. Regressionlines: Regression line is a drawn on a graph paper showing the values of one variable associated with the corresponding mean values of the other variable . Regression lines study the average relationship between two variables. Onliy a straight line is used as a regression line describing the shape of the average relationship between two variables. The straight line can be expressed by the linear equation. Y a bx 
  10. 10. 10 | P a g e There are two series say X and Y there would be two regression lines. RegressionEquation: Reression equations are algebricexpressio of the regression lines. There are two regression equations. Because there are two regression lines. These are: (1). Regression Equation of X on Y. (2). Regression Equation of X on Y. (1). RegressionEquation of X on Y. The equation show the effect of variations of X on Y variable. ( ) r . (Y )X Y X X Y      (2). RegressionEquation of Y on X. The equation show the effect of change in Y variable on X variable. (Y ) r . ( )Y X Y X X      Uses of RegressionEquation: Regression equation is used for two purposes. Fristly they provide the values for the regressions lines and secondly they provide a numerical method of finding out the most suitable value of X for a given value of Y and most suitable value of Y for a given value of X. An estimated regression equation may be used for a wide variety of business applications, such as:  Measuring the impact on a corporation's profits of an increase in profits  Understanding how sensitive a corporation's sales are to changes in advertising expenditures  Seeing how a stock price is affected by changes in interest rates Regression analysis may also be used for forecasting purposes; for example, a regression equation may be used to forecast the future demand for a company's products. Due to the extreme complexity of regression analysis, it is often implemented through the use of specialized calculators or spreadsheet programs. Conclusion:
  11. 11. 11 | P a g e CORRELATION AND REGRESSION-(50) MCQ No.1:A process by which we estimate the value of dependent variable on the basis of one orn more independent variables is called: (a) Correlation (b) Regression (c) Residual (d) Slope No.2:A relationship where the flow of the data points is best represented by a curve is called: (a) Linear relationship (b) Nonlinear relationship (c) Linear positive (d) Linear negative No.3:All data points falling along a straight line is called: (a) Linear relationship (b) Non-linear relationship (c) Residual (d) Scatter diagram No.4:The value we would predict for the dependent variable when the independent variables are all equal to zero is called: (a) Slope (b) Sum of residual (c) Intercept (d) Difficult to tell No.5:The predicted rate of response of the dependent variable to changes in the independent variable is called: (a) Slope (b) Intercept (c) Error (d) Regression equation No.6:The slope of the regression line of Y on X is also called the: (a) Correlation coefficient of X on Y (b) Correlation coefficient of Y on X (c) Regression coefficient of X on Y (d) Regression coefficient ofYon X No.7: In simple regression equation, the numbers of variables involved are: (a) 0 (b) 1 (c) 2 (d) 3 No.8:If the value of any regression coefficient is zero, then two variables are: (a) Qualitative (b) Correlation (c) Dependent (d) Independent
  12. 12. 12 | P a g e No.9: If one regression coefficient is greater than one, then other will he: (a) More than one (b) Equal to one (c) Less than one (d) Equal to minus one No.10:To determine the height of a person when his weight is given is: (a) Correlation problem (b) Association problem (c) Regression problem (d) Qualitative problem No.11:The dependent variable is also called: (a) Regression (b) Regressand (c) Continuous variable (d) Independent No.12:The dependent variable is also called: (a) Regressand variable (b) Predictand variable (c) Explained variable (d) All of these No.13:The independent variable is also called: (a) Regressor (b) Regressand (c) Predictand (d) Estimated No.14: In the regression equation Y = a+bX, the Y is called: (a) Independent variable (b) Dependent variable (c) Continuous variable (d) None of the above No.15: In the regression equation X = a + bY, the X is called: (a) Independent variable (b) Dependent variable (c) Qualitative variable (d) None of the above No.16: In the regression equation Y = a +bX, a is called: (a) X-intercept (b) Y-intercept (c) Dependent variable (d) None of the above
  13. 13. 13 | P a g e No.17: The regression equation always passes through: (a) (X, Y) (b) (a, b) (c) ( , ) (d) ( , Y) No.18: The Upward straight line represents the relationship that is: (a) Linear (b) Non-linear (c) Curvilinear (d) No relation No.19: The downward straight line represents the relationship that is.: (a) Linear positive (b) Linear negative (c) Non-linear (d) Curvilinear No.20: When bXYis positive, then byxwill be: (a) Negative (b) Positive (c) Zero (d) One No.21: When two regression coefficients bear same algebraic signs, then correlation coefficient is: (a) Positive (b) Negative (c) According to two signs (d) Zero No.22: It is possible that two regression coefficients have: (a) Opposite signs (b) Same signs (c) No sign (d) Difficult to tell No.23: Regression coefficient is independent of: (a) Units of measurement (b) Scale and origin (c) Both (a) and (b) (d) None of them No.24: The sum of the difference between the actual values of Y and its values obtained from the fitted regression line is always: (a) Zero (b) Positive (c) Negative (d) Minimum
  14. 14. 14 | P a g e No.25:A measure of the strength of the linear relationship that exists between two variables is called: (a) Slope (b) Intercept (c) Correlation coefficient (d) Regression equation No.26: When the ratio of variations in the related variables is constant, it is called: (a) Linear correlation (b) Nonlinear correlation (c) Positive correlation (d) Negative correlation No.27: If both variables X and Y increase or decrease simultaneously, then the coefficient of correlation will be: (a) Positive (b) Negative (c) Zero (d) One No.28: If the points on the scatter diagram indicate that as one variable increases the other variable. tends to decrease the value of r will be: (a) Perfect positive (b) Perfect negative (c) Negative (d) Zero No.29: If the points on the scatter diagram show no tendency either to increase together or decrease together the value of r will be close to: (a) -1 (b) +1 (c) 0.5 (d) 0 No.30: If one item is fixed and unchangeable and the other item varies the correlation-coefficient will be: (a) Positive (b) Negative (c) Zero (d) Undecided No.31: A regression model may be: (a) Linear (b) Non-linear (c) Both (a) and (b) (d) Neither (a) No.32: If bxy= 0.20 and rxy= 0.50, then byxis equal to: (a) 0.20 (b) 0.25 (c) 0.50 (d) 1.25
  15. 15. 15 | P a g e No.33: If rxy= 0, then: (a) byx= 0 (b) bxy= 0 (c) Both (a) and (b) (d) byx≠ bxy No.34:When rxy> 0, then byxandbxyare both: (a) 0 (b) < 0 (c)> 0 (d) < 1 No.35:When rxy< 0, then byxandbxywill be: (a) Zero (b) Not equal to zero (c) Less than zero (d) Greater than zero No.36: If rxy= 0.75, then rxywillbe: (a) 0.25 (b) 0.50 (c) 0.75 (d) -0.75 No.37:If the coefficient of correlation between the variables X and Y is r, the coefficient of correlation between X2 and Y2 is: (a) -1 (b) 1 (c) r (d)r2 No.38: If the sum of the product of the deviation of X and Y from their means is zero, the correlation coefficient between X and Y is: (a) Zero (b) Maximum (c) Minimum (d) Undecided No.39: If rxy= 1, then: (a) byx= bxy (b) byx>bxy (c) byx<bxy (d) byx. bxy= 1 No.40: A perfect positive correlation is signified by: (a) 0 (b) -1 (c) +1 (d) -1 to +1
  16. 16. 16 | P a g e No.41: If the figure +1 signifies perfect positive correlation and the figure -1 signifies a perfect negative correlation, then the figure 0 signifies: (a) A perfect correlation (b) Uncorrelated variables (c) Not significant (d) Weak correlation No.42: If Y = -10X and X = -0.1Y, then r is equal to: (a) 0.1 (b) 1 (c) -1 (d) 10 No.43:If byx= -0.8 and bxy= -0.2, then ryxis equal to: (a) -0.2 (b) -0.4 (c) 0.4 (d) -0.8 No.44: If byx= 1.6 and bxy= 0.4, then rxywillbe: (a) 0.4 (b) 0.64 (c) 0.8 (d) -0.8 No.45:Ifbyx= -2 and rxy= -1, then bxyis equal to: (a) -1 (b) -2 (c) 0.5 (d) -0.5 No.46:The measure of change in dependent variable corresponding to an unit change in independent variable is called: (a) Slope (b) Regression coefficient (c) Both (a) and (b) (d) Neither (a) and (b) No.47: rxyis equal to: (a) 0 (b) -1 (c) 1 (d) 0.5 No.48: The correlation coefficient between X and -X is: (a) 0 (b) 0.5 (c) 1 (d) -1
  17. 17. 17 | P a g e No.49: In the regression equation Y = a + bX, where a and b are called: (a) Constants (b) Estimates (c) Parameters (d) Both (a) and (b) No.50: In correlation problem both variables are: (a) Equal (b) Unknown (c) Fixed (d) Random
  18. 18. 18 | P a g e Introduction: In statistics, a central tendency (or, more commonly, a measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages. The term central tendency dates from the late 1920. The most common measures of central tendency are the arithmetic mean, the median and the mode. A central tendency can be calculated for either a finite set of values or for a theoretical distribution, such as the normal distribution. Occasionally authors use central tendency to denote "the tendency of quantitative data to cluster around some central value. The central tendency of a distribution is typically contrasted with its dispersion or variability dispersion and central tendency are the often characterized properties of distributions. Analysts may judge whether data has a strong or a weak central tendency based on its dispersion. Definition: a) A measure of central tendency (also referred to as measures of centre or central location) is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution. b) The term central tendencyrefers to the "middle" value or perhaps a typical value of the data, and is measured using the mean, median, or mode. Measures of Central Tedancy: There is five common measures of central tedancey under as follwing: (1 )Mean, (2) median, (3) Mode, (4) Harmonic mean, (5) (Geometric mean, Mean: Mean is used as one of the comparing properties of statistics. It is defined as the average of all the clarifications. Applications of mean:  It helps teachers to see the average marks of the students.  It is used in factories, for the authorities to recognize whether the benefits of the workers are continued or not.  It is also used to contrast the salaries of the workers.  To calculate the average speed of anything.  It is also used by the government to find the income or expenses of any person.  Using this family could balance their expenses with their average income.
  19. 19. 19 | P a g e How to calculate?  Example to find the mean:  Find the mean of 4,7,2,9,3.  Solution:  Mean = (4+7+2+9+3)/5=5. Median: Median is defined as the middle value of any observation. Its applications in daily life are as follow: Applications of Median:  It is used to measure the distribution of the earnings  Used to find the players height e.g. football players.  To find the middle age from the class students.  Also used to find the poverty line. How to caclulate? Example to find the middle value: From the series 7,2,4,9,1,6,8 the middle values is calculated by arranging the series into ascending order that is 1,2,4,6,7,8,9. Here, the median value is 6. Mode: Mode contains the highest frequency in any data. Applications of Mode:  It is used to influx of the public transport.  The no. of games succeeded by any team of players.  The frequency of the need of infants.  Used to find the number of the mode is also seen in calculation of the wages, in the patients going to the hospitals, the mode of travel etc. How to calculate? Example to find mode 1,2,3,3,3,4 In this the mode is 3 because it recurs the most.
  20. 20. 20 | P a g e Harmonic mean: Harmonic mean is quotient of “number of the given values” and “sum of the reciprocals of the given values” Applications of Harmonic mean: We know that when prices are expressed in quantities (so many units per rupee) harmonic mean should be calculated. If the same prices are expressed in money values (so many $ per unit) arithmetic mean gives the correct value of the average. The above logic can be generalized, if we have to average the ratios involving, price, quantity, speed, time and distance etc. as follows: If the given ratios are stated as x unit per y and if x's are given use Harmonic Mean, and if y's are given Use arithmetic mean. For example if we have to find out the average speed in Kilometers per hour use Harmonic mean if Kilometers (distance travelled) are given. When Hours (time- of journey) are given use arithmetic mean. (In the above case Kilometers are x and hours are y ). How to Calculate? Calculate the harmonic mean of the numbers: 13.5, 14.5, 14.8, 15.2 and 16.1 Solution: The harmonic mean is calculated as below: . 1 n H M x        5 . 1 0.3417 H M        . 14.63H M  x 1/x 13.2 0.0758 14.2 0.0704 14.8 0.0676 15.2 0.0658 16.1 0.0621 Total ∑(1/x)=0.3417
  21. 21. 21 | P a g e Geometric Mean: Geometric mean is a kind of mean or average. It indicates the central tendency of a set of numbers. The Geometric Mean is calculated by taking the $n^{th}$ root of the product of a set of data. The geometric mean is well defined for sets of positive real numbers. This is calculated by multiplying all the numbers and taking the $n^{th}$ root of the total. Applications of Geometric mean: (1) social sciences In social sciences, we frequently encounter this in a number of ways. For example, the human population growth is expressed as a percentage, and thus when population growth needs to be averaged, it is the geometric mean that is most relevant. (2) Surveys and studies: In surveys and studies too, the geometric mean becomes relevant. For example, if a survey found that over the years, the economic status of a poor neighborhood is getting better, they need to quote the geometric mean of the development, averaged over the years in which the survey was conducted. The arithmetic mean will not make sense in this case either. (2) Economics: In economics, we see the percentage growth in interest accumulation. Thus if you are starting out with a sum of money that is compounded for interest, then the mean that you should look for is the geometric mean. Many such financial instruments like bonds yield a fixed percentage return, and while quoting their “average” return, it is the geometric mean that should be quoted. How to calculate? Find the geometric mean of the following values: 15, 12, 13, 19, 10 Solution: Given data, 15, 12, 13, 19, 10 x Log x 15 1.1761 12 1.0792 13 1.1139 19 1.2788 10 1.0000 Total 5.648
  22. 22. 22 | P a g e logx = 5.648 and n = 5 logX . logG M anti n   5.648 . log 5 G M anti = Antilog 1.1296 = 13.48 Conclustion: Even a simple idea like the average has many uses there are more uses we haven’t covered (center of gravity, weighted averages, expected value). The key point is this:  The “average item” can be seen as the item that could replace all the others  The type of average depends on how existing items are used (Added, Multiplied, Used as rates, Used as exclusive choices)
  23. 23. 23 | P a g e Q.No.2 Whatare the Advantages and Disadvantagesof Measuring of central tedancy? Arithmetic Mean: Advantages:  It is rigidly define by a mathematical formula. Different persons would calculate the same value of mean from the given data.  It is based on all the observations of a series and least affected by fluctuations of sampling.  It is easy to calculate and easy to understand,  It is determinate. It is not indefinite.  It can be used for further analysis and treatment.  Is provides a good standard of comparison.  It is the best known of the average. Disadvantages:  It may give undue weight to extreme values; that is it gives greater importance to smaller items.  It can not be located by inspection; mode and median can be.  It can no be compute accurately in case of open ended distributions.  It may not lie in the middle of series, if the series is skewed.  It may some time give fallacious conclusions, for example, two series entirely different from each other may give the some average. Median: Advantages:  It is easy and quick to calculate.  It is easily located in individual ad discrete series.  It is not affected by the values of extreme items.  It can be found even for distribution, with open classes at either end.  It is suitable for skewed distributions. Disadvantages:  It is not as familiar average as the arithmetic mean.  It can not be used for further mathematical processing.  Median can not be calculated unless the values are arranged according to size.  It is not based on all the observations.
  24. 24. 24 | P a g e Mode: Advantages:  It is easy and quick to calculate.  It is easy to understand.  Extreme values do not effect its value.  It can be determined from open-end distributions.  It can be fount at once by inspection form the ungrouped data.  It is very useful for meteorological forecasts. Disadvantages:  It is ill- defined.  It is not based on all the observations of set of data.  It can not be used for further mathematical processing,  There may be more then one values of the mode in the set of data.  There will be no. mode, if there is no common value in the data. Harmonic mean: Advantages:  It is based on all the values.  It is define by a mathematical formula.  It strikes a balance by giving more importance to small values and less importance to big values.  It is capable for further algebraic treatment.  It is used in averaging of rates and time.  It is more suitable where there is much skewness or dissimilarity in series. Disadvantages:  It is difficult to understand and calculate,  It can not be calculated, if any one the value is zero.  It gives much importance to smaller values. Geometric Mean: Advantages:  It’s based on all the values.  This mean gives more importance to small values and less importance to big values and thus maintains a balance.  It’s defined by the mathematical formula.  It’s a suitable average for percentage or ratio.  It is capable of further algebraic treatment.  It is less affected by the extreme values as compared with the arithmetic mean.
  25. 25. 25 | P a g e Disadvantages:  It is not easy to calculate geometric mean.  It becomes zero, if any observation is zero.  It can not be calculated if any observation is negative.  It is not widely known.
  26. 26. 26 | P a g e MEARURES OF CENTRAL TEDANCY-(50) MCQ No.1: Any measure indicating the center of a set of data, arranged in an increasing or decreasing order of magnitude, is called a measure of: (a) Skewness (b) Symmetry (c) Central tendency (d) Dispersion No.2: The measure of centraltendency listed below is: (a) The raw score (b) The mean (c) The range (d) S.D No.3: The total of all the observations divided by the number of observations is called: (a) Arithmetic mean (b) Geometric mean (c) Median (d) Harmonic mean No.4: While computing the arithmetic mean of a frequency distribution, the each value of a class is considered equal to: (a) Class mark (b) Lower limit (c) Upper limit (d) Lower class boundary No.5: The sample mean is a: (a) Parameter (b) Statistic (c) Variable (d) Constant No.6:The population mean μ is called: (a) Discrete variable (b) Continuous variable (c) Parameter (d) Sampling unit No.8: The arithmetic mean is highly affected by: (a) Moderate values (b) Extremely small values (c) Odd values (d) Extremely large values No.9: If a constant value is added to every observation of data,then arithmetic mean is obtained by: (a) Subtracting the constant (b) Adding the constant (c) Multiplying the constant (d) Dividing the constant
  27. 27. 27 | P a g e No.9: If the arithmetic mean of 20 values is 10, then sum of these 20 values is: (a) 10 (b) 20 (c) 200 (d) 20 + 10 No.10: Ten families have an average of 2 boys. How many boys do they have together? (a) 2 (b) 10 (c) 12 (d) 20 No.11: Given X1=20 and X2= -20. The arithmetic mean will be: (a) Zero (b) Infinity (c) Impossible (d) Difficult to tell No.12: The mean of 10 observations is 10. All the observations are increased by 10%. The mean of increased observations will be: (a) 10 (b) 1.1 (c) 10.1 (d) 11 No.13: When the values in a series are not of equal importance, we calculate the: (a) Arithmetic mean (b) Geometric mean (c) Weighted mean (d) Mode No.14: When all the values in a series occur the equal number of times, then it is not possible to calculate the: (a) Arithmetic mean (b) Geometric mean (c) Harmonic mean (d) Weighted mean No.15: The midpoint of the values after they have been ordered from the smallest to the largest or the largest to the smallest is called: (a) Mean (b) Median (c) Lower quartile (d) Upper quartile No.16:The suitable average for qualitative data is: (a) Mean (b) Median (c) Mode (d) Geometric mean
  28. 28. 28 | P a g e No.17:We must arrange the data before calculating: (a) Mean (b) Median (c) Mode (d) Geometric mean No.18:If the smallest observation in a data is decreased,the average which is not affected is: (a) Mode (b) Median (c) Mean (d) Harmonic mean No.19:Suitable average for averaging the shoe sizes for children is: (a) Mean (b) Mode (c) Median (d) Geometric mean No.20:A measurement that corresponds to largest frequency in a set of data is called: (a) Mean (b) Median (c) Mode (d) Percentile No.21:Which of the following average cannot be calculated for the observations 2, 2, 4, 4, 6, 6, 8, 8, ? (a) Mean (b) Median (c) Mode (d) All of the above No.22:Mode of the series 0, 0, 0, 2, 2, 3, 3, 8, 10 is: (a) 0 (b) 2 (c) 3 (d) No mode No.23:A distribution with two modes is called: (a) Unimodel (b) Bimodal (c) Multimodal (d) Normal No.24:The best average in percentage rates and ratios is: (a) Arithmetic mean (b) Lower and upper quartiles (c) Geometric mean (d) Harmonic mean
  29. 29. 29 | P a g e No.25:The model letter of the word “STATISTICS” is: (a) S (b) T (c) Both S and I (d) Both S and T No.26: In a moderately symmetrical series, the arithmetic mean, median and mode are related as: (a) Mean - Mode = 3(Mean - Median) (b) Mean - Median = 2(Median - Mode) (c) Median - Mode = (Mean - Median) / 2 (d) Mode – Median = 2Mean– 2Median No.27: In a moderately skewed distribution, mean is equal to! (a) (3Median - Mode) / 2 (b) (2Mean + Mode) / 3 (c) 3Median – 2Mean (d) 3Median – Mode No.28:In a moderately asymmetrical distribution, the value of median is given by: (a) 3Median + 2Mean (b) 2Mean + Mode (c) (2Mean + Mode)/ (d) (3Median - Mode) / 2 No.29: For moderately skewed distribution, the value of mode is calculated as: (a) 2Mean – 3Median (b) 3Median – 2Mean (c) 2Mean + Mode (d) 3Median – Mode No.30:The averages are affected by change of: (a) Origin (b) Scale (c) Both (a) and (b) (d) None of the above No.31:If all the values in a series are same,then: (a) A.M = G.M = H.M (b) A.M ≠ G.M ≠ H.M (c) A.M > G.M > H.M (d) A.M < G.M < H.M No.32:In a given data the average which has the least value is: (a) Mean ( b) Median (c) Harmonic mean (d) Geometric mean
  30. 30. 30 | P a g e No.33:Which pair of averages cannot be calculated when one of numbers in the series is zero? (a) Geometric mean and Median (b) Harmonic mean and Mode (c) Simple mean and Weighted mean (d) Geometric mean and Harmonic mean No.34: The geometric mean and harmonic mean of two values are. 8 and 16 respectively, then arithmetic mean of values is: (a) 4 (b) 16 (c) 24 (d) 128 No.35: The arithmetic mean and geometric mean of two observations are 4 and 8 respectively, then harmonic mean of these two observations is: (a) 4 (b) 8 (c) 16 (d) 32 No.36:If the arithmetic mean and harmonic mean of two positive numbers are 4 and 16, then their geometric mean will be: (a) 4 (b) 8 (c) 16 (d) 64 No.37: Geometric mean and harmonic mean for the values 3, -11, 0, 63, -14, 100 are: (a) 0 and 3 (b) 3 and -3 (c) 0 and 0 (d) Impossible No.38:The geometric mean of a set of positive numbers X1, X2, X3, ... , Xn is less than or equal to their arithmetic mean but is greater than or equal to their: (a) Harmonic mean (b) Median (c) Mode (d) Lower and upper quartiles No 39: If all the items in a variable are non-zero and non-negative then: (a) A.M > G.M > H.M (b) G.M > A.M > H.M (c) H.M > G.M > A.M (d) A.M < G.M < H.M No.40:For an open-end frequency distribution, it is not possible to find: (a) Arithmetic mean (b) Geometric mean (c) Harmonic mean (d) All of the above
  31. 31. 31 | P a g e No.41:If the harmonic mean of the two numbers X1 and X2 is 6.4 if X2=16, then X1 is: (a) 4 (b) 10 (c) 16 (d) 20 No.42: The harmonic mean of the values 5, 9, 11, 0, 17, 13 is: (a) 9.5 (b) 6.2 (c) 0 (d) Impossible No.43:Harmonic mean gives less weightage to: (a) Small values (b) Large values (c) Positive values (d) Negative values No.44: The appropriate average for calculating the average speed of a journey is: (a) Median (b) Arithmetic mean (c) Mode (d) Harmonic mean No.45 :The ratio among the number of items and the sum of reciprocals of items is called: (a) Arithmetic mean (b) Geometric mean (c) Harmonic mean (d) Mode No.46:Geometric mean is suitable when the values are given as: (a) Proportions (b) Ratios (c) Percentage rates (d) All of the above No.47: Geometric mean of 2, 4, 8 is: (a) 6 (b) 4 (c) 14/3 (d) 8 No.48: If any value in a series is negative, then we cannot calculate the: (a) Mean (d) Median (c) Geometric mean (d) Harmonic mean
  32. 32. 32 | P a g e No.49:If each observation of a variable X is increased by 20%, then geometric mean is also increased by: (a) 20 (b) 1/20 (c) 20% (d) 100% No.50:The suitable average for computing average percentage increase in population is: (a) Geometric mean (b) Harmonic mean (c) Combined mean (d) Population mean