Employee data

Vivek Kumar Enrollment no – 09BS0002756
Lakshami Through Sarasawati
Summary: This study is an interpretation of employee data. This study reveals those education level and job
categories are gender biased. It is found that among all the employees, females are less educated. On the other
hand, it reveals that more education level is needed for better jobs.
People do job to satisfytheir different needs by earning the money (Lakshami) .As per the research, more educated
(Sarasawati) people are getting better job. Hence we can conclude it is not Lakshami Vs Sarasawati, Instead it is
“Lakshami Thorough Sarasawati”.
________________________________________________________________________________________
1. Introduction of Data
Employee data has been interpreted and result has been explained in this report. This
data is having nine attributes those attributes are gender, birth day, education level
(Year), Job category, current Salary, Beginning salary, Month since hire, Previous
experience and Minority classification. Some new attribute is derived from above nine
attribute like male (binary value for gender that may be 0 or 1) and age (derived from
date of birth).
2. Analysis
2.1. It’s a need to test weather female are less educated then male i.e. is
education level gender biased? (Refer: Appendix I)
a. Hypothesis:
H0:μMale = μfemale (Null Hypothesis is that mean of education level is same for male and female)
Ha:μMale ≠ μfemale (Alternate hypothesis is mean ofeducation level is not same for male and female)
Significance Level α 0.05 (i.e. Rejection Region - Rejectthe null hypothesis ifp-value ≤ 0.05)
b. Nature of Data and appropriate statistical tool:
In this case, attribute “education level” and “gender” need to be interpreted from
the employee data. Here “education level” is a continuous variables and
“gender” is a categorical variable. By Q-Q plot (Figure 1 , Appendix I), it is found
that the continuous variable “educational level” is normal is nature since
observed values in Q-Q plot is approximately on expected values. Also we

found that skewness and Kurtosis of “education level” is -0.114 and -0.265
which is acceptable region to say data is approximately normal to proceed for
independent sample t-test. Also we need to identify the outlier for education
level. A box plot is drawn to remove the outlier but we did not identified any
outlier for education level (number of year of education)
c. Independent sample t-test: Discussion of Result
Now to proceed with independent sample t-test (Appendix I), it is mandatory to
check the variance of “education level” for male and female. By “Levene’s Test
of equality of variance” (Table 2, Appendix I), we can see significance level is
less than 0.05 i.e. it can be interpreted that variance for educational level for
both the category is not significantly equals. Since variance is not equal for male
and female, we need to see significance level for t-test under “Equal variance
not assumed”. Significance level for t-test under “Equal variance not assumed”
is .000 (less than .05) and hence null hypothesis is rejected. Hence we can
conclude that education level for male and female is not equal. Now the mean of
education level for male and female are 14.43 and 12.37 (Table 1: Appendix 1)
respectively. Since mean of education level for female is lower than the same of
male hence we can say female are less educated than male.
2.2. It’s a need to test weather more education gives better Job (Refer Appendix II)
a. Hypothesis:
H0:μClerical = μCustodian= μManger (Null Hypothesis that mean of all category are equal)
Ha: Not all the Mean are equal (Alternative hypothesis)
Significance Level α = 0.05 (i.e. Rejection Region - Reject the null hypothesis if p-value ≤ 0.05)
b. Nature of Data and appropriate statistical tool:
In this case attribute “education level” is a continuous variable and job category
is a categorical variable which has more than two categories (i.e. Custodial,
Clerical and Manager). For the normality check of variable “education level” is
explained in previous section of this report and it is found that education level is

approximately normal. Since here more than two groups for variable “job
category” is available we need to apply ANOVA instead of independent sample
t-test.
c. ANOVA : Discussion of Result
Null hypothesis will be rejected since by ANOVA test we found that F=68.49 and
p=.000 (which is less than .05). Rejections of Null hypothesis conclude that
education level for all category of job is not equal. Now we have calculated the
mean of education level for all three categories. We found mean for manager,
clerical and custodian is 17.25, 12.87 and 10.87 respectively. It shows maximum
educational level is required for manager and least is required for custodian.
2.3. It’s a need to test weather job category is gender biased (Refer: Appendix III)
a. Hypothesis:
H0: Job category is independent of gender
Ha: Job category is NOT independent of gender
Significance Level α 0.05 (i.e. RejectionRegion- Reject the null hypothesis if p-value ≤ 0.05)
b. Nature of Data and appropriate statistical tool: Here we need to test
relationship between two categorical variables; those are Job category and
gender. To make the relationship between two categorical variables we should
go for a chi-square test. In SPSS, Chi-square test can be done through Cross
tab. Also we need to test one more requirement to proceed for chi-square test,
that is in contingency table expected frequency should not be less than five.
c. Chi-Square : Discussion of Result - The result indicated that there is no
statistical significant relationship between the type of job and gender with
significance level of 0.05 (chi-square with two degree of freedom = 79.277,
p=0.000)
3. Conclusion: This study is an interpretation of employee data. This study reveals
those education level and job categories are gender biased. It is found that among
all the employees, females are less educated. On the other hand, it reveals that
more education level is needed for better job.

Appendix I : Independent Sample t-test
Group Statistics
Gender N Mean Std. Deviation Std. Error Mean
Educational Level (years) Male 258 14.43 2.979 .185
Female 216 12.37 2.319 .158
Table 1 : Group Statistics, From Independent sample t-test
Table 2 : Independent Sample t-test
Figure 1 : Q-Q plot for "Education Level" to check the normality

Appendix II : ANOVA for education level and Job Category
Table 3 : ANOVA for Education level and Job Category
Educational Level (years)
Employment
Category Mean N Std. Deviation
Clerical 12.87 363 2.333
Custodial 10.19 27 2.219
Manager 17.25 84 1.612
Total 13.49 474 2.885
Table 4: Mean for Job category, from ANOVA
Appendix III : Cross Tab
Gender * Employment Category Crosstabulation
Employment Category
Clerical Custodial Manager Total
Gender Female Count 206 0 10 216
Expected Count 165.4 12.3 38.3 216.0
% within Gender 95.4% .0% 4.6% 100.0%
Male Count 157 27 74 258
Expected Count 197.6 14.7 45.7 258.0
% within Gender 60.9% 10.5% 28.7% 100.0%
Total Count 363 27 84 474
Expected Count 363.0 27.0 84.0 474.0
% within Gender 76.6% 5.7% 17.7% 100.0%
Table 5 : Contingency table ,From Cross Tab
ANOVA
Educational Level (years)
498.852 1 498.852 68.495 .000
3437.615 472 7.283
3936.466 473
Betw een Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.

Table 6 : Chi-square Test, from Cross tab
Chi-Square Tests
79.277a 2 .000
95.463 2 .000
474
Pearson Chi-Square
Likelihood Ratio
N of Valid Cases
Value df
Asymp. Sig.
(2-sided)
0 cells (.0%) have expected count less than 5. The
minimum expected count is 12.30.
a.

Employee data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Destaque

Destaque (20)

Semelhante a Employee data

Semelhante a Employee data (20)

Último

Último (20)

Employee data