Collected data by conducting a survey about MOOC among fellow classmates and created edge lists of students and their skills and students and MOOC websites they do courses using Python from the survey data.
Performed visualization of student network in UCINET and found out the densities among clusters in the network.
Performed hypothesis testing to see whether characteristic of a student affects their position(centrality) in the network.
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Social Network Analysis based on MOOC's (Massive Open Online Classes)
1. Social Network Analysis Report
on
Students participation in MOOCs
(Massive Open Online Courses)
2. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
1
Contents
Acknowledgements 2
Executive Summary 3
Background 3
Key findings 3
Discussion 3
Introduction 4
Background 4
Aim 4
Approach 4
Methodology 6
Networks 6
Hierarchical Clustering 18
Correspondence analysis 22
Density 24
Core Periphery 27
Conclusion and areas of improvement 32
References 34
3. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
2
Acknowledgements
The authors would like to thank students of University of Texas Arlington enrolled for Social
Network Analysis course for its support for taking the online survey that helped us obtain the
data for Social Network Analysis report on “Students participation in MOOCs (Massive Open
Online Classes).”
We would also like to thank our academic faculty–Dr. Sridhar Nerur–who helped us during
conception, data gathering and network analysis techniques and further peer-reviewed the
outline version of the report and provided thoughtful suggestions.
4. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
3
Executive Summary
Background
This research aims to test the use of social network analysis using MOOCs survey data, as
a tool to more systematically understand why graduate students do online courses. As such,
the report provides as one example of how social network analysis can be used, but the
approach could also be enforced to other types of networks of graduate students.The
research was undertaken to understand current MOOCs and addresses two research
questions:
1. What can social network analysis tell us about MOOCs?
2. How useful are the social network analysis outputs for developing MOOCs aimed at
improving skills of people enrolling for the course?
For this, sixty individuals studying in the University of Texas in Arlington were identified as
the opening point for the analysis. Further details about how to plan social network analysis
can be found in the later part of this report.
Key findings
● The common skill that most students want to develop (The hot skill)
● The most common website individuals use to develop skills (The most useful source)
● Tie between individuals on the basis of common skills and the websites they use to
develop those skills.
● The most central people in the above networks and if gender has a role in centrality
of an individual.
● Skills people tend to develop based on years of work experience and prefered
websites based on work experience.
An overall network of 60 individuals was obtained, using the online survey questionnaire.
Discussion
The study demonstrated the potential of the social network analysis approach that can help
and direct faculty and developers designing the MOOCs. This could be applied to other
universities as well where students along with their courses also enroll for MOOCs. As the
example represented only University of Texas at Arlington graduate students data, results
are not intended to be representative of forming MOOCs- just what was obtained through the
survey among the peers.
5. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
4
Introduction
Background
Social network analysis aims to understand a community (or sample of individuals) by
systematically identifying and mapping relationships connecting members as a network. It
assists in identifying key individuals, groups within the network, and associations between
individuals. Hence, it helps understand the structure of the network and the profile or
functions of individuals within it, and uses statistical tools to illustrate this.
This report serves as an example of how social network analysis can be used for explaining
why graduate students do online courses. This example serves as an example to inform
possible future use where understanding where social networks can enhance MOOCs,
including other factors such as most useful skill, resources etc.
Aim
The research aims to test the use of social network analysis using MOOCs survey data
conducted among peers, to help understand which MOOCs courses are more prominently
taken by the students to enhance their skills. It focuses on two research questions:
1. What can social network analysis tell us about MOOCs?
2. What can social network analysis tell us about students signing up for a MOOC??
3. How useful are the social network analysis outputs for developing MOOCs?
Approach
In using a relatively small amount of intelligence data from one University (The University of
Texas at Arlington), this work is an example of using social network analysis – findings are
not designed to be representative of MOOCs networks nationally.
Data Collection
The research started initially identifying primary and secondary research questionnaire for
the survey in accordance to answering the aim. Primary resources where identified as
graduate students and responses were recorded regarding multiple choice questions,
checkboxes, Yes/No, etc. in the secured excel database.
Data Coding
Python was used to create an edge list. Edgelist formed were people and skills, individuals
and their gender, people and affiliated websites, etc. In the edge list, names of the individual
were replaced by ID. The edgelists were converted into respective two mode and one mode
matrices by UCINET.
6. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
5
Data analysis and interpretation
There is no set way of undertaking social network analysis; it depends upon the questions
being investigated. We used UCINET along with tools such as Python and R for providing
explanations of the key social network analysis findings and statistics produced for this
report.
Limitations
We took several measures that could lead to potential limitations to this analysis.
Data Accuracy: The analysis can only be as good as the data it is based on. The accuracy
and comprehensiveness of MOOCs survey data are uncertain. It can be influenced by
individual activity and by the understanding of those what those individuals were doing while
providing for the survey. This is, however, a limitation of the social network analysis
approach in general.
Simplifications: The analysis involved some simplification for example GPA collected in the
survey of the graduate students were not included. Other aspects that were not examined
but may be essential include skills previously learned while working, undergraduate degree,
social ties, etc.
Limits were placed on the data collection to keep this to a manageable size this would have
limited the size and complexity of the network produced.
7. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
6
Methodology
The following section elaborates about the various techniques that were utilized for
answering the questions asked in introduction.
Networks
Person_skills Network
We formed the Person to courses network which is a two mode network. This is a network of
students and the courses which they have completed on various MOOC websites. We
created an edge list (edgelist2 format) of individuals and the courses that they have taken.
The name of a person was replaced by ID. ID - unique value assigned to every individual
that has taken the survey. The edgelist was converted into two mode matrix by UCINET.
By looking at the diagram we can infer that Python and SQL are the most central skills in the
network.
8. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
7
Person_websites Network
We formed the Person to website network which is a two mode network. This is a network of
students and their prefered websites to develop technical skills. We created an edge list
(edgelist2 format) of IDs that represent individuals and the respective websites used . The
edgelist was converted into two mode matrix by UCINET.
Person_Person_skills Network
We created an one mode network from Person_skills network. We tried to find out who is the
most central person based on the ties between students on the basis of common courses
taken by them. We computed degree centrality, beta centrality, eigenvector centrality and
betweenness centrality. Please refer below image for results:
10. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
9
Person_Person_websites Network
We created an one mode network from Person_websites network. We tried to find out who is
the most central person based on the common websites they choose to do online courses.
We computed degree centrality, beta centrality, eigenvector centrality and betweenness
centrality. Please refer below for results:
12. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
11
Male_skills Network
We created a one mode network for each male from the Person_male network. We tried to
find out who is the most central person among males based on the skills. We computed
Degree centrality, Beta centrality, Eigenvector centrality and Betweenness Centrality. Please
refer below for results:
13. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
12
Male_websites Network
14. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
13
We created a one mode network for each male from the Person_male network. We tried to
find out who is the most central person among males based on the websites they do online
courses. We computed Degree centrality, Beta centrality, Eigenvector centrality and
Betweenness Centrality. Please refer below for results:
15. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
14
Female_skills Network
We created a one mode network for each female from the Person_female network. We tried
to find out who is the most central person among females based on the skills. We computed
Degree centrality, Beta centrality, Eigenvector centrality and Betweenness Centrality. Please
refer below for results:
16. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
15
Female_websites Network
We created a one mode network for each female from the Person_female network. We tried
to find out who is the most central person among females based on the websites they do
online courses. We computed Degree centrality, Beta centrality, Eigenvector centrality and
Betweenness Centrality. Please refer below for results:
17. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
16
Does the gender of an individual affect centrality of person_person_skill network?
To confirm if gender has an effect on centrality, a linear regression was performed where
centrality was the dependent variable and gender was the independent variable. Please
refer below for the results of regression:
'Call: lm(formula = centrality ~ gender)
Residuals:
Min 1Q Median 3Q Max
-75.714 -23.640 1.286 24.360 67.360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 73.640 6.362 11.575 <2e-16 ***
gender 3.074 8.330 0.369 0.713
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 31.81 on 58 degrees of freedom
Multiple R-squared: 0.002343, Adjusted R-squared: -0.01486
F-statistic: 0.1362 on 1 and 58 DF, p-value: 0.7134
'
According to the above results, females are estimated to have a degree centrality score of
3.074 higher than males on an average.
A t-test should be performed to confirm the question in context.
Null hypothesis: The coefficient of gender in the regression equation is equal to zero.
Alternative hypothesis: The coefficient of gender in the regression equation is not equal to
zero.
The critical value of t: -2.663287 to +2.663287.
T-test statistic: 0.3690699.
As we can observe, the p-value is high.
Since the value of test statistic falls within the range of critical value, we fail to reject the null.
Gender of an individual doesn’t affect centrality.
18. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
17
Does the gender of an individual affect centrality of person_person_website network?
To confirm if gender affects centrality, a linear regression was performed where the centrality
of person_person_website was the dependent variable and gender was the independent
variable. Please refer below for the results of regression:
'Call:
lm(formula = centrality ~ gender)
Residuals:
Min 1Q Median 3Q Max
-80.743 -21.743 1.009 28.383 76.257
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 96.240 7.112 13.532 <2e-16 ***
gender -3.497 9.312 -0.376 0.709
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 35.56 on 58 degrees of freedom
Multiple R-squared: 0.002426, Adjusted R-squared: -0.01477
F-statistic: 0.1411 on 1 and 58 DF, p-value: 0.7086'
According to the above results, females are estimated to have a degree centrality score of
0.376 lesser than males on an average.
A t-test should be performed to confirm the question in context.
Null hypothesis: The coefficient of gender in the regression equation is equal to zero.
Alternative hypothesis: The coefficient of gender in the regression equation is not equal to
zero.
The critical value of t: -2.663287 (One sided t-test)
T-test statistic: -0.3755722.
As we can observe, the p-value is high.
Since the value of test statistic is higher than the critical value, we fail to reject the null.
Gender does not have any effect on centrality.
19. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
18
Hierarchical Clustering
We did Hierarchical Clustering for Person to Person based on skills network to find out who
is close to each other based on the skills. First, we formed the one mode Person to Person
network based on skills and then gave that data set as input for Structural Equivalence
(Network->Roles & Positions->Structural->Concor->Standard). Then we got the Dendrogram
which you can see above and also the partition diagram and levels. So, the person with
similar skills is joined together in a cluster. Also, you can see the density matrix and r-square
value (65.9%) which is significant.
Hierarchical Clustering - Skills
20. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
19
Hierarchical Clustering - Websites
21. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
20
We did Hierarchical Clustering for Person to Person based on website network to find out
who is close to each other based on the websites they do online courses. First, we formed
the one mode Person to Person network based on websites they do online courses and then
gave that data set as input for Structural Equivalence (Network->Roles & Positions-
>Structural->Concor->Standard). Then we got the Dendrogram which you can see below
and also the partition diagram and levels. So, the individuals that choose the same websites
to do online courses are joined together in a cluster. Also, you can see the density matrix
and r-square value (53.6%) which is significant.
22. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
21
Hierarchical Clustering using R studio was done by calculating Euclidean distance
between students on the basis of Critical Thinking, Collaborating with people,
Creative work, Leadership and Years of work experience
Survey values for the attributes mentioned above were low, medium and high. Therefore, to
do hierarchical clustering numerical values were needed. We converted the response values
low, medium and high to number 1,2 and 3. Later, we also took into consideration the
attribute years of experience which was converted to average values of ordinal response
values. Now, the foundation to calculate euclidean distance was ready as we had all numeric
values. So, we decided to calculate the Euclidean distance to do single linkage hierarchical
clustering as it gives us the closeness between students which have similar interests and
years of experience. As the range for years of experience was from 0 to 5 which was
different from the range of the other three parameters, we decided to normalize the values of
all five parameters.
As a result, we got a hierarchical clustering which clusters the students which are very close
to each other regarding the values of their responses.
23. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
22
Correspondence analysis
For Correspondence Analysis we found that there is an interesting library available in R
studio called “FactoMineR”. So, we decided to do correspondence analysis using R.
Correspondence analysis for years of work experience and skills.
The correspondence analysis in the figure above helps us understand the fact that Python is
considered as the most important skill by the majority of people. Similarly, students with
around 0 to 1 year of experience take an interest in learning R and Statistical tools. As the
experience increases and goes towards five years the tendency of students taking interest in
Cloud, AWS, and Azure increases. Students with around four years of experience tend to
learn SQL and BigData more compared to other skills.
24. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
23
In the figure above, when it comes to selecting a website for doing MOOC courses,
Datacamp and Udemy are chosen by a majority of students. Students with no experience go
with Udemy and Coursera. Students with around four years of experience prefer Datacamp.
It is interesting to know that students with around five years of experience don’t prefer any of
the websites to do a MOOC course and when they do they tend to select Databricks and
Edx.
25. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
24
Density
To test the distribution of density across the network, a statistical test was performed to
compare the value of density in people_people_Skill and people_people_Website network
against an against a test value. Before performing the tests, the networks were dichotomized
at values 0,1 and 2. Please refer below for results of the test:
The above images are results of T test for people_people_Skill and people_people_Website
network dichotomized at a threshold of 0.
26. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
25
The above images are results of T test for people_people_Skill and people_people_Website
network dichotomized at a threshold of 1.
27. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
26
The above images are results of T test for people_people_Skill and people_people_Website
network dichotomized at a threshold of 1.
Null hypothesis : Density is equal to 1.000
Alternative hypothesis: Density is not equal to 1.000
In all the above three scenarios, we observe that the test statistic is large and significant (p =
0.0002). Hence, null hypothesis is rejected.
28. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
27
Core Periphery
Person_Person_skills - Core Periphery
29. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
28
We did the Core Periphery analysis for Person to Person one mode network tied on the
basis of common courses taken using MOOC. As you can see in the above images, there
are set of core and periphery nodes. This is done using correlation and the value is 80.8%.
Also, you can see the block diagram and the interactions between the core and periphery
nodes. There is more interaction between the core nodes and periphery to core nodes as
compared with core to periphery and periphery to core nodes.
30. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
29
Person_Person_skills - Factions
We also did Faction analysis using 4 blocks and got Fitness values as 326. We did the
Faction analysis repeatedly and got the same fitness value. The color difference in the
network diagram shows the different factions they belong to.
31. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
30
Person_Person_websites - Core Periphery
32. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
31
We did the Core-Periphery analysis for Person to Person one mode network tied by common
courses taken using MOOC. As you can see in the above images, there are a set of core
and periphery nodes. This is done using correlation, and the value is 80.8%. Also, you can
see the block diagram and the interactions between the core and periphery nodes. There is
more interaction between the core nodes and periphery to core nodes as compared with the
core to periphery and periphery to core nodes.
We can also infer that some nodes that are a part of core for person_person_website
network are also core for person_person_skills network. Hence, we can infer that people
tend to use same MOOC websites to develop similar skills.
33. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
32
Person_Person_websites - Factions
We also did Faction analysis using 4 blocks and got Fitness values as 478. We did the
Faction analysis repeatedly and got the same fitness value. The color difference in the
network diagram shows the different factions they belong to.
34. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
33
Conclusion and areas of improvement
● As a part of this project, various concepts of social network analysis like centrality,
density, hierarchical clustering, correspondence analysis, core periphery analysis
and factions are applied on two networks - people_people_skills and
people_people_website.
● From the analysis through various technique, it was analysed that python and SQL
are the hot skills every student aspires to learn. Coursera and Udemy are the most
useful sources to develop various skills.
● We also obtained useful insights about an individual’s learning preference based on
prior work experience.
● The analysis could be useful to MOOC websites - to recommend courses based on
the learner’s profile.
● This project can be extended by obtaining data from different geographical regions
and prefered technical skills in different geographical locations can be analyzed.
● The project mainly focused on data analysis domain and could be extended to other
domains.
● Other factors like cost of a course that influences an individual to take up a course
can be used for analysis.
● All the methodologies used in this project focused on students’ learning preference.
35. Social Network Analysis - Students participation in MOOCs (Massive Open Online Classes)
34
References
1. Analyzing Social Networks by Stephen Borgatti, Martin Everett and Jeffrey Johnson,
SAGE publication, 2013. ISBN: 9781446247419.
2. Robert Hanneman and Mark Riddle. 2005. Introduction to social network methods.