2024 Q2 Orange County (CA) Tableau User Group Meeting
Automatic classification of students in online courses using machine learning techniques
1. Automatic classification of students in online
courses using machine learning techniques
D. Monllao Olive
School of Computer Science and Software Engineering
The University of Western Australia
Crawley WA 6009, AUSTRALIA
Principal supervisor: Dr Du Huynh
Co-supervisor: Assoc/Prof Mark Reynolds
External supervisor: Dr Martin Dougiamas
Master of Philosophy - part time student
2. Contents
1. Problem description
1.1. Online education and Moodle
1.2. Detection of students at risk of dropping out of courses
1.3. Students engagement in online courses
2. Literature review
3. Aim
4. Progress
5. Methodology
6. Timeline
1
3. Online education and Moodle
● Traditional education better for social interactions
● Online education offers more flexibility but relies more in self discipline
● Moodle stats (May 2017 https://moodle.net/stats/)
○ 103 million users worldwide
○ 12 million courses
○ 215 million forum posts
○ 589 million quiz questions https://moodle.org/logo/
2
4. Students at risk of dropping out of courses
● Students that are not engaged in the course don’t participate
● Different stakeholders interested in reducing online courses drop out rates
○ Students
○ Teachers
○ Educational institutions
https://www.thehrdigest.com/wp-content/upload
s/2016/03/college-degree.jpg
3
5. Students’ engagement in online courses
● Engaged students participate in the course activities
● Engagement is not as easy to detect in online courses as in face-to-face
education
● Some examples of engagement indicators:
○ Regular accesses to the course
○ Replies to other course participants’ forum posts
○ Quick reply to teacher’s feedback
○ Percentage of accessed course resources https://userscontent2.emaze.com/images/f5e5
a8b9-e038-4620-b54f-b935902facd9/cdeb1f7
d712f03715e4b0aed325ee967.jpg 4
6. 1. From an educational point of view
● Description of online students’ engagement indicators [1] [2]
● Factor analysis and correlations between indicators and students retention
● Limitations:
○ Not very empirically rigorous
○ Limited studied dataset, results biased to a few courses
○ Indicators correlate individually
Literature review - Learning analytics
5
[1] Katrina A. Meyer. Student engagement in online learning: What works and why. ASHE Higher Education Report, 40(6):1–114, 2014.
[2] Kate S. Hone and Ghada R. El Said. Exploring the factors affecting MOOC retention: A survey study. Computers & Education, 98:157–168, 2016.
7. Literature review - Educational data mining
6
[3] Carlos Marquez-Vera, Alberto Cano, Cristobal Romero, Amin Yousef Mohammad Noaman, Habib Mousa Fardoun, and Sebastian Ventura. Early
dropout prediction using data mining: a case study with high school students. Expert Systems, 33(1):107–124, 2016. EXSY-Dec-13-227.R3.
[4] J. M. Luna, C. Castro, and C. Romero. Mdm tool: A data mining framework integrated into moodle. Computer Applications in Engineering Education,
25(1):90–102, 2017
2. From a data mining point of view:
● Some recent studies using machine learning techniques like Decision
Trees, Association rules or Evolutionary algorithms or [3] [4]
● Limitations:
○ Limited studied dataset
○ Basic student engagement indicators
8. Aim
To find the model that better predicts students at risk of dropping out of
any ongoing Moodle course.
7
9. Aim - How to achieve it?
● By using multiple and different institutions’ datasets
○ To prevent the model to be overfit to courses of a particular institution or format
● By selecting a subset of the literature student engagement indicators
○ To discard indicators that don’t correlate well
● By adding course information to the training dataset
○ To make the model adaptable to all sort of courses
● By limiting the studied activity logs to the most relevant time range
○ To improve the model accuracy
8
10. Progress
● Student engagement indicators literature review
● Moodle analytics API developed (https://github.com/moodlehq/moodle-tool_inspire)
○ Machine learning backend plugins. Shipped with Python (Tensorflow) and PHP (php-ml)
○ Very extendable
○ Prototype: http://prototype.moodle.net/inspirephase1/
○ Experimental model included: Students at risk of dropping out
● Contributions to the most popular PHP machine learning library
○ https://github.com/php-ai/php-ml/graphs/contributors
http://php-ml.readthedoc
s.io/en/latest/assets/php
-ml-logo.png
9
15. Methodology - Overview
1. Training dataset preparation from raw Moodle sites data
○ One sample for each student enrolment in each course of each Moodle site
○ Features: Student engagement indicators calculations, course information and
information about the included activity logs time range
○ Label: Did the student drop out of the course?
○ Output: A .csv file
14
16. Methodology - Overview
2. Machine learning training and performance evaluation
a. Inputs: A .csv file
b. Cross-validation (hyper parameters tuning)
c. Prediction model performance evaluation
■ The process is repeated multiple times
d. Outputs: The average accuracy (Matthews correlation coefficient) and the standard
deviation of all performance evaluations 15
17. Methodology - Parameters
● Repeat the described process with different parameters:
a. Using different subsets of student engagement indicators
b. Adding more course information when required
c. Limiting the student activity logs that are used
d. Using different machine learning algorithms
■ e.g. Neural networks, Support vector machines, Random forests...
16
18. Timeline
17
Task / Milestone Date
Training courses and literature review September 2016 - May 2017
Thesis proposal seminar and proposal submission to the Graduate Research
School at UWA
May 2017
Learning Analytics and Educational Data Mining survey June 2017 - December 2017
Paper describing the analytics framework developed and used for this research November 2017 - July 2018
Paper detailing different combinations of parameters and results July 2018 - December 2019
Limit to nominate examiners and thesis submission August 2020 - September 2020