NCCU Ensemble Models Predict MOOC Dropout

Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan, 
Ming-Feng Tsai, Yi-Hsuan Yang, 
Hsin-Ping Chen, Pei-Wen Yeh, 
and Sin-Ya Peng
Research Center for 
Information Technology Innovation, 
Academia Sinica
Department of Computer Science,
National Chengchi University

Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan, 
Ming-Feng Tsai, Yi-Hsuan Yang, 
Hsin-Ping Chen, Pei-Wen Yeh, 
and Sin-Ya Peng
Research Center for 
Information Technology Innovation, 
Academia Sinica
Department of Computer Science,
National Chengchi University
linearly combination of several models
the proposed data engineering method
it’s able to generate a bunch of distinct feature sets

Key Point Summary
Latent Space Representation 
— Clustering Model 
— Skip-Gram Model
Backward Cumulative Features 
— Generate 30 distinct sets of features
Linear Model 
+ 
Tree-based Model
alleviate the feature sparsity problem
alleviate the bias problem of statistical feature
good match
weakness when using sparse feature

Workflow
Train Data
(75%)
Validate Data
(25%)
Train Data
0"
50000"
100000"
150000"
200000"
10/27/2013"
11/27/2013"
12/27/2013"1/27/2014"2/27/2014"
3/27/2014"4/27/2014"
5/27/2014"6/27/2014"
7/27/2014"
Training'Date'Distribu.on
0"
20000"
40000"
60000"
80000"
100000"
120000"
140000"
10/27/2013"
11/27/2013"
12/27/2013"1/27/2014"2/27/2014"
3/27/2014"4/27/2014"
5/27/2014"6/27/2014"
7/27/2014"
Tes.ng'Date'Distribu.on
Split the training data based on 
the time distribution. 
stable results
— 2 settings

Workflow
Train Data
(75%)
Validate Data
(25%)
Learned 
Model Ofﬂine
Evaluation
Test Data
Train Data
Submission
method 1
cross-validation
— 2 settings
check if it leads to better performance

Workflow
Train Data
(75%)
Validate Data
(25%)
Learned 
Model Ofﬂine
Evaluation
Test Data
Train Data
Learned 
Model
Submission
method 2
— 2 settings
check if it leads to better performance

Workflow
Train Data
(75%)
Validate Data
(25%)
Learned 
Model Ofﬂine
Evaluation
Test Data
Train Data
Learned 
Model
Submission
— 2 settings

Prediction Model Overview
Logistic Regression
Gradient Boosting
Classiﬁer
Raw 
Data
Support Vector 
Classiﬁer
Student
Course
Time
A classical approach to a general prediction task.
— 2 solutions
Features

Logistic Regression
Gradient Boosting
Classiﬁer
Gradient Boosting
Decision Trees
Raw 
Data
Support Vector 
Classiﬁer
Student
Course
Time
Backward
Cumulation
the feature engineering towards the MOOC dataset.
— 2 solutions
Features

Logistic Regression
Gradient Boosting
Classiﬁer
Gradient Boosting
Decision Trees
Linear 
Combination
Final 
Prediction
Raw 
Data
Support Vector 
Classiﬁer
Student
Course
Time
Backward
Cumulation
— 2 solutions
Features

Logistic Regression
Gradient Boosting
Classiﬁer
Gradient Boosting
Decision Trees
Linear 
Combination
Final 
Prediction
Raw 
Data
Support Vector 
Classiﬁer
Student
Course
Time
Backward
Cumulation
solution 1
solution 2
xgboost
scikit-learn
— 2 solutions
http://scikit-learn.org/stable/https://github.com/dmlc/xgboost

Logistic Regression
Gradient Boosting
Classiﬁer
Gradient Boosting
Decision Trees
Linear 
Combination
Final 
Prediction
Raw 
Data
Support Vector 
Classiﬁer
Student
Course
Time
Backward
Cumulation
Feature Extraction / Feature Engineering
— 2 solutions

Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature 
— Boolean (0/1) 
— Term Frequency (TF)
• Probability Value 
— Ratio 
— Naive Bayes
• Latent Space Feature 
— Clustering 
— DeepWalk
Raw 
Data

Feature Extraction
Student
Course
Time
Enrolment ID
— Ratio 
— Naive Bayes
— Clustering 
— DeepWalk
Raw 
Data
describing the status
e.g. 
the month of the course 
the number of registration
video 5
problem 10
wiki 0
discussion 2
navigation 0

Feature Extraction
Student
Course
Time
Enrolment ID
— Ratio 
— Naive Bayes
— Clustering 
— DeepWalk
Raw 
Data
dropout probability
P( dropout|containing objects )
:= P(O1|dropout) … P(Od|dropout)
O = {O1, O2, …, Od}objects:
e.g. 
the dropout ratio of the course
estimate the probability from observed data

Feature Extraction
Student
Course
Time
Enrolment ID
— Ratio 
— Naive Bayes
— Clustering 
— DeepWalk
Raw 
Data
Latent Topic
K-means Clustering on 
1. registered courses 
2. containing objects
some features are sparse
DeepWalk / Skip-Gram 
for obtaining a dense 
feature representation

DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension work of word2vec’s Skip-Gram model.

DeepWalk
The Goal — Find the representation of each node of a graph.
It’s an extension work of word2vec’s Skip-Gram model.
The core is to model the context information. 
(in practical, the node’s neighbours)
Similar objects are mapped into similar space.

From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
Course E

U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random Walk
Treat Random Walks on heterogeneous graph 
as the sentence.

U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random Walk
Treat Random Walks on heterogeneous graph 
as the sentence.
U1
0.3 0.2 -0.1 0.5 -0.8 Course B 0.1 0.3 -0.5 1.2 -0.3

Performance
Bag-of-words Bag-of-words
Probability
Bag-of-words
Probability
Naive Bayes
Bag-of-words
Probability
Naive Bayes
Latent Space
> 0.890 > 0.901 > 0.902 > 0.903
Backward
Cumulation
Models
Combination

Backward Cumulative Features — Motivation
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Logs Table
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3

Backward Cumulative Features — Motivation
O X O X O O X O O X X X X X O
different period
Logs Table
different number of logs
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3

Backward Cumulative Features
Consider only the logs in last N days. N=2
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3

10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3

Raw 
Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3
.
.
.
Feature Set 29
Feature Set 30
— 2 strategies

Raw 
Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3
.
.
.
Feature Set 29
Feature Set 30
Classiﬁer
Strategy 1. 
Concatenate all features.
— 2 strategies

Raw 
Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3
.
.
.
Feature Set 29
Feature Set 30
Classifier
Classifier
Classifier
Classifier
Classifier
Strategy 2. 
Build 30 distinct models.
Average
— 2 strategies

Logistic Regression
Gradient Boosting
Classiﬁer
Gradient Boosting
Decision Trees
Linear 
Combination
Final 
Prediction
Raw 
Data
Support Vector 
Classiﬁer
Student
Course
Time
Backward
Cumulation
solution 1 * 0.5
solution 2 * 0.5
xgboost
scikit-learn

What We Learned from the Competition
• Team Work is important 
— share ideas 
— share solutions
• Model diversity & feature diversity 
— diverse models / features can capture different characteristic of the data
• Realize the data 
— the goal 
— the evaluation metric 
— the data structure

— share ideas 
— share solutions
— the goal 
Start earlier …

— share ideas 
— share solutions
— the goal 
Start earlier …
Feature Format
Data Partition
Feature Scale
several things to be discussed
e.g.

changecandy [at] gmail.com
Any Question?

NCCU Ensemble Models Predict MOOC Dropout

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a NCCU Ensemble Models Predict MOOC Dropout

Semelhante a NCCU Ensemble Models Predict MOOC Dropout (20)

Mais de 志明陳

Mais de 志明陳 (10)

Último

Último (20)