VSS 2011 Data Mining (Thursday, 10:45)

Towards the Development of a Real-Time
Decision Support System for Online Learning,
Teaching and Administration

Kerry Rice, Ed.D.
Associate Professor and Chair
Andy Hung, Ed. D
Assistant Professor
Yu-Chang Hsu, Ph. D.
Assistant Professor

M.S. in Educational Technology
Masters in Educational Technology
Ed. D in Educational Technology
K-12 Online Teaching Endorsement

Graduate Certificates:
Online Teaching - K12 & Adult Learner
Technology Integration Specialist
School Technology Coordinator
Online Teacher PD Portal
Game Studio: Mobile Game Design
Learning Technology Design Lab

EDTECH Fast Facts

• Largest graduate program at BSU
• Fully online, self-support program
• Served over 1,200 unique students last year
• Interdisciplinary partnerships with Math, Engineering,
Geoscience, Nursing, Psychology, Literacy, Athletics.
• Partnerships with iNACOL, AECT, ISTE, Google,
Stanford, IDLA, Connections Academy, K12, Inc., ID
State Department of Education, Discovery Education,
Nicolaus Copernicus University, Poland
• First dual degree program – National University of
Tainan.
• Save 200+ tons of CO2 emissions annually

Image created using wordle: http://www.wordle.net/

Going Virtual! Research Series

2007: The Status of Professional Development

• Who delivered/received PD?
• When and how PD was delivered?
• Content and sequence of PD?

2008: Unique Needs and Challenges

• Amount of PD?
• Preferred delivery format?
• Most important topics for PD?

2009: Effective Professional Development of K-12 Online Teachers

• Program evaluations
• Complexities of measuring “effectiveness”

2010: The Status of PD and Unique Needs of K-12 Online Teachers

• Revisit questions from 2007 & 2008
• What PD have you had? What do you need?

2011: Development of an Educational Data Mining model

• Pass Rate Predictive Model
• Engagement
• Association Rules


258 Respondents 884 K-12 Online 830 K-12 Online

Going Virtual! 2008
Going Virtual! 2007

Going Virtual! 2010
Teachers Teachers
Descriptive

•167 K-12 online teachers
•61 Administrators •727 virtual schools •417 Virtual School
•14 Trainers •99 supplemental •318 Supplemental
Over 40 virtual programs •81 Blended
schools and •54 brick and mortar •12 Brick N Mortar
online programs
online programs Over 50 virtual
Over 60 virtual schools and online
Over 30 states schools and programs
online programs
Over 30 states Over 40 states & 24
countries
Traditional

Going Virtual 2011
• Virtual Charter
• Supplemental
Goals: Program

Evaluative
• Program evaluation With DATA MINING
• Develop cloud-based, real-time • Online Teacher PD
Workshops
Decision Support System • Online Graduate
Courses
(DSS) • End of Year Program
Evaluation
• Link PD effectiveness to
student outcomes

Traditional Evaluation Systems
Teacher Student
Program
Effectiveness Outcomes

Highly
AYP? Performance
qualified?

Parent Improved Test
Participation
Satisfaction Scores

Annual Parent
Attendance
Performance Satisfaction

Range of
ISAT/DWA
implementation

Student
Self-Efficacy
Satisfaction

Knowledge of
Satisfaction
STS

Leveraging Data Systems

PD Teacher Student
Effectiveness Effectiveness Outcomes

Change in
Quality teaching Satisfaction
practice
Self report
Self report
Quantity
AND Quality
Usefulness of Engagement
Interaction
Self report

Course Dropout
Engagement Design Rate
Low-level data

Performanc
e

Low-level data

Learning
Patterns

Data Mining

Data mining techniques can be applied in online
environments to understand hidden relationships
between logged activities, learner experiences, and
performance. It can be used in education to track learner
behaviors, identify struggling students, depict learning
preferences, improve course design, personalize
instruction, and predict student performance.

Educational Data Mining

Special Challenges
• Learning behaviors are complex
• Target variables (learning outcomes/performance)
require wide range of assessments and indicators
• Goal of improving online teaching and learning is hard
to quantify
• Limited number of DM techniques suitable to meet
educational goals
• Only interactions that occur in the LMS can be tracked
through data mining. What if learning occurs outside
the LMS?
• Still a very intensive process to identify rules and
patterns

DM Applications in Education
• Pattern discovery (data visualization, clustering, sequential
path analysis)
– Track students’ learning progress
– Identify outliers (outstanding or at-risk students)
– Depict students’ learning preferences (learner profiling)
– Identify relationships of course components (web
mining)
• Predictive Modeling (decision tree analysis)
– Suggest personalized activities (classification prediction)
– Foresee student performance (numeric prediction)
– Adaptive evaluation system development
• Algorithm generation: analysis methods can be integrated
into platforms.

Data Preprocessing

• Data Collection
• Data Cleaning
• Session Identification
• Behavior Identification

3 Data Mining Studies

• Study #1: Teacher Training Workshops 2010
– Survey Data + Data Mining + Student Outcomes
• Study #2: Graduate Courses 2010
– Data Mining + Student Outcomes (no demographic data)
• Study #3: End of Year K-12 Program Evaluation
(2009 – 2010)
– Data Mining + Student Outcomes + Demographic Data
+ Survey Data

Study #1: Teacher Training Workshops 2010

• Survey Data + Data Mining + Student Outcomes
• Research Goal: To demonstrate the potential
applications of data mining with a case study
– Program evaluation of workshop quality for continuous
improvement of design and delivery.
– Evaluation of PD impact on both teachers (and
students).

Study #1: Teacher Training Workshops 2010

• Blackboard
• 103 participants
• 31,417 learning logs
• clustering analysis, sequential association
analysis, and decision tree analysis
• Engagement variables
– Frequency of logins
– Length of time online (survey and dm)
– Frequency of content access
– Number of discussion posts

Learning Paths

• Association Rule Analysis
– Participants tended to switch between content and
discussion within one session.
– Different types of interactions (content-participant,
participant-instructor, and participant-participant) were
well facilitated in the workshops overall.

Performance
Pass Rate Predictive Model
• Decision Tree Analysis
– Improved grades and pass
rate (from 88% to 92% and
89% to 94% respectively)
when participants’ logged into
LMS more than 10 times over
six weeks. The average for
both is further improved to
98% when frequency of
logins increased to 17 times.

Increased logins = Increased performance

Quality of Experience

Engagement
• Clustering + Survey Questions
– More time spent online = more time spent offline.
– Previous online teaching experience = more hours
spent both online and offline.

DM Conclusions

• Interaction and engagement were important factors in
learning outcomes.
• The results indicate that the workshops were well
facilitated, in terms of interaction.
• Participants who had online teaching experience could be
expected to have a higher engagement level but prior
online learning experience did NOT show a similar
relationship.
• There is a direct relationship between the amount of time
learners spent online and their average course logins to
engagement and performance. Specifically, more time
spent online and a higher frequency of logins equates to
increased engagement and improved performance.

Overall Conclusions

• Two factors influenced expectation ratings:
– Practical new knowledge
– Ease of locating information
• Three factors influenced satisfaction ratings:
– Usefulness of subject-matter
– Well-structured website
– Sufficient technical supports
• Instructor quality was related to:
– Stimulated interest
– Preparation for class
– Respectful treatment of students
– Peer collaboration
– Assessments aligned to course objectives
– Support services for technical problems

Study #2: Graduate Courses 2010

• Data Mining + Student Outcomes (no
demographic data)
• Research Goal: To demonstrate the potential
applications of data mining with a case study
– Generate personalized advice
– Identify struggling students
– Adjust teaching strategies
– Improve course design
– Data Visualization
• Study Design
– Comparative (between and within courses)
– Random course selection

Study #2: Graduate Course 2010

• Moodle
• Two graduate courses (X and Y)
• Each with two sections
– X1 (18 students)
– X2 (19 students)
– Y1 (18 students)
– Y2 (22 students)
• 2,744,433 server logs

Study #2: Graduate Course 2010

• Variables
– ID’s (user and session)
– Learning Behaviors (reading materials, posting disc.)
– Time/duration
– Grades or pass/fail (independent variables)

Learner Behaviors
Weekday Student Patterns
Weekday Course Patterns

Weekday and Time Patterns of Learning
Behaviors

• Reading is the major activity; Similar patterns
• Sunday => reply discussions
• Monday & Tuesday, between 1pm and midnight

Shared Student Characteristics
Course X

Shared Student Characteristics
Course Y

Predictive Analysis – Course X
Discussion board posts and
replies were the most
important variable for
predicting performance
(27+ replies = better
performance)

Some lower performers
had high reply numbers (>
43)

Cluster analysis revealed
that students tended to
only read discussions.

Predictive Analysis – Course Y
Number of discussion
board posts read was the
most important predictor of
performance (378+ =
better performance)

Fewer discussions read +
more replies (54+ = better
performance)

The design of course Y
improved the quality of
discussions and influenced
student behaviors.

Study #3: End of Year K-12 Program Evaluation

• Demographics + Survey Data + Data Mining +
Student Outcomes
• Research Goal: Large scale program evaluation
– How can the proposed program evaluation framework
support decision making at the course and institutional
level?
– Identify key variables and examine potential
relationships between teacher and course satisfaction,
student behaviors, and student performance outcomes

Study #3: End of Year K-12 Program Evaluation
(2009 – 2010)

• Blackboard LMS
• 7500 students
• 883 courses
• 23,854,527
learning logs
(over 1 billion
records)

Total Variables = 22

stuID Login_Avg
Age Module_Avg
City Gender
District HSGradYear
Grade_Avg School
Click_Avg No_Course
Content_Access_Avg No_Fail
Course_Access_Avg No_Pass
Page_Access_Avg Pass rate
DB_Entry_Avg cSatisfaction_Avg
Tab_Access_Avg iSatisfaction_Avg

Engagement

• Average frequency of logins per course.
• Average frequency of tab accessed per course
• Average frequency of module accessed per course
• Average frequency of clicks per course
• Average frequency of courses accessed (from the
Blackboard portal)
• Average frequency of page accessed per course (page tool)
• Average frequency of course content accessed per course
(content tool)
• Average number of discussion board entries per course.

Cluster Analysis - by Student
Spring 2010

Cluster Analysis - by Student

• High engagement = high performance
• The optimal number of courses = 1 to 2 per semester
• Older students (age > 16.91) tended to take more than two
courses with pass rates ranging from 54.09-56.11%
• High-engaged students demonstrated engagement levels
twice that of low-engaged students
• Female students were more active than male students in
online discussions (with higher DB_Entry avg frequency)
• Female students had higher pass rates than male students

Cluster Analysis – by Course

Identified lowest performing courses (Math, Science and
English) were analyzed with cluster analysis.
• High-engaged + high performance = good design and good
implementation?
• High engaged + low performance = bad design and good
implementation?
• Low engaged + low performance = bad design and bad
implementation?


Subject areas in which the level of activity was
consistent with student outcomes:
– High Performance and High Engagement = Driver
Education, Electives, Foreign Language, Health, and
Social Studies
– Low Engagement and Low Performance = English

Subject areas in which the level of activity was
inconsistent with student outcomes:
– High Engagement and Low Performance = Math and
Science. Why?


• Regardless of the content area or level of engagement, low
performance courses were entry-level
• Most high-engaged, high performance courses were
advanced level courses.
• Regardless of Math, Science, or English subject-matter,
entry level courses tended to have lower performance
whether students were categorized as low-engaged or high-
engaged.
• The reasons students enrolled in a course may influence
their engagement level and performance. Student survey
responses indicated that students who retook courses they
have previously failed, tended to demonstrate lower
engagement and lower performance.

Predictive Analysis – Pass Rate

• Positive correlation between engagement level and
performance (higher engaged => higher performance)
• Engagement level and gender have stronger effects on
student final grades than age, school district, school, and
city. For most students, high engaged => high performance
• Overall, female students performed better than male
students
• Students who were around 16 years old or younger
performed better than those who were 18 years or older.
• Compared with other Blackboard components such as
discussion board entries and content access, tab access
had negative effects on student performance (higher
tab access => lower performance)

Predictive Analysis – Course Satisfaction

• Students with higher average final grades (> 73.25) had
higher course satisfaction.
• Students who passed all courses or passed some of their
courses had higher course satisfaction than all-failed
students.
• Students who took two or more courses in Spring 2010,
whether they passed those courses or not, had higher
course satisfaction.
• Female students had higher course satisfaction than male
students.
• Online behaviors (i.e., frequency of page accessed and
number of discussion board entries) had minor effects on
course satisfaction (higher frequency/number => higher
course satisfaction).

Predictive Analysis – Instructor Satisfaction

• Students with higher average final grades (> 73.25%) indicated
higher instructor satisfaction.
• Students who took two or more courses in Spring 2010, whether
they passed those courses or not, showed higher instructor
satisfaction.
• Female students indicated higher instructor satisfaction than male
students.
• Online behaviors (frequency of module accessed) had minor
effects on instructor satisfaction (higher frequency => higher
course satisfaction).
• Older students (> 17.5 years old) had higher instructor
satisfaction.

Regression Analysis

• Spring 2010 – Survey data + Data Mining
• Purpose: To identify which variables contributed
significantly toward students’ average final grade.
• Positive (higher values, higher average final grade)
– Self-reported GPA (Likert-scale type of response)
– Satisfaction toward positive experience (Likert-scale type of
response)
– Satisfaction toward course content (Likert-scale type of
response)
– Time on coursework (Likert-scale type of response)
– Course access (based on LMS server log data)
• Negative (higher values, lower average final grade)
– Effort and challenge (based on Likert-scale type of response on
the survey)
– Tab access (based on LMS server log data)

Conclusions

• Higher-engaged students usually had higher
performance
– limited to courses which were well-designed and
implemented. In this study, entry-level courses tended
to have lower performance whether students were
categorized as low engaged or high engaged high
• Satisfaction and engagement levels could not
guarantee high performance

Characteristics of successful students

• Female
• 16.5 years or younger
• Took one or two courses per semester
• Took Foreign Language or Health course
• Lived in larger cities

Characteristics of at-risk students

• Male
• 18 years or older
• Took more than two courses per semester
• Took entry-level courses in Math, Science, or
English
• Lived in smaller cities

VSS 2011 Data Mining (Thursday, 10:45)

VSS 2011 Data Mining (Thursday, 10:45)

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a VSS 2011 Data Mining (Thursday, 10:45)

Semelhante a VSS 2011 Data Mining (Thursday, 10:45) (20)

Mais de Kerry Rice

Mais de Kerry Rice (20)

Último

Último (20)

VSS 2011 Data Mining (Thursday, 10:45)

Notas do Editor