As the spotlight for increased transparency and accountability continue to shine upon higher education a need for more granular data regarding student retention and graduation has become a critical component in the decision making process for both faculty and staff. Developing an extensive program-level retention and graduation report is needed to inform faculty and staff as to the outcomes of their efforts and how to improve for the future. And while this kind of data is great for reflection and summative assessment, there has become an increasing need for data to become more predictive so preventative steps may be taken in a more formative assessment style. This session will explore the reporting of program-level retention and graduation and what the future holds for more predictive insights through the use of data mining and machine learning.
4. The Problem
• We are asked for program-level data
• Lack of program-level retention and graduation rates
• Complicated (particularly for undergrads)
• Different programs serve different purposes
• High stakes – program prioritization (AA driven)
• Reports lumped all non-retained together
(whether they graduated or stopped out)
4
7. What We Wanted
• Solid & simple approach (easy to explain and defend)
• Fair
• Useful for all types of programs
• Meaningful for decision-making (high- and low-level)
• Not overly complicated display
• Illuminates
• Overall performance
• Historic trends
• When are students lost
• Something that can be generated yearly w/o too much
effort
7
8. 5 Outcomes
• Five possible outcomes for each student that
declares a given major
• Retained in program
• Graduated in program
• Retained in different program
• Graduated in different program
• Not retained (stop-out/drop-out)
• Exclusive and exhaustive
8
9. General Approach
• Based on cohorts:
• A student is placed in a program cohort the 1st time
they declare a given program
• Each student in the cohort is flagged as one of the 5
possible outcomes for each ½ year interval (each
regular semester)
• At each interval we report where the members of the
cohort fall
• Each student will only appear in one cohort for a
program (usually)
9
13. Summary
All students lumped into 3 groups
This is the most summarized data we can (read: are willing to) provide.
The bottom line = Bold 3-group number
13
14. But how does that compare?
This is average program data to use as a comparison
14
15. Horizontal stacked bar
• Normally we would never do this, but …..
What are people asking:
1. How is the program performing?
2. How are students performing overall at institution?
3. How many are dropping out?
15
1
2
3
16. We graph 5 Flags too
@ 1 year @ 4 year @ 6 year
Transitions to completers and drop-outs
16
Compare performance over time
19. If we study learning as a data science, we can
reverse engineer the human brain and tailor
learning techniques to maximize the chances of
student success. This is the biggest revolution that
could happen in education, turning it into a data-
driven science, and not such a medieval set of
rumors professors tend to carry on.
-Sebastian Thrun
31. ### CREATING MODEL ###
train_new <- DT[1:3032,]
test_new <- DT[3033:3790,]
test_new$fake_retention <- NULL
# Create a new model `my_tree`
my_tree <- rpart( fake_retention ~ SAT + HS.GPA + Gender_Factor + Race_Eth,
data = train_new, method = "class",
control=rpart.control(cp=0.0001) )
# Visualize your new decision tree
prp( my_tree, type = 4, extra = 100, faclen=0, under = TRUE,
compress=FALSE, ycompress=FALSE )
# Make your prediction using `my_tree` and `test_new`
my_prediction <- predict( my_tree, test_new, type = "class“ )
# Create a data frame with two columns: ID & Retained Survived contains your
predictions
vector_studentID <- test_new$ID
my_solution <- data.frame( ID=vector_studentID, Retained=my_prediction )