4. WHY?
We have career outcome data to
derive better insights about higher
education
5. Common questions from user studies
Prospective students:
I want to be a pediatrician. Where should I go to school?
I don’t know what I want but I am an A student. So?
Current students:
Show me the internship / job opportunities.
Should I double / change major?
Recent graduates:
Show me the job opportunities.
Should I consider further education?
6. The Answer for the type A’s
Show me the career outcome data per school / field of study / degree
7. The Answer for the exploratory kind
Show me the career outcome data in a form that allows for
serendipitous discoveries
build me some data products to help me draw insights
from aggregate data
build me some data products that are delightful
8. OK! Let’s start building some data
products for students!
type A’s and non type A’s, we have answers for you
12. Data Science for Higher Ed
A case study
From plumbing to fixture.
From standardization to delightful data products.
13. Standardization
•
Standardization is about understanding our data, and
building the foundational layer that maps <school_name> to
<school_id> so that we can build data products on top
•
Entity resolution
•
Recognizable entities
•
Typeahead
16. Recognizable entities
•
User types in University of California, Berkeley easy
•
User types in UCB hard / ambiguous / alias not
understood
•
User types in 東京大学 harder / canonical name not
understood
17. Recognizable entities
•
You don’t know what you don’t know
•
Your standardization is only as good as your recognized
dataset
•
LinkedIn data is very global
18. Recognizable entities
•
IPEDS for US school data
•
Crowdsourcing for non-US school + government data
•
•
internal and external with schema spec’ed out
Alias – bootstrap from member data
19. Typeahead
•
Plug the hole from the front(-end) as soon as you can
•
Invest in a good typeahead early on so that you don’t even
need to standardize
•
Helps standardization rate tremendously
•
Make sure you have aliases and localized strings in your
typeahead
20. Plumbing? checked
Onto building delightful* data products
*The level of delightfulness is directly correlated to
how good your standardization layer is.
23. Similar schools
•
Aggregate profile per school based on alumni data
•
Industry, job title, job function, company, skills, etc
•
Feature engineering and balancing
•
Dot-product of 2 aggregate profiles = school similarity
24. Similar schools – issues
•
Observation #1: similarity identified between tiny
specialized schools and big research institutions
•
Observation #2: similarity identified between non-US
specialized schools and big US research institutions
26. Similar schools - issues
•
Observation: no data
•
New community colleges and non-US
schools have very sparse data
•
Solution: attribute-based similarity
•
From IPEDS and crowdsourced data
Kyoritsu Women's University
31. Wikipedia stories
•
Lightweight school standardization
•
•
Name mapping
•
•
✓ Name feature ✕ profile feature ✕ network feature
Even when you are notable, your name isn’t unique
Crowdsourcing for evaluation
•
Profile from LinkedIn vs profile from Wikipedia
33. Are we done? Do we have notable
alumni for all schools?
Similar issue like similar schools – data sparseness
34. Who’s notable - Success stories
•
Many schools don’t have notable alumni section in Wikipedia
•
Success stories based on LinkedIn data
•
Features of success
•
•
•
CXO’s at Fortune companies
Generalizes to high seniority at top companies
But what does it mean to be
•
•
Senior
•
•
A top company
An alum
They all depend on…
35. Standardization
•
Degree standardization - alumni
•
Company standardization
•
•
IBM vs international brotherhood of magicians
Title & seniority standardization
•
founder of the gloria lau franchise vs founder of LinkedIn
•
VP in financial sector vs VP in software engineering industry