2. But first...
● How many of you …
● own a computer...?
● use Google...?
● have a Facebook account...?
● buy things on Amazon...?
● use Last.fm/Spotify...?
3. Web Companies
● Offer you
● Personalised services and advertisements
● Who you know, what you are looking for,
recommendations for what you may like
● Make money by
● Doing useful things with the data you create
● Selling advertisements, giving recommendations
● Everything is centred around “you”
4. Data Mining (web)
● Using huge amounts of data
● Clicks onto links
● Ratings for movies
● Friendships
● With data mining algorithms to
● Understand (predict) how people behave
● Build systems that will help them
5. Forget the web
● 40,000+ people die on European roads each
year
● Congestion costs an estimated 1% of EU GDP
= 100 Billion Euros
● Transport accounts for 30% of total energy
consumption in the EU
● It is expected that 70% of the world population
will live in cities by 2050
6. Lots of problems to be solved
● Road safety/traffic monitoring
● Reducing congestion
● Building sustainable transport networks
● Urban navigation
7. Another question
● How many of you...
● have a smart phone?
● have an Oyster card?
8. Oyster cards/Mobile Phones
● These devices produce data that is very similar
to the data you create online
● Talking/texting friends
● Checking in/rating to locations
● Travelling around London with your Oyster card
9. Example
● On last.fm, you only listen to rock music
● On TfL, you only travel on buses
● Both are implicit indications of your preferences
10. My research
● Can we use the technologies that work so well
for web companies to solve problems in cities?
11. My research
● Can we use the technologies that work so well
for web companies to solve problems in cities?
● Today's examples:
● Personalised tube services
● Ticket recommendations
12. Today's examples
● Will give you a very brief introduction to things
people who are doing data mining are
interested in:
● Clustering
● Regression
● Ranking
● Classification
16. Clustering
● We are looking for the different habits that
travellers may have
● Clustering is a process of automatically
organising data into groups, so that each group
has very similar members
20. Predicting travel time
● How long will it take me to get there?
● Every time you travel, you make some data
● From where + what time → to where + what time
● Every one is creating their own data
● When you want to travel, can we give you a
personalised travel time estimate?
21. How does it work?
● We design algorithms that leverage this data:
● Self-similarity: how long it took you before
● Familiarity: people who are similar to you
● Context: time you are travelling
22. How well does it work?
● On average, how much error in the predictions?
● Using the mean trip time, 11.45 minutes
● Using zone-zone mean time, 8.56 minutes
● Using journey planner, ~6 minutes
● Using our algorithms, < 3 minutes
● Combined algorithm: 2.92 minutes
24. Station Alerts
● How often do you get to a station and find that
there is a problem?
● Travel alerts/disruptions: you need to look
manually for what is relevant to you
● But every time you touch in a station, you are
showing that you are potentially interested in
what happens there
25. Ranking
● Is the process of making an ordered list
● We can automatically make a unique list for each
person
● The stations will be sorted according to how
relevant they are to your travels
26. How does it work?
● Each station has a weight (a number) that we
use to sort them
● At first, the weight is just how popular the station is
● We increase the weights of stations you visit often
● We increase the weights of stations that are similar
to the ones you visit often
– Similar, in this case, means that “people who travel
to/from station X also travel to/from station Y”
27. Does it work?
● We use a metric
called percentile
ranking
● Smaller values are
better
29. Paying for travel
● Is it cheaper to use pay as you go?
● Which travel card is best?
● … how do you decide?
30. Paying for travel
● Is it cheaper to use pay as you go?
● Which travel card is best?
● … how do you decide?
● The cheapest fare will depend on where you
need to go, when you need to travel, and how
you tend to go there (bus, train)
31. Wasting money?
● The Oyster card data we have shows what
ticket people were using
● We can use their trips to compute what the
cheapest fare would have been, and then see
how much money they could have saved
32. Wasting Money!
● Based on the data, travellers could save about
£200 million per year
● If they were buying the cheapest tickets for their
travel needs
● Can we help them buy the best fare?
33. Classification
● Is the process of assigning some data to a
group. In our case,
● Data = a person's travel habits
● Group = the cheapest ticket
34. How does it work?
● We used decision
trees: an automatic
way of recursively
partitioning data and
discovering rules to
classify data
35. Example
● Neal's travel habits:
● 2.5 average trips per day
● 85% trips on the tube / 15% trips on buses
● 75% of trips during peak-hours
● 95% of trips between Zone 1 and Zone 2
● Decision tree says: Neal should buy a Zone 1-2
travel card
36. Does this work?
● We can ask our algorithm to predict what the
best ticket for a person will be, and see if it
predicts correctly
● We pick a group that could have saved
£479,583.91
● Our algorithm is > 98% accurate; if this group
followed our recommendations, it would have
saved £473,918.38
38. Summary
● Data mining for the city
● People are already carrying around Oyster cards
and mobile phones, and making lots of useful data
about their movements
● There are a lot of problems that can be tackled
using data mining
39. Summary
● We have looked at examples of
● Clustering: grouping people's behaviours
● Regression: predicting travel times
● Ranking: making an ordered list of stations
● Classification: recommending the best ticket