In a marketing analytics class, we were responsible for mimicking a company, coming up with a questions to be solved, making an analytical model to answer the question, and determine if we asked the right question.
2. Business
Understanding
❖Business Understanding: TLC's board wants a better understandingof
the customers they are serving during large holidayslikeNewYear's
Day.TLC would liketo predict which customers tip versus those who do
not tip to increaseemployeemotivation.This is because we hypothesize
that larger groups tip better, especiallyonholidayssuchas NewYear's
Day. However, the current consensusamong the drivers is that large
groups are lessprofitableand more annoying.
❖Business Question: What customer segment is more likelyto tip taxi
drivers?
❖Success Matrix: We will determine this model a successif we discover a
unique trend between the group size and tippingamount.
3. Data
Understanding
The data was collected
and obtained from the
New York City Taxi &
Limousine Commission.
This data set is
information exclusively
gained from interactions
during the month of
January 2019.
The Data Dictionary
includes categorical,
text, nominal, and
quantitative data.
The fields we will be
using within our model
include:
•Ride Length (quantitative)
•Passenger Count (quantitative)
•Fare Amount (quantitative)
•Tip Amount (quantitative)
•Tip form (categorical)
4. Data Preparation
❖We began by deleting two columns where no taxi driver filled out
the requested information. We may come back another time to
discover any significance of the data absence.
❖We deleted entries that were the outside the month of January
❖We condensed the data to business on 1/1/2019
❖We calculated ride length by changing the data type toTime
format and subtracted the pick-up and drop-off times
❖In a new sheet we put the tpep_pickup_datetime,
tpep_dropoff_datetime, Ride_Length passenger_count
trip_distance payment_type fare_amount tip_amount columns to
run with the model
5. Modeling
❖We chose to use the CustomerSegmentation Model. We wanted
to group different customers and compare the segments to ride length, trip
fare, and trip amount.
❖The variables we considered were the pickup time, drop-off time, passenger
count, trip distance, fare amount, payment type, and tip amount. We calculated
ride length by doing a transformation on the pickup and drop-off times. We
also considered location variable, but the location was based off of area codes
fromTLC,not general zip codes or Bureaus.SinceTLCis focused on group size
and tip amount, we did not use location for the final variable.
❖We assessed the results through segmentations featured on the following slides
as created by Solver.
❖We assessed the validity of the data results by converting the data into a box and
whiskers plot so we could visually see statistical outliers.
8. Modeling
❖This graph showTip
Amount per cluster.
Segment 2 shows many
outliers that are important
to investigate. This shows
that single riders who ride
more than 30 min have the
highest spread of tips. This
is our first clue that
distance, and not party
number, may be more
useful in determining tip
amount.
9. Modeling
This graph shows the fare
amount per cluster. The graph
shows that taxi drivers in
segments four and five have
negative tipping amounts. This is
most likely due to mistaken
entries from the taxi drivers. The
mistakes may derive from these
segments being less than 10 min
rides.
10. Modeling
This graph shows the trip distance per cluster. It is again
showing that segment two has the longest ride mileage.
11. Modeling
This graph shows the number of riders per cluster. While
segment two had the largest tipping spread in the earlier
graph, it tends to have a very low amount of riders that
disproves our earlier hypothesis. The segments with the most
riders are segment 1 and 4.
12. Model
Evaluation
We found through our 5 customer segmentations that the
distance of the trip bears no correlation to passenger(s) tipping
on the ride. However, we discovered two levers that changed if
the rider(s) tipped: the amount of time they spent in the taxi and
payment type. Our model shows that if the rider(s) spent, on
average, at least 10 minutes in the taxi, then the customer was
likely to tip.The other lever that changed if the rider(s) tipped at
all was payment type. The group tipped if credit card was used
and did not tip if cash was used. We believe this to be a
consequence of the data being self-reported. Therefore, cash
tips may not be reported and may be pocketed by the driver.
Group size, trip distance, and the amount of the fare did not
change the tip amount in a way that we could see.
13. Recommendation
We recommendthat we dismissthis modeland create a new
model thatfocuseson thedistanceof thecab ride instead of
thegroup sizeto see any possible correlationof tipping
patterns.While we still believe there is a way to increase
driver motivation,thismodel helped us determine which
variables may be morebeneficial to focuson.
A possiblefix to our model is to include moresegmentations
to identify more trends in oursegmentation.