Lets eat presentation_final_20160521

Let’s Eat!
Brad Binder, Lesley Chapman,
Jon Froiland, David Lee

Introduction
History:
Since 1979 there have been services that review
and rank restaurants (Zagat)
•
Today:
According to Nielson – Americans have on
average 41 apps on their smartphones, many of
which provide a recommendation service

Introduction
A variety of restaurant recommendation apps
have been created
Features include: find restaurants, make reservations,
and healthy options
–
A Restaurant Recommender would aim to help
users save money, time, and could help cure
buyers remorse

Problem Summary
We need a tool that resolves the challenge of
finding a restaurant in your area based upon
specific cuisine and menu item criteria
entered by the user

Hypothesis
Hypothesis: The Restaurant Recommender will recommend a
more accurate restaurant compared to selecting a restaurant
based on chance alone
Ho (null hypothesis): A user will find a restaurant that they like
based on chance alone
HA(alternative hypothesis): The restaurant recommender app
will provide a better restaurant suggestion to the user compared
chance alone

Data Ingestion
• WORM Storage
–Stored HTML menu pages in one location
which could be read many times
• Parsed HTML with BeautifulSoup
–Built out a list of “Restaurant” objects
• GET requests to WMATA API to pull metro
station data
–JSON data parsed with pandas read_json()
function
Ingestion Wrangling Analysis Modeling Visualization

Wrangling and Munging
• Majority of time spent wrangling the data and
building restaurants
–Removing duplicate and incomplete
records
–Standardizing inconsistent fields (e.g. price)
–Aggregating and grouping
–Data types
• Merged restaurant and WMATA data using
Euclidean distance

Data Overview
964 Total Restaurants
115,517 Total Menu Items
• Restaurant data includes:
–Name
–Location (address, latitude, longitude)
–Type of cuisine
–Menu (item, price, description)
• WMATA data includes:
–Station name
–Location (latitude, longitude)
–Metro Line

Analysis
10 cities
964 Restaurants
115,517 Menu Items

Analysis
964 Restaurants
115,517 Menu Items

Washington, D.C.

Feature Selection
• Four feature extraction pipelines using sklearn
–Chunking
–Cuisine Type
• TfidfVectorizer
–Extract keywords and assign significance score
– Tokenize and chunk parts of speech using nltk
• LabelBinarizer
–Convert cuisine types to binary features
• FeatureUnion

Modeling and Prediction
• Transformation pipelines and transformed
feature vectors pickled
• Kmeans models fitted using training
restaurant data, then pickled
• User inputs entered via Flask are stored as
training instance
• Relevant pipeline and model loaded to
transform and predict

K=15

Reporting and Visualization
• Restaurant recommendations are determined
by similarity within a matched cluster
–“Similarity” is calculated by minimizing sklearn’s
pairwise euclidean distance function between the
test data and the training instances in the feature
space
• Predictions are exported into an interactive
Tableau visualization
–Allows the user flexibility in making a selection
through filtering and visual indicators

Results
• Some predictions are good, others not so
good
–Some clusters still contain a “hodge podge”
• Removing the “cuisine type” feature helped to
eliminate what we saw as overfit
• Different k values saw better results in some
cases, worse in others
• Additional features (price, ratings, metro)
would require more clusters and MORE DATA

Conclusions
• More data over a “better” model
• Might improve results using transformations
like Singular Value Decomposition (SVD) or
Latent Dirichlet Allocation (LDA)
– Better model analysis
• With more data, improve our tokenizer
– Incorporate stemming, improve chunking
• Incorporating user feedback into prediction
model (ex: Flask interface)

Additional Opportunities
• “Waiter-caller” function that would allow users to login, use
the restaurant map search function, click on a restaurant, and
be matched up with menu items based on keyword matches.
As opposed to reading through an entire menu to find
relevant items.
–Required more knowledge and implementation of
javascript, css, and jinja into the Flask environment.
• Sentiment analyzer was developed but not integrated. Would
allow users to go to restaurant and input a review. The review
would then be analyzed giving back a recommended score (1-
5) to the user.
–Similar requirements

Sources
• Downey, Allen B. Think Bayes. O’Reilly Media; 1st Edition. 2013. Paperback.
• Downey, Allen B. Think Python. O’Reilly Media; 1st Edition, 2012. Paperback.
• Dwyer, Gareth. Flask by Example. Packt Publishing, 2016. Paperback.
• Harris, Harlin, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An
Introspective Survey of Data Scientists and Their Work. O’Reilly Media; 1st Edition,
2013.
• Julian, David. Designing Machine Learning Systems with Python. Packt Publishing,
2016. Paperback.
• Kirk, Matthew. Thoughtful Machine Learning: A Test-Driven Approach. O’Reilly
Media; 1st Edition, 2014. Paperback.
• Kumar, Ashish. Learning Predictive Analytics with Python. Packt Publishing, 2016.
Paperback.
• McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy,
and IPython. O’Reilly Media; 1st Edition, 2012. Paperback.
• Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web.
O’Reilly Media; 1st Edition, 2015. Paperback.
• Raschka, Sebastian. Python Machine Learning. Packt Publishing, 2015. Paperback.
• Segaran, Toby. Programming Collective Intelligence: Building Smart Web 2.0
Applications. O’Reilly Media, 2007. Paperback.

Lets eat presentation_final_20160521

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (6)

Semelhante a Lets eat presentation_final_20160521

Semelhante a Lets eat presentation_final_20160521 (20)

Último

Último (20)

Lets eat presentation_final_20160521