Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Próximos SlideShares
Carregando em…5
×

# Walmart sales

851 visualizações

• Full Name
Comment goes here.

Are you sure you want to Yes No
• ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐ has really great writers to help you get the grades you need, they are fast and do great research. Support will always contact you if there is any confusion with the requirements of your paper so they can make sure you are getting exactly what you need.

Tem certeza que deseja  Sim  Não
Insira sua mensagem aqui
• who will win this game? get free picks and predictions. ★★★ https://tinyurl.com/yxcmgjf5

Tem certeza que deseja  Sim  Não
Insira sua mensagem aqui
• Seja a primeira pessoa a gostar disto

### Walmart sales

1. 1. Kaggle Walmart sales prediction Data Science General Assembly in DC 19 May 2014 Yifan Li
2. 2. Contents The problem The data The model The result Conclusion
3. 3. The problem To predict the Walmart sales number Regression problem 1. Given past sales, markdown(sales) events 2. Given associated CPI, temperature, unemployment, fuel_price, store type, store size
4. 4. The data • The weekly sales data corresponding to store line plot histogram
5. 5. The data(cont.) Which feature to choose? type vs. size Training data Test data
6. 6. The data(cont.) Handling missing Markdown data 1. Fill it with zero Handling missing CPI, temperature data 1. (CPI)Fill it with linear prediction of Temperature and Fuel Price 2. (Unemployment)Fill it with linear prediction of CPI and Temperature
7. 7. The model linear regression: lm{stats} regularization: glmnet{glmnet} least absolute deviation: rq{quantreg}
8. 8. linear regression The result
9. 9. regularization(result is not improved) The result(cont.)
10. 10. The result(cont.)! linear regression with weight(1 for normal week, 5 for holiday week)
11. 11. The result(cont.)! least absolute deviation with weight
12. 12. The result(cont.) ﬁrst Kaggle experience 44 submissions(maximum 5 submissions per day) 592nd/694(18416.22852 points) beat the all zero benchmark(647th,22265.71813 points)
13. 13. Conclusion Linear regression is a good starting point for feature selection(which takes a lot of time) Using model that corresponding to the evaluation method may improve score The best ﬁve on leaderboard uses Autoregression, Random Forest: With limited data, more sophisticated algorithm would be beneﬁcial