https://predictiveanalyticsworld.de/de/berlin2017/programm/#session37621
AutoScout24 is the largest online car marketplace Europe-wide. With more than 2.4 million listings across Europe, AutoScout24 has access to large amounts of data about historic and current market prices and wants to use this data to empower its users to make informed decisions about selling and buying cars. We created a live price estimation service for used vehicles based on a Random Forest prediction model that is continuously delivered to the end user. Learn how automated verification using live test data sets in our delivery pipeline allows us to release model improvements with confidence at any time.
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at AutoScout24
1. Predictive Analytics World Berlin | 14.11.2017 | Christian Deger | @cdeger
Predictive Analytics for Vehicle Price Prediction
Delivered Continuously at AutoScout24
21. Lessons learned
Form a cross-functional team of
data scientists & software engineers!
Software engineers
… learn how data scientists work
… and understand the quirks of a prediction model
Data Scientist
… learn about unit testing, stable interfaces, git, etc.
... get quick feedback about the impact of their work
Model and product iterations become much faster!
22. Lessons learned
Generating gigabytes of Java code
is a challenge for the JVM
Use the G1 garbage collector
Turn off Tiered Compilation
Do extensive warm-ups
24. Lessons learned
The approach of applying Continuous Delivery to
Data Science is useful independently of the tech
Successfully applied similarly to a Python- and
Spark-based project
Even more useful when quick model evolution
is required because of rapidly changing inputs
(e.g. user interaction)
25. Conclusions
Continuous Delivery allows us to bring prediction
model changes live very quickly.
Only extensive automated end-to-end tests
provide confidence to deploy to production
automatically.
Java code generation allows for very low response
times and excellent scalability for high loads but
requires plenty of memory.