This document discusses collaborating with data scientists and practicing dataops. It begins by describing a data scientist's work and expectations. When doing a data science project, the key steps are data cleaning, analysis, validation, splitting, model training, and validation. When developing a data science product, additional steps are needed like model scaling, updating, deployment, monitoring, logging, and optimization. The document advocates for consistent workflows, collaborative modeling, continuous improvement, automated deployment, reproducible results, and quality monitoring when practicing dataops. Dataops combines development and operations to continuously deliver high quality data, bringing together data professionals from various roles. Examples are provided for implementing dataops in practice using platforms like Kubeflow and Paperspace. Automated
29. Enter data
Define goals
Apply constraints
Output
Automated ML Accelerates Model Development
Input Intelligently test multiple models in parallel
Optimized model
30. Automated ML Customer Testimonials
• Press-coverage from
public preview:
• CNET
• VentureBeat
• PRNewswire
“I quite like your AutoML function. It gives me good results compared to
other libraries I tested before (tpot and auto-sklearn) that I believe was
only looking at scores and often gave me models that over-trained my
data. And of course the model from your suggested code is better.”
- Big oil company
“I will start with AutoML and use the algorithm that AutoML
recommends to further tune the model”
- Data Scientist
“I actually enjoy being able to use AutoML in a Jupyter notebook. The
DataRobot interface was nice for non-experts, but for someone like me,
it felt a bit basic.”
- Data Scientist