Pariveda's Ryan Gross presented on the ways that companies are transforming themselves using data and data science. Many of the challenges that organizations run into are cultural and/or process related. The presentation goes through a framework for getting your organization started successfully with Data Science.
11. • Building Data Lake
Platforms & Data
Governance processes to
make data available
• Change management
being utilized to adopt ML
solutions across the
business
• Filling out roles on Data
Science team
• Business actively looking
• Executing against a value-
driven backlog of ML
opportunities
• Scaling up the data science
function, supplementing with
platforms
• Business understands data-
driven decision making,
• Utilizing controlled
experiments during roll-out
• Putting first Machine
Learning MVP solutions
into production
• Educating the business
on utilizing ML
predictions
• Sourcing data for ML
models ad-hoc
• Hiring leadership roles
on data science team
• Not currently in position to
build a production
machine learning solution.
• Require major Data
Engineering to get ready.
• Can also implement POCs
where data is ready to
show the benefits of ML to
executives
STORMINGFORMING NORMING PERFORMIN
G
Readiness for DS Value Realization
Organizations now collect an enormous volume of data about their customers, products, processes
Those organizations with the capability to turn data into actionable insights leap ahead of their competitors – delighting customers, reducing costs, opening new markets
Human experts and traditional decision support tools become overwhelmed as the amount of data increases
Distinguishing subtle differences across hundreds or thousands of interacting factors is difficult
The sheer volume of data to be analyzed becomes a limiting factor
Machine learning is being used by a rapidly increasing number of organizations to overcome the challenge of generating insights from large data sets
Cheap computing power and easy accessibility to advanced algorithms has reduced the barrier to entry – ML is not just for academics or bleeding edge companies any more
Come up with the examples of what each group should be doing
There are three kinds of data problems that really push the big data envelope. They’re commonly known as the Vs of Big Data. Most people agree on at least 3 of them because they help us categorize the technical problem, while others tend to be dimensions of the solution or business problem.
The value is to decouple the storage from the process
There are three kinds of data problems that really push the big data envelope. They’re commonly known as the Vs of Big Data. Most people agree on at least 3 of them because they help us categorize the technical problem, while others tend to be dimensions of the solution or business problem.
Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline”
Expands on the familiar [ Extract ] [ Transform ] [ Load ] pattern
Terminology-wise, this maps directly to the Analytics Pipeline
Extract = Ingest/Collect
Transform = Store (Model)/Process (Enhance/Transform)
Load = Consume/Visualize (Distribute)
Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline”
Expands on the familiar [ Extract ] [ Transform ] [ Load ] pattern
Terminology-wise, this maps directly to the Analytics Pipeline
Extract = Ingest/Collect
Transform = Store (Model)/Process (Enhance/Transform)
Load = Consume/Visualize (Distribute)
Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline”
Expands on the familiar [ Extract ] [ Transform ] [ Load ] pattern
Terminology-wise, this maps directly to the Analytics Pipeline
Extract = Ingest/Collect
Transform = Store (Model)/Process (Enhance/Transform)
Load = Consume/Visualize (Distribute)
Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline”
Expands on the familiar [ Extract ] [ Transform ] [ Load ] pattern
Terminology-wise, this maps directly to the Analytics Pipeline
Extract = Ingest/Collect
Transform = Store (Model)/Process (Enhance/Transform)
Load = Consume/Visualize (Distribute)
Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline”
Expands on the familiar [ Extract ] [ Transform ] [ Load ] pattern
Terminology-wise, this maps directly to the Analytics Pipeline
Extract = Ingest/Collect
Transform = Store (Model)/Process (Enhance/Transform)
Load = Consume/Visualize (Distribute)
The Data Science work provides insight and value, but how do operationalize the work. This is the challenge.
We haven’t necessarily completely figured out how the data science work is part of the agile software development process
More Info: Jon Landers, Ryan Gross
One thing we didn’t cover in the prior section is how to get your models into production
Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service!
But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up
So you build an API to trigger retraining and validation, and trigger it on a timer
Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
More Info: Jon Landers, Ryan Gross
One thing we didn’t cover in the prior section is how to get your models into production
Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service!
But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up
So you build an API to trigger retraining and validation, and trigger it on a timer
Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
More Info: Jon Landers, Ryan Gross
One thing we didn’t cover in the prior section is how to get your models into production
Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service!
But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up
So you build an API to trigger retraining and validation, and trigger it on a timer
Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
More Info: Jon Landers, Ryan Gross
One thing we didn’t cover in the prior section is how to get your models into production
Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service!
But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up
So you build an API to trigger retraining and validation, and trigger it on a timer
Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
More Info: Jon Landers, Ryan Gross
One thing we didn’t cover in the prior section is how to get your models into production
Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service!
But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up
So you build an API to trigger retraining and validation, and trigger it on a timer
Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
More Info: Ryan Gross
Let’s go back to your prediction pipeline
When the next transaction, day, week, or month goes by, we can check our predictions against the actual values, detecting the need to re-train our models using the new actuals.
If the prediction is off by too much, we can alert the team so they can figure out why.
If (as is common), it’s just drift because the real world changed, we can re-train the model
More Info: Ryan Gross
Let’s go back to your prediction pipeline
When the next transaction, day, week, or month goes by, we can check our predictions against the actual values, detecting the need to re-train our models using the new actuals.
If the prediction is off by too much, we can alert the team so they can figure out why.
If (as is common), it’s just drift because the real world changed, we can re-train the model
More Info: Ryan Gross
Let’s go back to your prediction pipeline
When the next transaction, day, week, or month goes by, we can check our predictions against the actual values, detecting the need to re-train our models using the new actuals.
If the prediction is off by too much, we can alert the team so they can figure out why.
If (as is common), it’s just drift because the real world changed, we can re-train the model