This document summarizes the agenda and key topics from a 4-day course on data science for finance. Day 4 focuses on deploying machine learning models in production and providing a recap of the overall course. The presentation discusses challenges in moving models from prototypes to production, and introduces QuSandbox as a platform for adopting data science and AI in enterprises. QuSandbox provides tools for model management, experimentation and deployment through a user portal and APIs.
3. MODULE 1:
• Data Science in Finance
Orientation on the Credit risk case study
Lab 1:
Exploring Data sets to make sense in Python
MODULE 2:
• Machine Learning in 30 minutes!
Lab 2:Credit risk case study
Building your first model
Agenda
4. MODULE 3:
• Evaluating machine learning models: The metrics
Lab 3:Credit risk case study
Understanding and tuning your model
MODULE 4:
• Deployment of machine learning models and Prediction through AP
Lab 4:Credit risk case study
Deploying your model and predicting interest rates
Agenda
5. 5
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4
7. 7
Claim:
• Machine learning is better for fraud
detection, looking for arbitrage
opportunities and trade execution
Caution:
• Beware of imbalanced class problems
• A model that gives 99% accuracy may still
not be good enough
1. Machine learning is not a generic solution to all problems
8. 8
Claim:
• Our models work on
datasets we have tested on
Caution:
• Do we have enough data?
• How do we handle bias in
datasets?
• Beware of overfitting
• Historical Analysis is not
Prediction
2. A prototype model is not your production model
9. 9
AI and Machine Learning in Production
https://www.itnews.com.au/news/hsbc-societe-generale-run-
into-ais-production-problems-477966
Kristy Roth from HSBC:
“It’s been somewhat easy - in a funny way - to
get going using sample data, [but] then you hit
the real problems,” Roth said.
“I think our early track record on PoCs or pilots
hides a little bit the underlying issues.
Matt Davey from Societe Generale:
“We’ve done quite a bit of work with RPA
recently and I have to say we’ve been a bit
disillusioned with that experience,”
“the PoC is the easy bit: it’s how you get that
into production and shift the balance”
10. 10
Claim:
• It works. We don’t know how!
Caution:
• It’s still not a proven science
• Interpretability or “auditability” of
models is important
• Transparency in codebase is paramount
with the proliferation of opensource
tools
• Skilled data scientists who are
knowledgeable about algorithms and
their appropriate usage are key to
successful adoption
3. We are just getting started!
11. 11
Claim:
• Machine Learning models are
more accurate than
traditional models
Caution:
• Is accuracy the right metric?
• How do we evaluate the
model? RMS or R2
• How does the model behave
in different regimes?
4. Choose the right metrics for evaluation
12. 12
Claim:
• Machine Learning and AI will replace
humans in most applications
Caution:
• Beware of the hype!
• Just because it worked some times
doesn’t mean that the organization can
be on autopilot
• Will we have true AI or Augmented
Intelligence?
• Model risk and robust risk
management is paramount to the
success of the organization.
• We are just getting started!
5. Are we there yet?
https://www.bloomberg.com/news/articles/2017-10-20/automation-
starts-to-sweep-wall-street-with-tons-of-glitches
15. QuSandbox- The platform for adopting Data
Science and AI in the Enterprise
2018 Copyright QuantUniversity LLC.
16. 16
• QuSandbox, is an end-to-end workflow based system to enable
creation and deployment of data science workflows within the
enterprise for primarily ML and AI applications.
• Our environment supports AWS and Google Cloud platform and
incorporates model and data provenance throughout the life cycle
of model development.
• The solution can also be hosted on-prem to leverage custom
hardware and software integrations.
Executive Summary
21. 21
Quant/Enterprise use cases
• Create an environment that can support multiple platforms and
programming languages
• Enable remote running of applications
• Ability to try out a Github submission/ someone else’s code
• Facilitate creation of Docker images to create replicable containers
• Create prototyping environments for Data Science/Quant teams
• Enable Data scientists/Quants to deploy their solutions
• Enable running multiple tasks and jobs
• Enable concurrent running of multiple experiments
• Integrate seamlessly with the cloud to scale up computations
Use cases
22. 22
Fintech use cases
• To demonstrate solutions to enterprises
• Create customized enterprise trials for companies that don’t permit
installation of vendor software prior to procurement
• To manage quick updates
• Enable effective integration and hosting of services (REST APIs)
• To deploy custom services on QuSandbox
Use cases
23. 23
Academic use cases
• Enable creation of course material and exercises that could be
shared
• Enable students and workshop participants to focus on the data
science experiments rather than environment setting
Use cases
33. 33
Creating replicable environments
Create replicable environments (Code + software + data) through a easy point & click tool and
publish to Dockerhub or manage internally
Share it with target users
34. 34
User portal
• Run multiple experiments in pre-created environments (Code + software + data)
• Deploy your own solutions
• Run any Docker image or Github submission on the cloud
41. 41
Data pre-
processing &
EDA
Building a
Machine
Learning model
Evaluating
different
models and
model selection
Deploying your
model in
production
Recap
Day 1 Day 2 Day 3 Day 4
42. 42
• Fill out the assessments for the certificate. Deadline is 9/15/2018
Next Steps
45. 45
About us:
• Data Science, Quant Finance and
Machine Learning Startup
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform
46. Thank you for attending Day 3!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
46