O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

End to-end example: consumer loan acceptance scoring using kubeflow

110 visualizações

Publicada em

CREDO's presentation from the Kubeflow Summit 2019 on how Kubeflow can be used in financial industry.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

End to-end example: consumer loan acceptance scoring using kubeflow

  1. 1. End-to-end example: Consumer loan acceptance scoring using Kubeflow Radovan Parrak, Lead Data Scientist, Credo radovan.parrak@credo.be
  2. 2. Situation European company with 2 This example related to scoring credit acceptance ● Analytics environment ○ AWS ○ Banking: very high requirements regarding IT security and compliance with regulation ● Objectives ○ Putting hundreds of data products live ○ Single development >> deployment >> delivery environment ○ First go batch, then also real-time ● Typical modelling project ○ Structured data ○ Supervised learning ○ Internalizing interpretable models and hybrid pipelines
  3. 3. Requirements ● Data Scientists ○ Hybrid, integrated & cloud-based development env ■ Python ■ PySpark (locally + remotely on Spark cluster) ■ R ■ SQL ○ Version control (scripts & artifacts) 3 ● ML DevOps ○ Seamless deployment of hybrid pipelines ■ dependency hassle (package & data) ○ Trigger-based scheduling & orchestration of runs ○ Monitoring & dashboarding ○ Version control (runs & pipelines)
  4. 4. Architecture 4 AWS Cloud [existing] ● Infrastructure, connections, security ● S3, Spark cluster, virtual machines,... AWS EKS ● Managed Kubernetes service Kubeflow (v0.6) ● Notebook development environment ● Pipelines for deployment & delivery AWS ECR ElasticStack ● dashboarding env Custom Notebook Servers Open challenges ● Upgrading/migrating of Kubeflow (> v.1)
  5. 5. Development Status quo ● Model fitting done via Kubeflow Notebooks ● Custom JupyterLab Notebook Servers ○ Python, R, (Py)Spark ○ SQL extension ● Kubeflow namespace used as a project directory ○ PVCs per namespace ○ PVs mounted from S3 buckets ○ Shared and private data per namespace 5 Next tasks on our roadmap 1. Remote Spark job submission 2. Modelling pipeline templates 3. Katib for hyper-parameter tuning in our use-case (XGBoost)
  6. 6. Deployment Status quo ● Kubeflow Pipelines as a primary deployment object ● Same underlying development & deployment containers ● Manual script & artifact injection ○ copy from project to kubeflow namespace ○ referencing in ContainerOp 6 Next tasks on our roadmap 1. Explore and implement Fairing 2. Deploying directly from the project namespace 3. Explore Istio for A/B testing and canary deployments
  7. 7. Delivery Status quo ● Parameterized batch runs (manual) 7 Next tasks on our roadmap 1. Explore trigger-based scheduling & orchestration possibilities (cron??) 2. Explore Metadata 3. Explore Nuclio ● Real-time delivery via a Flask app directly on Kubernetes ● Acceptance scores delivered to a dedicated S3 bucket ● Scores monitoring via Kibana dashboards (ElasticStack)
  8. 8. Thanks! Radovan Parrak, Lead Data Scientist, www.credo.be radovan.parrak@credo.be