O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Data science apps: beyond notebooks

502 visualizações

Publicada em

Jupyter notebooks are transforming the way we look at computing, coding and problem solving. But is this the only “data scientist experience” that this technology can provide?

In this webinar, Natalino will sketch how you could use Jupyter to create interactive and compelling data science web applications and provide new ways of data exploration and analysis. In the background, these apps are still powered by well understood and documented Jupyter notebooks.

They will present an architecture which is composed of four parts: a jupyter server-only gateway, a Scala/Spark Jupyter kernel, a Spark cluster and a angular/bootstrap web application.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

Data science apps: beyond notebooks

  1. 1. Data Science Apps: Beyond Notebooks Natalino Busa - Head of Data Science
  2. 2. 2 Linkedin and Twitter: @natbusa
  3. 3. 3 Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY Learning: The Scientific Method Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method observation hypothesis deduction synthesis Hans Christian Ørsted experiment
  4. 4. 4 Data Scientist Experience
  5. 5. 5 CloudTools AI & ML
  6. 6. 6 The Jupyter Project http://jupyter.org
  7. 7. 7 Jupyter notebook: what is it? The Jupyter Notebook
  8. 8. 8 Jupyter notebook: why? Language of choice The Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala. Share notebooks Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer. Interactive widgets Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime. Big data integration Leverage big data tools, such as Apache Spark, from Python, R and Scala. Explore that same data with pandas, scikit-learn, ggplot2, dplyr, etc.
  9. 9. 9 Text Cell Code Cell Cell Input Cell Output Edit, Run, Kernel, Widgets Menu’s Kernel Type Cell output: ASCII, HTML, Image. etc
  10. 10. 10 Architecture of a Jupyter Notebook ∅MQ Notebook files HTTP Websockets
  11. 11. 11 Architecture of a Jupyter Notebook • Modular architecture: Web App, Server, Kernel • Kernels: Python, R, Scala, Julia, Bash, SPARKQL • Web App: Asynchronous, rich editing, syntax highlight, export and share
  12. 12. 12 Jupyter Notebook ● Narratives and Use Cases Narratives are collaborative, shareable, publishable, and reproducible. We believe that Narratives help both yourself and other researchers by sharing your use of Jupyter projects, technical specifics of your deployment, and installation and configuration tips so that others can learn from your experiences. From https://jupyter.readthedocs.io/en/latest/use-cases/content-user.html
  13. 13. 13 Jupyter is more than Notebooks
  14. 14. 14 Examples of Jupyter powered narratives ● ●
  15. 15. 15 Orioles: A powerful educational narrative
  16. 16. 16 Orioles: A powerful educational narrative ∅MQ Notebook files HTTP Websockets Video files Docker Containers
  17. 17. 17
  18. 18. 18 Build your own narrative! What do you need? Understand how to communicate to the jupyter server Two ways: websockets or http api endpoints Build your own web application Many ways: e.g. angular, polymer, dart, etc 1 2
  19. 19. 19 Example: autoscience demo Purpose: - Quick exploration of data sets - No coding required - Visual analysis of outliers
  20. 20. 20
  21. 21. 21
  22. 22. 22
  23. 23. 23
  24. 24. 24 Jupyter Gateway: expose API endpoints Declare the endpoint Produce the JSON payload GET http://localhost:8800/cog/datasets/1
  25. 25. 25 Jupyter Gateway: consume the data Consume the JSON payload GET http://localhost:8800/cog/datasets/1 app.controller('datasetCtrl', function ($scope, $routeParams, $http) { var id= $routeParams.id; $http({ method: 'GET', url: '/cog/datasets/'+id }).then(function successCallback(response) { // this callback will be called asynchronously // when the response is available $scope.d = response.data }, function errorCallback(response) { // called asynchronously if an error occurs // or server returns response with an error status. }); });
  26. 26. 26 <div class="row"> <div class="col-md-9 offset-md-2"> <p class="small">{{d.ds.rows}} obs. of {{d.ds.cols}} variables <br/> NA rows:{{d.ds.na.rows}}, columns:{{d.ds.na.cols}}</p> </div> </div> ... <tr ng-repeat="v in d.vars"> <td><a href="#/ds/{{d.ds.id}}/variables/{{v.id}}">{{v.name}}</a></td> <td class="small">{{ v.sample.toString() }}</td> <td>{{v.type.vtype}}</td> <td>{{v.type.tcoerce}}</td> <td>{{v.type.unique}}</td> <td>{{v.type.nan}}</td> <td>{{v.type.valid}}</td> <td>{{v.type.quality}}</td> ... Jupyter Gateway: consume the data $scope.d Render the angular scope object
  27. 27. 27
  28. 28. 28 Jupyter: docker stacks Docker container: jupyter notebook + apache toree
  29. 29. 29 Dockerize your jupyter gateway api Add the jupyter gateway FROM jupyter/all-spark-notebook ... # add some extra packages ADD packages /srv/ RUN pip install -r /srv/packages # install the kernel gateway RUN pip install jupyter_kernel_gateway ENV JUPYTER_GATEWAY=1 # REST API is designed as notebooks ADD notebooks /srv/notebooks Add the notebook which powers the API
  30. 30. 30 Dockerize your jupyter gateway api IMAGE=autoscience/kernel_gateway docker build -t $(IMAGE) . docker run --rm -ti -p 8888:8888 $(IMAGE) jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 --KernelGatewayApp.port=8888 --KernelGatewayApp.api=notebook-http --KernelGatewayApp.seed_uri=/srv/notebooks/autoscience.ipynb
  31. 31. 31 Dockerize your jupyter gateway api ∅MQ Notebook files HTTP REST API Docker Containers
  32. 32. 32 Summary • Jupyter notebook is a great way to create and share data-driven uses cases and projects • Jupyter is more than notebooks – gateway, kernels, hub, etc • Narratives powered by jupyter – O’ Reilly Orioles – build your own: autoscience example
  33. 33. 33 Resources
  34. 34. 34 Linkedin and Twitter: @natbusa

×