SlideShare uma empresa Scribd logo
1 de 32
Data Science in
Ruby? Is it possible?
Is it Fast? Should we
use it?
• Rodrigo Urubatan
• rodrigo@urubatan.dev
• http://urubatan.dev
• http://twitter.com/urubatan
Anyone here work
with Data Science?
• Data Scientist?
• Data Engineer?
• Developers of application that uses Data?
• Statisticians?
What exactly
is Data
Science?
The process of extracting meaning from and interpret
data
The usage of statistics and machine learning to clean
and manipulate data
The usage of computer software to collect, clean,
manipulate and interpret data
A cool name for the combination of Data Mining and
Business Intelligence (other buzz words that were used
for a long time for exactly what we call Data Science
today, but with more expensive tool sets)
Can Ruby do Data
Science?
Can Ruby do
Data Science?
(Long Answer)
INTEGRATION WITH
OTHER TOOLS
DATA
MANIPULATION
DISTRIBUTED
COMPUTING
DATA STRUCTURES
DATA SETS STATISTICS VISUALIZATION INTERACTIVE
COMPUTING
Interactive
Computing
iruby — Ruby kernel
for Jupyter.
iruby-rails —
Integration library for
IRuby and Rails.
Standing on
the shoulders
of giants
(integration)
pycall — Bridge into
the Python world.
rserve-client — Ruby
connector for Rserve,
R's binary server.
Data
manipulation
kiba — lightweight Ruby ETL
(Extract-Transform-Load)
framework.
jongleur — Workflow
manager using DAG
definitions to execute ETL
tasks
Distributed
Computing
ruby-spark — Ruby
Interface to Apache
Spark 1.x.x.
jruby-spark — JRuby
based bindings
for Apache Spark.
Data
Structures
daru — Data Frame and Vector
structures with comprehensive
manipulating and visualization methods.
numo-narray — n-dimensional
Numerical Array for Ruby.
nmatrix — dense and sparse linear
algebra library for Ruby via SciRuby.
Data Sets
rdatasets — Data sets
available in R via Rdatasets.
red-datasets — Growing
collection of publicly
available data sets such as
CIFAR-10, Iris, MNIST etc
Statistics
rb-gsl — Ruby interface to the GNU
Scientific Library. [dep: GLS]
simple_stats — Enumerable patches
for descriptive statistics.
enumerable-statistics — fast
implementation of descriptive
statistics for the Enumerable module.
Visualization
• matplotlib — Ruby based wrapper
around matplotlib. [dep: matplotlib]
• mathematical — PNG and MathML
renderings for your equations.
• daru-view — daru-view is
interactive plotting gem for web
application (any Ruby web
application framework like
Rails/Sinatra/Nanoc/Hanami) &
IRuby notebook. It is a plugin gem
for daru.
• daru-plotly — Plotly based
visualization for Daru.
The 3 Major
Ruby Data
Science
Projects
SciRuby project
Nmatrix Centric gems
Nmatrix
Daru
GnuplotRB
Stas_sample
Ruby Numo project
Numo:: NArray centric Gems
Numo:: NArray
Numo:: FFTE
Numo:: FFTW
Numo::Gnuplot
RedDataTools project
Apache Arrow centric gems
RedArrow
RedChainer
RedArrowGSL
RedArrowNMatrix
RedArrowNumoNArray
Doing data science in
Ruby is Hard!
Ruby
X
Python
Ruby
Daru
NMatrix/NArray
Python
Pandas
Numpy
Simple number operation with numpy
“Same”number operation with NMatrix
Simple DataFrame operation with Pandas
“Same” DataFrame operation with Daru
Ruby and Ruby on Rails are
way better to write business
web applications!
We can even do
really good Machine
Learning with Ruby
(but that is subject
for another
presentation)
And my objective is to
help ruby developers to use
the best tools for each job so
they can solve hard
problems, with less bugs and
have more free time.
pycall to the
rescue
pycall lets you use Python libraries from
your ruby code very naturally, as if you
were calling a Ruby library
pycall consists of one ruby binding
library for libpython.so and an Object-
oriented protocol for communication
between Ruby and Python
Simple pycall
code
Ok, so what
are the best
work
patterns?
Python is way better than Ruby for
Data Science
Ruby is better for web business
applications
Best patterns for integration are
(IMHO)
• Pointing both applications to the same
database
• Exchanging data through JSON or some similar
serialization
• Calling Python directly through pycall
References
• Ruby Conf 2017 – Using Ruby in Data Science by Kenta Murata (@mrkn)
• Big Data analysis in Ruby
• Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst)
• Progress of Ruby/Numo: Numerical Computing for Ruby
• SciRuby
• Ruby::Numo
• Ruby Machine Learning resources
• Ruby Data Science Resources
• PyCall
Any questions? Talk to
me!
• @urubatan
• https://urubatan.dev
• rodrigo@urubatan.dev
Other Data
Structure
Libraries
• spreadsheet — manipulation library for MS
Excel spreadsheets
• mdarray — Array structure for Jruby
• cumo — CUDA-aware numerical Array library
with NArray similar interface.
Other statistics libraries
statsample — basic and advanced statistics for Ruby. [dep: GLS]
statsample-glm — extension of statsample by Generalized Linear Models.
statsample-bivariate-extension — extension of statsample by Bivariate Correlations.
statsample-timeseries — extension of statsample by Time Series estimators.
pca — Principal Component Analysis (PCA) in Ruby.
descriptive-statistics — descriptive extensions for the Enumerable module or standalone usage.
distribution — probabilistic distributions and descriptive measures for them.
statistics2 — Normal, Chi-square, t- and F- probability distributions for Ruby.
General
Format IO
• https://github.com/fiksu/rcsv
• ox — Optimized for speed XML parser and
object marshaller.
• oj — High-speed JSON parser.
• Markdown
• Nokogiri
• CSV
Database
Adapters
• pg
• Mongo
• MySQL

Mais conteúdo relacionado

Mais procurados

Combinational circuit and Sequential circuit
Combinational circuit and Sequential circuitCombinational circuit and Sequential circuit
Combinational circuit and Sequential circuitPoornima Santhosh
 
Basic Concepts of Networking
Basic Concepts of NetworkingBasic Concepts of Networking
Basic Concepts of NetworkingVivin NL
 
Chap ii.BCD code,Gray code
Chap ii.BCD code,Gray codeChap ii.BCD code,Gray code
Chap ii.BCD code,Gray codeBala Ganesh
 
Error detection & correction codes
Error detection & correction codesError detection & correction codes
Error detection & correction codesRevathi Subramaniam
 
Matlab: Procedures And Functions
Matlab: Procedures And FunctionsMatlab: Procedures And Functions
Matlab: Procedures And Functionsmatlab Content
 
COMPUTER NETWORKING
COMPUTER NETWORKINGCOMPUTER NETWORKING
COMPUTER NETWORKINGKiran Buriro
 
Network Fundamentals: Ch6 - Addressing the Network IP v4
Network Fundamentals: Ch6 - Addressing the Network IP v4Network Fundamentals: Ch6 - Addressing the Network IP v4
Network Fundamentals: Ch6 - Addressing the Network IP v4Abdelkhalik Mosa
 
Microelectronic circuits and devices: chapter one
Microelectronic circuits and devices: chapter oneMicroelectronic circuits and devices: chapter one
Microelectronic circuits and devices: chapter onemuhabaw amare
 

Mais procurados (12)

Logic families
Logic  familiesLogic  families
Logic families
 
Combinational circuit and Sequential circuit
Combinational circuit and Sequential circuitCombinational circuit and Sequential circuit
Combinational circuit and Sequential circuit
 
3 capa de red
3 capa de red3 capa de red
3 capa de red
 
Basic Concepts of Networking
Basic Concepts of NetworkingBasic Concepts of Networking
Basic Concepts of Networking
 
Chap ii.BCD code,Gray code
Chap ii.BCD code,Gray codeChap ii.BCD code,Gray code
Chap ii.BCD code,Gray code
 
Iot module2
Iot module2Iot module2
Iot module2
 
Error detection & correction codes
Error detection & correction codesError detection & correction codes
Error detection & correction codes
 
Matlab: Procedures And Functions
Matlab: Procedures And FunctionsMatlab: Procedures And Functions
Matlab: Procedures And Functions
 
COMPUTER NETWORKING
COMPUTER NETWORKINGCOMPUTER NETWORKING
COMPUTER NETWORKING
 
Network Fundamentals: Ch6 - Addressing the Network IP v4
Network Fundamentals: Ch6 - Addressing the Network IP v4Network Fundamentals: Ch6 - Addressing the Network IP v4
Network Fundamentals: Ch6 - Addressing the Network IP v4
 
Microelectronic circuits and devices: chapter one
Microelectronic circuits and devices: chapter oneMicroelectronic circuits and devices: chapter one
Microelectronic circuits and devices: chapter one
 
Network layer logical addressing
Network layer logical addressingNetwork layer logical addressing
Network layer logical addressing
 

Semelhante a Data science in ruby is it possible? is it fast? should we use it?

Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Rodrigo Urubatan
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"Portland R User Group
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpcDr Reeja S R
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 

Semelhante a Data science in ruby is it possible? is it fast? should we use it? (20)

Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Session 2
Session 2Session 2
Session 2
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 

Mais de Rodrigo Urubatan

2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...Rodrigo Urubatan
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...Rodrigo Urubatan
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...Rodrigo Urubatan
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d frameworkRodrigo Urubatan
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuídaRodrigo Urubatan
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRodrigo Urubatan
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddRodrigo Urubatan
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotoRodrigo Urubatan
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problemsRodrigo Urubatan
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JSRodrigo Urubatan
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Rodrigo Urubatan
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JSRodrigo Urubatan
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextRodrigo Urubatan
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores javaRodrigo Urubatan
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPRodrigo Urubatan
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu clienteRodrigo Urubatan
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Rodrigo Urubatan
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time agoRodrigo Urubatan
 

Mais de Rodrigo Urubatan (20)

Ruby code smells
Ruby code smellsRuby code smells
Ruby code smells
 
2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d framework
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuída
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDD
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bdd
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remoto
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problems
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores java
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HP
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
 
Mini curso rails 3
Mini curso rails 3Mini curso rails 3
Mini curso rails 3
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time ago
 

Último

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Data science in ruby is it possible? is it fast? should we use it?

  • 1. Data Science in Ruby? Is it possible? Is it Fast? Should we use it? • Rodrigo Urubatan • rodrigo@urubatan.dev • http://urubatan.dev • http://twitter.com/urubatan
  • 2. Anyone here work with Data Science? • Data Scientist? • Data Engineer? • Developers of application that uses Data? • Statisticians?
  • 3. What exactly is Data Science? The process of extracting meaning from and interpret data The usage of statistics and machine learning to clean and manipulate data The usage of computer software to collect, clean, manipulate and interpret data A cool name for the combination of Data Mining and Business Intelligence (other buzz words that were used for a long time for exactly what we call Data Science today, but with more expensive tool sets)
  • 4. Can Ruby do Data Science?
  • 5. Can Ruby do Data Science? (Long Answer) INTEGRATION WITH OTHER TOOLS DATA MANIPULATION DISTRIBUTED COMPUTING DATA STRUCTURES DATA SETS STATISTICS VISUALIZATION INTERACTIVE COMPUTING
  • 6. Interactive Computing iruby — Ruby kernel for Jupyter. iruby-rails — Integration library for IRuby and Rails.
  • 7. Standing on the shoulders of giants (integration) pycall — Bridge into the Python world. rserve-client — Ruby connector for Rserve, R's binary server.
  • 8. Data manipulation kiba — lightweight Ruby ETL (Extract-Transform-Load) framework. jongleur — Workflow manager using DAG definitions to execute ETL tasks
  • 9. Distributed Computing ruby-spark — Ruby Interface to Apache Spark 1.x.x. jruby-spark — JRuby based bindings for Apache Spark.
  • 10. Data Structures daru — Data Frame and Vector structures with comprehensive manipulating and visualization methods. numo-narray — n-dimensional Numerical Array for Ruby. nmatrix — dense and sparse linear algebra library for Ruby via SciRuby.
  • 11. Data Sets rdatasets — Data sets available in R via Rdatasets. red-datasets — Growing collection of publicly available data sets such as CIFAR-10, Iris, MNIST etc
  • 12. Statistics rb-gsl — Ruby interface to the GNU Scientific Library. [dep: GLS] simple_stats — Enumerable patches for descriptive statistics. enumerable-statistics — fast implementation of descriptive statistics for the Enumerable module.
  • 13. Visualization • matplotlib — Ruby based wrapper around matplotlib. [dep: matplotlib] • mathematical — PNG and MathML renderings for your equations. • daru-view — daru-view is interactive plotting gem for web application (any Ruby web application framework like Rails/Sinatra/Nanoc/Hanami) & IRuby notebook. It is a plugin gem for daru. • daru-plotly — Plotly based visualization for Daru.
  • 14. The 3 Major Ruby Data Science Projects SciRuby project Nmatrix Centric gems Nmatrix Daru GnuplotRB Stas_sample Ruby Numo project Numo:: NArray centric Gems Numo:: NArray Numo:: FFTE Numo:: FFTW Numo::Gnuplot RedDataTools project Apache Arrow centric gems RedArrow RedChainer RedArrowGSL RedArrowNMatrix RedArrowNumoNArray
  • 15. Doing data science in Ruby is Hard!
  • 21. Ruby and Ruby on Rails are way better to write business web applications!
  • 22. We can even do really good Machine Learning with Ruby (but that is subject for another presentation)
  • 23. And my objective is to help ruby developers to use the best tools for each job so they can solve hard problems, with less bugs and have more free time.
  • 24. pycall to the rescue pycall lets you use Python libraries from your ruby code very naturally, as if you were calling a Ruby library pycall consists of one ruby binding library for libpython.so and an Object- oriented protocol for communication between Ruby and Python
  • 26. Ok, so what are the best work patterns? Python is way better than Ruby for Data Science Ruby is better for web business applications Best patterns for integration are (IMHO) • Pointing both applications to the same database • Exchanging data through JSON or some similar serialization • Calling Python directly through pycall
  • 27. References • Ruby Conf 2017 – Using Ruby in Data Science by Kenta Murata (@mrkn) • Big Data analysis in Ruby • Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst) • Progress of Ruby/Numo: Numerical Computing for Ruby • SciRuby • Ruby::Numo • Ruby Machine Learning resources • Ruby Data Science Resources • PyCall
  • 28. Any questions? Talk to me! • @urubatan • https://urubatan.dev • rodrigo@urubatan.dev
  • 29. Other Data Structure Libraries • spreadsheet — manipulation library for MS Excel spreadsheets • mdarray — Array structure for Jruby • cumo — CUDA-aware numerical Array library with NArray similar interface.
  • 30. Other statistics libraries statsample — basic and advanced statistics for Ruby. [dep: GLS] statsample-glm — extension of statsample by Generalized Linear Models. statsample-bivariate-extension — extension of statsample by Bivariate Correlations. statsample-timeseries — extension of statsample by Time Series estimators. pca — Principal Component Analysis (PCA) in Ruby. descriptive-statistics — descriptive extensions for the Enumerable module or standalone usage. distribution — probabilistic distributions and descriptive measures for them. statistics2 — Normal, Chi-square, t- and F- probability distributions for Ruby.
  • 31. General Format IO • https://github.com/fiksu/rcsv • ox — Optimized for speed XML parser and object marshaller. • oj — High-speed JSON parser. • Markdown • Nokogiri • CSV

Notas do Editor

  1. 53s
  2. Try to interact with the audience 41s (1:34)
  3. Quick comment of what is data science 1:44s (3:15)
  4. Quick answer: Yes, but let's dive a little into that, since you can do everything, but the answer to if you should deppends on what you want to do 43s (3:58)
  5. 1:53s (5:51) There are lots of data science libraries for Ruby, for statiscics, data manipulation, data visualization, for integration with python and R, distributed computing, data visualization, machine learning, it appears we have everything we need! But not everything is as great as it seems, lets check some of the options in depth.
  6. 45s (6:36)
  7. 38s (7:14)
  8. 1:14 (8:28)
  9. 44s (10:12)
  10. 25s (10:37)
  11. 28s (11:05)
  12. 1:25 (12:30)
  13. 1:45 (14:15)
  14. 4:48 (19:03) SciRuby Drawbacks: - Nmatrix is slow for large ammounts of data (there is a bug open for that) - Daru has less functionality than Pandas for practical DS work - There is a lot less documentation Benefits: - You only need Ruby Nmatrix supports in-memory sparse matrices - You can use Data frames with Daru Data frames are the basic data structure to manipulate and visualize living data in data science a 2D table data structure like a SQL Table Ruby Numo Benefits: You need only Ruby Numo::Narray is faster than Nmatrix and pure ruby Drawbacks No sparce matrices suport No data frame support Even less documented In Summary for Data Science SciRuby is better because it has Daru, for scientific computing is better because Nmatrix is too slow But I didn’t forget about RedDataTools It supports Apache Arrow and the core developer Kohoei Suto is also a member of Apache Arrow PMC But it is too young to use in production, and right now it only supports Data I/O, manipulation is not supported
  15. 10s (19:13)
  16. 54s (20:07) The most used libraries for data cleaning and transformation in Python are Pandas and Numpy, and we have the corresponding Daru and NMatrix/Narray, but there are some problems, for starters, the documentation of the ruby versions is ages behind the Python libraries, mainly because there are a lot less users. Also Daru has less features than Pandas NMatrix gets slow for big ammounts of data Narray is lots faster but not compatible with Daru but things are improving
  17. 50s (20:57)
  18. 1:36s (21:33)
  19. 31s (22:04)
  20. 51s (22:55)
  21. 20s (23:15)
  22. 10s (23:25)
  23. 15s (23:40)
  24. 1:13 (24:53)
  25. 1:08s (26:01) Pycal can work with most python libraries, but to make our lifes easier, it already has wrapers for numpy, pandas, matplotlib, seaborn, scikit-learn, tensorflow, and even wraping python libraries it is a lot faster than using the native Ruby libraries (thanks Kenta Murata for this great project)
  26. 1:45s (27:46)
  27. 1:11 (28:57)
  28. 40s