A technical introduction to the ownR Enterprise Analytics Platform for R and Python. Focuses on features delivered to the R and Python users such as a comfortable deployment process with CI/CD to productionize models without IT involvement and on embedding models into the enterprise for wider audiences. Addresses questions in the Joel Test for Data Science.
Statistics notes ,it includes mean to index numbers
ownR platform extended technical introduction
1. 1ownR Enterprise R platform
ownR Enterprise Analytics Platform
for R and Python
A technical introduction
2. 2ownR Enterprise R platform
Motivation
1. Can new hires get set up in the environment to run analyses
on their first day?
2. Can data scientists utilize the latest tools/packages without
help from IT?
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions?
The “Joel Test” for Data Science
3. 3ownR Enterprise R platform
Motivation
5. Does collaboration happen through a system other than
email?
6. Can predictive models be deployed to production without
custom engineering or infrastructure work?
7. Is there a single place to search for past research and
reusable data sets, code, etc.?
8. Do your data scientists use the best tools money can buy?
Source: https://blog.dominodatalab.com/joel-test-data-science/
+1 Can you access your statistics without using R/Python?
The “Joel Test” for Data Science
4. 4ownR Enterprise R platform
Ideal Process
From code to production
SCM
Data scientist /
Developer:
• commits code
5. 5ownR Enterprise R platform
Ideal Process
From code to production
SCM Package
CI / CD
• Documentation
• Unit tests
• Build
6. 6ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
CI / CD
• Upload to
repository
7. 7ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
Prod
CI/CD
• Deploy the
application
8. 8ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
Scale the
application
Load Balancer
9. 9ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
Load Balancer
Excel or web
front-end
Scheduler Secure
Shiny app
REST API REST APINon-technical end
user’s access:
10. 10ownR Enterprise R platform
ownR repository
1. Multiple repositories
• Shared and Private repositories
• Local public repository mirrors
2. Share your applications with colleagues for reusability
3. Search public and internal packages
4. Dependency management across all repositories
5. Maintain multiple (historical) versions of the same package
The application repository of the ownR platform
11. 11ownR Enterprise R platform
ownR repository
1. Can new hires get set up in the environment to run analyses
on their first day?
2. Can data scientists utilize the latest tools/packages without
help from IT? ✅
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions?
The “Joel Test” for Data Science
12. 12ownR Enterprise R platform
ownR repository
5. Does collaboration happen through a system other than
email? ✅
6. Can predictive models be deployed to production without
custom engineering or infrastructure work?
7. Is there a single place to search for past research and
reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R/Python?
The “Joel Test” for Data Science
13. 13ownR Enterprise R platform
roveR
1. Preconfigured, separated R environments
2. Open-source R package
3. Linking R environments to the ownR repository
4. Document the dependencies in a DESCRIPTION file
5. Install dependencies with specific versions from the repository
Note: Python solves this with virtualenv, YAML and pip.
Open source R virtual environment
14. 14ownR Enterprise R platform
roveR + ownR repository
1. Can new hires get set up in the environment to run analyses
on their first day? ✅
2. Can data scientists utilize the latest tools/packages without
help from IT? ✅
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions? ✅
The “Joel Test” for Data Science
15. 15ownR Enterprise R platform
roveR + ownR repository
5. Does collaboration happen through a system other than
email? ✅
6. Can predictive models be deployed to production without
custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and
reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R/Python? ✅
The “Joel Test” for Data Science
16. 16ownR Enterprise R platform
ownR production applications
1. Access deployed R & Python applications via REST API
2. Allows other applications to integrate analytics calculations
developed in R or Python without porting to a third language
3. Also supporting Flask and Shiny apps for web applications
4. Scalable solution with load balancing
5. Secure access out-of-the-box
• Using API keys for API applications
• User authentication & authorization for web applications
Scalable secure R & Python applications
17. 17ownR Enterprise R platform
ownR platform
1. Can new hires get set up in the environment to run analyses
on their first day? ✅
2. Can data scientists utilize the latest tools/packages without
help from IT? ✅
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops? ✅
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions? ✅
The “Joel Test” for Data Science
18. 18ownR Enterprise R platform
ownR platform
5. Does collaboration happen through a system other than
email? ✅
6. Can predictive models be deployed to production without
custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and
reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy? ✅
+1 Can you access your statistics without using R/Python? ✅
The “Joel Test” for Data Science