SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Data Analytics using the Cloud
- Challenges and Opportunities for India
Introduction
AJAY OHRI
Author 1,2,3 Thinker 1,2
Founder, DECISIONSTATS
ohri2007@gmail.com http://linkedin.com/in/ajayohri
What comes next?
Data Analytics- Older Paradigms
Thoughts on Stats and Computer Science
Overview - Data Storage, Cloud Computing
Data Analytics
old (er) paradigms -
SAS and SPSS languages, ETL and DWs
newer paradigms -
R and Python, Scala and Hadoop
More machine learning, less classical stats
Is statistics lagging behind
computer science
Classical statistics- too few data
Big Data era- cost of throwing data is more
than cost of storing it
Machine learning - seems to be the flavor
Data Storage
older paradigms - RDBMS and Spreadsheets
structure and interactivity
new paradigms- NoSQL, Hadoop ,
cloud enabled spreadsheets
(?)
Cloud Computing- defined by NIST
http://www.nist.gov/itl/csd/cloud-102511.cfm
cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction
or
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
Data analytics using the cloud   challenges and opportunities for india
Data analytics using the cloud   challenges and opportunities for india
Data analytics using the cloud   challenges and opportunities for india
Service Models for Cloud Computing
SaaS- Software as a service
IaaS - Infrastructure as a service
PaaS-Platform as a service
Service Models for Cloud Computing
IaaS - Infrastructure as a service
http://media.amazonwebservices.com/IDC_Business_Value_of_AWS_Accelerates_Over_time.pdf
http://www.gartner.com/technology/reprints.do?id=1-1IMDMZ5&ct=130819&st=sb
Service Models for Cloud Computing
PaaS - Platform as a service
http://www.gartner.com/technology/research/cloud-computing/report/paas-cloud.jsp
http://www.forrester.com/search?N=20033+10001&sort=3&everything=true&source=browse&
Service Models for Cloud Computing
SaaS - Software as a service
http://www.forrester.com/Software--as--a--Service-%28SaaS%29
http://www.gartner.com/newsroom/id/1963815
http://www.forbes.com/sites/louiscolumbus/2013/02/19/gartner-predicts-infrastructure-services-will-accelerate-cloud-
computing-growth/
http://my.gartner.com/portal/server.pt?
open=512&objID=202&&PageID=5553&mode=2&in_hi_userid=2&cached=true&resId=2332215&ref=AnalystProfile
http://www.gartner.com/it-glossary/software-as-a-service-saas/
Deployment Models for Cloud
Computing
Private-
Community-
Public-
Hybrid-
Data Analytics (traditional) -Porter’s
Model
Threat of Mobility- Low (Lockin)
Industry Rivalry- Medium (Many)
Supplier Power- High(S/w, H/W)
Buyer Power- Medium
Substitutes- Low (Not many
alternatives to SAS, SPSS)
Data Analytics (cloud based) -Porter’
s Model
Threat of Mobility- High (Easy switch
as data and analytics is cloud based)
Industry Rivalry- High( Global providers)
Supplier Power- Low (open source
,free , GPL)
Buyer Power -High (lots of options
outsource, insource,crowd source)
Substitutes- High (lots of options
Python, R , Julia etc)
Data Analytics in India - Porter’s
Diamond Model
Chance- Favorable supply of engineers
, Mature outsource and service industry
, Rapid growth domestically
Factor Conditions- Good Service Industry
Firm Strategy- relative lack of ecosystem
hampers analytics entrepreneurs
Demand Conditions- High
Government- Little or No interference
India in traditional Data Analytics
Strengths Weakness
reliable pool of experienced engineering
talent
inability or unwillingness to invest in huge
upfront capex for hardware and software for
analytics
Opportunities Threats
ability to navigate upstream based on cost based arbitrage than skill
based value addition thus vulnerable to
competition
India in Cloud Based Data Analytics
Strengths Weakness
experienced service industry with huge pool
of trained engineering and analytical talent
lack of deep domain depth
relative lack of ecosystem for cutting edge
analytics entrepreneurship
slow to embrace open source
Opportunities Threats
no more capital expenditure needed in
software and hardware
virtualization offers secure delivery from
any location
risk management needs to be more mature
lack of data privacy regulations
Biggest Challenge to using Cloud
Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
Biggest Challenge to using Cloud ==NSA?
Biggest Challenge to using Cloud
Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
Unfortunately the USA Govt taps the information for both security as well as economic advantages
Unfortunately American Companies seek and get economic advantages for such cooperation
Unfortunately in the age of cyber war and the biggest proponent across the border, we have no critical infrastructure as a service for economic
players
In the future, you wont need United Nations to sanction countries. You just switch off their internet and their economy will shut off.
Foreign digital infrastructure can be used to infiltrate Stuxnet like viruses in the domestic supply chain?
India may be self reliant in agriculture and semi reliant in manufacturing arms, but we are totally dependent on new generation and even
current generation computing
Biggest Opportunities to using
Cloud
Build our critical digital grid using local companies - POSSIBLE
Build our next generation of cyber warriors and cyber farmers - VERY POSSIBLE
Teach more distributed computing earlier ;)
Regulation like EU to ensure Indian Citizen Data stays within Indian State’s administrative boundaries and within reach of Indian legal system
Compare ADHAAR Card with information in emails, social networks, on the personal computer ??
Better regulation - POSSIBLE OR NOT POSSIBLE ---DEPENDS ON ELECTIONS ?
Moving onto Cloud Based Data
Analytics
Open Source analytics like Python and R
Support Distributed Computing
Memory is no problem now ( especially for R)
on the cloud
Existing Data Analytics in India
Lots of Analytics Outsourcing
Both SAS and SPSS are present
Open Source Analytics on the rise but still
palpable lack of awareness
Data - ETL- Data WareHouse- SQL Query-
Stats Software MINDSET
Existing Data Analytics in India
Cloud Computing Explicitly uses Linux for
Efficiency
Your Windows CERTIFICATIONS can hinder
your IT Department’s mindset on the cloud
Data Science requires cross functional learning
Developments in Stats Software
A New Hope - Julia, Pandas
http://julialang.org/
http://pandas.pydata.org/
The Empire Strikes Back - SAS
http://www.sas.com/en_us/software/cloud.html
https://www.sas.com/en_us/software/sas-hadoop.html
Return of the Jedi
http://www.r-bloggers.com/
a few Developments in Analytics
Revolution R on the cloud (AWS)
www.revolutionanalytics.com/RRE-AWS
SAS on the cloud
http://blogs.sas.com/content/sascom/2013/04/29/start-planning-now-for-sas-9-4/
http://www.allanalytics.com/author.asp?section_id=1411&doc_id=262924
Apache Spark and R
http://amplab-extras.github.io/SparkR-pkg/
a few Developments on the Cloud
Amazon http://aws.amazon.com/
Google https://cloud.google.com/products/
IBM http://www.ibm.com/cloud-computing/in/en/
Oracle https://cloud.oracle.com/java
a few Developments in R
RHadoop Project
https://github.com/RevolutionAnalytics/RHadoop/wiki
OpenCPU Project
https://www.opencpu.org/
rOpenSci Project
http://blog.programmableweb.com/2013/03/20/pw-interview-karthik-ram-ropensci-wrapping-all-science-apis/
The future of Open Cloud
R + Python on OpenStack ?
There is a fair degree that Apache Hadoop related projects like Shark / Spark
would be there and We need a Hadoop Based Data Warehouse Solutions(?)
We need to hedge for US Policy Interference
Education and developer ecosystems have to keep pace
Thank You

Mais conteúdo relacionado

Mais procurados

Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 

Mais procurados (20)

Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 

Semelhante a Data analytics using the cloud challenges and opportunities for india

Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014Kenneth Igiri
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 

Semelhante a Data analytics using the cloud challenges and opportunities for india (20)

Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Cloudant
CloudantCloudant
Cloudant
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 

Mais de Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 

Mais de Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 

Último

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 

Último (20)

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 

Data analytics using the cloud challenges and opportunities for india

  • 1. Data Analytics using the Cloud - Challenges and Opportunities for India
  • 2. Introduction AJAY OHRI Author 1,2,3 Thinker 1,2 Founder, DECISIONSTATS ohri2007@gmail.com http://linkedin.com/in/ajayohri
  • 3. What comes next? Data Analytics- Older Paradigms Thoughts on Stats and Computer Science Overview - Data Storage, Cloud Computing
  • 4. Data Analytics old (er) paradigms - SAS and SPSS languages, ETL and DWs newer paradigms - R and Python, Scala and Hadoop More machine learning, less classical stats
  • 5. Is statistics lagging behind computer science Classical statistics- too few data Big Data era- cost of throwing data is more than cost of storing it Machine learning - seems to be the flavor
  • 6. Data Storage older paradigms - RDBMS and Spreadsheets structure and interactivity new paradigms- NoSQL, Hadoop , cloud enabled spreadsheets (?)
  • 7. Cloud Computing- defined by NIST http://www.nist.gov/itl/csd/cloud-102511.cfm cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction or http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
  • 11. Service Models for Cloud Computing SaaS- Software as a service IaaS - Infrastructure as a service PaaS-Platform as a service
  • 12. Service Models for Cloud Computing IaaS - Infrastructure as a service http://media.amazonwebservices.com/IDC_Business_Value_of_AWS_Accelerates_Over_time.pdf http://www.gartner.com/technology/reprints.do?id=1-1IMDMZ5&ct=130819&st=sb
  • 13. Service Models for Cloud Computing PaaS - Platform as a service http://www.gartner.com/technology/research/cloud-computing/report/paas-cloud.jsp http://www.forrester.com/search?N=20033+10001&sort=3&everything=true&source=browse&
  • 14. Service Models for Cloud Computing SaaS - Software as a service http://www.forrester.com/Software--as--a--Service-%28SaaS%29 http://www.gartner.com/newsroom/id/1963815 http://www.forbes.com/sites/louiscolumbus/2013/02/19/gartner-predicts-infrastructure-services-will-accelerate-cloud- computing-growth/ http://my.gartner.com/portal/server.pt? open=512&objID=202&&PageID=5553&mode=2&in_hi_userid=2&cached=true&resId=2332215&ref=AnalystProfile http://www.gartner.com/it-glossary/software-as-a-service-saas/
  • 15. Deployment Models for Cloud Computing Private- Community- Public- Hybrid-
  • 16. Data Analytics (traditional) -Porter’s Model Threat of Mobility- Low (Lockin) Industry Rivalry- Medium (Many) Supplier Power- High(S/w, H/W) Buyer Power- Medium Substitutes- Low (Not many alternatives to SAS, SPSS)
  • 17. Data Analytics (cloud based) -Porter’ s Model Threat of Mobility- High (Easy switch as data and analytics is cloud based) Industry Rivalry- High( Global providers) Supplier Power- Low (open source ,free , GPL) Buyer Power -High (lots of options outsource, insource,crowd source) Substitutes- High (lots of options Python, R , Julia etc)
  • 18. Data Analytics in India - Porter’s Diamond Model Chance- Favorable supply of engineers , Mature outsource and service industry , Rapid growth domestically Factor Conditions- Good Service Industry Firm Strategy- relative lack of ecosystem hampers analytics entrepreneurs Demand Conditions- High Government- Little or No interference
  • 19. India in traditional Data Analytics Strengths Weakness reliable pool of experienced engineering talent inability or unwillingness to invest in huge upfront capex for hardware and software for analytics Opportunities Threats ability to navigate upstream based on cost based arbitrage than skill based value addition thus vulnerable to competition
  • 20. India in Cloud Based Data Analytics Strengths Weakness experienced service industry with huge pool of trained engineering and analytical talent lack of deep domain depth relative lack of ecosystem for cutting edge analytics entrepreneurship slow to embrace open source Opportunities Threats no more capital expenditure needed in software and hardware virtualization offers secure delivery from any location risk management needs to be more mature lack of data privacy regulations
  • 21. Biggest Challenge to using Cloud Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors Most of the cloud infrastructure is based out of United States of America
  • 22. Biggest Challenge to using Cloud ==NSA?
  • 23. Biggest Challenge to using Cloud Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors Most of the cloud infrastructure is based out of United States of America Unfortunately the USA Govt taps the information for both security as well as economic advantages Unfortunately American Companies seek and get economic advantages for such cooperation Unfortunately in the age of cyber war and the biggest proponent across the border, we have no critical infrastructure as a service for economic players In the future, you wont need United Nations to sanction countries. You just switch off their internet and their economy will shut off. Foreign digital infrastructure can be used to infiltrate Stuxnet like viruses in the domestic supply chain? India may be self reliant in agriculture and semi reliant in manufacturing arms, but we are totally dependent on new generation and even current generation computing
  • 24. Biggest Opportunities to using Cloud Build our critical digital grid using local companies - POSSIBLE Build our next generation of cyber warriors and cyber farmers - VERY POSSIBLE Teach more distributed computing earlier ;) Regulation like EU to ensure Indian Citizen Data stays within Indian State’s administrative boundaries and within reach of Indian legal system Compare ADHAAR Card with information in emails, social networks, on the personal computer ?? Better regulation - POSSIBLE OR NOT POSSIBLE ---DEPENDS ON ELECTIONS ?
  • 25. Moving onto Cloud Based Data Analytics Open Source analytics like Python and R Support Distributed Computing Memory is no problem now ( especially for R) on the cloud
  • 26. Existing Data Analytics in India Lots of Analytics Outsourcing Both SAS and SPSS are present Open Source Analytics on the rise but still palpable lack of awareness Data - ETL- Data WareHouse- SQL Query- Stats Software MINDSET
  • 27. Existing Data Analytics in India Cloud Computing Explicitly uses Linux for Efficiency Your Windows CERTIFICATIONS can hinder your IT Department’s mindset on the cloud Data Science requires cross functional learning
  • 28. Developments in Stats Software A New Hope - Julia, Pandas http://julialang.org/ http://pandas.pydata.org/ The Empire Strikes Back - SAS http://www.sas.com/en_us/software/cloud.html https://www.sas.com/en_us/software/sas-hadoop.html Return of the Jedi http://www.r-bloggers.com/
  • 29. a few Developments in Analytics Revolution R on the cloud (AWS) www.revolutionanalytics.com/RRE-AWS SAS on the cloud http://blogs.sas.com/content/sascom/2013/04/29/start-planning-now-for-sas-9-4/ http://www.allanalytics.com/author.asp?section_id=1411&doc_id=262924 Apache Spark and R http://amplab-extras.github.io/SparkR-pkg/
  • 30. a few Developments on the Cloud Amazon http://aws.amazon.com/ Google https://cloud.google.com/products/ IBM http://www.ibm.com/cloud-computing/in/en/ Oracle https://cloud.oracle.com/java
  • 31. a few Developments in R RHadoop Project https://github.com/RevolutionAnalytics/RHadoop/wiki OpenCPU Project https://www.opencpu.org/ rOpenSci Project http://blog.programmableweb.com/2013/03/20/pw-interview-karthik-ram-ropensci-wrapping-all-science-apis/
  • 32. The future of Open Cloud R + Python on OpenStack ? There is a fair degree that Apache Hadoop related projects like Shark / Spark would be there and We need a Hadoop Based Data Warehouse Solutions(?) We need to hedge for US Policy Interference Education and developer ecosystems have to keep pace