3. What comes next?
Data Analytics- Older Paradigms
Thoughts on Stats and Computer Science
Overview - Data Storage, Cloud Computing
4. Data Analytics
old (er) paradigms -
SAS and SPSS languages, ETL and DWs
newer paradigms -
R and Python, Scala and Hadoop
More machine learning, less classical stats
5. Is statistics lagging behind
computer science
Classical statistics- too few data
Big Data era- cost of throwing data is more
than cost of storing it
Machine learning - seems to be the flavor
6. Data Storage
older paradigms - RDBMS and Spreadsheets
structure and interactivity
new paradigms- NoSQL, Hadoop ,
cloud enabled spreadsheets
(?)
7. Cloud Computing- defined by NIST
http://www.nist.gov/itl/csd/cloud-102511.cfm
cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction
or
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
11. Service Models for Cloud Computing
SaaS- Software as a service
IaaS - Infrastructure as a service
PaaS-Platform as a service
12. Service Models for Cloud Computing
IaaS - Infrastructure as a service
http://media.amazonwebservices.com/IDC_Business_Value_of_AWS_Accelerates_Over_time.pdf
http://www.gartner.com/technology/reprints.do?id=1-1IMDMZ5&ct=130819&st=sb
13. Service Models for Cloud Computing
PaaS - Platform as a service
http://www.gartner.com/technology/research/cloud-computing/report/paas-cloud.jsp
http://www.forrester.com/search?N=20033+10001&sort=3&everything=true&source=browse&
14. Service Models for Cloud Computing
SaaS - Software as a service
http://www.forrester.com/Software--as--a--Service-%28SaaS%29
http://www.gartner.com/newsroom/id/1963815
http://www.forbes.com/sites/louiscolumbus/2013/02/19/gartner-predicts-infrastructure-services-will-accelerate-cloud-
computing-growth/
http://my.gartner.com/portal/server.pt?
open=512&objID=202&&PageID=5553&mode=2&in_hi_userid=2&cached=true&resId=2332215&ref=AnalystProfile
http://www.gartner.com/it-glossary/software-as-a-service-saas/
16. Data Analytics (traditional) -Porter’s
Model
Threat of Mobility- Low (Lockin)
Industry Rivalry- Medium (Many)
Supplier Power- High(S/w, H/W)
Buyer Power- Medium
Substitutes- Low (Not many
alternatives to SAS, SPSS)
17. Data Analytics (cloud based) -Porter’
s Model
Threat of Mobility- High (Easy switch
as data and analytics is cloud based)
Industry Rivalry- High( Global providers)
Supplier Power- Low (open source
,free , GPL)
Buyer Power -High (lots of options
outsource, insource,crowd source)
Substitutes- High (lots of options
Python, R , Julia etc)
18. Data Analytics in India - Porter’s
Diamond Model
Chance- Favorable supply of engineers
, Mature outsource and service industry
, Rapid growth domestically
Factor Conditions- Good Service Industry
Firm Strategy- relative lack of ecosystem
hampers analytics entrepreneurs
Demand Conditions- High
Government- Little or No interference
19. India in traditional Data Analytics
Strengths Weakness
reliable pool of experienced engineering
talent
inability or unwillingness to invest in huge
upfront capex for hardware and software for
analytics
Opportunities Threats
ability to navigate upstream based on cost based arbitrage than skill
based value addition thus vulnerable to
competition
20. India in Cloud Based Data Analytics
Strengths Weakness
experienced service industry with huge pool
of trained engineering and analytical talent
lack of deep domain depth
relative lack of ecosystem for cutting edge
analytics entrepreneurship
slow to embrace open source
Opportunities Threats
no more capital expenditure needed in
software and hardware
virtualization offers secure delivery from
any location
risk management needs to be more mature
lack of data privacy regulations
21. Biggest Challenge to using Cloud
Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
23. Biggest Challenge to using Cloud
Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
Unfortunately the USA Govt taps the information for both security as well as economic advantages
Unfortunately American Companies seek and get economic advantages for such cooperation
Unfortunately in the age of cyber war and the biggest proponent across the border, we have no critical infrastructure as a service for economic
players
In the future, you wont need United Nations to sanction countries. You just switch off their internet and their economy will shut off.
Foreign digital infrastructure can be used to infiltrate Stuxnet like viruses in the domestic supply chain?
India may be self reliant in agriculture and semi reliant in manufacturing arms, but we are totally dependent on new generation and even
current generation computing
24. Biggest Opportunities to using
Cloud
Build our critical digital grid using local companies - POSSIBLE
Build our next generation of cyber warriors and cyber farmers - VERY POSSIBLE
Teach more distributed computing earlier ;)
Regulation like EU to ensure Indian Citizen Data stays within Indian State’s administrative boundaries and within reach of Indian legal system
Compare ADHAAR Card with information in emails, social networks, on the personal computer ??
Better regulation - POSSIBLE OR NOT POSSIBLE ---DEPENDS ON ELECTIONS ?
25. Moving onto Cloud Based Data
Analytics
Open Source analytics like Python and R
Support Distributed Computing
Memory is no problem now ( especially for R)
on the cloud
26. Existing Data Analytics in India
Lots of Analytics Outsourcing
Both SAS and SPSS are present
Open Source Analytics on the rise but still
palpable lack of awareness
Data - ETL- Data WareHouse- SQL Query-
Stats Software MINDSET
27. Existing Data Analytics in India
Cloud Computing Explicitly uses Linux for
Efficiency
Your Windows CERTIFICATIONS can hinder
your IT Department’s mindset on the cloud
Data Science requires cross functional learning
28. Developments in Stats Software
A New Hope - Julia, Pandas
http://julialang.org/
http://pandas.pydata.org/
The Empire Strikes Back - SAS
http://www.sas.com/en_us/software/cloud.html
https://www.sas.com/en_us/software/sas-hadoop.html
Return of the Jedi
http://www.r-bloggers.com/
29. a few Developments in Analytics
Revolution R on the cloud (AWS)
www.revolutionanalytics.com/RRE-AWS
SAS on the cloud
http://blogs.sas.com/content/sascom/2013/04/29/start-planning-now-for-sas-9-4/
http://www.allanalytics.com/author.asp?section_id=1411&doc_id=262924
Apache Spark and R
http://amplab-extras.github.io/SparkR-pkg/
30. a few Developments on the Cloud
Amazon http://aws.amazon.com/
Google https://cloud.google.com/products/
IBM http://www.ibm.com/cloud-computing/in/en/
Oracle https://cloud.oracle.com/java
31. a few Developments in R
RHadoop Project
https://github.com/RevolutionAnalytics/RHadoop/wiki
OpenCPU Project
https://www.opencpu.org/
rOpenSci Project
http://blog.programmableweb.com/2013/03/20/pw-interview-karthik-ram-ropensci-wrapping-all-science-apis/
32. The future of Open Cloud
R + Python on OpenStack ?
There is a fair degree that Apache Hadoop related projects like Shark / Spark
would be there and We need a Hadoop Based Data Warehouse Solutions(?)
We need to hedge for US Policy Interference
Education and developer ecosystems have to keep pace