3. Iceberg
Estimating TCO is hard.
Like an iceberg, many
costs are hidden.
Example :
integration of Big Data
within the existing
ecosystem.
4. Hadoop Implementations
Hadoop deployment methods
Sample Vendors
Hortonworks IBM, EMC AWS EMR
Cloudera Oracle, Teradata Rackspace Altiscale
MAPR VMware Gogrid Quoble
On Premise
Hadoop
Appliance
Hadoop
Hosting
Hadoop as a
service
Bare Metal Cloud
5. On-Premise Cost Categories
Cost Group Item
Hardware/Infrastructure Costs Servers , Peripherals, Network
Storage
Communication Costs Local Area Network , Wide Area Network
Remote Access
Software Costs License/Subscription Fees
Implementation Costs Development/customization/integration
Training , Consulting , Non Functional
Testing(Performance, Capacity, Security etc.)
Management Costs Hardware & software upgrades , Hardware &
software administration, Legal Cost
Support Costs Support staff, Staff training, Travel, Support
contracts, Overhead labor, High Availability Cost
Disaster Recovery Cost, Ticketing & Trouble
Shooting Cost, Monitoring Cost, Internal Audit Cost
6. Managing Risk
Cost Group Item
Vendor Vendor Viability
Control on Technical Architecture
Data Protection
Loss of Intellectual Property
Loss of Privacy
Internal IT Vendor Viability
Control on Technical Architecture
Data Protection
Loss of Intellectual Property
Loss of Privacy
7. Sample calculation
Inputs
Average Monthly HDFS (TB) 1500
Peak HDFS over Monthly (TB) 100
Monthly HDFS Growth (TB) 20
Average Monthly Compute ('000 SH) 20
Peak Compute (SH) 1400
Planning Cycle (Months) 36
Purchased Distribution No
Hadoop Admin Costs Included
Data from S3 Yes
8. Results without considering risk
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
Hadoop as a
service
On Premise Amazon EMR Hadoop
Distribution
on EC2
Cost over 36 Months
Cost over 36 Months
9. Managing Risk (Vendor) – Sample data
Managing Risk Risk Factor Weight(%) Calculated Risk
Vendor Viability 2 40 0.8
Control on Technical
Architecture 1 20 0.2
Data Protection 2 15 0.3
Loss of Intellectual
Property 1 10 0.1
Loss of Privacy 2 15 0.3
Total 1.7
Vendor Viability 1 - No Risk, 5 - Very High Risk with vendor viability
Control on Technical Architecture 1 - No Need to Control, 5 - Compelling Need to control technical architecture.
Data Protection 1 - High data protection provided by architecture and process, 5 - No data protection
Loss of Intellectual Property 1 - No IP, 5 - High business impact with the loss of IP
Loss of Privacy 1 - No privacy issue for the solution, 5 - High business impact with loss of Data
10. Managing Risk (Internal IT – Sample data)
Managing Risk Risk Factor Weight(%) Calculated Risk
Vendor Viability 1 40 0.4
Control on Technical
Architecture 1 20 0.2
Data Protection 2 15 0.3
Loss of Intellectual
Property 1 10 0.1
Loss of Privacy 2 15 0.3
Total 1.3
Vendor Viability 1 - No Risk, 5 - Very High Risk with vendor viability
Control on Technical Architecture 1 - No Need to Control, 5 - Compelling Need to control technical architecture.
Data Protection 1 - High data protection provided by architecture and process, 5 - No data protection
Loss of Intellectual Property 1 - No IP, 5 - High business impact with the loss of IP
Loss of Privacy 1 - No privacy issue for the solution, 5 - High business impact with loss of Data
11. Results after considering risk
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
Hadoop as a
service
On Premise Amazon EMR Hadoop
Distribution
on EC2
Cost over 36 Months
Cost over 36 Months
13. On-Premise Implementation – When?
• Well-defined use cases with a demonstrated ROI
• Developed and tuned Hadoop applications
• IT team with experience and bandwidth to
manage/maintain Hadoop and integrated
hardware/software stack - as well as troubleshoot job
problems
• Sufficient # of Nodes to Support:
o Growth in Data Sets
o “Bursty” Nature of Jobs
14. On-Premise Implementation – Company Profile
• Large enterprise with a strategic need for Big Data
Analytics
• Moved from an exploratory stage to enterprise
adoption
• Committed IT resources to support Hadoop
hardware/software stack
15. Hadoop as a Service – The Continuum
• Vendors manage the hardware
• Vendors install hadoop
• Vendors manage hadoop
16. Vendors Manage The Hardware
For Organizations that:
• Want to create a small cluster for a relatively
short period of time, for training and software
development purposes.
• Have a short-term processing need and no
internal capacity to support it.
• Do not have an IT organization that can install,
manage, maintain and operate the Hadoop
hardware/software stack, and can fix “broken”
jobs.
17. Vendors Install Hadoop
For Organizations that:
• Have a short-term need or small-scale Hadoop
requirement.
• Have Hadoop applications that are “bursty.”
• Have an IT organization that can operate the
Hadoop hardware/software stack, can manage
scaling the cluster, and can fix “broken” jobs.
• Do not need to tailor the hardware to their
specific requirements.
18. Vendors Manage Hadoop
For Organizations that:
• Do not have the IT organization that can install,
manage, maintain and operate the Hadoop
hardware/software stack, and fix “broken” jobs.
• Do not have the IT hardware infrastructure that’s
required.
• May need an “always on” Hadoop environment.
• Need service providers that:
• Can handle all aspects of the IT support for Hadoop.
• Can provide comprehensive SLAs.
• May offer hardware optimized for Hadoop.