Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function Data Management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a worry-free experience with the architecture and its components.
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Estimating the Total Costs of Your Cloud Analytics Platform
1. Estimating the Total
Costs of Your Cloud
Analytics Platform
Presented by: William McKnight
“#1 Global Influencer in Cloud Computing” Thinkers360
President, McKnight Consulting Group
A 2-time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
With William McKnight
3. ChaosSearch helps modern organizations
Know Better™ by activating the data lake for
analytics.
The ChaosSearch Data Lake Platform indexes customers’ cloud
data, rendering it fully searchable and enabling analytics at scale
with massive reductions of time, cost and complexity.
9. Image Goes
Here
Our SRE teams used to struggle with managing
the vast amount of logs it takes to support
millions of users in real time in a consistent
manner across all our product lines. With
ChaosSearch, we are able to use a singular
solution for our various logs without the hassle
of managing the logging tools as well.”
Joel Snook, Director, DevOps Engineering
ChaosSearch Replaces Elasticsearch for Log Analytics
Activate your cloud object storage to become a hot, analytical data lake.
12. William McKnight
President, McKnight Consulting Group
• Consulted to Pfizer, Scotiabank, Fidelity, TD
Ameritrade, Teva Pharmaceuticals, Verizon, and many
other Global 1000 companies
• Frequent keynote speaker and trainer internationally
• Hundreds of articles, blogs and white papers in
publication
• Focused on delivering business value and solving
business problems utilizing proven, streamlined
approaches to information management
• Former Database Engineer, Fortune 50 Information
Technology executive and Ernst&Young Entrepreneur
of Year Finalist
• Owner/consultant: Data strategy and implementation
consulting firm
William McKnight
The Savvy Manager’s Guide
The
Savvy
Manager’s
Guide
Information
Management
Information Management
Strategies for Gaining a
Competitive Advantage with Data
2
13. Data is Under Management when it is…
• In a leveragable platform
• In an appropriate platform for its profile and
usage
• With high non-functionals (availability,
performance, scalability, stability, durability,
secure)
• Data is captured at the most granular level
• Data is at a data quality standard (as
defined by Data Governance)
3
15. Total Cost of Ownership is More Than Just
Cloud Costs
• Autonomous Administration
• Lack of Platform Features Leads to Increased
Configuration and Management
– stored procedures, referential integrity and uniqueness capabilities
– mission critical options for backup and disaster recovery, which
typically includes a standby database
– full ANSI-SQL compliance
• Performance
16. Cost Predictability and Transparency
• The cost profile options for cloud databases are straightforward
if you accept the defaults for simple workload or proof-of-
concept (POC) environments
• Initial entry costs and inadequately scoped environments can
artificially lower expectations of the true costs of jumping into a
cloud data warehouse environment.
• For some, you pay for compute resources as a function of time,
but you also choose the hourly rate based on certain enterprise
features you need.
• With some platforms, you pay for bytes processed and the
underlying architecture is unknown. The environment is scaled
automatically without affecting price. There is also a cost-per-
hour flat rate where you would need to calculate how long it
would take to run your queries to completion to predict costs.
• Customers need to analyze current workloads, performance,
and concurrency and project those into realistic pricing in
alternative platforms.
6
17. Cost Consciousness and Licensing Structure
• Be on the lookout for cost optimizations like not
paying when the system is idle, compression to save
storage costs, and moving or isolating workloads to
avoid contention.
• Look for the ability to directly operate on compact
open file formats Parquet and ORC
• Also, costs can spin out of control if you have to pay
a separate license for each deployment option or
each machine learning algorithm.
• Finally, also consider if you will be paying per user,
per node, per terabyte, per CPU, per hour, etc..
7
18. Cloud Data Warehousing
Data professionals who used to be valued for tuning
queries are now valued for tuning costs.
19. What is a Node?
• Azure SQL Data Warehouse is scaled by Data Warehouse Units (DWUs) which
are bundled combinations of CPU, memory, and I/O. According to Microsoft,
DWUs are “abstract, normalized measures of compute resources and
performance.”
• Amazon Redshift uses EC2-like instances with tightly-coupled compute and
storage nodes which is a “node” in a more conventional sense.
• Snowflake “nodes” are loosely defined as a measure of virtual compute
resources. Their architecture is described as “a hybrid of traditional shared-
disk database architectures and shared-nothing database architectures.” Thus,
it is difficult to infer what a “node” actually is.
• Google BigQuery does not use the concept of a node at all, but instead refers
to “slots” as “a unit of computational capacity required to execute SQL
queries,” which is also a vague and abstract concept.
20. Understanding Pricing 1/2
• The price-performance metric is dollars per query-hour ($/query-hour).
– This is defined as the normalized cost of running a workload.
– It is calculated by multiplying the rate offered by the cloud platform vendor times the number of computation
nodes used in the cluster and by dividing this amount by the aggregate total of the execution time
• To determine pricing, each platform has options. Buyers should be
aware of all their pricing options.
• For Azure SQL Data Warehouse, you pay for compute resources as a
function of time.
– The hourly rate for SQL Data Warehouse various slightly by region.
– Also add the separate storage charge to store the data (compressed) at a rate of $
per TB per hour.
• For Amazon Redshift, you also pay for compute resources (nodes) as a
function of time.
– Redshift also has reserved instance pricing, which can be substantially cheaper than
on-demand pricing, available with 1 or 3-year commitments and is cheapest when
paid in full upfront.
21. Understanding Pricing 2/2
• For Snowflake, you pay for compute resources as a function of time—
just like SQL Data Warehouse and Redshift.
– However you chose the hourly rate based on certain enterprise features you need
(“Standard”, “Premier”, “Enterprise”/multi-cluster, “Enterprise for Sensitive Data”
and “Virtual Private Snowflake”)
• With Google BigQuery, one option is to pay for bytes processed at $
per TB
– There’s also BigQuery flat rate
• Azure SQL Data Warehouse pricing is found at https://azure.microsoft.com/en-us/pricing/details/sql-
data-warehouse/gen2/.
• Amazon Redshift pricing is found at https://aws.amazon.com/redshift/pricing/.
• Snowflake pricing is found at https://www.snowflake.com/pricing/.
• Google BigQuery pricing is found at https://cloud.google.com/bigquery/pricing.
22. Pricing Gotchas: Memory Pressure on Scale
Out Compute
• Whenever a data warehouse does not have enough memory to build a
join hash table and keep it in memory, it has to spill it to disk
– This is costly in terms of performance, because the DBMS has to do
double work writing, sorting, and reading the hash table information all on
disk—rather than in memory
• If you want to provision a medium-sized cluster and let it scale up to
two medium clusters during the busy hours to handle the higher
concurrency, a large JOIN would spill to disk on one of the clusters
23. Pricing Gotchas: Scale Out Impact on Cost
• If an additional identical cluster is deployed
to handle the additional user queries, the
cost doubles for the time period the
additional cluster is up and running
35. Project ROI & TCO
25
ROI =
Benefit
TCO Infrastructure Software
+
FTE
+
Consulting
+
36. Design Your Benchmark
• What are you benchmarking?
– Query performance
– Load performance
– Query performance with concurrency
– Ease of use
• Competition
• Queries, Schema, Data
• Scale
• Cost
• Query Cut-Off
• Number of runs/cache
• Number of nodes
• Tuning allowed
• Vendor Involvement
• Any free third party, SaaS, or on-demand software (e.g., Apigee or SQL
Server)
• Any not-free third party, SaaS, or on-demand software
• Instance type of nodes
• Measure Price/Performance!
26
38. Summary
• Large Project Stack costs between $7M-$23M (to get full ML-based project to
production) and $19M-$43M over 2 years for the enterprise.
• Buyer Beware
– The total cost of ownership of cloud analytics platforms scales up too. Demand for
analytics at your company will only increase in the coming years.
• Hardware (CPU, memory, and input/output) is often the biggest performance
bottleneck of a database management system.
– Most cloud analytical products scale hardware in powers of 2
– In many systems, you can add more memory here or more CPU there at a more
fractional cost.
• Remember “only pay for what you use” is a two-sided coin.
• The true gauge of value is price-performance. Thus, we recommend that you
demand reliable performance at a predictable price from your analytical
platform.
• The true gauge of project efficacy is ROI.
39. Estimating the Total
Costs of Your Cloud
Analytics Platform
Presented by: William McKnight
“#1 Global Influencer in Cloud Computing” Thinkers360
President, McKnight Consulting Group
A 2 time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
#AdvAnalytics