SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Database Survival:
Food for Thought
Robin Bloor, Ph D
An Early Thought
Data Lakes and Databases are not
very different things…
Irrespective of what the data lake
enthusiasts claim
Everything in flux
u Hardware (network,
storage, servers)
u Data Sources
u Data Staging
u Data Volumes
u Data Flow
u Data Governance
u Query Languages
u Data Usage
u Data Structures
u Schema definition
u Ingest speeds
u Data Workloads
u Applications
The Data Lake Picture
Data
Cleansing
Data
Security
Ingest
Metadata
Mgt
Real-Time
Apps
Transform &
Aggregate
Search &
Query
BI, Visual'n
& Analytics
Other
Apps
Data Lake
Mgt
Data
Governance
DATA LAKE
To
Databases
Data Marts
Other Apps
Archive
Life Cycle
Mgt Extracts
Servers, Desktops, Mobile, Network Devices, Embedded
Chips, RFID, IoT, The Cloud, Oses, VMs, Log Files, Sys
Mgt Apps, ESBs, Web Services, SaaS, Business Apps,
Office Apps, BI Apps, Workflow, Data Streams, Social...
u Data Lakes (Yes!):
u Ingest points for data for the
sake of governance
u Analytics sandboxes
u Good places for cool and cold
data – and hence archive
u Data Lakes (No!):
u OLTP databases
u Fast query engines
u High user concurrency
u Bid Data analytics apps
u Unusually structured data
(NoSQL, graph, etc.)
You don’t have one data lake you have
many
Data lakes do not manage data well.
Streaming
There’s a spectrum of streaming
capability and thus a spectrum of
streaming platforms:
Spark, in-memory DBMS, SQLstream
Database Workload Parameters
q Read-intensive vs. write-
intensive
q Mutable vs. immutable data
q Immediate vs. eventual
consistency
q Short vs. long data latency
q Predictable vs.
unpredictable data access
patterns
q Simple vs. complex data
types
Horses for Courses
q Relational row store databases for
conventional OLTP
q Relational databases for ACID
requirements
q Parallel databases (row or column)
for unpredictable or variable query
workloads
q Specialized databases for complex
data query workloads (graph, etc.)
q NoSQL (KVS, DHT) for high scale
OLTP
q NoSQL (KVS, DHT) for low latency
read-mostly data access
q NoSQL / Hadoop /Spark for scale-
out batch analytic workloads
q Cloud Databases can be any of the
above
Database Tools
q Have you noticed how databases
are not self-running.
q DBA’s are in short supply and the
need for them is increasing
q Database diversity doesn’t help
in this area.
q DBA Tools:
q SQL analysis
q Performance analysis
q Security management
q Capacity planning
q Database deployment
q We meet the same problem with
data lakes – except that there
are very few tools
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Database Automation. Business Innovation.
Bloor Roundtable Webcast
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
2
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Mind the Gap
3
#ofReleases
Time
Database Deployments
Application Releases
The
Deployment
Gap
Organizations have a Deployment Gap
• Current DB deployment methods are
outpaced by Agile software release
process
• DB changes are:
– Manually intensive
– Slow
– Full of Risk
The Deployment Gap
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
4
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Slowing Releases
75% of DBAs report release
delays due to database change
process are increasing.
Need Automation
Over half respondents report
automation is key to release
speed. Same amount report
current automation does not
meet their needs.
Errors Increasing
30% report errors have increased over
last 12 months. 42% of DBAs report
same.
Dev Managers say
database change
process delay releases.
90%
DBAs say database
change process delay
releases.
91%
5
CIO Magazine Survey:
Digital Transformation and the Database
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Close the Gap
7
#ofReleases
Time
Database Deployments
Application Releases
The Deployment Gap
Datical Closes the Deployment Gap
• Datical DB radically improvesand simplifiesthe
application release processby automating
database change
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
What Makes Us Different
8
Change Management
Simulator
Simulate the impact of
database changes before
they are deployed
Dynamic Rules Engine
Automaticallyenforce
DBA rules across all
proposed database changes
Database Code Packager
Creates database
continuous integrationby
unifying application and
database changes
Deployment Monitoring
Console
Automaticallymonitor the
status of everydatabase
deploymentacross the
enterprise
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Expand, Migrate, Contract
• Move from Data Clump to Coordinate class.
9
https://martinfowler.com/bliki/ParallelChange.html
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Expand, Migrate, Contract
• Move from Data Clump to Coordinate class.
10
https://martinfowler.com/bliki/ParallelChange.html
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Expand, Migrate, Contract
• Move from Data Clump to Coordinate class.
11
https://martinfowler.com/bliki/ParallelChange.html
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Expand, Migrate, Contract
• Split a column into two columns
12
https://martinfowler.com/bliki/ParallelChange.html
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Expand, Migrate, Contract
• Add new columns, populate with UPDATE & Substring
13
https://martinfowler.com/bliki/ParallelChange.html
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Expand, Migrate, Contract
• Drop the old column
14
https://martinfowler.com/bliki/ParallelChange.html
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Respond faster by automating the deployment of database
changes.
15
Eliminates back and forth
between Dev, QA and DBAs1
Integrates with
your tools and
processes
2
Automated deployment
Validated database changes are
automatically deployed with Datical
to different environments right
alongside application changes.
3
CODE BUILD TEST
DB CHANGES
APP CHANGES
TEST STAGE PRODUCTION
Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Perform higher by massively increasing productivity,
efficiency, and ROI.
DB Professional
Database pros avoid time-
consuming review of change
scripts to focus on
strategically moving the
business forward.
Developer/QA
Devs package, review, and
validate database changes
alongside app code
changes with the push of a
button.
Business Executive
Business delivers experiences faster
and more often while reducing error
and maximizing other app release
investments.
Less Time on Database
Change Management
Tasks*
Days & Weeks è Hours
80%
Decrease in
Deployment Errors to
Test and Production*
90%
* Benchmarked from Datical customers.
16
Big Data Analytics and Hybrid
Architectures
Steve Sarsfield
Steve.Sarsfield@hpe.com
– Blogger
my.vertica.com
data-governance.blogspot.com
– Author of “The Data Governance Imperative”
– Contact
– Twitter: @stevesarsfield
– steve.sarsfield@hpe.com
About the Speaker
Steve Sarsfield
Vertica Team
Picking a DB
3
Structure
• Does the data fit into a nice
clean data model
• Will the schema lack clarity
or be dynamic?
Analytics
• What question(s) do
you want to ask of the
data?
• Short running queries
• Long, deep analytics
including predictive
Size
• Is the data “Big Data”
or will it ever be big
data?
Also:
• Cost per Terabyte
• Staffing considerations
• Familiarity with
technologies
• Company Financials
• Company Ancillary
Portfolio
• Community & Openness
Security Analytics
– Are there any attacks happening
right now?
Needing different kinds of analysis is common
Weather Application
– Tell me the current
temperature and pressure
Short, fast queries
Deeper analytics with
bigger data sets
Machine learning and
predictive
– What was the high/low for my
area?
– What was the high/low for my
region?
– What was the average
temperature?
– Highest and lowest of all time?
– Can we predict conditions
tomorrow?
– What IP and where are most of my
events coming from?
– Has traffic spiked compared to
historical?
– Has any event happened liked this
over the last three years
– What new events should we be
tracking to predict security events?
HPE Vertica Enterprise
– Columnar storage and advanced compression
– Maximum performance and scalability
HPE Vertica
All built on the same trusted and proven HPE Vertica Core SQL Engine
5
Core HPE Vertica SQL Engine
• Advanced Analytics
• Open ANSI SQL Standards ++
• R, Python, Java, Spark. Scala
• In-database machine learning
HPE Vertica for SQL on Hadoop
– Native support for ORC and Parquet
– Support for industry-leading distributions
– No helper node or single point of failure
HPE Vertica In the Cloud
– Get up and running quickly in the cloud
– Flexible, enterprise-class cloud deployment options
The Appeal of Vertica
Requirement Proof
Extreme Optimization
• Columnar design for high performance analytics
• Aggressive compression
• Scalable to petabyte scale
Total Cost of Ownership
• Simply and predictable pricing
• No penalty for additional hardware or connected users
Ready for your Enterprise
• SQL compliant to 100% of the TPC-DS benchmark queries
• Secure and ACID compliant
• No single point of failure
Open and Compatible
• Open platform – Standards compliant SQL, Python, Java
• Working with open source community on Spark, Hadoop, Kafka, etc.
6
Vertica Enterprise Unique Value to expand the data warehouse
7
Hadoop Data Lake Vertica Big Data Warehouse
CREATE TABLE customer_visits (
customer_id bigint,
visit_num int)
PARTITIONED BY (page_view_dt date)
STORED AS ORC;
Customer information in Hadoop Customer information in Data Warehouse
SELECT customers.customer_id FROM orders RIGHT OUTER JOIN customers
ON orders.customer_id = customers.customer_id
GROUP BY customers.customer_id HAVING COUNT(orders.customer_id) = 0;
Vertica Engine
Querying data that sits
BOTH in the data
warehouse and Hadoop
is our unique value.
Most solutions require that
you move the data.
ROS
§ Leveraging Web Logs to gain customer insight
§ Sensor and IOT data for pre-emptive service
§ Marketing Programs Tracking
§ Tracking impact of application updates
§ Many more uses
Machine Learning in Vertica 8.0.1
Algorithm Example
Linear Regression Demand Forecasting
Model the demand for a service or good (response) based on its features (predictors) for
example; demand for different models of laptops based on monitor size, weight, price,
operating system, etc.
Logistic Regression Engineering
Predicting the likelihood that a particular mechanical part of a system will malfunction or
require maintenance (response) based on operating conditions and diagnostic
measurements (predictors)
K-means Fraud Detection
Identify individual observations that don’t align to a distinct group (cluster) and identify
types of clusters that are more likely to be at risk of
Naïve Bayes Categorization
Using fuzzy logic, identify items that in one group or another. Used in email spam
detection, language detection, sentiment analysis and document sorting
Support the whole workflow of predictive analytics
Perhaps the ultimate architecture is all-inclusive
Apache Spark, Hadoop and Kafka
Data Warehouse (Vertica)
Optimal Use Case
– Deep Analysis
– Massive scale
– Many concurrent users
Kafka
Data Lake (Hadoop)
Optimal Use Case
– Data lake
– Warm, cold storage
– Data discovery
– ETL
Operational Analytics (Spark)
Optimal Use Case
– Small, fast running queries
– ETL and complex event processing
– Operational analytics
Features:
– Vertica performs optimized data load from
Spark
– Spark runs queries on Vertica data
Features:
– Analyze-in-place without data movement
via native ORC and Parquet readers
– Any Hadoop
– Run ON the Hadoop cluster or ON Vertica
cluster
Features:
– Share data between
applications that support
Kafka
– Data streaming into Vertica
Vertica makes data matter
Purpose built for Big Data from the first line of code
Gain insight into your data 50x-1,000x
faster than legacy products
Fast Analytics
Infinitely scale your solution by addingan
unlimited number of low cost nodes
Massive scalability
Built-in support for Hadoop, R, and a
range of ETL and BI tools
Open architecture
Store 10x-30x more data per server than
row databases with patentedcolumnar
compression
Optimized data storage
HPE Vertica Community
Edition
Download and install community
edition.Manage and analyze up to 1
TB of data across three nodes for an
unlimited time.
Try it on my.vertica.com

Mais conteúdo relacionado

Mais procurados

DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax
 

Mais procurados (20)

DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data HubCloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
 
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQLCouchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake Journey
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
 

Semelhante a Horses for Courses: Database Roundtable

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 

Semelhante a Horses for Courses: Database Roundtable (20)

Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 

Mais de Eric Kavanagh

Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Eric Kavanagh
 

Mais de Eric Kavanagh (20)

Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
Expediting the Path to Discovery with Multi-Source Analysis
Expediting the Path to Discovery with Multi-Source AnalysisExpediting the Path to Discovery with Multi-Source Analysis
Expediting the Path to Discovery with Multi-Source Analysis
 
Will AI Eliminate Reports and Dashboards
Will AI Eliminate Reports and DashboardsWill AI Eliminate Reports and Dashboards
Will AI Eliminate Reports and Dashboards
 
Metadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI ModernizationMetadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI Modernization
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Better to Ask Permission? Best Practices for Privacy and Security
Better to Ask Permission? Best Practices for Privacy and SecurityBetter to Ask Permission? Best Practices for Privacy and Security
Better to Ask Permission? Best Practices for Privacy and Security
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
 
A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital Economy
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Health Check: Maintaining Enterprise BI
Health Check: Maintaining Enterprise BIHealth Check: Maintaining Enterprise BI
Health Check: Maintaining Enterprise BI
 
Rapid Response: Debugging and Profiling to the Rescue
Rapid Response: Debugging and Profiling to the RescueRapid Response: Debugging and Profiling to the Rescue
Rapid Response: Debugging and Profiling to the Rescue
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Beyond the Platform: Enabling Fluid Analysis
Beyond the Platform: Enabling Fluid AnalysisBeyond the Platform: Enabling Fluid Analysis
Beyond the Platform: Enabling Fluid Analysis
 
Protect Your Database: High Availability for High Demand Data
 Protect Your Database: High Availability for High Demand Data Protect Your Database: High Availability for High Demand Data
Protect Your Database: High Availability for High Demand Data
 
A Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with DataA Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with Data
 
The Key to Effective Analytics: Fast-Returning Queries
The Key to Effective Analytics: Fast-Returning QueriesThe Key to Effective Analytics: Fast-Returning Queries
The Key to Effective Analytics: Fast-Returning Queries
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the Enterprise
 
Application Acceleration: Faster Performance for End Users
Application Acceleration: Faster Performance for End Users	Application Acceleration: Faster Performance for End Users
Application Acceleration: Faster Performance for End Users
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data Now
 
The New Normal: Dealing with the Reality of an Unsecure World
The New Normal: Dealing with the Reality of an Unsecure WorldThe New Normal: Dealing with the Reality of an Unsecure World
The New Normal: Dealing with the Reality of an Unsecure World
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Horses for Courses: Database Roundtable

  • 1. Database Survival: Food for Thought Robin Bloor, Ph D
  • 2. An Early Thought Data Lakes and Databases are not very different things… Irrespective of what the data lake enthusiasts claim
  • 3. Everything in flux u Hardware (network, storage, servers) u Data Sources u Data Staging u Data Volumes u Data Flow u Data Governance u Query Languages u Data Usage u Data Structures u Schema definition u Ingest speeds u Data Workloads u Applications
  • 4. The Data Lake Picture Data Cleansing Data Security Ingest Metadata Mgt Real-Time Apps Transform & Aggregate Search & Query BI, Visual'n & Analytics Other Apps Data Lake Mgt Data Governance DATA LAKE To Databases Data Marts Other Apps Archive Life Cycle Mgt Extracts Servers, Desktops, Mobile, Network Devices, Embedded Chips, RFID, IoT, The Cloud, Oses, VMs, Log Files, Sys Mgt Apps, ESBs, Web Services, SaaS, Business Apps, Office Apps, BI Apps, Workflow, Data Streams, Social... u Data Lakes (Yes!): u Ingest points for data for the sake of governance u Analytics sandboxes u Good places for cool and cold data – and hence archive u Data Lakes (No!): u OLTP databases u Fast query engines u High user concurrency u Bid Data analytics apps u Unusually structured data (NoSQL, graph, etc.) You don’t have one data lake you have many Data lakes do not manage data well.
  • 5. Streaming There’s a spectrum of streaming capability and thus a spectrum of streaming platforms: Spark, in-memory DBMS, SQLstream
  • 6. Database Workload Parameters q Read-intensive vs. write- intensive q Mutable vs. immutable data q Immediate vs. eventual consistency q Short vs. long data latency q Predictable vs. unpredictable data access patterns q Simple vs. complex data types
  • 7. Horses for Courses q Relational row store databases for conventional OLTP q Relational databases for ACID requirements q Parallel databases (row or column) for unpredictable or variable query workloads q Specialized databases for complex data query workloads (graph, etc.) q NoSQL (KVS, DHT) for high scale OLTP q NoSQL (KVS, DHT) for low latency read-mostly data access q NoSQL / Hadoop /Spark for scale- out batch analytic workloads q Cloud Databases can be any of the above
  • 8. Database Tools q Have you noticed how databases are not self-running. q DBA’s are in short supply and the need for them is increasing q Database diversity doesn’t help in this area. q DBA Tools: q SQL analysis q Performance analysis q Security management q Capacity planning q Database deployment q We meet the same problem with data lakes – except that there are very few tools
  • 9. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Database Automation. Business Innovation. Bloor Roundtable Webcast
  • 10. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. 2
  • 11. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Mind the Gap 3 #ofReleases Time Database Deployments Application Releases The Deployment Gap Organizations have a Deployment Gap • Current DB deployment methods are outpaced by Agile software release process • DB changes are: – Manually intensive – Slow – Full of Risk The Deployment Gap
  • 12. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. 4
  • 13. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Slowing Releases 75% of DBAs report release delays due to database change process are increasing. Need Automation Over half respondents report automation is key to release speed. Same amount report current automation does not meet their needs. Errors Increasing 30% report errors have increased over last 12 months. 42% of DBAs report same. Dev Managers say database change process delay releases. 90% DBAs say database change process delay releases. 91% 5 CIO Magazine Survey: Digital Transformation and the Database
  • 14. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
  • 15. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Close the Gap 7 #ofReleases Time Database Deployments Application Releases The Deployment Gap Datical Closes the Deployment Gap • Datical DB radically improvesand simplifiesthe application release processby automating database change
  • 16. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. What Makes Us Different 8 Change Management Simulator Simulate the impact of database changes before they are deployed Dynamic Rules Engine Automaticallyenforce DBA rules across all proposed database changes Database Code Packager Creates database continuous integrationby unifying application and database changes Deployment Monitoring Console Automaticallymonitor the status of everydatabase deploymentacross the enterprise
  • 17. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Expand, Migrate, Contract • Move from Data Clump to Coordinate class. 9 https://martinfowler.com/bliki/ParallelChange.html
  • 18. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Expand, Migrate, Contract • Move from Data Clump to Coordinate class. 10 https://martinfowler.com/bliki/ParallelChange.html
  • 19. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Expand, Migrate, Contract • Move from Data Clump to Coordinate class. 11 https://martinfowler.com/bliki/ParallelChange.html
  • 20. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Expand, Migrate, Contract • Split a column into two columns 12 https://martinfowler.com/bliki/ParallelChange.html
  • 21. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Expand, Migrate, Contract • Add new columns, populate with UPDATE & Substring 13 https://martinfowler.com/bliki/ParallelChange.html
  • 22. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Expand, Migrate, Contract • Drop the old column 14 https://martinfowler.com/bliki/ParallelChange.html
  • 23. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Respond faster by automating the deployment of database changes. 15 Eliminates back and forth between Dev, QA and DBAs1 Integrates with your tools and processes 2 Automated deployment Validated database changes are automatically deployed with Datical to different environments right alongside application changes. 3 CODE BUILD TEST DB CHANGES APP CHANGES TEST STAGE PRODUCTION
  • 24. Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved. Perform higher by massively increasing productivity, efficiency, and ROI. DB Professional Database pros avoid time- consuming review of change scripts to focus on strategically moving the business forward. Developer/QA Devs package, review, and validate database changes alongside app code changes with the push of a button. Business Executive Business delivers experiences faster and more often while reducing error and maximizing other app release investments. Less Time on Database Change Management Tasks* Days & Weeks è Hours 80% Decrease in Deployment Errors to Test and Production* 90% * Benchmarked from Datical customers. 16
  • 25. Big Data Analytics and Hybrid Architectures Steve Sarsfield Steve.Sarsfield@hpe.com
  • 26. – Blogger my.vertica.com data-governance.blogspot.com – Author of “The Data Governance Imperative” – Contact – Twitter: @stevesarsfield – steve.sarsfield@hpe.com About the Speaker Steve Sarsfield Vertica Team
  • 27. Picking a DB 3 Structure • Does the data fit into a nice clean data model • Will the schema lack clarity or be dynamic? Analytics • What question(s) do you want to ask of the data? • Short running queries • Long, deep analytics including predictive Size • Is the data “Big Data” or will it ever be big data? Also: • Cost per Terabyte • Staffing considerations • Familiarity with technologies • Company Financials • Company Ancillary Portfolio • Community & Openness
  • 28. Security Analytics – Are there any attacks happening right now? Needing different kinds of analysis is common Weather Application – Tell me the current temperature and pressure Short, fast queries Deeper analytics with bigger data sets Machine learning and predictive – What was the high/low for my area? – What was the high/low for my region? – What was the average temperature? – Highest and lowest of all time? – Can we predict conditions tomorrow? – What IP and where are most of my events coming from? – Has traffic spiked compared to historical? – Has any event happened liked this over the last three years – What new events should we be tracking to predict security events?
  • 29. HPE Vertica Enterprise – Columnar storage and advanced compression – Maximum performance and scalability HPE Vertica All built on the same trusted and proven HPE Vertica Core SQL Engine 5 Core HPE Vertica SQL Engine • Advanced Analytics • Open ANSI SQL Standards ++ • R, Python, Java, Spark. Scala • In-database machine learning HPE Vertica for SQL on Hadoop – Native support for ORC and Parquet – Support for industry-leading distributions – No helper node or single point of failure HPE Vertica In the Cloud – Get up and running quickly in the cloud – Flexible, enterprise-class cloud deployment options
  • 30. The Appeal of Vertica Requirement Proof Extreme Optimization • Columnar design for high performance analytics • Aggressive compression • Scalable to petabyte scale Total Cost of Ownership • Simply and predictable pricing • No penalty for additional hardware or connected users Ready for your Enterprise • SQL compliant to 100% of the TPC-DS benchmark queries • Secure and ACID compliant • No single point of failure Open and Compatible • Open platform – Standards compliant SQL, Python, Java • Working with open source community on Spark, Hadoop, Kafka, etc. 6
  • 31. Vertica Enterprise Unique Value to expand the data warehouse 7 Hadoop Data Lake Vertica Big Data Warehouse CREATE TABLE customer_visits ( customer_id bigint, visit_num int) PARTITIONED BY (page_view_dt date) STORED AS ORC; Customer information in Hadoop Customer information in Data Warehouse SELECT customers.customer_id FROM orders RIGHT OUTER JOIN customers ON orders.customer_id = customers.customer_id GROUP BY customers.customer_id HAVING COUNT(orders.customer_id) = 0; Vertica Engine Querying data that sits BOTH in the data warehouse and Hadoop is our unique value. Most solutions require that you move the data. ROS § Leveraging Web Logs to gain customer insight § Sensor and IOT data for pre-emptive service § Marketing Programs Tracking § Tracking impact of application updates § Many more uses
  • 32. Machine Learning in Vertica 8.0.1 Algorithm Example Linear Regression Demand Forecasting Model the demand for a service or good (response) based on its features (predictors) for example; demand for different models of laptops based on monitor size, weight, price, operating system, etc. Logistic Regression Engineering Predicting the likelihood that a particular mechanical part of a system will malfunction or require maintenance (response) based on operating conditions and diagnostic measurements (predictors) K-means Fraud Detection Identify individual observations that don’t align to a distinct group (cluster) and identify types of clusters that are more likely to be at risk of Naïve Bayes Categorization Using fuzzy logic, identify items that in one group or another. Used in email spam detection, language detection, sentiment analysis and document sorting Support the whole workflow of predictive analytics
  • 33. Perhaps the ultimate architecture is all-inclusive Apache Spark, Hadoop and Kafka Data Warehouse (Vertica) Optimal Use Case – Deep Analysis – Massive scale – Many concurrent users Kafka Data Lake (Hadoop) Optimal Use Case – Data lake – Warm, cold storage – Data discovery – ETL Operational Analytics (Spark) Optimal Use Case – Small, fast running queries – ETL and complex event processing – Operational analytics Features: – Vertica performs optimized data load from Spark – Spark runs queries on Vertica data Features: – Analyze-in-place without data movement via native ORC and Parquet readers – Any Hadoop – Run ON the Hadoop cluster or ON Vertica cluster Features: – Share data between applications that support Kafka – Data streaming into Vertica
  • 34. Vertica makes data matter Purpose built for Big Data from the first line of code Gain insight into your data 50x-1,000x faster than legacy products Fast Analytics Infinitely scale your solution by addingan unlimited number of low cost nodes Massive scalability Built-in support for Hadoop, R, and a range of ETL and BI tools Open architecture Store 10x-30x more data per server than row databases with patentedcolumnar compression Optimized data storage HPE Vertica Community Edition Download and install community edition.Manage and analyze up to 1 TB of data across three nodes for an unlimited time. Try it on my.vertica.com