There was a time when the Enterprise Data Warehouse (EDW) was the only way to provide a 360-degree analytical view of the business. In recent years many organizations have deployed disparate analytics alternatives to the EDW, including: cloud data warehouses, machine learning frameworks, graph databases, geospatial tools, and other technologies. Often these new deployments have resulted in the creation of analytical silos that are too complex to integrate, seriously limiting global insights and innovation.
Join guest speaker, 451 Research’s Jim Curtis and Pivotal’s Jacque Istok for an interactive discussion about some of the overarching trends affecting the data warehousing market, as well as how to build a next generation data platform to accelerate business innovation. During this webinar you will learn:
- The significance of a multi-cloud, infrastructure-agnostic analytics
- What is working and what isn’t, when it comes to analytics integration
- The importance of seamlessly integrating all your analytics in one platform
- How to innovate faster, taking advantage of open source and agile software
Speakers: James Curtis, Senior Analyst, Data Platforms & Analytics, 451 Research & Jacque Istok, Head of Data, Pivotal
2. Copyright (C) 2016 451 Research LLC
22
451 Research is a leading IT research & advisory company
Founded in 2000
300+ employees, including over 120 analysts
2,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
50,000+ IT professionals, business users and consumers in our research
community
Over 52 million data points published each quarter and 4,500+ reports
published each year
3,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions
of The 451 Group
Headquartered in New York City, with offices in London, Boston, San
Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia,
Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
4. Copyright (C) 2016 451 Research LLC
4
DECISION
MAKERS
DATA
ANALYSTS
IT PROSENTERPRISE
APPLICATIONS
DATA
WAREHOUSE
The Traditional Enterprise Data Warehouse:
Common characteristics
5. Copyright (C) 2016 451 Research LLC
5
So what’s driving the change?
• Data, data, and more data
• Desire for greater, deeper insight
• Data storage locations (data gravity)
• Need for broader data access
• Changes in hardware
• Looking for environment choice
• Need to reduce costs
• Leverage open source technologies
7. Copyright (C) 2016 451 Research LLC
Times are a changin’ – What’s driving the change?
– Jon Kabat-Zinn
“You can’t stop the waves,
but you can learn to surf.”
8. Copyright (C) 2016 451 Research LLC
8
ENTERPRISE
APPLICATIONS
DECISION
MAKERS
DATA
ANALYSTS
IT PROSDATA
WAREHOUSE
3
Adapt and
Expand
Our Field
of Vision
9. Copyright (C) 2016 451 Research LLC
9
ENTERPRISE
APPLICATIONS
CLOUD STORAGE
DECISION
MAKERS
HADOOP
SPARK
AI+ML
DATA
ANALYSTS
IT PROSDATA
WAREHOUSE
3
Expanded
Processing
Choices
10. Copyright (C) 2016 451 Research LLC
10
ENTERPRISE
APPLICATIONS
CLOUD STORAGE
MOBILE
APPS
BOTS
IOT DEVICES
AND SENSORS
SOCIAL
MEDIA
DECISION
MAKERS
HADOOP
SPARK
AI+ML
DATA
ANALYSTS
IT PROS
LOG AND
CLICKSTREAM
DATA
DATA
WAREHOUSE
3
Leads to
Expansion
of Data
Sources
11. Copyright (C) 2016 451 Research LLC
11
ENTERPRISE
APPLICATIONS
CLOUD STORAGE
MOBILE
APPS
BOTS
IOT DEVICES
AND SENSORS
SOCIAL
MEDIA
BUSINESS
USERS
DATA-DRIVEN
APPLICATIONS
DATA
SCIENTISTS
DECISION
MAKERS
HADOOP
SPARK
AI+ML
DATA
ANALYSTS
IT PROS
LOG AND
CLICKSTREAM
DATA
OT
USERS
DATA
WAREHOUSE
3
Which
Leads to
More
Advanced
Decision-
Making
Processes
15. Copyright (C) 2016 451 Research LLC
15
Data analy(cs is moving away from monolithic, sta(c systems; they are in a sense
integrated pla9orms architected to func(on in a variety of environments.
Data has been and will con(nue to be the most cri(cal role in an analy(c
environment, but the need to carry out ever more sophis(cated analysis—
pervasive intelligence– on that data is not a nice to have but a requirement.
Data access is likewise becoming the expecta(on for organiza(ons because
processing power and capability are only as good as an organiza(on’s ability to
community those insights.
Key takeaways
17. Pivotal Data Suite Use Case
Applied to Predictive Maintenance
Great organizations leverage software,
analytics, and insights to take better actions
and fundamentally change or pioneer
entirely new operational business models
20. Using Our Process For Solving Analytics
Pair Programming / Solutioning
Retros
Iterative Development
Greenplum Open Data Platform ANSI-compliant SQLStandups
User Centric Data Science
21. Provide the Business With Solutions
“You’ve got to start with
the customer experience
and work back toward
the technology - not the
other way around.”
- Steve Jobs
22. User Centered Design
“A design approach that supports the entire
development process with user-centered
activities, in order to create a product that is easy
to use and of added value to the intended users.”
24. Users, Users, Users
Different Users Want Different Things
IT
● Tasked with legacy
system integration
● Controls security access
to comply with policy
and laws
● Operationalization
● Enterprise Architecture
Developers
● Build applications to
interoperate
● Develop reports and
dashboards
● Extract and Transform
data
Business Analysts
● Subject Matter
Experts
● Primary consumer of
analytical models
● SQL or BI expert
Data Scientists
● Mathematically astute
● Intellectual curiosity,
analytical exploration
● Domain Knowledge
● Communication in the
form of visualization
● SQL and analytical
libraries expert
25. A Modern Data Platform Must Be Built for Diverse Analytics
26. CAPTURE
ANALYZE
APPLY
1. High speed ingestion (e.g. sensors, financial transactions)
2. Data consolidation (join and cross reference)
3. ANSI SQL for structured data
4. Higher level data constructs (e.g. Graph, Geospatial, Text)
5. SLA driven queries on big data
6. Real time access for consumers and applications
Building A Next Generation Data Platform
Common Functional Requirements
27. OPEN
PRODUCTION
PROVEN
SCALE OUT
SECURE
1. Open Source Software
2. Infrastructure Agnostic: Multi-cloud & on premises
3. Production Proven
4. Scale Out Growth
5. Security Centric
Building A Next Generation Data Platform
Common Non-Functional Requirements
28.
29. ANALYTICAL
APPLICATIONS
NATIVE INTERFACES
PIVOTAL
GREENPLUM
PLATFORM
MULTI-
STRUCTURED DATA
SOURCES &
PIPELINES
Structured Data
JDBC, OBBC
SQL
ANSI SQL
USERS
FLEXIBLE
DEPLOYMENT
Local
Storage
Other
RDBMSes
SparkGemFire
Cloud
Object
Storage
HDFS
JSON, Apache AVRO, Apache Parquet & XML
Teradata SQL
Other DB SQL
Apache MADlib
ML/Statistics/Graph
Python. R,
Java, Perl, C
Programmatic
Apache SOLR
Text
PostGIS
GeoSpatial
Custom Apps BI / Reporting Machine Learning AI
On-Premises
NEXT
GENERATION
DATA
PLATFORM
KafkaETL
Spring
Cloud
Data Flow
Massively
Parallel
(MPP)
PostgresSQL
Kernel
Petabyte
Scale
Loading
Query
Optimizer
(GPORCA)
Workload
Manager
Polymorphic
Storage
Command
Center
SQL
Compatibility
(Hyper-Q)
IT Dev
Business
Analysts
Data
Scientists
Public
Clouds
Private
Clouds
Fully
Managed
Clouds
30. Pivotal Greenplum
Powerful, MPP, and multi-cloud analytics on petabyte-scale data
BenefitsGreenplum DeliversChallenges
• Legacy scale-up DBs are
expensive to operate
• Hadoop doesn’t fit low-
latency, iterative analytics
with high user concurrency
• Multiple environments with
messy, disjointed structured
and unstructured data
• Multi-cloud, Open-source,
analytics data platform
• Massively parallel processing
with machine learning and
ANSI SQL compliance
• Unify and query structured
and unstructured data from
native, HDFS, and cloud
storage - including text,
spatial, and graph data
• Scales linearly with hardware
for optimal cost and
performance
• Faster workflow; train models
in parallel, publish to DB for
rapid parallel scoring
• Analyze more types of data
more quickly for faster,
deeper insights
31. Infrastructure Agnostic Data Platform
BenefitsChallenges
On-Premises
Run Your Analytics Anywhere
Private Cloud Public Cloud
32. Bring Analytics to
Your Data, Faster
• Distributed machine learning on large-scale
data sets
• Over 50 open-source machine learning,
statistical, graph, and math functions
• Massively parallel execution within Pivotal
Greenplum
• Apache Top Level Project: http://
madlib.apache.org
33. Bring Text Analytics to
Your Data, Faster
• Distributed free text search and text analytics on
large-scale data sets
• A Universal Query Processor that accepts
queries with mixed syntax from supported Solr
query processors
• Faceted search results
• Term highlighting in results
34. Bring Spatial Analytics to
Your Data, Faster
• Add spatial functions such as distance, area,
ability to manipulate polygons and special data
types and indexes to speed the processing
• Key Features include:
Points
Lines
Polygons
Perimeter
Area
Intersection
Contains
Distance
Long/Lat
35. Bring R to
Your Data, Faster
• Open Source analytical language most often
compared to SAS
• Great for statistical learning
36. Bring Python to
Your Data, Faster
• Great language to use for syntax
• Many packages and libraries available for
Python making things like Machine Learning
practically easy
38. What Does It Take To Build A Next Generation Data Platform?
39. Pivotal Greenplum: Learn More
● Find out more about Pivotal Greenplum at
○ https://pivotal.io/pivotal-greenplum
● OR learn more about the open source at
○ http://greenplum.org/
● OR give it a try yourself at
○ Amazon AWS or Microsoft Azure or Google GCP (coming soon)
○ or via Download