Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
5. 5
Self-Service Data Delivery Environment
Scope
• Shared/managed environment for data producers and consumers
• Corporate and non-corporate data source mash-ups
• Responsive delivery of data products with real-time data access
• Bridged data environments, across technology and business domains
Implementation
• Branded data virtualization implementation using the Denodo
Platform
• Included:
• Governance (e.g. data request process)
• Data Catalog (for end-users)
• Drivers (e.g. for BI & analytics tool integration)
• VDP Client (for data engineers and analysts)
• VDP Server (with optimized data sources)
To create and use data services for analytics, reports, and apps
6. 6
Data Architecture at Anadarko
Data Sources
Iot/Edge
Sensor Data
Machine Data
Internet Data
Images and Video
Enterprise
Structured Data Sources
Unstructured Content
Cloud
FTP
Databases
Web Services
Processing
Events (real-time)
Virtualize (real-time)
Streams (real-time)
Change Data Capture
(real-time)
ETL (batch)
Data Ingestion
Streams (real-time)
Change Data Capture
(real-time)
ETL (batch)
Data Integration
Data Lake
Batch DW NoSQL
Hadoop
YARN/Workload Management
HDFS
Data Environment
Data Compute
CPU/GPU/TPU
Data Cache
In-Memory
Data Warehouse
EDW In-Memory Data Mart
ODS Historian
Data Virtualization
Federation
Abstraction
Data Services
Optimization
Security
Governance
Analytics
Predictive Analytics
Statistical Analytics
Text Analytics
Data Mining
Data Insights
Data Access
Data Discovery
Self-Service
Search
Aplications
Real-time Decision
Management
Alerts
Reporting
Dashboards/Ad-hoc
Canned
Metadata Management, Data Governance, Data Security
8. 8
IT – Business Dilemma
IT Architecture is Unmanageable & Brittle because:
IT Focuses on
Data Collection
& Storage
Business
Focuses on Data
Visualization &
Analysis
No One Focused on Data Delivery
– So create 100’s to 1K’s of brittle direct connections and
replicate large volumes of data
Inventory System
(MS SQL Server)
Product Catalog
(Web Service -SOAP)
BI / Reporting
JDBC, ODBC,
ADO .NET
Web / Mobile
WS – REST JSON,
XML, HTML, RSS
MS Excel
Denodo Excel
Add-in
Log files
(.txt/.log files)
CRM
(MySQL)
Billing System
(Web Service - Rest)
Big Data, Cloud
(Hadoop, Web)
Product Data
(CSV)
E
T
L
Portals
JSR168 / 286,
Ms Web Parts
SOA, Middleware,
Enterprise Apps
WS – SOAP
Java API
Customer Voice
(Internet, Unstruc)
9. 9
IT and Business Going in Different Directions
BI Benchmark Report
High Cost - IT spends ~1% of Revenue on ETL
& Storage
▪ 75% of data stored is not used – large £ wasted
▪ 90% of all queries are for Current data
▪ not available from traditional EDW or data
lakes
Long Time – Months to Build ETL Process
& DataMarts
▪ 2+ Months to add new data source to an EDW
▪ 1 – 2 Months to build complex dashboard or
report
IT Slowing Down
By2020
▪ 500% growth in Data &
Device Avalanche
▪ Due to lack of data
accessibility today
< 0.5% of all data is
ever analyzed and used
Source:
Business Speeding Up
To remain competitive,
by 2020, Business
Decision Speed &
Analysis Sophistication
Requires 300% Increase
Source:
10. 10
The Promise of Self-Service Initiatives
• Let business users access the data that they need and stop IT being a bottleneck
• That’s the vision as sold by many BI tool vendors
• i.e. give me the tools and access to the data and stand back ☺
11. 11
• First wave of self-service initiatives
was driven by ‘shadow IT’ and
spreadsheets
• More recently using desktop analytics
tools
• Tableau, Qlik, Trifacta, …
• Do these initiatives really work in
practice?
Self-Service Initiatives
12. 12
Self-Service Issues…
• Tools are designed for data analysts (or power users)
• Users who are happy finding, wrangling, cleansing data
• Creating calculations, aggregations within the data
• What about the other business users?
• People who don’t want to spend hours fighting the spreadsheet…
• Spreadsheets and desktop tools are isolated
• Sitting on one desktop or shared via email
• Ultimately, can you trust the numbers?
• Where did the data come from? How has is been manipulated?
13. Rob van der Meulen, Gartner
Gartner predicts that by 2018 most business users will have
access to self-service tools, but that only one in 10 initiatives
will be sufficiently well-governed to avoid data inconsistencies
that negatively impact the business.
15. 15
Self-Service with Guardrails
• Don’t build just for the ‘data cowboys’
• Create pre-integrated, pre-calculated data services
• Saves the user having to do this themselves
• Ensures consistency of calculations, etc.
• But allow the cowboys to ‘roam and wrangle’
• Even the cowboys can only access ‘approved’ data
sources
16. 16
A Few Simple Rules…
1. Users come in all shapes and sizes
• Who are they? What data do they need? What flexibility do they want?
2. Connect to all of the data (but start with the most important)
• What data is needed by the users? Open access or pre-aggregated and pre-
calculated?
3. Use the language that the business understands
• e.g. to Finance it’s an ‘account’, but to Customer Care it’s a ‘customer’. Don’t force
people to change terminology…support multiple semantic mappings (to the language
of the consumer)
17. 17
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
IT can now
move at
slower
speed w/o
affecting
business
Business can
now make
faster & more
sophisticated
decisions as
all data
accessible by
any tool of
choice
BI and Analytics Reference Architecture
22. 22
The true potential of Self-Service Analytics
• Companies have always been challenged to deliver data to their end-users
faster
• Business users are waiting on BI Developers to deliver dashboards
• BI Developers are waiting on ETL to load data in a warehouse
• Data Scientists need access to all data and they want it in the (raw) detail forma
• The typical approach to this challenge is to build a Data Lake
• Often this results is a vast data store with no overriding metadata
• Cryptic column names, no defined relationships between different Data Sets
• Solution – Build a Virtual Data Lake with Denodo
• Faster and cheaper to deploy along with enterprise level metadata defining data
relationships
• Allow end users true self-service analytics…but with guard rails
24. 24
Summary – Key Takeaways
• Data Virtualization provides a common and consistent view of data across
organization
• No more arguments about data sources and veracity ☺
• Data Virtualization provides a platform for self-service with guardrails
• Supports both ‘data cowboys’ (with limits) and regular business users
• Accelerates self-service initiatives – no more analysis silos – while retaining control
and governance
26. Next steps
Download Denodo Express:
www.denodoexpress.com
Access Denodo Platform in the Cloud!
30 day FREE trial available!
Denodo for Azure:
www.denodo.com/TrialAzure/PackedLunch
Denodo for AWS: www.denodo.com/TrialAWS/PackedLunch
27. Next session
Data Virtualization – An Introduction
Thursday, July 19, 2017 | 11:00am PT | 2:00pm ET
Paul Moxon
VP Data Architectures & Chief Evangelist, Denodo