SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
David Stodder
TDWI Senior Director of Research, BI
November 14, 2018
Accelerating Value
from Data Lakes
Sponsors
DAVID STODDER
Senior Research Director
Business Intelligence
TDWI
Agenda
• Data lake: Definitions and in
context with other systems
• Trends and drivers
• Impact of the cloud on data lakes
• Use cases: analytics, AI, BI
• Recommendations
• Hortonworks presentation
• Arcadia Data presentation
• Audience Q&A
The Data Lake: What Is It?
• Method for organizing large volumes of
highly diverse data from diverse
sources
– For broad data exploration and discovery,
plus advanced analytics
– Depending on the platform, a data lake may
handle many data structures
• Ingest first, prep later: Make data
available ASAP
– Persisting data in its raw, detailed state
• For multiple use cases: advanced
analytics, AI/machine learning, and BI
What’s Driving Adoption?
• Data-driven ambitions: Organizations
want to realize value from all of their data,
not just well-known datasets (e.g., in the
data warehouse)
– Hadoop/Spark ecosystems; the NoSQL
database revolution
• Instrumentation: Organizations have
more data coming online
– IoT, geolocation, multichannel customer
behavior, social media, text, images
Data Lakes: Becoming Established
• Similar
percentage of
organizations
surveyed
indicate plans
to use a data
lake to either
augment or
replace data
warehouse
Source: TDWI
Impact of the Cloud on Data Lakes
• “Data gravity” pulls data into the cloud
– Interest in reducing data movement
– Accessing and analyzing data where it is created;
30% using BI in the cloud and 38% plan to
– Overflowing the banks of on-premises data lakes:
streaming data into cloud storage; 32% in TDWI
research see this use case
• Cloud storage (e.g., object, file, block) used
to augment – or rival – Hadoop ecosystem
data lakes
Data Warehouses and Data Lakes
are Complementary, and They Integrate Well
RELATIONAL DATA WAREHOUSE HADOOP-BASED DATA LAKE
RDBMS for relational requirements Hadoop for diverse data, scalability, low cost
Data of recognized high value Candidate data of potential value
Mostly refined calculated data Mostly detailed source data
Known entities, tracked over time Raw material for discovering entities & facts
Data conforms to enterprise standards Fidelity to original format and condition
Data integration up front Data prep on demand
Data transformed a priori Data repurposed later, as needs arise
Typically schema on write Typically schema on read
A priori metadata improvement Metadata developed on read, in many cases
SOURCE: TDWI 2017 Best Practices Report on Data
BI and Analytics on Data Lakes: Growing
• “Replacing” ETL: An original attraction of data lakes
– 20%-30% plan data lake to augment or replace data warehouse
• SQL-on-Hadoop and Hadoop/Spark-native BI
– Engines for intensive SQL workloads on data lakes
• Fast interaction: moving from batch to micro-batch and
faster data interaction
– Relevance for analytics and AI/ML
– Growing relevance for operational reporting
Overview: Many Potential Use Cases
Source: TDWI
REFERENCE ARCHITECTURE w/HADOOP-BASED DATA LAKE
Modern Data Warehouse Environment (DWE)
DIVERSE SYSTEM PLATFORMS: Web, Client/Server, Clusters, Racks, Grids, Clouds, Hybrid Combinations…
CRM, SFA, ERP…
Financials, billing, operations,
call center, human resources…
Traditional Data
Dimensions, cubes, subject
areas, time series, metrics...
Data for reports, dashboards…
Set-Based Analytics (SQL,
OLAP)
Core Warehouse
Based on clouds, appliances,
columns, graph, sandboxes,
and other specialized analytics
Specialized DBMSs
Machine Data, IoT, Mobile Data,
Web Data, Social Media…
New Data, Big Data
CROSS-SYSTEM INTEGRATION: Application Integration, Data Integration, Interfaces, Metadata; Queries, Virtualization…
ManyIngestionMethods
ManyDeliveryMethods
Massive Store of
Raw Source Data
(actual Data Lake)
Algorithmic
Analytics
Departmental
Data Domains
(Mktg, Sales)
Data Landing
and Staging
ETL/ELT
Processing
Stream Capture
Complex Event
Processing
Ingestion Data Lake
Hadoop
SOURCE: TDWI 2017 Best Practices Report on Data
Users’ Concerns RE: Data Lakes
Addressing these ensures success
• Lack of data governance (DG) (41%)
• Inadequate skills for big data (32%), Hadoop
(32%), data integration (32%)
• Lack of compelling business case (31%) or
sponsorship (28%)
• Exposing sensitive data (28%)
• Immaturity of data lake concept (27%)
• OTHER: Self-service tools for end users,
Business metadata, Automation for DG
UNKNOWN
TERRITORY
AHEAD
SOURCE: TDWI 2017 Best Practices Report on Data
MYTH: If we build it, they will come.
• They won’t come unless you have:
– Compelling business case, sponsorship, and funding
– Right tools for self service BI, visualization, analytics, AI/ML
• They won’t stay without:
– Ability to find governed, trusted data
– Plan for controlled expansion
• They won’t succeed without:
– Lake & tool training; consulting help
Data Catalog: Important to Gaining Value
from Lakes & Data Architecture
• Objective: Sharing
knowledge about
distributed data;
improving quality,
efficiency, curation, and
governance
• Not all catalogs are alike
– Some cover only single BI,
DW, data marts
– Others specialize in legacy
systems
• Growing for easier data
discovery in data lake
Central data catalog
(Data definitions, attributes,
and other metadata)
Users, applications, and services
(Search, browse, query, filter)
Databases,
data
warehouses,
data marts,
and data
lakes
ETL,
source-to-
target
mappings
Applications, logs,
operational
systems
BI/OLAP
systems
(with their
own
metadata)
Mainframe,
files, other
legacy
systems
Empowerment Expectations for
Business Analytics via a Data Lake
• Users expect self-service access to lake data
– Without it, the lake is a failure to them
• Integrated sequence of self-service best practices.
– Data access, exploration, prep, visualization, and analysis
– This requires friendly business metadata or equivalent semantics
• Advanced analytics that complement older analytics.
– Firms want more than just OLAP & SQL-based analytics.
– Hadoop-based, algorithmic analytic processing: mining, stats, graph…
• Analytic value from human language, text, other unstructured data.
– Hadoop-based lake is natural fit for this untapped data.
Recommendations
 Focus on business use cases that a lake can address.
Identify the desired ROI
 Prioritize use cases – as there will be many proposed!
 Evaluate demand for BI and business analytics access
and interaction with data
 Evaluate potential of shared data catalog for improving
data discovery and management of data lake
 Think about the big picture: the enterprise data
architecture (involving cloud and on premises)
18
Thank You!
David Stodder
Senior Director of Research for Business Intelligence
TDWI (www.tdwi.org)
dstodder@tdwi.org
(415) 859-9933
Poll Question
• What do you see as the #1 role (or potential role) of a data
lake in your organization?
– To support advanced analytics and AI/machine learning
– For offloading data warehouse processes (e.g., ETL)
– To replace the data warehouse
– Place to collect operational and streaming data
– To support self-service BI, data discovery, and visualization
– Not sure, don’t know
ALI BAJWA
Partner Solution Engineering,
Hortonworks
21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only.
Data Lake 101:
Challenges and Best Practices
Ali Bajwa
Partner Solution Engineering
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Market
Drivers
Infinitely Scalable
(Billions of files, Exabytes)
Low TCO
(Less Storage Overhead)
BIGGER
REAL-TIME
SQL
One SQL Layer
(Across Historical, Real-time)
TRUSTED
Data Swamp->Data Lake
SMARTER
Deep Learning frameworks
(TensorFlow, Caffe)
GPU Pooling/Isolation
Faster time to deployment
(Containerized Micro-Services)
FASTER
App
1
App
2
Data
Scien
cedata
base
Secu
rity
&
Gove
Oper
ation
al
Data
Store
HDP
Core
Release Agility
(De-coupled HDP Components)
S3, ADLS/WASB, GCS with Truly
Incremental Replication
HYBRID
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
An Autonomous Car Use Case with Datalake 3.0
BIGGER 312 PB! Reduced to 156 PB
FASTER Hours! Deploy with Containers in minutes
SMARTER Separate Cluster & Months! Pool GPUs
& Train with TensorFlow in hours
HYBRID Silos! Share Data/Models across geos
TRUSTED Security & Data Quality on by default
50 cars across geos-4TB/day x 2 years
On-Prem/Cloud
HDP 3.0 Cluster for Training
Inference
Data
Model
-Store training data
-Pre-process data
-Pool GPUs into an unified pool
-Deploy containerized TensorFlow
-Train the CNN model
-
24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Security Challenges of Data Lake
Central repository of critical and sensitive
data
Data maintained over long duration
External ecosystem is in flux
Users can access and analyze data in new
and different ways
25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Watch Towers
Limited Entry Points
Moat
Kerberos
Securing your data lake
High Hard Walls
Check Identity
Inner Walls
Firewall
HDFS Encryption
LDAP/AD
HDP 3.0Apache Knox
Apache Ranger
26 © Hortonworks Inc. 2011–2018. All rights reserved
Key Data Management Considerations for “Trusted” Data Lakes
DATA
LINEAGE
MASTER
DATA
MANAGE
MENT
SEARCH
AND
INDEX
DATA
RETENTI
ON AND
ARCHIVA
L
DATA
QUALITY
AND
PROFILIN
G
SECURIT
Y AND
ACCESS
CONTRO
L
METADAT
A
MANAGE
MENT
AUDITINGBUSINES
S
DEFINITI
ONS
GOVERN
ANCE
27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
HDP Architecture: Dynamic Tag-based Security Policies
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets
Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitle
s
Streams
Pipelines
Feeds
Hive
Table
s
HDFS
Files
HBase
Table
s
Entities
in Data
Lake
28 © Hortonworks Inc. 2011–2018. All rights reserved
• Store both structured and unstructured data both in raw and “prepared” forms
• Data sourcing and derivation should tie to the use case roadmap
• Capture data from the right sources, at the right frequency, and right quality
• Data can be from internal and external (partners) sources including freely available public data sources
• E.g. Government data sources, Social Media, Weather, etc.
• Govern and document the data pipelines that are built – avoid the data swamp
• Enrich with metadata for promoting collaboration and crowdsourcing
• Just enough data protection
• Data lake will almost always contain some ”sensitive” data – personal data such as PII, PCI, PHI etc.
• Rational security and privacy controls in place
• Support goal of making “all” data available to “all” teams responsibly for BI/Analytics or
for Data Science
Data Lake Design Best Practices
Trusted Data Lake: Trusted Data from All Sources in a Single Place
29 © Hortonworks Inc. 2011–2018. All rights reserved
HDF HDP
Next Generation Data Problems
My Data Is Spread Across Multiple
Clusters and Data Sources
I Store & Analyze Data From
ERP/CRM, Systems, IoT/ Mobile
Devices, Social Media, Geo
Location etc.
Some of my data is on-premise, some
is in the cloud. I move my data from
cloud to on-premise & vice versa
between different clouds
™ ®
30 © Hortonworks Inc. 2011–2018. All rights reserved
Dataplane: Bring All Data Under Management
From the edge, through movement, to rest
Hortonworks DataPlane Service
a foundational platform for the delivery of data
solutions that will:
• Support enterprise hybrid deployment
strategy and adoption of cloud
• Common Metadata, Security and Governance
across all deployments
• Simplified enterprise data asset management
• Extensible to new services: Services enablement
layer for rapidly bringing new solutions to market
HORTONWORKS
DATAPLANE
SERVICE
MULTIPLE
CLUSTERS AND
SOURCES
MULTIHYBRID
Manage, Secure, GovernDATA AT REST
Hortonworks
Data Platform
DATA IN MOTION
Hortonworks
Data Flow
31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
DataPlane Service: At a Glance
Native Capabilities Clusters & Data Sources
Core Services Extensibility, Metering, Telemetry
DPS PLATFORM
Data Management
Services
Data Processing
Services
Data Infrastructure
Services
DLM DSS DAS* CB*
DPS EXTENSIBLE SERVICES
SMM*
DALE KIM
Senior Director,
Products/Solutions,
Arcadia Data
ArcadiaData.Proprietaryand Confidential
Accelerating Value from Data Lakes
Dale Kim, Sr. Director, Products/Solutions
November 14, 2018
ArcadiaData.Proprietaryand Confidential34
Arcadia Data Mission: To Connect Business Users to Big Data
Founding team from Teradata
Aster, HPE 3PAR, IBM DB2
Architected to solve challenges
around big data analytics
Growing customer base
in the Fortune 500
Investors
Strong Performer: Hadoop Native BI Wave Report.
”Put your BI where your data is”
Gartner names Arcadia Data a Cool
Vendor for IoT Analytics.
Recent Awards
Customers
Winner Datanami Editors Choice Award for Best
Big Data Product and Technology: Data
Visualization
ArcadiaData.Proprietaryand Confidential
Data Drives Market Disruption
35
Campaign Analysis Application
Understand high-level metrics with the ability to drill
down to details
Augment analysis with a
variety of data types &
sources such as actual
display ad images
ArcadiaData.Proprietaryand Confidential
Data Drives Market Disruption
36
Retail Store Geographic Analysis
YoY Growth
metrics plotted by
county for the
chose sub-brand
Trellising allows for
quick trend analysis
across multiple stores.
Here showing store
sales vs trade area
sales to correlate
potential shifts in buying
pattern
Choose a
specific state to
drill down to
county level
ArcadiaData.Proprietaryand Confidential
Data Drives Market Disruption
37
Retail Store Drill Down
Interactive maps allow for
easy visualization of spatial
data zooming into details
38 Arcadia Data. Proprietary and Confidential
Getting Insight From Big Data Is Hard
Platforms that very few
people can use
Business users that email
Excel sheets around
Traditional BI on Data Lakes
Arcadia Data. Proprietary and Confidential
39
A Framework for Data Lake BI
Native
Architecture
Smart
Acceleration
Connected
Ecosystem
AI Scoring and Recommendations
Arcadia Data. Proprietary and Confidential
40
What Can Search Do for You?
Arcadia Data. Proprietary and Confidential
41
What Can Search Do for You?
Arcadia Data. Proprietary and Confidential42
Instant Visuals – AI-Based Visualization Recommendations
Select data fields, then one click…
Visualization Builder Recommended Visualizations
shows which visuals best represent your data.
ArcadiaData.Proprietaryand Confidential43
Query acceleration for
scale, performance,
and concurrency
Smart Acceleration Leverages What Is Learned during Data Discovery
Ad hoc queries
Arcadia Enterprise makes
recommendations –
build these with a click.
Data Lake
Cluster
• Fast query responses
• Minimal modeling
• Live acceleration (no downtime)
All Granular
Data
Analytical
Views
Accelerated
applicationqueries
ArcadiaData.Proprietaryand Confidential44
Sample Results of Benchmark Tests
SignificantAccelerationfor Dashboards
Improvementof 21x to 88x
EfficientConcurrencyScaling
Responsiveness at 95 ConcurrentUsers
https://www.arcadiadata.com/lp/esg-technical-review-native-bi-performance-for-data-lakes/
ArcadiaData.Proprietaryand Confidential45
 ODBC to Big Data SQL Engine
 PRO: Can query all the data in Hadoop
 CON: Can’t scale on user concurrency
 Extract Data to BI Server
 PRO: Scales to more users
 CON: Can’t query all the data
 Bolt-On an External BI Accelerator
 PRO: More users on more data
 CON: Complicated architecture,
multiple security models,
more software products, edge nodes,
data movement, etc.
Three Ways Legacy BI Tools Cannot Scale on Data Lakes
extract
query
results
BIAccelerator
ODBC
1 2 3
1
2
3
BI Server BI Server
Hadoop nodes Edge nodes
ArcadiaData.Proprietaryand Confidential46
How Arcadia Data Scales for Data Lakes
ArcEngine. The powerful query
acceleration engine runs
directly on the compute nodes
of the cluster.
Analytical Views.Queries are
accelerated via pre-computed
AVs. These are stored right next
to the raw data in HDFS or cloud
object stores.
ArcViz. The lightweight visualization server
handles the UI, and includes a data coherent
cache to boost performance/concurrency
while always honoring data and permission
changes to the underlying store.
Hadoop Cluster
(Hive, Spark, YARN, HDFS, etc.)
3
2
1
ArcadiaData.Proprietaryand Confidential47
Companies Are Now Choosing Two BI Standards for Their Enterprise
Data Warehouse Data Lake
BI Standard for
Data Warehouse
(RDBMS)
BI Standard for
Data Lake
(HDFS, Object Store)
ArcadiaData.Proprietaryand Confidential48
Data Warehouse BI Architecture
BI Server
Analytic Process
Optimize Physical
Semantic Layer
Secure Data
Load Data
BigDataRequirements
Native Connection
Semi-Structured
Parallel
Real-time
Data Warehouse
(RDBMS)
ArcadiaData.Proprietaryand Confidential49
Data Lake BI Architecture – The Arcadia Data Way
BI Server
Analytic Process
Optimize Physical
Semantic Layer
Secure Data
Load Data
BigDataRequirements
Native Connection
Semi-Structured
Parallel
Real-time
Data Warehouse
(RDBMS)
Data Lake
(HDFS, Cloud Object Storage)
Arcadia Data was built
from inception to
run nativelywithin data lakes
Social media: @arcadiadataarcadiadata.com
50
How Neustar MarketShare
Scales Their Data Lake
Search-Based BI
in Action
Download
Arcadia Instant
https://www.arcadiadata.com/lp/how-neustar-
scales-saas-based-reporting-and-analytics-
for-1000s-of-customer-users/
http://watch.arcadiadata.com/watch
/NSf1mMENjYfTY2cjpuGWPS
arcadiadata.com/instant
www.arcadiadata.com/resources
QUESTIONS?
tdwi.org
CONTACT INFORMATION
If you have further questions or comments:
David Stodder, TDWI
dstodder@tdwi.org
Dale Kim, Arcadia Data
dale@arcadiadata.com
Ali Bajwa, Hortonworks
abajwa@hortonworks.com
tdwi.org
53
TDWI Conference
Keynotes, Educational Classes, Networking, and More
Las Vegas, NV, February 10-15, 2019
http://www.tdwi.org/lasvegas
*
TDWI Strategy Summit
Case Studies, Expert Talks, and Leadership Panels
Las Vegas, NV, February 11-12, 2019
https://tdwi.org/events/strategy-summits/las-vegas
Learn More in Las Vegas!

Mais conteúdo relacionado

Mais de Arcadia Data

Big Data vs. Big Risk: Real-Time Trade Surveillance in Financial Markets
Big Data vs. Big Risk: Real-Time Trade Surveillance in Financial MarketsBig Data vs. Big Risk: Real-Time Trade Surveillance in Financial Markets
Big Data vs. Big Risk: Real-Time Trade Surveillance in Financial MarketsArcadia Data
 
RegTech: Leveraging Alternative Data for Compliance
RegTech: Leveraging Alternative Data for ComplianceRegTech: Leveraging Alternative Data for Compliance
RegTech: Leveraging Alternative Data for ComplianceArcadia Data
 
How to Scale BI and Analytics with Hadoop-based Platforms
How to Scale BI and Analytics with Hadoop-based PlatformsHow to Scale BI and Analytics with Hadoop-based Platforms
How to Scale BI and Analytics with Hadoop-based PlatformsArcadia Data
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
BI on Big Data Presentation
BI on Big Data PresentationBI on Big Data Presentation
BI on Big Data PresentationArcadia Data
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI StandardsArcadia Data
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyArcadia Data
 

Mais de Arcadia Data (7)

Big Data vs. Big Risk: Real-Time Trade Surveillance in Financial Markets
Big Data vs. Big Risk: Real-Time Trade Surveillance in Financial MarketsBig Data vs. Big Risk: Real-Time Trade Surveillance in Financial Markets
Big Data vs. Big Risk: Real-Time Trade Surveillance in Financial Markets
 
RegTech: Leveraging Alternative Data for Compliance
RegTech: Leveraging Alternative Data for ComplianceRegTech: Leveraging Alternative Data for Compliance
RegTech: Leveraging Alternative Data for Compliance
 
How to Scale BI and Analytics with Hadoop-based Platforms
How to Scale BI and Analytics with Hadoop-based PlatformsHow to Scale BI and Analytics with Hadoop-based Platforms
How to Scale BI and Analytics with Hadoop-based Platforms
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
BI on Big Data Presentation
BI on Big Data PresentationBI on Big Data Presentation
BI on Big Data Presentation
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics Strategy
 

Último

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Accelerating Value from Data Lakes

  • 1. David Stodder TDWI Senior Director of Research, BI November 14, 2018 Accelerating Value from Data Lakes
  • 3. DAVID STODDER Senior Research Director Business Intelligence TDWI
  • 4. Agenda • Data lake: Definitions and in context with other systems • Trends and drivers • Impact of the cloud on data lakes • Use cases: analytics, AI, BI • Recommendations • Hortonworks presentation • Arcadia Data presentation • Audience Q&A
  • 5. The Data Lake: What Is It? • Method for organizing large volumes of highly diverse data from diverse sources – For broad data exploration and discovery, plus advanced analytics – Depending on the platform, a data lake may handle many data structures • Ingest first, prep later: Make data available ASAP – Persisting data in its raw, detailed state • For multiple use cases: advanced analytics, AI/machine learning, and BI
  • 6. What’s Driving Adoption? • Data-driven ambitions: Organizations want to realize value from all of their data, not just well-known datasets (e.g., in the data warehouse) – Hadoop/Spark ecosystems; the NoSQL database revolution • Instrumentation: Organizations have more data coming online – IoT, geolocation, multichannel customer behavior, social media, text, images
  • 7. Data Lakes: Becoming Established • Similar percentage of organizations surveyed indicate plans to use a data lake to either augment or replace data warehouse Source: TDWI
  • 8. Impact of the Cloud on Data Lakes • “Data gravity” pulls data into the cloud – Interest in reducing data movement – Accessing and analyzing data where it is created; 30% using BI in the cloud and 38% plan to – Overflowing the banks of on-premises data lakes: streaming data into cloud storage; 32% in TDWI research see this use case • Cloud storage (e.g., object, file, block) used to augment – or rival – Hadoop ecosystem data lakes
  • 9. Data Warehouses and Data Lakes are Complementary, and They Integrate Well RELATIONAL DATA WAREHOUSE HADOOP-BASED DATA LAKE RDBMS for relational requirements Hadoop for diverse data, scalability, low cost Data of recognized high value Candidate data of potential value Mostly refined calculated data Mostly detailed source data Known entities, tracked over time Raw material for discovering entities & facts Data conforms to enterprise standards Fidelity to original format and condition Data integration up front Data prep on demand Data transformed a priori Data repurposed later, as needs arise Typically schema on write Typically schema on read A priori metadata improvement Metadata developed on read, in many cases SOURCE: TDWI 2017 Best Practices Report on Data
  • 10. BI and Analytics on Data Lakes: Growing • “Replacing” ETL: An original attraction of data lakes – 20%-30% plan data lake to augment or replace data warehouse • SQL-on-Hadoop and Hadoop/Spark-native BI – Engines for intensive SQL workloads on data lakes • Fast interaction: moving from batch to micro-batch and faster data interaction – Relevance for analytics and AI/ML – Growing relevance for operational reporting
  • 11. Overview: Many Potential Use Cases Source: TDWI
  • 12. REFERENCE ARCHITECTURE w/HADOOP-BASED DATA LAKE Modern Data Warehouse Environment (DWE) DIVERSE SYSTEM PLATFORMS: Web, Client/Server, Clusters, Racks, Grids, Clouds, Hybrid Combinations… CRM, SFA, ERP… Financials, billing, operations, call center, human resources… Traditional Data Dimensions, cubes, subject areas, time series, metrics... Data for reports, dashboards… Set-Based Analytics (SQL, OLAP) Core Warehouse Based on clouds, appliances, columns, graph, sandboxes, and other specialized analytics Specialized DBMSs Machine Data, IoT, Mobile Data, Web Data, Social Media… New Data, Big Data CROSS-SYSTEM INTEGRATION: Application Integration, Data Integration, Interfaces, Metadata; Queries, Virtualization… ManyIngestionMethods ManyDeliveryMethods Massive Store of Raw Source Data (actual Data Lake) Algorithmic Analytics Departmental Data Domains (Mktg, Sales) Data Landing and Staging ETL/ELT Processing Stream Capture Complex Event Processing Ingestion Data Lake Hadoop SOURCE: TDWI 2017 Best Practices Report on Data
  • 13. Users’ Concerns RE: Data Lakes Addressing these ensures success • Lack of data governance (DG) (41%) • Inadequate skills for big data (32%), Hadoop (32%), data integration (32%) • Lack of compelling business case (31%) or sponsorship (28%) • Exposing sensitive data (28%) • Immaturity of data lake concept (27%) • OTHER: Self-service tools for end users, Business metadata, Automation for DG UNKNOWN TERRITORY AHEAD SOURCE: TDWI 2017 Best Practices Report on Data
  • 14. MYTH: If we build it, they will come. • They won’t come unless you have: – Compelling business case, sponsorship, and funding – Right tools for self service BI, visualization, analytics, AI/ML • They won’t stay without: – Ability to find governed, trusted data – Plan for controlled expansion • They won’t succeed without: – Lake & tool training; consulting help
  • 15. Data Catalog: Important to Gaining Value from Lakes & Data Architecture • Objective: Sharing knowledge about distributed data; improving quality, efficiency, curation, and governance • Not all catalogs are alike – Some cover only single BI, DW, data marts – Others specialize in legacy systems • Growing for easier data discovery in data lake Central data catalog (Data definitions, attributes, and other metadata) Users, applications, and services (Search, browse, query, filter) Databases, data warehouses, data marts, and data lakes ETL, source-to- target mappings Applications, logs, operational systems BI/OLAP systems (with their own metadata) Mainframe, files, other legacy systems
  • 16. Empowerment Expectations for Business Analytics via a Data Lake • Users expect self-service access to lake data – Without it, the lake is a failure to them • Integrated sequence of self-service best practices. – Data access, exploration, prep, visualization, and analysis – This requires friendly business metadata or equivalent semantics • Advanced analytics that complement older analytics. – Firms want more than just OLAP & SQL-based analytics. – Hadoop-based, algorithmic analytic processing: mining, stats, graph… • Analytic value from human language, text, other unstructured data. – Hadoop-based lake is natural fit for this untapped data.
  • 17. Recommendations  Focus on business use cases that a lake can address. Identify the desired ROI  Prioritize use cases – as there will be many proposed!  Evaluate demand for BI and business analytics access and interaction with data  Evaluate potential of shared data catalog for improving data discovery and management of data lake  Think about the big picture: the enterprise data architecture (involving cloud and on premises)
  • 18. 18 Thank You! David Stodder Senior Director of Research for Business Intelligence TDWI (www.tdwi.org) dstodder@tdwi.org (415) 859-9933
  • 19. Poll Question • What do you see as the #1 role (or potential role) of a data lake in your organization? – To support advanced analytics and AI/machine learning – For offloading data warehouse processes (e.g., ETL) – To replace the data warehouse – Place to collect operational and streaming data – To support self-service BI, data discovery, and visualization – Not sure, don’t know
  • 20. ALI BAJWA Partner Solution Engineering, Hortonworks
  • 21. 21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only. Data Lake 101: Challenges and Best Practices Ali Bajwa Partner Solution Engineering
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Market Drivers Infinitely Scalable (Billions of files, Exabytes) Low TCO (Less Storage Overhead) BIGGER REAL-TIME SQL One SQL Layer (Across Historical, Real-time) TRUSTED Data Swamp->Data Lake SMARTER Deep Learning frameworks (TensorFlow, Caffe) GPU Pooling/Isolation Faster time to deployment (Containerized Micro-Services) FASTER App 1 App 2 Data Scien cedata base Secu rity & Gove Oper ation al Data Store HDP Core Release Agility (De-coupled HDP Components) S3, ADLS/WASB, GCS with Truly Incremental Replication HYBRID
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved An Autonomous Car Use Case with Datalake 3.0 BIGGER 312 PB! Reduced to 156 PB FASTER Hours! Deploy with Containers in minutes SMARTER Separate Cluster & Months! Pool GPUs & Train with TensorFlow in hours HYBRID Silos! Share Data/Models across geos TRUSTED Security & Data Quality on by default 50 cars across geos-4TB/day x 2 years On-Prem/Cloud HDP 3.0 Cluster for Training Inference Data Model -Store training data -Pre-process data -Pool GPUs into an unified pool -Deploy containerized TensorFlow -Train the CNN model -
  • 24. 24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Security Challenges of Data Lake Central repository of critical and sensitive data Data maintained over long duration External ecosystem is in flux Users can access and analyze data in new and different ways
  • 25. 25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Watch Towers Limited Entry Points Moat Kerberos Securing your data lake High Hard Walls Check Identity Inner Walls Firewall HDFS Encryption LDAP/AD HDP 3.0Apache Knox Apache Ranger
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Key Data Management Considerations for “Trusted” Data Lakes DATA LINEAGE MASTER DATA MANAGE MENT SEARCH AND INDEX DATA RETENTI ON AND ARCHIVA L DATA QUALITY AND PROFILIN G SECURIT Y AND ACCESS CONTRO L METADAT A MANAGE MENT AUDITINGBUSINES S DEFINITI ONS GOVERN ANCE
  • 27. 27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved HDP Architecture: Dynamic Tag-based Security Policies Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitle s Streams Pipelines Feeds Hive Table s HDFS Files HBase Table s Entities in Data Lake
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved • Store both structured and unstructured data both in raw and “prepared” forms • Data sourcing and derivation should tie to the use case roadmap • Capture data from the right sources, at the right frequency, and right quality • Data can be from internal and external (partners) sources including freely available public data sources • E.g. Government data sources, Social Media, Weather, etc. • Govern and document the data pipelines that are built – avoid the data swamp • Enrich with metadata for promoting collaboration and crowdsourcing • Just enough data protection • Data lake will almost always contain some ”sensitive” data – personal data such as PII, PCI, PHI etc. • Rational security and privacy controls in place • Support goal of making “all” data available to “all” teams responsibly for BI/Analytics or for Data Science Data Lake Design Best Practices Trusted Data Lake: Trusted Data from All Sources in a Single Place
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved HDF HDP Next Generation Data Problems My Data Is Spread Across Multiple Clusters and Data Sources I Store & Analyze Data From ERP/CRM, Systems, IoT/ Mobile Devices, Social Media, Geo Location etc. Some of my data is on-premise, some is in the cloud. I move my data from cloud to on-premise & vice versa between different clouds ™ ®
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Dataplane: Bring All Data Under Management From the edge, through movement, to rest Hortonworks DataPlane Service a foundational platform for the delivery of data solutions that will: • Support enterprise hybrid deployment strategy and adoption of cloud • Common Metadata, Security and Governance across all deployments • Simplified enterprise data asset management • Extensible to new services: Services enablement layer for rapidly bringing new solutions to market HORTONWORKS DATAPLANE SERVICE MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID Manage, Secure, GovernDATA AT REST Hortonworks Data Platform DATA IN MOTION Hortonworks Data Flow
  • 31. 31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved DataPlane Service: At a Glance Native Capabilities Clusters & Data Sources Core Services Extensibility, Metering, Telemetry DPS PLATFORM Data Management Services Data Processing Services Data Infrastructure Services DLM DSS DAS* CB* DPS EXTENSIBLE SERVICES SMM*
  • 33. ArcadiaData.Proprietaryand Confidential Accelerating Value from Data Lakes Dale Kim, Sr. Director, Products/Solutions November 14, 2018
  • 34. ArcadiaData.Proprietaryand Confidential34 Arcadia Data Mission: To Connect Business Users to Big Data Founding team from Teradata Aster, HPE 3PAR, IBM DB2 Architected to solve challenges around big data analytics Growing customer base in the Fortune 500 Investors Strong Performer: Hadoop Native BI Wave Report. ”Put your BI where your data is” Gartner names Arcadia Data a Cool Vendor for IoT Analytics. Recent Awards Customers Winner Datanami Editors Choice Award for Best Big Data Product and Technology: Data Visualization
  • 35. ArcadiaData.Proprietaryand Confidential Data Drives Market Disruption 35 Campaign Analysis Application Understand high-level metrics with the ability to drill down to details Augment analysis with a variety of data types & sources such as actual display ad images
  • 36. ArcadiaData.Proprietaryand Confidential Data Drives Market Disruption 36 Retail Store Geographic Analysis YoY Growth metrics plotted by county for the chose sub-brand Trellising allows for quick trend analysis across multiple stores. Here showing store sales vs trade area sales to correlate potential shifts in buying pattern Choose a specific state to drill down to county level
  • 37. ArcadiaData.Proprietaryand Confidential Data Drives Market Disruption 37 Retail Store Drill Down Interactive maps allow for easy visualization of spatial data zooming into details
  • 38. 38 Arcadia Data. Proprietary and Confidential Getting Insight From Big Data Is Hard Platforms that very few people can use Business users that email Excel sheets around Traditional BI on Data Lakes
  • 39. Arcadia Data. Proprietary and Confidential 39 A Framework for Data Lake BI Native Architecture Smart Acceleration Connected Ecosystem AI Scoring and Recommendations
  • 40. Arcadia Data. Proprietary and Confidential 40 What Can Search Do for You?
  • 41. Arcadia Data. Proprietary and Confidential 41 What Can Search Do for You?
  • 42. Arcadia Data. Proprietary and Confidential42 Instant Visuals – AI-Based Visualization Recommendations Select data fields, then one click… Visualization Builder Recommended Visualizations shows which visuals best represent your data.
  • 43. ArcadiaData.Proprietaryand Confidential43 Query acceleration for scale, performance, and concurrency Smart Acceleration Leverages What Is Learned during Data Discovery Ad hoc queries Arcadia Enterprise makes recommendations – build these with a click. Data Lake Cluster • Fast query responses • Minimal modeling • Live acceleration (no downtime) All Granular Data Analytical Views Accelerated applicationqueries
  • 44. ArcadiaData.Proprietaryand Confidential44 Sample Results of Benchmark Tests SignificantAccelerationfor Dashboards Improvementof 21x to 88x EfficientConcurrencyScaling Responsiveness at 95 ConcurrentUsers https://www.arcadiadata.com/lp/esg-technical-review-native-bi-performance-for-data-lakes/
  • 45. ArcadiaData.Proprietaryand Confidential45  ODBC to Big Data SQL Engine  PRO: Can query all the data in Hadoop  CON: Can’t scale on user concurrency  Extract Data to BI Server  PRO: Scales to more users  CON: Can’t query all the data  Bolt-On an External BI Accelerator  PRO: More users on more data  CON: Complicated architecture, multiple security models, more software products, edge nodes, data movement, etc. Three Ways Legacy BI Tools Cannot Scale on Data Lakes extract query results BIAccelerator ODBC 1 2 3 1 2 3 BI Server BI Server Hadoop nodes Edge nodes
  • 46. ArcadiaData.Proprietaryand Confidential46 How Arcadia Data Scales for Data Lakes ArcEngine. The powerful query acceleration engine runs directly on the compute nodes of the cluster. Analytical Views.Queries are accelerated via pre-computed AVs. These are stored right next to the raw data in HDFS or cloud object stores. ArcViz. The lightweight visualization server handles the UI, and includes a data coherent cache to boost performance/concurrency while always honoring data and permission changes to the underlying store. Hadoop Cluster (Hive, Spark, YARN, HDFS, etc.) 3 2 1
  • 47. ArcadiaData.Proprietaryand Confidential47 Companies Are Now Choosing Two BI Standards for Their Enterprise Data Warehouse Data Lake BI Standard for Data Warehouse (RDBMS) BI Standard for Data Lake (HDFS, Object Store)
  • 48. ArcadiaData.Proprietaryand Confidential48 Data Warehouse BI Architecture BI Server Analytic Process Optimize Physical Semantic Layer Secure Data Load Data BigDataRequirements Native Connection Semi-Structured Parallel Real-time Data Warehouse (RDBMS)
  • 49. ArcadiaData.Proprietaryand Confidential49 Data Lake BI Architecture – The Arcadia Data Way BI Server Analytic Process Optimize Physical Semantic Layer Secure Data Load Data BigDataRequirements Native Connection Semi-Structured Parallel Real-time Data Warehouse (RDBMS) Data Lake (HDFS, Cloud Object Storage) Arcadia Data was built from inception to run nativelywithin data lakes
  • 50. Social media: @arcadiadataarcadiadata.com 50 How Neustar MarketShare Scales Their Data Lake Search-Based BI in Action Download Arcadia Instant https://www.arcadiadata.com/lp/how-neustar- scales-saas-based-reporting-and-analytics- for-1000s-of-customer-users/ http://watch.arcadiadata.com/watch /NSf1mMENjYfTY2cjpuGWPS arcadiadata.com/instant www.arcadiadata.com/resources
  • 52. CONTACT INFORMATION If you have further questions or comments: David Stodder, TDWI dstodder@tdwi.org Dale Kim, Arcadia Data dale@arcadiadata.com Ali Bajwa, Hortonworks abajwa@hortonworks.com tdwi.org
  • 53. 53 TDWI Conference Keynotes, Educational Classes, Networking, and More Las Vegas, NV, February 10-15, 2019 http://www.tdwi.org/lasvegas * TDWI Strategy Summit Case Studies, Expert Talks, and Leadership Panels Las Vegas, NV, February 11-12, 2019 https://tdwi.org/events/strategy-summits/las-vegas Learn More in Las Vegas!