Big Data and Data Virtualization

GAIN BETTER INSIGHTS FROM BIG DATA
USING RED HAT JBOSS DATA VIRTUALIZATION

Red Hat Corporation
January 5, 2014

Red Hat is…

“By running tests and executing numerous examples for specific teams, we were able to prove […] not
only would the solution work, but it will perform better & at a fraction of the costs.”
MICHAEL BLAKE, Director, Systems & Architecture

2

RED HAT Confidential

Agenda
●

Data challenges getting bigger

●

Red Hat Big Data Strategy and Platform

●

Data Virtualization Overview

●

Customer Use Case for Big Data integration using Data
Virtualization

●

●

3

Demo
Q&A


Data Driven Economy
Data is becoming the new raw material
of business: an economic input almost
on a par with capital and labor. “Every
day I wake up and ask, ‘how can I flow
data better, manage data better,
analyze data better?”
CIO - Wal-Mart

4


Data Challenges Getting Bigger
Big Data, Cloud, and Mobile

Existing Data Integration approaches are not sufficient
●

Extracting and moving data adds latency and cost

●

Every project solves data access and integration in a different way

●

Solutions are tightly coupled to data sources

●

Poor flexibility and agility
BI Reports

Operational
Reports

Enterprise
Applications

SOA
Applications

Mobile
Applications

Constant
Change

How to align?

Integration Complexity

Siloed &
Complex
Hadoop

5

NoSQL

Cloud Apps

Data Warehouse
& Databases

Mainframe


XML, CSV
& Excel Files

Enterprise Apps

Business Objective
Turn Data into Actionable Information
Only

28%

Users have any meaningful
data access

 Reduce costs for finding and

accessing highly fragmented data

Over

70%

BI project efforts lies in the
integration of source data

 Improve time to market for new
products and services by simplifying

data access and integration

 Deliver IT solution agility

necessary to capitalize on constantly
changing market conditions

 Transform fragmented data into
actionable information that delivers
competitive advantage

6


Red Hat’s Big Data Strategy
●

Reduce Information Gap thru cost effectively making
ALL data easily consumable for analytics

Process

Data to Actionable Information Cycle

7


Analytics

Data

Capture

Integrat
e

Red Hat Big Data
Platform

Middleware

Hadoop
Integration
JBoss Data
Virtualization

Platform
RHEL
Platform Integration
&
Optimization

op
ado
H n
o
ra
Apache
edo
F
Fedora
Big Data SIG

Hadoop

Hadoop
Distributions

Hadoop On
Red Hat Storage

Storage
8


Hadoop
On
OpenStack

Cloud /
Virtualization

Red Hat Big Data
Platform

Platform
RHEL
&
Optimization

Middleware

Hadoop
Integration
JBoss Data
Virtualization

p
doo
Ha n
o
ora
Apache
Fed
Fedora
Big Data SIG

Hadoop

Hadoop
Distributions

Hadoop On
Red Hat Storage

Storage
9


Hadoop
On
OpenStack

Cloud /
Virtualization

What does Data Virtualization software do?
Turn Fragmented Data into Actionable Information
Data Virtualization software virtually
unifies data spread across various
disparate sources; and makes it
available to applications as a single
consolidated data source.

DATA CONSUMERS
BI Reports

The data virtualization software
implements 3 steps process to bridge
data sources and data consumers:
●

●

●

10

Connect: Fast access to data from
diverse data sources
Compose: Easily create unified
virtual data models and views by
combining and transforming data
from multiple sources.
Consume: Expose consistent
information to data consumers in
the right form thru standard data
access methods.

SOA Applications

Easy,
Real-time
Information
Access

Virtual Consolidated Data Source

Data Virtualization Software

•
•
•

Consume
Compose
Connect

Oracle DW

SAP

Hadoop

DATA SOURCES

Salesforce.com

Virtualize
Abstract
Federate

Siloed &
Complex

Turn Fragmented Data into Actionable Information
Mobile Applications
ESB, ETL

BI Reports & Analytics

SOA Applications & Portals

Data
Consumers

JBoss
Data
Virtu
aliza
tion

Design Tools

Standard based Data Provisioning
JDBC, ODBC, SOAP, REST, OData

Consume

Dashboard

Unified Virtual Database / Common Data Model
Compose

Unified Customer
View

Unified
Product View

Easy,
Real-time
Information
Access

Unified
Supplier View

Optimization
Caching

Virtualize
Abstract
Federate

Security

Connect

Native Data Connectivity
Metadata

Data
Sources

Siloed &
Complex
Hadoop

11

NoSQL

Cloud Apps

Data Warehouse
& Databases


Mainframe

XML, CSV
& Excel Files

Enterprise Apps

JBoss Data Virtualization:
Supported Data Sources
Enterprise RDBMS:
• Oracle
• IBM DB2
• Microsoft SQL Server
• Sybase ASE
• MySQL
• PostgreSQL
• Ingres
Enterprise EDW:
• Teradata
• Netezza
• Greenplum

12

Hadoop:
• Apache
• HortonWorks
• Cloudera
• More coming…
Office Productivity:
• Microsoft Excel
• Microsoft Access
• Google Spreadsheets
Specialty Data
Sources:
• ModeShape
Repository
• Mondrian
• MetaMatrix
• LDAP

NoSQL:
• JBoss Data Grid
• MongoDB
• More coming…
Enterprise & Cloud
Applications:
• Salesforce.com
• SAP
Technology
Connectors:
• Flat Files, XML Files,
XML over HTTP
• SOAP Web Services
• REST Web Services
• OData Services

Key New Features and Capabilities
●

Data connectivity enhancements
–
–

NoSQL (MongoDB – Tech Preview) and JBoss Data Grid

–
●

Hadoop Integration (Hive – Big Data),
Odata support (SAP integration)

Developer Productivity improvements
–
–

Enhanced column level security,

–
●

New VDB Designer 8 and integration with JBoss Developer Studio v7
VDB import/reuse, and native queries

Simplify deployment and packaging
–
–

●

Requires JBoss EAP only; included with subscription
Remove dependency with SOA Platform

Business Dashboard
–

13

New rapid data reporting/visualization capability

●

JBoss Data Virtualization – Use Cases

Self-Service
Business
Intelligence

The virtual, reusable data model provides business-friendly representation of data,
allowing the user to interact with their data without having to know the complexities of their
database or where the data is stored and allowing multiple BI tools to acquire data from
centralized data layer. Gain better insights from Big Data using JBoss Data Virtualization to
integrate with existing information sources.

360◦
Unified
View

Deliver a complete view of master & transactional data in real-time. The virtual data layer
serves as a unified, enterprise-wide view of business information that improves users’ ability
to understand and leverage enterprise data.

Agile SOA
Data
Services

A data virtualization layer deliver the missing data services layer to SOA applications. JBoss
Data Virtualization increases agility and loose coupling with virtual data stores without the
need to touch underlying sources and creation of data services that encapsulate the data
access logic and allowing multiple business service to acquire data from centralized data
layer.

Regulatory
Compliance

Data Virtualization layer deliver the data firewall functionality. JBoss Data Virtualization
improves data quality via centralized access control, robust security infrastructure and
reduction in physical copies of data thus reducing risk. Furthermore, the metadata
repository catalogs enterprise data locations and the relationships between the data in
various data stores, enabling transparency and visibility.

14


Big Data integration
use case

Retail Customer Use Case

Gain Better Insight from Big Data for Intelligent Inventory Management
●

Objective:
–

●

Right merchandise, at right time and price

JBoss
BRMS

Problem:
–

●

Analytical Apps

Data Driven
Decision
Management

Cannot utilize social data and sentiment
analysis with their inventory and purchase
management system

Solution:
–

Leverage JBoss Data Virtualization to
mashup Sentiment analysis data with
inventory and purchasing system data.
Leveraged BRMS to optimize pricing and
stocking decisions.

Consume
Compose
Connect

JBoss Data Virtualization

Hive
Inventory
Databases

15


Purchase Mgmt
Application
Sentiment
Analysis

Better Together - Big Data and Data Virtualization
Hadoop not another Silo - Customers Combine Multiple Technologies
●

Combine structured and unstructured analysis
–

●

Combine high velocity and historical analysis
–

●

Analyze and react to data in motion; adjust models with deep historical
analysis

Reuse structured data for analysis
–

16

Augment data warehouse with additional external sources, such as
social media

Experimentation and ad-hoc analysis with structured data


Integrate & Analyze

●

Better Together - Big Data and Data
Virtualization

Capture, Process and Integrate Data Volume, Velocity, Variety
BI Analytics
SOA Composite Applications

(historical, operational, predictive)

Capture & Process

In-memory Cache
JBoss Data Grid

Messaging and Event Processing
JBoss A-MQ and JBoss BRMS
J
Structured Data

17

Streaming
Data


Hadoop
Semi-Structured
Data

Red Hat Storage
Red Hat Enterprise Linux & Virtualization

Data Integration

Consider...
Inconsistent,
Incomplete
Information

Uninformed,
Delayed Decisions

Costly Business Risk
and Exposure

How would your organization change…
●

●

●

18

If data were readily reusable in place rather than
requiring significant effort to build new intermediary data
tiers?
If data could be repurposed quickly into new applications
and business processes?
If all applications and business processes could get all of
the information needed in the form needed, where
needed and when needed?

●

Red Hat JBoss Middleware

Business Process
Management

•
•

JBoss BRMS
JBoss BPM Suite

Application
Integration

•
•
•

JBoss A-MQ
JBoss Fuse
JBoss Fuse Service Works

Data Integration

Foundation

ACCELERATE

19

•

•
•
•

JBoss Data
Virtualization
JBoss EAP
JBoss Web Server
JBoss Data Grid

INTEGRATE


AUTOMATE

JBoss Operations Network

JBoss Developer Studio

JBoss Portal

•

•

•

Management
Management
Tools
Tools

Development
Development
Toolsh
Toolsh

User Interaction

Big Data Integration using JBoss Data Virtualization

Demo

Demo Scenario
●

Objective:
–

●

Cannot utilize social data and
sentiment analysis with sales
management system

Consume
Compose
Connect

Solution:
–

21

Determine if sentiment data from the
first week of the Iron Man 3 movie is a
predictor of sales

Problem:
–

●

Excel Powerview and
DV Dashboard to
analyze the
aggregated data


Leverage JBoss Data Virtualization to
mashup Sentiment analysis data with
ticket and merchandise sales data on
MySQL into a single view of the data.

Hive

SOURCE 1: Hive/Hadoop
contains twitter data
including sentiment


SOURCE 2: MySQL data
that includes ticket and
merchandise sales

Demonstration System Requirements
• JDK
– Oracle JDK 1.6, 1.7 or OpenJDK 1.6 or 1.7

• JBoss Data Virtualization v6 Beta
– http://jboss.org/products/datavirt.html

• JBoss Developer Studio
– http://jboss.org/products

• JBoss Integration Stack Tools (Teiid)
– https://devstudio.jboss.com/updates/7.0-development/integration-stack/

• Slides, Code and References for demo
– https://github.com/DataVirtualizationByExample/Mashup-with-Hive-and-MyS
QL

• Hortonworks Data Platform (A VM for testing Hive/Hadoop)
– http://hortonworks.com/products/hdp-2/#install

• Red Hat Storage
– http://www.redhat.com/products/storage-server/
22


Why Red Hat for Big Data?
●

Transform ALL data into actionable information
–

Cost Effective, Comprehensive Platform

–

Community based Innovation

–

Enterprise Class Software and Support

Process

Integrate

Data to Actionable Information Cycle

59


Information

Data

Capture

●

Red Hat Big Data
Platform

Middleware

Hadoop
Integration
JBoss Data
Virtualization

Platform
RHEL
&
Optimization

op
ado
H n
o
ra
Apache
edo
F
Fedora
Big Data SIG

Hadoop

Hadoop
Distributions

Hadoop On
Red Hat Storage

Storage
60


Hadoop
On
OpenStack

Cloud /
Virtualization

Big Data and Data Virtualization

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Big Data and Data Virtualization

Semelhante a Big Data and Data Virtualization (20)

Mais de Kenneth Peeples

Mais de Kenneth Peeples (14)

Último

Último (20)

Big Data and Data Virtualization

Notas do Editor