FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
Meet the experts dwo bde vds v7
1. Meet the Experts Series
How to use the Informatica Big Data
Edition and Vibe Data Stream for Hadoop-
based Data Warehouse Offloading
Informatica Product Desk
Murthy Mathiprakasam, Principal Product Marketing Manager
Sumeet Agrawal, Principal Product Manager
Jeff Rydz, Director of Big Data Solutions
Amrish Thakkar, Senior Product Manager
Knowledge
Series
2. Informatica Big Data Edition
Standard Edition
High Productivity Data Integration
Governance Edition
Comprehensive Data Governance
Lineage &
Glossary
Profile Parse
Discover
ETL Profile Parse ETL
Cleanse
Includes restricted use Vibe Data Stream
3. Informatica PowerExchange & Vibe Data Stream
Vibe Data Stream
Real-time Data Integration
Multiple
Targets
Real-Time
Collection
Easy
Deployment
Highly
Available
Guaranteed
Delivery
Continuous
Streaming
PowerExchange
Batch Data Integration
Cloud &
SaaS Apps
Relational
& Flat Files
Hadoop &
NoSQL
MPP
Appliances
Social Data
Enterprise
Applications
4. 4
Your Mission
Deploy the right workloads
On the right platforms
So the right people
Get the right data
At the right time
What’s the Mission of Every Data Services Team?
5. Meet the Experts Series
How to use the Informatica Big Data
Edition and Vibe Data Stream for Hadoop-
based Data Warehouse Offloading
Informatica Product Desk
Murthy Mathiprakasam, Principal Product Marketing Manager
Sumeet Agrawal, Principal Product Manager
Jeff Rydz, Director of Big Data Solutions
Amrish Thakkar, Senior Product Manager
Knowledge
Series
7. Data Warehouses Are Not Optimized For Modern Needs
7Source: Appfluent
More
Data
Supply
More
Data
Demand
80%
20%
Transformations
/ Data Loads
Analytical
Queries
Data Warehouse
Resource Utilization
8. Hadoop Can Help Drive Efficiency & Scalability
8
Machine Device,
Cloud
Relational, Mainframe
Social Media,
Web Logs
Data
Warehouse
Focused on
Analytics
Hadoop
Focused On
Data
Preparation
Source
Data
9. But Enterprises Are Approaching Hadoop With Caution
9
Slow Time
To Production
Challenging
to Staff
Risk of
Rework
10. Informatica Helps Lower Costs & Lower Risks
10
5X Developer
Productivity
Easier to
Staff
Easier to Adopt
Innovations
CleanseDiscover
Profile Parse ETL
Greater Efficiency Today, Higher Confidence For Tomorrow
Informatica Big Data Edition
Lineage &
Glossary
14. 14
Data Is Growing and More Distributed
TB
Time
Social media
Web logs
Sensor data
15. 15
But Organizations Are Struggling to Harness It
Incomplete
Data Sets
Expensive
To Store
Low Fidelity
Analytics
16. Informatica Helps Lower Costs & Harness Real Time Data
16
Ingest Higher
Data Volumes
Lower Cost
Storage
Analytics On
All Data
10X Faster Streaming Technology
Informatica Vibe Data Stream
Data warehouse storage and CPU utilization are constrained by growing supply of data and demand for analytics
Pushdown data transformations consume excess CPU cycles
Analytical performance suffers
Forces expansion of expensive platforms
In addition, Informatica was quick to adopt this new data platform so organizations could use skills they already had today for ETL and data quality. In fact, with Informatica, developers can increase their productivity up to 5x while dramatically lowering both infrastructure costs and ongoing operational costs associated with BI/DW
Hadoop is ideally suited for unlimited data storage and processing and complex data analytics, often at 10 to 100 times less cost than traditional systems.
But when Hadoop first began growing in popularity there was a lack of tooling so that developers had to resort to hand-coding ETL workloads in new languages and with a new shared-nothing paradigm called MapReduce.
So while organizations could dramatically lower the cost of their infrastructure, ongoing operational labor costs continued to be a challenge. Hadoop developer skills are in high-demand and therefore can be difficult to find and retain.
Publish Subscribe
Vibe Data Stream for Machine Data provides the ability to efficiently perform high volume (throughput), high velocity (speed), & high scale (large # of end points) streaming data collection across wide variety of sources over LAN & WAN environments to enable real-time & big data analytics, operational intelligence, and enterprise data warehousing.
Some of the features and benefits of Vibe Data Stream are:
Established high performance (>10X) real-time solution by leveraging fastest and most reliable high performance messaging technology
UM messaging is a brokerless messaging system and this eliminates a lot of issues with traditional systems such as single point of failure, multiple hops, bottle neck, etc. This allows high performance and reliability with lower operational costs.
High throughput solution for streaming, and guaranteed delivery
Out of box support for wide variety of data sources (Sensors, Mobile Devices, log files, IoT, etc)
High availability and Reliability
Enterprise grade: Simplified configuration, deployment, administration and monitoring
Vibe Data Stream front end is integrated in Informatica Admin Console and allows the user to manage and monitor the topology from within Admin Console.
Vibe Data Stream leverages Apache zookeeper for configuration management. Once the user has defined the topology, deploying the topology will push the configuration into zookeeper. VDS nodes as they come up, will pull configuration from zookeeper to start with their operation. New Sources and Targets can be deployed without impacting the currently operational nodes. User can also add multiple nodes and load balance traffic across those nodes.
VDS nodes are using Ultra Messaging as an infrastructure and as a result as very light weight. This allows you to embed VDS node in devices with limited resources (CPU, memory, etc).
High performance/efficient streaming data collection over LAN/WAN
GUI interface provides ease of configuration, deployment & use
Continuous ingestion of real-time generated data (sensors; logs; etc.). Machine generated & other data sources
Enable real-time interactions & response
Real-time delivery directly to multiple targets (batch/stream processing)
Highly available; efficient; scalable
Available ecosystem of light weight agents (sources & targets)
Big Data Edition Trials for Cloudera and Hortonworks
Free trial for Vibe Data Stream
Data Warehouse Optimization reference architecture co-written with Cloudera
Informatica – Cloudera collaborative training course
Data Warehouse optimization whitepaper for Informatica and MapR
INFA/Hortonworks/Teradata joint webinar
Tuesday, September 16, 2014
Hadoop 2.0: YARN to Further Optimize Data Processing
12:00 PM Eastern / 9:00 AM Pacific
Data is exponentially increasing in both types and volumes, creating opportunities for businesses. To fully realize the potential of this new data, analysts recommend the shift from a single platform to a data ecosystem. Multiple systems are needed to exploit the variety and volume of data sources. A flexible data repository such as a data lake is needed to store the data. Technologically speaking Apache Hadoop 2 enables true data lake architectures. The introduction of YARN in particular added a pluggable framework that enabled new data access patterns in addition to MapReduce. An intelligent data management layer is needed to manage metadata and usage patterns as well as track consumption across these data platforms. Join us in this webinar as our panel of experts discusses how Hadoop can be used alongside the Enterprise Data Warehouse and with Data Integration tools to enable the optimization of data processing workloads for more efficient use of resources.