Enviar pesquisa
Carregar
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integration
•
1 gostou
•
1,867 visualizações
DataWorks Summit
Seguir
Tecnologia
Vista de apresentação de diapositivos
Denunciar
Compartilhar
Vista de apresentação de diapositivos
Denunciar
Compartilhar
1 de 29
Recomendados
MapInfo Pro v2021 - Next Generation Location Analytics Made Easy
MapInfo Pro v2021 - Next Generation Location Analytics Made Easy
Precisely
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
inside-BigData.com
Optimizing Open Source for Greater Database Savings and Control
Optimizing Open Source for Greater Database Savings and Control
EDB
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA
Machine Learning Everywhere
Machine Learning Everywhere
DataWorks Summit
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
Peak Hosting
Hybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public Cloud
Ryan Lynn
Recomendados
MapInfo Pro v2021 - Next Generation Location Analytics Made Easy
MapInfo Pro v2021 - Next Generation Location Analytics Made Easy
Precisely
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
inside-BigData.com
Optimizing Open Source for Greater Database Savings and Control
Optimizing Open Source for Greater Database Savings and Control
EDB
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA
Machine Learning Everywhere
Machine Learning Everywhere
DataWorks Summit
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
Peak Hosting
Hybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public Cloud
Ryan Lynn
The Value of Postgres to IT and Finance
The Value of Postgres to IT and Finance
EDB
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Daniela Zuppini
MT125 Virtustream Enterprise Cloud: Purpose Built to Run Mission Critical App...
MT125 Virtustream Enterprise Cloud: Purpose Built to Run Mission Critical App...
Dell EMC World
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
Chad Lawler
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
David Spurway
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
IBM Spectrum Scale and Its Use for Content Management
IBM Spectrum Scale and Its Use for Content Management
Sandeep Patil
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Diego Alberto Tamayo
Nippon It Solutions Data services offering 2015
Nippon It Solutions Data services offering 2015
Vinay Mistry
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Mark Rittman
Transform DBMS to Drive Apps of Engagement Innovation
Transform DBMS to Drive Apps of Engagement Innovation
EDB
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
IBMInfoSphereUGFR
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
DataWorks Summit
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
Dell EMC World
Powerplay: Postgres and Lenovo for the Best Performance & Savings
Powerplay: Postgres and Lenovo for the Best Performance & Savings
EDB
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
Optimalisert datasenter
Optimalisert datasenter
Kenneth de Brucq
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
VMware Tanzu
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
Edureka!
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
Alasdair Gray
Mais conteúdo relacionado
Mais procurados
The Value of Postgres to IT and Finance
The Value of Postgres to IT and Finance
EDB
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Daniela Zuppini
MT125 Virtustream Enterprise Cloud: Purpose Built to Run Mission Critical App...
MT125 Virtustream Enterprise Cloud: Purpose Built to Run Mission Critical App...
Dell EMC World
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
Chad Lawler
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
David Spurway
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
IBM Spectrum Scale and Its Use for Content Management
IBM Spectrum Scale and Its Use for Content Management
Sandeep Patil
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Diego Alberto Tamayo
Nippon It Solutions Data services offering 2015
Nippon It Solutions Data services offering 2015
Vinay Mistry
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Mark Rittman
Transform DBMS to Drive Apps of Engagement Innovation
Transform DBMS to Drive Apps of Engagement Innovation
EDB
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
IBMInfoSphereUGFR
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
DataWorks Summit
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
Dell EMC World
Powerplay: Postgres and Lenovo for the Best Performance & Savings
Powerplay: Postgres and Lenovo for the Best Performance & Savings
EDB
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
Optimalisert datasenter
Optimalisert datasenter
Kenneth de Brucq
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
VMware Tanzu
Mais procurados
(20)
The Value of Postgres to IT and Finance
The Value of Postgres to IT and Finance
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
MT125 Virtustream Enterprise Cloud: Purpose Built to Run Mission Critical App...
MT125 Virtustream Enterprise Cloud: Purpose Built to Run Mission Critical App...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
IBM Spectrum Scale and Its Use for Content Management
IBM Spectrum Scale and Its Use for Content Management
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Nippon It Solutions Data services offering 2015
Nippon It Solutions Data services offering 2015
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Transform DBMS to Drive Apps of Engagement Innovation
Transform DBMS to Drive Apps of Engagement Innovation
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
Powerplay: Postgres and Lenovo for the Best Performance & Savings
Powerplay: Postgres and Lenovo for the Best Performance & Savings
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
Optimalisert datasenter
Optimalisert datasenter
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Destaque
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
Edureka!
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
Alasdair Gray
Webinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data Integration
SnapLogic
Summer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
ibi
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
Edureka!
ETL using Big Data Talend
ETL using Big Data Talend
Edureka!
Large scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
Destaque
(7)
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
Webinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data Integration
Summer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
ETL using Big Data Talend
ETL using Big Data Talend
Large scale ETL with Hadoop
Large scale ETL with Hadoop
Semelhante a Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integration
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
jdijcks
Skilwise Big data
Skilwise Big data
Skillwise Group
Accelerating Big Data Analytics
Accelerating Big Data Analytics
Attunity
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
Amazon Web Services
Présentation IBM InfoSphere Information Server 11.3
Présentation IBM InfoSphere Information Server 11.3
IBMInfoSphereUGFR
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
Precisely
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Michael Hiskey
Beyond TCO
Beyond TCO
DataWorks Summit/Hadoop Summit
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
EMC
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
Halo BI
Big data presentation (2014)
Big data presentation (2014)
Xavier Constant
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Rizaldy Ignacio
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
Nicolas Morales
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
7 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 2022
Safe Software
Semelhante a Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integration
(20)
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Skillwise Big Data part 2
Skillwise Big Data part 2
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Skilwise Big data
Skilwise Big data
Accelerating Big Data Analytics
Accelerating Big Data Analytics
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
Présentation IBM InfoSphere Information Server 11.3
Présentation IBM InfoSphere Information Server 11.3
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Beyond TCO
Beyond TCO
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
Big data presentation (2014)
Big data presentation (2014)
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
7 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 2022
Mais de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Mais de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Último
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Roshan Dwivedi
Último
(20)
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integration
1.
© 2014 IBM
Corporation | IBM Confidential Faster, cheaper, easier… and successful! Best practices for Big Data Integration 2014-06-05
2.
© 2014 IBM
Corporation | IBM Confidential Big Data Integration Is Critical For Success With Hadoop Extract, Transform, and Load Big Data With Apache Hadoop - White Paper https://software.intel.com/sites/default/files/article/402274/etl-big-data-with- hadoop.pdf “By most accounts 80% of the development effort in a big data project goes into data integration goes towards data analysis.” …and only 20% Most Hadoop initiatives involve collecting, moving, transforming, cleansing, integrating, exploring, and analysing volumes of disparate data sources and types.
3.
© 2014 IBM
Corporation | IBM Confidential Why is 80% of the effort in “data integration” Heterogeneity of data sources Optimizing performance Data Issues Missing or bad requirements Complexity Lack of understanding Diverse formats To be useful, the meaning and accuracy of the data should never be in question As such, data needs to be made fit-for- purpose so that it is used correctly and consistently Inhibitors – Both traditional & Big Data
4.
© 2014 IBM
Corporation | IBM Confidential Isn’t there any good news? most Hadoop initiatives will end up achieving “garbage in, garbage out” faster, against larger data volumes, at much lower total cost than without Hadoop YES Without effective Big Data Integration, you won’t have consumable data
5.
© 2014 IBM
Corporation | IBM Confidential Getting consumable data from the data lake The Data Lake unfettered Adding integration & governance discipline
6.
© 2014 IBM
Corporation | IBM Confidential Five Best Practices for Big Data Integration No Hand Coding Anywhere For Any Purpose One Data Integration And Governance Platform For The Enterprise Massively Scalable Data Integration Wherever It Needs To Run World-Class Data Governance Across The Enterprise Robust Administration And Operations Control Across The Enterprise 1 2 3 4 5
7.
© 2014 IBM
Corporation | IBM Confidential Best Practice #1 No hand coding anywhere, for any purpose • No hand coding for any aspect of Big Data Integration • Data access and movement across the enterprise • Data integration logic • Assembling data integration jobs from logic objects • Assembling larger workflows • Data governance • Operational and administrative management What does this mean
8.
© 2014 IBM
Corporation | IBM Confidential Cost of hand coding vs market-leading tooling 30 man days to write Almost 2,000 lines of code 71,000 characters No documentation Difficult to re-use Difficult to maintain 2 days to write Graphical Self Documenting Reusability More Maintainable Improved Performance Handcoding / Legacy DI Tooling / Info Server VS Saving in Dev Costs Our largest customers concluded years ago that they will not succeed with Big Data Initiatives without Information Server * Pharmaceutical Customer example 87% Winner Loser
9.
© 2014 IBM
Corporation | IBM Confidential Best Practice #1 No hand coding anywhere, for any purpose Lowers Costs • DI tooling reduces labor costs by 90% over hand coding • One set of skills and best practices leveraged across all projects Faster time to value • DI tooling reduces project timelines by 90% over hand coding • Much less time required to add new sources and new DI processes Higher quality data • Data profiling and cleansing are very difficult to implement using hand coding Effective data governance • requires world- class data integration tooling to support objectives like impact analysis and data lineage
10.
© 2014 IBM
Corporation | IBM Confidential Best Practice #2 One Data Integration And Governance Platform For The Enterprise • Build a job once and run it anywhere on any platform in the enterprise without modification • Access, move and load data between a variety of sources and targets across the enterprise • Support a variety of data integration paradigms • Batch processing • Federation • Change data capture • SOA enablement of data integration tasks • Real-time with transactional integrity • Self-service for business users • Support the establishment of world-class data governance across the enterprise What does this mean
11.
© 2014 IBM
Corporation | IBM Confidential Self-Service Big Data Integration On-Demand InfoSphere Data Click • Provides a simple web-based interface for any user • Move data in batch or real-time in a few clicks • Policy choices which are then automated, without any coding • Optimized runtime • Automatically captures metadata for built-in governance "I have a feeling before long Gartner will be telling us if we’re not doing this something is wrong.” – an IBM Customer
12.
© 2014 IBM
Corporation | IBM Confidential Optimized for Hadoop with blazing fast HDFS speeds Extends the same easy drag and drop paradigm, then simply add your hadoop server name and port number 15tb/hr Information Server engine has streaming parallelization techniques to pipe data in and out at massive scale Performance study run up to 15 TB/hr before HDFS disks were completely saturated
13.
© 2014 IBM
Corporation | IBM Confidential Make data available to Hadoop in real-time Non Invasive Record Capture Read data from transactional database logs, to minimize impact to source systems High Speed Data Replication Low-latency capture and delivery of real time information Consistently current Hadoop data Data available in Hadoop moments after it was committed in source databases to accelerate analytics currency
14.
© 2014 IBM
Corporation | IBM Confidential Best Practice #3 Massively Scalable Data Integration Wherever It Needs To Run Case 1. InfoServer parallel engine running against any traditional data source Case 2. Push processing into parallel database Case 4. Push processing into Hadoop MapReduce Case 5. InfoServer parallel engine running against HDFS without M/R Outside Hadoop Environment Within Hadoop Environment Case 3. Move and process data in parallel between environments Design Once • Develop the logic in the same manner regardless of execution platform Scale Anywhere • Execute the logic in any of the 5 patterns for scalable data integration … no single pattern is sufficient What does this mean Information Server is the only Big Data Integration platform supporting all 5 use cases
15.
© 2014 IBM
Corporation | IBM Confidential Dynamic Instantly get better performance as hardware resources are added Extendable Add a new server to scale out through simple text file edit (or, in grid config, automatically via integration with grid management software). Data Partitioned In true MPP fashion (like Hadoop) data persisted in the DI parallel to scale out the I/O. Sour ce Data Transform Cleanse Enrich EDW Information Server is Big Data Integration Disk CPU Memory Sequential Disk CPU Shared Memory CPUCPU CPU 4-way Parallel 64-way Parallel Uniprocessor SMP System MPP Clustered System
16.
© 2014 IBM
Corporation | IBM Confidential Information Server: Customers stories with Big Data Using the Information Server MPP Engine 200,000 programs built in Information Server on a grid/cluster of low commodity hardware Uses text analytics across 200 million medical documents a weekend, creating indexes to support optimal retrieval by users Desensitizes 200 TB of data one weekend each month to populate their dev environments Process 50,000 tps with complex transformation and guaranteed delivery Information Server powered grid processing over 40+ trillion rcds each month Global Bank Data Services Co Global Bank Health Care Health Care
17.
© 2014 IBM
Corporation | IBM Confidential Where should you run scalable data integration Run In the Database Advantages: • Exploit database MPP engine • Minimize data movement • Leverage database for joins/aggregations • Works best when data is already clean • Frees up cycles on ETL server • Use excess capacity on RDBMS server • Database faster for some processes Disadvantages: • Very expensive hardware and storage • Can force 100% reliance on ELT • Degradation of query SLAs • Not all ETL logic can be pushed into RDBMS (with ELT tool or hand coding) • Can’t exploit commodity hardware • Usually requires hand coding • Limitations on complex transformations • Limited data cleansing • Database slower for some processes • ELT can consume RDBMS capacity • (capacity planning is nontrivial) Run in the DI engine Advantages: • Exploit ETL MPP engine • Exploit commodity hardware and storage • Exploit grid to consolidate SMP servers • Perform complex transforms (data cleansing) that can’t be pushed into RDBMS • Free up capacity on RDBMS server • Process heterogenous data sources (not stored in the database) • ETL server faster for some processes Disadvantages: • ETL server slower for some processes (data already stored in relational tables) • May require extra hardware (low cost hardware) Run in Hadoop Advantages: • Exploit MapReduce MPP engine • Exploit commodity hardware and storage • Free up capacity on the database server • Support processing of unstructured data • Exploit Hadoop’s capabilities for persisting • data (e.g. updating and indexing) • Low cost archiving of history data Disadvantages: • Not all ETL logic can be pushed into RDBMS (with ELT tool or hand coding) • Can require complex programming • MapReduce will usually be much slower than parallel database or scalable ETL tool • Risk: Hadoop is still a young technology Big Data Integration requires a balanced approach that supports all of the above
18.
© 2014 IBM
Corporation | IBM Confidential Automated MapReduce Job Generation • Leverage the same UI and the same stages to automatically build MapReduce • Drag and drop stages to the canvas to create a job, rather than have to learn MapReduce programming. • Push the processing to the data for patterns when you don’t want to transport the data on the network.
19.
© 2014 IBM
Corporation | IBM Confidential © 2013 IBM Corporation Build integration jobs with the same data flow tool and stages Automatically creates MapReduce code. Automated MapReduce Job Generation
20.
© 2014 IBM
Corporation | IBM Confidential Big Data Integration also requires running scalable data integration workloads outside of the Hadoop MapReduce environment. Complex data integration logic can’t be pushed into a parallel database or MapReduce easily and efficiently, or at all in some cases. • IBM’s experiences with customers’ early Hadoop initiatives have shown that much of their data integration processing logic can’t be pushed into MapReduce. • Without Information Server, these more complex data integration processes would have to be hand coded to run in MapReduce, increasing project time, cost, and complexity MapReduce has significant and known performance limitations • For processing large data volumes with complex transformations (including data integration) • Many Big Data vendors and researchers are focusing on bypassing MapReduce performance limitations DataStage will process data integration 10X-15X faster than MapReduce. So why is Pushdown Of ETL Into MapReduce not sufficient for Big Data Integration
21.
© 2014 IBM
Corporation | IBM Confidential Best Practice #4 World-Class Data Governance Across The Enterprise What does this mean • Both IT and Line of business need to have a high degree of confidence in the data • Confidence requires that data is understood to be of high quality, secure and fit-for-purpose • Where does the data in my report come from? • What is being done with it inside of Hadoop? • Where was it before reaching our data lake? • Oftentimes these requirements extends from regulations within the specific industry
22.
© 2014 IBM
Corporation | IBM Confidential How well do your business users understand the content of the information in your Big Data stores? Are you measuring the quality of your information in Big Data? ? Why Is Data Governance Critical For Big Data?
23.
© 2014 IBM
Corporation | IBM Confidential All data needs to build confidence via a Fully Governed Data Lifecycle Find • Leverage Terms, Labels and Collections to find governed, curated data sources Curate • Add Labels, Terms, Custom Properties to relevant assets Collect • Use Collections to capture assets for a specific analysis or governance effort Collaborate • Share Collections for additional Curation and Governance Govern • Create and reference IG Policies and Rules • Apply DQ, Masking, Archiving, Cleansing, … to data Offload • Copy data in one click to HDFS for analysis for warehouse augmentation Analyze • Perform analyses on offloaded data Reuse & Trust • Understand how data is being used today via lineage for analyses and reports
24.
© 2014 IBM
Corporation | IBM Confidential Best Practice #5 Robust Administration And Operations Control Across The Enterprise What does this mean • Operations Management For Big Data Integration • provides quick answers for the operators, developers and other stakeholders as they monitor run-time environment • Workload Management allocates resource priority in a shared services environment and queues workload on a busy system • Performance Analysis provides insight into resource consumption to help understand when system may need more resources. • Build workflows that include hadoop activities defined via Oozie directly along with other data integration activities. • Administrative Management For Big Data Integration • Web-based installer for all Integration & Governance capabilities • High-available configurations for meeting 24/7 requirements • Instantly provision/deploy a new project instance • Centralized Authentication, Authorization, & Session Management • Audit logging of security related events to promote SOX compliance
25.
© 2014 IBM
Corporation | IBM Confidential Combined Workflows For Big Data • Simple design paradigm for workflows (same as for job design) • Mix any Oozie activity right alongside other data integration activities • Allows users to have the data sourcing, ETL, analytics and delivery of information all controlled through a single coordinating process • Monitor all stages through Operations Console’s web based interface 25
26.
© 2014 IBM
Corporation | IBM Confidential Automotive manufacturer uses Big Data Integration to build out global data warehouse Challenges • Need to manage massive amounts of vehicle data – about 5TB per day • Need to understand, incorporate, and correlate a variety of data sources to better understand problems and product quality issues • Need to share information across functional teams for improved decision making Business Benefits • Doubled number of models for JD Power award for initial quality study in 1 year • Improved and streamlined decision making and system efficiency • Lowered warranty costs IT Benefits • Single infrastructure to consolidate structured, semi- structured and unstructured data; simplified management Optimize existing Teradata environment – size, performance, and TCO High-performance ETL for in-database transformations 26
27.
© 2014 IBM
Corporation | IBM Confidential Other Examples of Proven Value of IBM Big Data Integration european telco • ELT pushdown into the database and Hadoop was not sufficient for Big Data Integration • InfoSphere DataStage runs some DI processes faster than the parallel database and MapReduce wireless carrier • InfoSphere Information Server can transform a dirty Hadoop lake into a clean Hadoop lake • IBM met requirements for processing 25 terabytes in 24 hours • Capabilities of InfoSphere Information Server and InfoSphere Optim data masking all helped to produce a clean Hadoop lake insurance company • IBM could shred complex XML claims messages and flatten them for Hadoop reporting and analysis • IBM could meet all requirements for large- scale batch processing • Information Server could adjust for changes in XML structure while tolerating future unanticipated changes
28.
© 2014 IBM
Corporation | IBM Confidential Promote Object Reuse Build once, share, and run anywhere (ETL/ELTt/real-time) Reduce Operational Cost Provides a robust framework to manage data integration Protect from Changes isolation from underlying technologies changes as they continue to evolve Summing it up Increase ROI for Hadoop via 5 Best Practices for Big Data Integration Speed Productivity Graphical design easier to use than hand coding Simplify Heterogeneity Common method for diverse data sources Shorten Project Cycles Pre-built components reduce cost and timelines
29.
© 2014 IBM
Corporation | IBM Confidential Thank you