SlideShare uma empresa Scribd logo
1 de 25
Journey to the Data Lake:
How Progressive Paved a Faster, Smoother
Path to Increase Enterprise Adoption
Krishna Potluri – Big Data Tech Lead, Progressive
@TenduYogurtcu – CTO, Syncsort
Data Liberation, Integrity & Integration for Next-Generation Analytics
Marquee global customer base of leaders and
emerging businesses across all major industries
Trusted Industry Leadership
We provide unique data management solutions
and expertise to over 2,500 large enterprises worldwide
with an unmatched focus on customer success & value
Best Quality, Top Performance, Lower Costs
Our proven software efficiently delivers all critical enterprise
data assets with the highest integrity to Big Data
environments, on premise or in the cloud
Highly Acclaimed & Award Winning
• Data Quality “Leader” in Gartner Magic Quadrant
• IT World Awards® 2016 “Innovations in IT” Gold Winner
• Database Trends & Applications “Companies That Matter
Most in Data”
• Mainframe Access & Integration
for Application Data
• High-Performance ETL
Data Access & Transformation
• Mainframe Access & Integration
for Machine Data
Data Infrastructure
Optimization
Data Quality
• Big Data Quality & Integration
• Data Enrichment & Validation
• Data Governance
• Customer 360
• Enterprise Data Warehouse
Optimization
• Application Modernization
• Mainframe Optimization
Syncsort Makes All Data Accessible & Usable for Analytics
Syncsort + Hortonworks Solution
Enabling the Enterprise Data Lake
Fast. Secure. Enterprise Grade.
ETL Onboarding Mainframe Access & Integration
• Free up your budget: dramatically reduce EDW
costs and avoid continual upgrades
• Get fresher data: from right-time to real-time
• Keep all data as long as you want: from weeks
to months, years and beyond
• Optimize your data warehouse: faster database
queries for faster analysis
• Blend new data, fast: from structured to
unstructured. From mainframe to the Cloud
• Deliver powerful new insights: combine
mainframe data with Big Data
• Securely access mainframe data when you need
it: directly access and translate mainframe data
• Reduce storage costs: Go from $100K/TB to
$2K/TB by migrating data to HDFS
• Use your MIPS wisely: Save an average of up to
$7K per MIPS when offloading batch workloads to
Hadoop.
Syncsort + Hortonworks Reference Architecture
• Apache Ambari Integration
• Deploy DMX-h across cluster
• Monitor DMX-h jobs
• Process in MapReduce or Spark
• Source relational and non relational data (including
mainframes)
• Syncsort DMX-h & HDF – batch and real time
• Out-of-the-box integration, interoperability &
certifications
• Kerberos-secured clusters
• Apache Sentry/Ranger security certified
• Early beta, release certification
• Metadata lineage export from DMX
• Atlas integration
Our Strategy: Simplify Big Data Integration
• Deploy on premise or in the cloud
• Choose among multiple execution frameworks – Hadoop, Spark, Spark 2.0,
Linux, Unix, Windows
• Integrate streaming and batch data with a single data pipeline for
innovative applications, like IoT
• Future-proof applications to avoid re-writing jobs in order to take
advantage of innovations in new execution frameworks
• Access and integrate ALL enterprise data sources – including mainframe –
for advanced analytics
How Progressive Paved a Faster, Smoother
Path to Increase Enterprise Adoption
Krishna Potluri
Big Data Tech Lead
JOURNEY TO THE DATA LAKE:
Big Data Use Cases
Telematics Ads Claims
External
Data
Progressive
Data
Hive
ORC
Batch
Batch
Hortonworks Cluster
Hive / Java
Sqoop / Java
Hive
Text
Staging Target
MPP
Appliance
Batch
On-Demand
PolyBase
Oozie
Pig
Hive
M/R
Batch Batch
Hive
Instrumentation
Hive
Balancing
Scheduler
Tools and Ingestion Flow - Past
• Developer Skill Set - Mainframe and SSIS
• Hadoop Upgrade Issues
• Maven, Java, and Artifactory
• Custom Oozie Actions
• Custom Oozie Triggers
• Pig and Hive UDFs
• Custom Pig Loaders
• Complex N way Hive Reports using JDBC
• Custom Hive Serdes
• Complex Oozie workflows, bundle, coordinators
Support Issues
Hadoop
Microsoft
PDW
INGEST
ETL
IT WAREHOUSE TEAMS
BUSINESS ANALYSTS
Queue
Ingestion Team
Centralized Team for Hadoop
1. Custom ETL in Pig and Hive:
• Tied up resources
• Maintenance and knowledge transfer issues
2. Ingestion projects:
• No consistency in code
• Reconciliation and Balancing Issues
• Each developer had their own pattern
3. Projects were queued up!
Again, Support Issues
Requirements Tools Selection
• Industrial Strength
• Run natively on Hadoop
• All Data Sources Support
• Usable by Non-Java Programmers
• Security/Cloud/IDE
• Pricing/Stability
• Enterprise Support
• Ability to customize frameworks
• At pace with Hortonworks
• Fit with Internal Build and Elevate
Syncsort
Talend
Informatica
Actian
CDAP
Cascading
Syncsort
Ingestion/ETL Tool Selection
External
Data
Progressive
Data
Hive
ORC
Batch
Batch
Hortonworks Cluster
DMX-h
DMX-h
Hive
Text
Staging Target
Microsoft
PDW
Batch
On-Demand
PolyBase
DMX-h
Batch Batch
Scheduler
Data Lake Ingestion - Current
Data Node
Name Node
Hortonworks Cluster
Secondary
Name Node
DMX-h
Data Node
DMX-h
Data Node
DMX-h
Data Node
DMX-h
…
Edge
Node
Edge
Node
DMX-h
Daemon
DMX-h
Daemon
Firewall
Developer
Machine
DMX-h Client
Non-Prod
Prod
Build
Machine
Interactive
Deploy
TFS
Syncsort Implementation
Sqoop Syncsort DMX-h
Connectivity Database Driver Jars Data Direct
In-Memory Transformations Not Supported Supported
Parallelism Supported Supported
Edge-Node Ingestion Not Supported Supported
Sqoop vs. Syncsort DMX-h
Syncsort DMX-h DTL Syncsort DMX-h GUI
Complexity High Low
Development Time High Medium
Flexibility High Medium
Reuse High Medium
Choosing the Right Interface for the Job
DMX-h DTL
Copy Task
DMX-h GUI
Copy Task
Syncsort DMX-h DTL vs. GUI - Visual
No CDC
Incremental
Extract
Full Extract
CDC On
Source
Hive
(ORC)
Hive
(Text)
7 Days
Rolling
Partition
Hive
(ORC)
Daily
Partitioned
Staging
Raw
History
Target
Current View
DMX-h
DTL
I/U/D I/U
CDC Patterns – Syncsort DMX-h DTL
DMX-h
DTL
DataFunnel DMX-h
DTL
HDFS
Hive
Schema Generate
Ingest
In-Memory
Transformations
Multiple Data Sources
Multiple Tables
DataFunnel
UI
Syncsort DataFunnel™
TM
TM
Hadoop
Metadata
Collector
HDFS
Sqoop
Hadoop Job
Configuration
UI
Hadoop Ingestor Service
Sources
Meta
Create Job Edit Job
Hadoop Ingestor Service
JDBC Hive
DMX-h
Code
Hadoop Ingestor
Table 1
Table 2
Table n
Pattern 2
Hive
Staging
Hive
Raw
Hive
Target
Hive
Reconciliation
Hive
Reconciliation
Pattern 3
Hive
Staging
Hive
Raw
Hive
Target
HiveSqoop DTL
Hive
Reconciliation
Hive
Reconciliation
Pattern 4
Hive
Staging
Hive
Raw
DTLDTL
Hive
Reconciliation
Hive
Reconciliation
Hive
Reconciliation
Hive
Reconciliation
Staging Raw Target
Hive
Target
DTL
Hive
Reconciliation
Logging Metrics Balancing
Hadoop Ingestor Service Flow
DMX-h
DTL
DMX-h
DTL
DMX-h
DTL
DMX-h
DTL
DMX-h
DTL
DMX-h
DTL
DMX-h
DTL
• One Code Base for Ingestion
• Choice of CDC Patterns
• Every Table is ingested the same way
• Jobs don’t need to be recoded for new framework - DMX-h Shields us
• Every table is reconciled and balanced
• Capture metrics around run time and row counts
• Ingestion time remains the same no matter the table count
“ Ingestion has gone from days to hours, and Progressive IT has
a single entry point and code base for Data Lake Ingestion ”
Hadoop Ingestor + Syncsort DMX-h (Benefits)
- Ingestion Team
THANK
YOU
Learn More!
Visit us at Booth #1102
Get a demo & pick up your
Data Liberator t-shirt!

Mais conteúdo relacionado

Mais procurados

Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at WalgreensDataWorks Summit
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OSCuneyt Goksu
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...DataWorks Summit
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...DataWorks Summit
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 

Mais procurados (20)

Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 

Semelhante a Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to Increase Enterprise Adoption

Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA
 
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Precisely
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectPrecisely
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Precisely
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Streaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use CasesStreaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use CasesPrecisely
 
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...Precisely
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseAltibase
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectPeak Hosting
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Precisely
 

Semelhante a Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to Increase Enterprise Adoption (20)

Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
 
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Streaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use CasesStreaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use Cases
 
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Talend introduction v1
Talend introduction v1Talend introduction v1
Talend introduction v1
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to Increase Enterprise Adoption

  • 1. Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to Increase Enterprise Adoption Krishna Potluri – Big Data Tech Lead, Progressive @TenduYogurtcu – CTO, Syncsort
  • 2. Data Liberation, Integrity & Integration for Next-Generation Analytics Marquee global customer base of leaders and emerging businesses across all major industries Trusted Industry Leadership We provide unique data management solutions and expertise to over 2,500 large enterprises worldwide with an unmatched focus on customer success & value Best Quality, Top Performance, Lower Costs Our proven software efficiently delivers all critical enterprise data assets with the highest integrity to Big Data environments, on premise or in the cloud Highly Acclaimed & Award Winning • Data Quality “Leader” in Gartner Magic Quadrant • IT World Awards® 2016 “Innovations in IT” Gold Winner • Database Trends & Applications “Companies That Matter Most in Data” • Mainframe Access & Integration for Application Data • High-Performance ETL Data Access & Transformation • Mainframe Access & Integration for Machine Data Data Infrastructure Optimization Data Quality • Big Data Quality & Integration • Data Enrichment & Validation • Data Governance • Customer 360 • Enterprise Data Warehouse Optimization • Application Modernization • Mainframe Optimization
  • 3. Syncsort Makes All Data Accessible & Usable for Analytics
  • 4. Syncsort + Hortonworks Solution Enabling the Enterprise Data Lake Fast. Secure. Enterprise Grade. ETL Onboarding Mainframe Access & Integration • Free up your budget: dramatically reduce EDW costs and avoid continual upgrades • Get fresher data: from right-time to real-time • Keep all data as long as you want: from weeks to months, years and beyond • Optimize your data warehouse: faster database queries for faster analysis • Blend new data, fast: from structured to unstructured. From mainframe to the Cloud • Deliver powerful new insights: combine mainframe data with Big Data • Securely access mainframe data when you need it: directly access and translate mainframe data • Reduce storage costs: Go from $100K/TB to $2K/TB by migrating data to HDFS • Use your MIPS wisely: Save an average of up to $7K per MIPS when offloading batch workloads to Hadoop.
  • 5. Syncsort + Hortonworks Reference Architecture • Apache Ambari Integration • Deploy DMX-h across cluster • Monitor DMX-h jobs • Process in MapReduce or Spark • Source relational and non relational data (including mainframes) • Syncsort DMX-h & HDF – batch and real time • Out-of-the-box integration, interoperability & certifications • Kerberos-secured clusters • Apache Sentry/Ranger security certified • Early beta, release certification • Metadata lineage export from DMX • Atlas integration
  • 6. Our Strategy: Simplify Big Data Integration • Deploy on premise or in the cloud • Choose among multiple execution frameworks – Hadoop, Spark, Spark 2.0, Linux, Unix, Windows • Integrate streaming and batch data with a single data pipeline for innovative applications, like IoT • Future-proof applications to avoid re-writing jobs in order to take advantage of innovations in new execution frameworks • Access and integrate ALL enterprise data sources – including mainframe – for advanced analytics
  • 7. How Progressive Paved a Faster, Smoother Path to Increase Enterprise Adoption Krishna Potluri Big Data Tech Lead JOURNEY TO THE DATA LAKE:
  • 8. Big Data Use Cases Telematics Ads Claims
  • 9. External Data Progressive Data Hive ORC Batch Batch Hortonworks Cluster Hive / Java Sqoop / Java Hive Text Staging Target MPP Appliance Batch On-Demand PolyBase Oozie Pig Hive M/R Batch Batch Hive Instrumentation Hive Balancing Scheduler Tools and Ingestion Flow - Past
  • 10. • Developer Skill Set - Mainframe and SSIS • Hadoop Upgrade Issues • Maven, Java, and Artifactory • Custom Oozie Actions • Custom Oozie Triggers • Pig and Hive UDFs • Custom Pig Loaders • Complex N way Hive Reports using JDBC • Custom Hive Serdes • Complex Oozie workflows, bundle, coordinators Support Issues
  • 11. Hadoop Microsoft PDW INGEST ETL IT WAREHOUSE TEAMS BUSINESS ANALYSTS Queue Ingestion Team Centralized Team for Hadoop
  • 12. 1. Custom ETL in Pig and Hive: • Tied up resources • Maintenance and knowledge transfer issues 2. Ingestion projects: • No consistency in code • Reconciliation and Balancing Issues • Each developer had their own pattern 3. Projects were queued up! Again, Support Issues
  • 13. Requirements Tools Selection • Industrial Strength • Run natively on Hadoop • All Data Sources Support • Usable by Non-Java Programmers • Security/Cloud/IDE • Pricing/Stability • Enterprise Support • Ability to customize frameworks • At pace with Hortonworks • Fit with Internal Build and Elevate Syncsort Talend Informatica Actian CDAP Cascading Syncsort Ingestion/ETL Tool Selection
  • 15. Data Node Name Node Hortonworks Cluster Secondary Name Node DMX-h Data Node DMX-h Data Node DMX-h Data Node DMX-h … Edge Node Edge Node DMX-h Daemon DMX-h Daemon Firewall Developer Machine DMX-h Client Non-Prod Prod Build Machine Interactive Deploy TFS Syncsort Implementation
  • 16. Sqoop Syncsort DMX-h Connectivity Database Driver Jars Data Direct In-Memory Transformations Not Supported Supported Parallelism Supported Supported Edge-Node Ingestion Not Supported Supported Sqoop vs. Syncsort DMX-h
  • 17. Syncsort DMX-h DTL Syncsort DMX-h GUI Complexity High Low Development Time High Medium Flexibility High Medium Reuse High Medium Choosing the Right Interface for the Job
  • 18. DMX-h DTL Copy Task DMX-h GUI Copy Task Syncsort DMX-h DTL vs. GUI - Visual
  • 19. No CDC Incremental Extract Full Extract CDC On Source Hive (ORC) Hive (Text) 7 Days Rolling Partition Hive (ORC) Daily Partitioned Staging Raw History Target Current View DMX-h DTL I/U/D I/U CDC Patterns – Syncsort DMX-h DTL DMX-h DTL
  • 20. DataFunnel DMX-h DTL HDFS Hive Schema Generate Ingest In-Memory Transformations Multiple Data Sources Multiple Tables DataFunnel UI Syncsort DataFunnel™ TM TM
  • 21. Hadoop Metadata Collector HDFS Sqoop Hadoop Job Configuration UI Hadoop Ingestor Service Sources Meta Create Job Edit Job Hadoop Ingestor Service JDBC Hive DMX-h Code Hadoop Ingestor
  • 22. Table 1 Table 2 Table n Pattern 2 Hive Staging Hive Raw Hive Target Hive Reconciliation Hive Reconciliation Pattern 3 Hive Staging Hive Raw Hive Target HiveSqoop DTL Hive Reconciliation Hive Reconciliation Pattern 4 Hive Staging Hive Raw DTLDTL Hive Reconciliation Hive Reconciliation Hive Reconciliation Hive Reconciliation Staging Raw Target Hive Target DTL Hive Reconciliation Logging Metrics Balancing Hadoop Ingestor Service Flow DMX-h DTL DMX-h DTL DMX-h DTL DMX-h DTL DMX-h DTL DMX-h DTL DMX-h DTL
  • 23. • One Code Base for Ingestion • Choice of CDC Patterns • Every Table is ingested the same way • Jobs don’t need to be recoded for new framework - DMX-h Shields us • Every table is reconciled and balanced • Capture metrics around run time and row counts • Ingestion time remains the same no matter the table count “ Ingestion has gone from days to hours, and Progressive IT has a single entry point and code base for Data Lake Ingestion ” Hadoop Ingestor + Syncsort DMX-h (Benefits) - Ingestion Team
  • 25. Learn More! Visit us at Booth #1102 Get a demo & pick up your Data Liberator t-shirt!

Notas do Editor

  1. Move Access and Transformation to the top
  2. We focus on two main use cases: ETL or ELT onboarding– Moving data processing from expensive platforms like Teradata and moving it to Hadoop One of Our biggest use case is what we call mainframe access and integration
  3. Syncsort/Hortonworks reference architecture Deployed by Ambari On every node Data movement and transformation MapReduce or Spark
  4. Syncsort’s data integration has always delivered the ability to process large data volumes, in less time, with fewer resources. However, performance and efficiency are just are starting points. It became apparent in speaking with our customers a few years ago – particularly when “Big Data” and Hadoop took off – that they were facing new challenges that had a common theme of complexity. The rapid evolution of Big Data technologies presents several challenges on its own: New technologies require new specialized skills that continue to be in short supply, and very expensive if you can find them Because execution frameworks continue to be improved, customers don’t want to feel locked in. They don’t want to have to redevelop all their jobs if they want to take advantage of innovative new frameworks. A great example is MapReduce v1 to MapReduce v2 to Spark. New sources and types of data add to the complexity as well – with streaming sources as a recent example. Connectivity and skills challenges. Many of our customers are large enterprises that still have a significant reliance on the mainframe – and these companies found it very difficult to leverage the mainframe with the rest of the Big Data integration strategy Organizations were building data lakes but then struggling to fill them. We heard from customers and partners alike that ingesting data from all its enterprise sources – Mainframe, data warehouse, etc. -- into Hadoop was a big problem to solve. So, our product strategy has focused on not only delivering data integration products with exceptional performance and efficiency and lower TCO – but to also simplify the data integration process for all enterprise data sources and across all platforms – Linux, Unix, Windows, Hadoop, Spark – on premise, or in the cloud.