SlideShare a Scribd company logo
1 of 45
Download to read offline
2018 Big Data Trends:
Liberate, Integrate, and Trust Your Data
Paige Roberts, Big Data Product Marketing Manager
Today’s Speaker
2Syncsort Confidential and Proprietary - do not copy or distribute
Product Marketing Manager
•DMX/DMX-h
•DataFunnel™
•DMX Change Data Capture
Paige Roberts
Agenda
Who is Syncsort and Why Did We Do This Survey
Big Picture on Big Data
Who Participated in the Big Data Trends Survey
5 Big Data Trends and How Syncsort Addresses Them
– 1. More Enterprise Data Flows Into the Data Lake
– 2. Data Quality Moves to Center Stage
– 3. Data Governance Expands
– 4. Data Lakes Stay Fresher
– 5. Big Data: Stronger than Ever
How Syncsort Addresses These Trends
Questions
3Syncsort Confidential and Proprietary - do not copy or distribute
Who is Syncsort?
Syncsort: Trusted Industry Leadership
Syncsort Confidential and Proprietary - do not copy or distribute 5
500+
Experienced & Talented
Data Professionals
>7,000
Customers
1968
50 Years of Market Leadership
& Award-Winning Customer Support
84
of Fortune 100 are Customers
3x
Revenue Growth
In Last 12 Months
The global leader in Big Iron to Big Data
Use Cases & Strategic Partnerships
Syncsort Confidential and Proprietary - do not copy or distribute
Data
Infrastructure Optimization
• Mainframe Optimization
• Application Modernization
• EDW Optimization
• Cross-Platform Capacity
Management
Data
Availability
• High Availability & Disaster
Recovery
• Mission-Critical Migration
• Cross-Platform Data Sharing
• IBM i Data Security & Audit
• Mainframe Access &
Integration for Machine Data
• Mainframe Access &
Integration for App Data
• High-performance ETL
Data
Integration
Data
Quality
• Data Governance
• Customer 360
• Big Data Quality & Integration
• Data Enrichment & Validation
Big Iron to Big Data
A fast-growing market segment composed of solutions that optimize traditional data systems and
deliver mission-critical data from these systems to next-generation analytic environments.
6
Big Picture on Big Data
Advantages of the Modern Big Data Architecture
8Syncsort Confidential and Proprietary - do not copy or distribute
What do customers want to use their Hadoop clusters for?
9Syncsort Confidential and Proprietary - do not copy or distribute
1.ETL
2.Analytics*
3.Data Blending
4.Active Archive
5.EDW / Mainframe
Optimization
Implementation Challenges
10Syncsort Confidential and Proprietary - do not copy or distribute
1. Data Quality: Assessing and improving quality of
data as it enters and/or in the data lake.
2. Skills/Staff: Need to learn a new set of skills,
Hadoop programmers are difficult to find and/or
expensive.
3. Data Governance: Including data lake in
governance initiatives and meeting regulatory
compliance.
4. Rapid Change: Frameworks and tools evolve fast,
and it’s difficult to keep up with the latest tech.
5. Fresh Data (CDC): Difficult to keep data lake up-to-
date with changes made on other platforms.
6. Mainframe: Difficult to move mainframe data in
and out of Hadoop/Spark.
7. Data Movement: Difficult to move data in and out
of Hadoop/Spark.
0
5
10
15
20
25
30
35
40
45
% of People Who Consider this a Top Challenge (Rated 1 or 2)
Big Data Challenges
Data Quality Skills Governance
Rapid Change CDC Mainframe
Data Movement Cost Connectivity
Who Participated in the Survey
Who Participated in the Big Data Trends Survey?
12Syncsort Confidential and Proprietary - do not copy or distribute
Main Industries
Represented:
1. Financial Services
2. Healthcare
3. Information Services
4. Government
5. Retail
6. Insurance
1. Data Architects
2. Developers
3. IT Managers
4. Data Scientists
5. Variety of other roles
Five 2018 Big Data Trends
1. More Data Flows Into the Data Lake
Implementation Challenges
15Syncsort Confidential and Proprietary - do not copy or distribute
1. Data Quality: Assessing and improving quality of
data as it enters and/or in the data lake.
2. Skills/Staff: Need to learn a new set of skills,
Hadoop programmers are difficult to find and/or
expensive.
3. Data Governance: Including data lake in
governance initiatives and meeting regulatory
compliance.
4. Rapid Change: Frameworks and tools evolve fast,
and it’s difficult to keep up with the latest tech.
5. Fresh Data (CDC): Difficult to keep data lake up-to-
date with changes made on other platforms.
6. Mainframe: Difficult to move mainframe data in
and out of Hadoop/Spark.
7. Data Movement: Difficult to move data in and out
of Hadoop/Spark.
0
5
10
15
20
25
30
35
40
45
% of People Who Consider this a Top Challenge (Rated 1 or 2)
Big Data Challenges
Data Quality Skills Governance
Rapid Change CDC Mainframe
Data Movement Cost Connectivity
What data do people need to get into their Hadoop clusters?
16Syncsort Confidential and Proprietary - do not copy or distribute
1. Relational Databases
2. Enterprise Data Warehouses
3. NoSQL Databases and Third Party Data
4. Cloud repositories
5. Mainframe data
6. Web / Mobile / Social Media data
7. AIX Power Systems and IBM I data
8. Machine / Sensor data
69% RDMS
46%
Enterprise Data Warehouse
45%
41%
32%
30%
30%
0.5%
18%
62%
NoSQL Databases
Files from Third Party Data, Providers or Partners
Cloud Repositories
Mainframe
Web/Mobil/Social Media
AIX Power Systems
Machines/Sensors
Other
IBM i
16 %
How Valuable is Mainframe and IBM i Data in a Data Lake?
17Syncsort Confidential and Proprietary - do not copy or distribute
Over 97% of respondents with mainframes believe its
valuable to access and integrate that data in the data lake.
Over 90% of organizations that have IBM i say it is valuable
to integrate that data with Hadoop.
Populating the Data Lake with Progressive
• Easily access and integrate
operational data, such as
Claims Liability, Policy,
Customer and Incident data,
for advanced analytics.
• Fill Hortonworks Data Lake
with 500+ tables from
Mainframe DB2, Oracle and
SQL Server, for cost-effective
storage and analytics
• Track day-to-day changes in
the data
Challenge Solution
• DMX DataFunnel easily and
quickly ingested all database
tables with the click of a
button
• DMX-h used on Hortonworks
Data Platform cluster to
determine daily changes from
both full and incremental data
files
• Simplicity: Single tool to
ingest, detect changes and
populate the data lake
• Faster Development &
Implementation:
DataFunnel ingested data
much faster than using open
source tools.
• Skills: Developers don’t need
in-depth knowledge of
Hadoop
• Insight: Better analytics with
readily-accessible
operational data
• Compliance –Ability to build
audit trails & keep the EDW
current
• Agility: Reclaim
development time by
automating, optimizing and
future-proofing
development
• Costs: Lower archival costs
The Progressive Group of Insurance Companies lives up to its name by being one step ahead of the insurance industry,
innovating with the latest technology to make it easy to understand, buy and use auto insurance. They began offering
the first drive-in claims office in 1937, pioneered online auto insurance policy sales in 1997, and customize premiums
based on customer’s actual driving patterns. Progressive has been recognized as a top business technology innovator by
InformationWeek 17 years in a row.
Benefit Business Value
2. Data Quality Moves to Center Stage
Implementation Challenges
20Syncsort Confidential and Proprietary - do not copy or distribute
1. Data Quality: Assessing and improving quality of
data as it enters and/or in the data lake.
2. Skills/Staff: Need to learn a new set of skills,
Hadoop programmers are difficult to find and/or
expensive.
3. Data Governance: Including data lake in
governance initiatives and meeting regulatory
compliance.
4. Rapid Change: Frameworks and tools evolve fast,
and it’s difficult to keep up with the latest tech.
5. Fresh Data (CDC): Difficult to keep data lake up-to-
date with changes made on other platforms.
6. Mainframe: Difficult to move mainframe data in
and out of Hadoop/Spark.
7. Data Movement: Difficult to move data in and out
of Hadoop/Spark.
0
5
10
15
20
25
30
35
40
45
% of People Who Consider this a Top Challenge (Rated 1 or 2)
Big Data Challenges
Data Quality Skills Governance
Rapid Change CDC Mainframe
Data Movement Cost Connectivity
Big Data deemed untrustworthy by business managers/leaders
21Syncsort Confidential and Proprietary - do not copy or distribute
Only 33% of senior execs
have a high level of trust
in the accuracy of their
Big Data analytics.
~ KPMG 2016
85% of global execs say
major investments are
needed to update existing
data platform, including data
cleaning and consolidating.
~ Bain 2015
59% of global execs do not
believe their company has
capabilities to generate
meaningful business
insights from their data.
~ Bain 2015
Three Insights on Data Quality in Big Data Architectures
The greater the diversity of data, the greater the
need for data quality processes.
– Over 60% of respondents said storing enterprise-
wide data was critical to supporting their business.
– Respondents cited an average of four sources each.
– Respondents who identified five or more sources
were 4X as likely to name data quality as a critical
factor in a successful data lake implementation.
22Syncsort Confidential and Proprietary - do not copy or distribute
Financial services and insurance industries are the most focused on data quality and governance.
– Highly regulated industries, with high cost of non-compliance.
– 60% in these industries named data quality as most critical compared to 40% in other industries.
Not everyone is making the connection between quality and business benefits.
– 70% of respondents who did not include data quality as a top priority put Advanced/Predictive Analytics as their top
use case.
– Increased reliance of executives on Analytics insights should go hand-in-hand with trusted, high quality data.
Washing Out Money Laundering at a Large UK-Based Bank
• Selected BAE Systems’
NetReveal as new Anti-Money
Laundering (AML) solution,
operating on a Hadoop data
lake.
• Hadoop functionality was key
to meeting next-gen AML
transaction monitoring and
FCA compliance demands
using an efficient, inexpensive
distributed architecture.
• Needed a new data quality
solution for party/entity
matching in Hadoop to
support its new Anti-Money
Laundering solution.
• Trillium Quality for Big Data
was selected after a
competitive RFP process as
solution of choice for
party/entity matching in the
data lake.
• Proven speed and
performance in Hadoop
using integrated DMX-h
Intelligent eXecution
functionality.
• Ability to leverage existing
Trillium Software System
skills; i.e, visual creation of
data quality jobs.
• Proven domain expertise.
TSS is in active use elsewhere
in the company. The Trillium
team also showed its domain
expertise, such as proper
SWIFT processing.
• Native processing of data
quality jobs within Hadoop
“financial crimes database”
at high performance and
massive scale.
• Will support AML
compliance for many years
to come.
Business Challenge Solution Benefit Business Value
A UK-based bank serving over 30 million customers, providing current (checking) accounts,
savings, personal loans, credit cards and mortgages. Employing over 75,000 people, this bank funds
a large percentage of UK new-build properties and lends to many first-time UK home buyers.
3. Data Governance Expands
Implementation Challenges
25Syncsort Confidential and Proprietary - do not copy or distribute
1. Data Quality: Assessing and improving quality of
data as it enters and/or in the data lake.
2. Skills/Staff: Need to learn a new set of skills,
Hadoop programmers are difficult to find and/or
expensive.
3. Data Governance: Including data lake in
governance initiatives and meeting regulatory
compliance.
4. Rapid Change: Frameworks and tools evolve fast,
and it’s difficult to keep up with the latest tech.
5. Fresh Data (CDC): Difficult to keep data lake up-to-
date with changes made on other platforms.
6. Mainframe: Difficult to move mainframe data in
and out of Hadoop/Spark.
7. Data Movement: Difficult to move data in and out
of Hadoop/Spark.
0
5
10
15
20
25
30
35
40
45
% of People Who Consider this a Top Challenge (Rated 1 or 2)
Big Data Challenges
Data Quality Skills Governance
Rapid Change CDC Mainframe
Data Movement Cost Connectivity
Data Quality & Data Governance Work Together
26Syncsort Confidential and Proprietary - do not copy or distribute
The processes
that help ensure
data is
understood,
corrected and
monitored to
ensure TRUST and
COMPLIANCE.
Collection of
practices and
processes which
help ensure the
formal
management of
data assets within
an organization.
DATA QUALITY DATA GOVERNANCE
Data Governance vs Data Quality: Managing Data-Driven Solutions. www.dataversity.com
Data Availability
Data
Compliance
Defining Key
Data Elements
Assigning
Data
Stewards
Data Consistency
Data Cleansing
Enrichment
Monitoring
Standardization
Defining Policies
Consistent
Analytics,
Metrics &
Reporting
Parsing
Matching
Discovery &
Profiling
Data Lineage
Data Quality Processing for Compliance
27Syncsort Confidential and Proprietary - do not copy or distribute
Cleanse data while improving contextual
understanding:
Parse data values from unstructured fields into
useful, usable new attributes.
Verify and enrich global postal addresses.
Standardize values for matching and linking.
Enrich data with external, third-party sources to
create comprehensive, unified records.
Link records spanning multiple sources of personal
data related to same customer.
4. Data Lakes Stay Fresher
Implementation Challenges
29Syncsort Confidential and Proprietary - do not copy or distribute
1. Data Quality: Assessing and improving quality of
data as it enters and/or in the data lake.
2. Skills/Staff: Need to learn a new set of skills,
Hadoop programmers are difficult to find and/or
expensive.
3. Data Governance: Including data lake in
governance initiatives and meeting regulatory
compliance.
4. Rapid Change: Frameworks and tools evolve fast,
and it’s difficult to keep up with the latest tech.
5. Fresh Data (CDC): Difficult to keep data lake up-
to-date with changes made on other platforms.
6. Mainframe: Difficult to move mainframe data in
and out of Hadoop/Spark.
7. Data Movement: Difficult to move data in and out
of Hadoop/Spark.
0
5
10
15
20
25
30
35
40
45
% of People Who Consider this a Top Challenge (Rated 1 or 2)
Big Data Challenges
Data Quality Skills Governance
Rapid Change CDC Mainframe
Data Movement Cost Connectivity
Keeping the Data Lake Fresh: Even Harder Than You Think
30Syncsort Confidential and Proprietary - do not copy or distribute
Keeping data in the data lake fresh is
difficult, especially when the source is
mainframe data.
Transactional sources change with each
transaction – often millions per day.
Each source has its own way of tracking
data changes.
Some Hadoop targets such as Hive don’t
even support fast updating.
Mastering Data Assets with Guardian
Guardian Life Insurance has 150 years of protection solutions, a long history of strong, successful customer relationships, and 20 years in the Fortune 500 list.
Guardian uses state-of-the-art technology to drive awareness and engagement for optimal results. Flexible funding options to meet each customers’ unique needs,
fast and accurate claims and long-term financial strength have led to award winning, customer-focused service.
“We found DMX-h to be very usable and easy to ramp up in terms of skills. Most of all, Syncsort has been a very good partner in terms of
support and listening to our needs.” – Alex Rosenthal, Enterprise Data Office
“Syncsort’s DataFunnel™ has been a powerful tool in our data lake strategy. We were able to ingest into Hadoop over 800 tables from one
source system … with one press of the button.”
• Include mainframe data in
comprehensive data-as-a-service
for internal self-service analytics.
• Ingest to HDFS hundreds of
mainframe DB2 tables, hundreds
of Oracle tables and 11 VSAM data
sets
• Time-to-market for analytics
projects was unacceptable (6-12
months), not repeatable
• 100TB of DB2/z data to monitor
for changes. Batch CDC couldn't
keep it current fast enough.
• DMX-h to easily load VSAM data to
HDFS; connect, transfer and
translate data
• DMX DataFunnel to quickly and
easily load over 800 tables from
DB2 and Oracle
• Migrated 49 COBOL and 14 JCL
jobs from the mainframe to DMX-h
• DMX CDC grabs delta changes in
real-time and pushes directly to
Hive.
• Hard-to-access Mainframe data all
included for comprehensive
analytics
• Simplified transformation
processes and reused data assets
• Hundreds of man/hours saved
• 1.4 terabytes of Oracle data
loaded in 3.5 hours
• No 3rd party software installed on
the mainframe
• Shorten time-to-market for data
and analytics projects
• Centralized standardized reusable
data assets that are searchable,
accessible and managed
• Increased ease of self-service
customized report building &
dashboarding
• 50 different business applications
depend on this data. This data is
now better managed, more
current, and their analytics output
– more trustworthy.
Business Challenge Solution Benefit Business Value
Big Data: Stronger Than Ever
Benefits Businesses are Actually Getting from Big Data
33Syncsort Confidential and Proprietary - do not copy or distribute
Increase Productivity
Reduce Costs
Next-Gen Analytics
Increase Revenue
and Growth
Archive Data
Increase Agility
Get More for EDW/ Mainframe
Investment
Retain Data for Compliance
Free Mainframe Resources and
Reduce Costs
Insurance Company Moves Historical Data to Azure Cloud
34Syncsort Confidential and Proprietary - do not copy or distribute
One year of sales data available to key business apps, data
stored on expensive DASD storage.
97 TB of historical data stored on unreadable, inaccessible
virtual tape.
No access of key business applications to historical data on a
daily basis. Syncsort MFX could run several jobs to access that
data in a few WEEKS if it was needed for a quote, etc.
Syncsort MFX converted virtual tape to mainframe variable.
Syncsort DMX used over 300 copybooks to translate mainframe
variable data into human readable text, and remove duplicates.
Microsoft Azure Data Import Service put all 97 TB in Cloudera
CDH in the Azure cloud.
Key business applications moved to the Cloud.
All sales data encrypted securely in the Cloud.
Applications have instant access to all 97 TB of historical data.
Before
Current data on expensive mainframe DASD.
Older data on inaccessible virtual tape.
After
with MFX, DMX & Azure
Cloud App
Gives quotes, reports sold cases,
and rejects in seconds.
Instant access to all data.
Virtual Tape
18 Years of
Sales Data
Mainframe
1 Year of
Sales Data
NO
ACCESSMainframe App
Does quotes.
Checks sold
cases, rejects.
How Syncsort Addresses the Trends
Syncsort Helps You Beat the Challenges of Big Data
36Syncsort Confidential and Proprietary - do not copy or distribute
• Get mainframe data into Hadoop easily,
in Hadoop format, or even original
mainframe format.
• Secure, govern, manage and
monitor the entire process
• Bridge the Big Iron to Big Data
skills gap
• Reduce development time from
weeks to days
Get Your Database data into Hadoop, At the Press of a Button
37Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
• Funnel hundreds of tables at once into your data lake
‒ Extract, map and move whole DB schemas in one invocation
‒ Extract from DB2, Oracle, Teradata, Netezza, S3, Redshift …
‒ To SQL Server, Postgres, Hive, Redshift and HDFS
‒ Automatically create target tables
• Process multiple funnels in parallel on edge node or data nodes
‒ Order data flows by dependencies
‒ Leverage DMX-h high performance data processing engine
• Filter unwanted data before extraction
‒ Data type filtering
‒ Table, record or column exclusion / inclusion
• In-flight transformations and cleansing
Trillium Quality for Big Data
38Syncsort Confidential and Proprietary - do not copy or distribute
Intelligent Execution enables deployment to Hadoop MapReduce and Spark.
Verify and enrich global postal addresses using global postal reference sources.
Enrich data from external, third-party sources to create comprehensive, unified
records, enabling 360-degree views of the customer and other key business entities.
Identify records that belong to the same domain (i.e., household or business).
Parse data values to their correct fields and standardize for better matching.
Match like records and eliminate duplicates.
Easily Create Data Quality Workflows on Hadoop Without MapReduce or Spark Coding
Syncsort Enables Governance
39Syncsort Confidential and Proprietary - do not copy or distribute
Metadata and data lineage for Hive, Avro and Parquet through HCatalog
Metadata lineage from DMX/DMX-h, Trillium Quality for Big Data
– Simplify audits, analytics dashboards, metrics
– Run-time job metadata and lineage REST API
– Integrate with enterprise metadata repositories like ASG
Cloudera Navigator certified integration
– Extends HCatalog metadata
– HDFS, YARN, Spark and other metadata
– Business and structural metadata
– Audit and track data from source to cluster
Apache Atlas ingestion lineage integration
– Audit and track data from source to cluster
– Detailed field level lineage
Syncsort Real-Time Change Data Capture
40Syncsort Confidential and Proprietary - do not copy or distribute
Keep data in sync in real-time:
Without overloading networks.
Without affecting source database performance.
Without coding or tuning.
• HDFS
• Hive
• IBM DB2
• IBM Informix
• Oracle
• Oracle RAC
• Sybase
• MS SQL Server
• Teradata
• MySQL
• PostgreSQL
• IBM DB2
• IBM Informix
• Oracle
• Oracle RAC
• Sybase
• MS SQL Server
Dependable – Reliable transfer of
data even if connectivity fails on
either side.
Fast – Captures changes in source
as they happen. Updates table
statistics for faster queries.
Flexible – Writes to HDFS, all Hive
tables, including those backed by
text, ORC, Parquet or Avro, and most
major RDBMSs.
Even updates Hive versions that
don’t support updates.
Real-Time Replication with Transformation
Conflict Resolution, Collision Monitoring, Tracking and Auditing
Implementation Challenges
41Syncsort Confidential and Proprietary - do not copy or distribute
0
5
10
15
20
25
30
35
40
45
% of People Who Consider this a Top Challenge (Rated 1 or 2)
Big Data Challenges
Data Quality Skills Governance
Rapid Change CDC Mainframe
Data Movement Cost Connectivity
1. Data Quality: Assessing and improving quality of
data as it enters and/or in the data lake.
2. Skills/Staff: Need to learn a new set of skills,
Hadoop programmers are difficult to find and/or
expensive.
3. Data Governance: Including data lake in
governance initiatives and meeting regulatory
compliance.
4. Rapid Change: Frameworks and tools evolve fast,
and it’s difficult to keep up with the latest tech.
5. Fresh Data (CDC): Difficult to keep data lake up-to-
date with changes made on other platforms.
6. Mainframe: Difficult to move mainframe data in
and out of Hadoop/Spark.
7. Data Movement: Difficult to move data in and out
of Hadoop/Spark.
Design Once, Deploy Anywhere
42Syncsort Confidential and Proprietary - do not copy or distribute
• Use existing ETL skills
• No need to worry about mappers, reducers, big side or small side of joins, etc
• Automatic optimization for best performance, load balancing, etc.
• No changes or tuning required, even if you change execution frameworks
• Future-proof job designs for emerging compute frameworks, e.g. Spark 2.x
• Run multiple execution frameworks in a single job
Single GUI Execute Anywhere!
Intelligent Execution - Insulate your organization from underlying complexities of Big Data.
Syncsort Makes ALL Data Accessible & Usable – Ready for Analytics
43Syncsort Confidential and Proprietary - do not copy or distribute
Get the ebook: 2018 Big Data Trends: Liberate, Integrate and Trust
http://www.syncsort.com/en/Resource-Center/BigData/eBooks/2018-Big-Data-Trends-Liberate-Integrate-Trust
Contact Syncsort sales to get the latest Syncsort info: http://www.syncsort.com/en/ContactSales
Questions
44Syncsort Confidential and Proprietary - do not copy or distribute
THANK YOU!

More Related Content

What's hot

¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
Data IQ Argentina
 

What's hot (20)

Neo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More IntelligentNeo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More Intelligent
 
5 ways big data benefits consumers
5 ways big data benefits consumers5 ways big data benefits consumers
5 ways big data benefits consumers
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
 
Hacked: Threats, Trends and the Power of Connected Data
Hacked: Threats, Trends and the Power of Connected DataHacked: Threats, Trends and the Power of Connected Data
Hacked: Threats, Trends and the Power of Connected Data
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
 
Essential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data ArsenalEssential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data Arsenal
 
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
Bigdata
BigdataBigdata
Bigdata
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Delivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeDelivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data Lake
 
Big Data Expo 2015 - Cisco Connected Analytics
Big Data Expo 2015 - Cisco Connected AnalyticsBig Data Expo 2015 - Cisco Connected Analytics
Big Data Expo 2015 - Cisco Connected Analytics
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
IoT and Big Data
IoT and Big DataIoT and Big Data
IoT and Big Data
 

Similar to 2018 Big Data Trends: Liberate, Integrate, and Trust Your Data

¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
CompTIA
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
☁Jake Weaver ☁
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
Denodo
 

Similar to 2018 Big Data Trends: Liberate, Integrate, and Trust Your Data (20)

Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
 
Foundational Strategies for Trust in Big Data Part 3: Data Lineage
Foundational Strategies for Trust in Big Data Part 3: Data LineageFoundational Strategies for Trust in Big Data Part 3: Data Lineage
Foundational Strategies for Trust in Big Data Part 3: Data Lineage
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data Assets
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
 
Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph Technology
 
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
 
Intel Big Data Analysis Peer Research Slideshare 2013
Intel Big Data Analysis Peer Research Slideshare 2013Intel Big Data Analysis Peer Research Slideshare 2013
Intel Big Data Analysis Peer Research Slideshare 2013
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
 
Don’t Bring Old Problems to Your New Cloud Data Warehouse
Don’t Bring Old Problems to Your New Cloud Data Warehouse Don’t Bring Old Problems to Your New Cloud Data Warehouse
Don’t Bring Old Problems to Your New Cloud Data Warehouse
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
 

More from Precisely

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
Precisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Precisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Precisely
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
Precisely
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
Precisely
 

More from Precisely (20)

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIs
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to Know
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar Deck
 

Recently uploaded

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

2018 Big Data Trends: Liberate, Integrate, and Trust Your Data

  • 1. 2018 Big Data Trends: Liberate, Integrate, and Trust Your Data Paige Roberts, Big Data Product Marketing Manager
  • 2. Today’s Speaker 2Syncsort Confidential and Proprietary - do not copy or distribute Product Marketing Manager •DMX/DMX-h •DataFunnel™ •DMX Change Data Capture Paige Roberts
  • 3. Agenda Who is Syncsort and Why Did We Do This Survey Big Picture on Big Data Who Participated in the Big Data Trends Survey 5 Big Data Trends and How Syncsort Addresses Them – 1. More Enterprise Data Flows Into the Data Lake – 2. Data Quality Moves to Center Stage – 3. Data Governance Expands – 4. Data Lakes Stay Fresher – 5. Big Data: Stronger than Ever How Syncsort Addresses These Trends Questions 3Syncsort Confidential and Proprietary - do not copy or distribute
  • 5. Syncsort: Trusted Industry Leadership Syncsort Confidential and Proprietary - do not copy or distribute 5 500+ Experienced & Talented Data Professionals >7,000 Customers 1968 50 Years of Market Leadership & Award-Winning Customer Support 84 of Fortune 100 are Customers 3x Revenue Growth In Last 12 Months The global leader in Big Iron to Big Data
  • 6. Use Cases & Strategic Partnerships Syncsort Confidential and Proprietary - do not copy or distribute Data Infrastructure Optimization • Mainframe Optimization • Application Modernization • EDW Optimization • Cross-Platform Capacity Management Data Availability • High Availability & Disaster Recovery • Mission-Critical Migration • Cross-Platform Data Sharing • IBM i Data Security & Audit • Mainframe Access & Integration for Machine Data • Mainframe Access & Integration for App Data • High-performance ETL Data Integration Data Quality • Data Governance • Customer 360 • Big Data Quality & Integration • Data Enrichment & Validation Big Iron to Big Data A fast-growing market segment composed of solutions that optimize traditional data systems and deliver mission-critical data from these systems to next-generation analytic environments. 6
  • 7. Big Picture on Big Data
  • 8. Advantages of the Modern Big Data Architecture 8Syncsort Confidential and Proprietary - do not copy or distribute
  • 9. What do customers want to use their Hadoop clusters for? 9Syncsort Confidential and Proprietary - do not copy or distribute 1.ETL 2.Analytics* 3.Data Blending 4.Active Archive 5.EDW / Mainframe Optimization
  • 10. Implementation Challenges 10Syncsort Confidential and Proprietary - do not copy or distribute 1. Data Quality: Assessing and improving quality of data as it enters and/or in the data lake. 2. Skills/Staff: Need to learn a new set of skills, Hadoop programmers are difficult to find and/or expensive. 3. Data Governance: Including data lake in governance initiatives and meeting regulatory compliance. 4. Rapid Change: Frameworks and tools evolve fast, and it’s difficult to keep up with the latest tech. 5. Fresh Data (CDC): Difficult to keep data lake up-to- date with changes made on other platforms. 6. Mainframe: Difficult to move mainframe data in and out of Hadoop/Spark. 7. Data Movement: Difficult to move data in and out of Hadoop/Spark. 0 5 10 15 20 25 30 35 40 45 % of People Who Consider this a Top Challenge (Rated 1 or 2) Big Data Challenges Data Quality Skills Governance Rapid Change CDC Mainframe Data Movement Cost Connectivity
  • 11. Who Participated in the Survey
  • 12. Who Participated in the Big Data Trends Survey? 12Syncsort Confidential and Proprietary - do not copy or distribute Main Industries Represented: 1. Financial Services 2. Healthcare 3. Information Services 4. Government 5. Retail 6. Insurance 1. Data Architects 2. Developers 3. IT Managers 4. Data Scientists 5. Variety of other roles
  • 13. Five 2018 Big Data Trends
  • 14. 1. More Data Flows Into the Data Lake
  • 15. Implementation Challenges 15Syncsort Confidential and Proprietary - do not copy or distribute 1. Data Quality: Assessing and improving quality of data as it enters and/or in the data lake. 2. Skills/Staff: Need to learn a new set of skills, Hadoop programmers are difficult to find and/or expensive. 3. Data Governance: Including data lake in governance initiatives and meeting regulatory compliance. 4. Rapid Change: Frameworks and tools evolve fast, and it’s difficult to keep up with the latest tech. 5. Fresh Data (CDC): Difficult to keep data lake up-to- date with changes made on other platforms. 6. Mainframe: Difficult to move mainframe data in and out of Hadoop/Spark. 7. Data Movement: Difficult to move data in and out of Hadoop/Spark. 0 5 10 15 20 25 30 35 40 45 % of People Who Consider this a Top Challenge (Rated 1 or 2) Big Data Challenges Data Quality Skills Governance Rapid Change CDC Mainframe Data Movement Cost Connectivity
  • 16. What data do people need to get into their Hadoop clusters? 16Syncsort Confidential and Proprietary - do not copy or distribute 1. Relational Databases 2. Enterprise Data Warehouses 3. NoSQL Databases and Third Party Data 4. Cloud repositories 5. Mainframe data 6. Web / Mobile / Social Media data 7. AIX Power Systems and IBM I data 8. Machine / Sensor data 69% RDMS 46% Enterprise Data Warehouse 45% 41% 32% 30% 30% 0.5% 18% 62% NoSQL Databases Files from Third Party Data, Providers or Partners Cloud Repositories Mainframe Web/Mobil/Social Media AIX Power Systems Machines/Sensors Other IBM i 16 %
  • 17. How Valuable is Mainframe and IBM i Data in a Data Lake? 17Syncsort Confidential and Proprietary - do not copy or distribute Over 97% of respondents with mainframes believe its valuable to access and integrate that data in the data lake. Over 90% of organizations that have IBM i say it is valuable to integrate that data with Hadoop.
  • 18. Populating the Data Lake with Progressive • Easily access and integrate operational data, such as Claims Liability, Policy, Customer and Incident data, for advanced analytics. • Fill Hortonworks Data Lake with 500+ tables from Mainframe DB2, Oracle and SQL Server, for cost-effective storage and analytics • Track day-to-day changes in the data Challenge Solution • DMX DataFunnel easily and quickly ingested all database tables with the click of a button • DMX-h used on Hortonworks Data Platform cluster to determine daily changes from both full and incremental data files • Simplicity: Single tool to ingest, detect changes and populate the data lake • Faster Development & Implementation: DataFunnel ingested data much faster than using open source tools. • Skills: Developers don’t need in-depth knowledge of Hadoop • Insight: Better analytics with readily-accessible operational data • Compliance –Ability to build audit trails & keep the EDW current • Agility: Reclaim development time by automating, optimizing and future-proofing development • Costs: Lower archival costs The Progressive Group of Insurance Companies lives up to its name by being one step ahead of the insurance industry, innovating with the latest technology to make it easy to understand, buy and use auto insurance. They began offering the first drive-in claims office in 1937, pioneered online auto insurance policy sales in 1997, and customize premiums based on customer’s actual driving patterns. Progressive has been recognized as a top business technology innovator by InformationWeek 17 years in a row. Benefit Business Value
  • 19. 2. Data Quality Moves to Center Stage
  • 20. Implementation Challenges 20Syncsort Confidential and Proprietary - do not copy or distribute 1. Data Quality: Assessing and improving quality of data as it enters and/or in the data lake. 2. Skills/Staff: Need to learn a new set of skills, Hadoop programmers are difficult to find and/or expensive. 3. Data Governance: Including data lake in governance initiatives and meeting regulatory compliance. 4. Rapid Change: Frameworks and tools evolve fast, and it’s difficult to keep up with the latest tech. 5. Fresh Data (CDC): Difficult to keep data lake up-to- date with changes made on other platforms. 6. Mainframe: Difficult to move mainframe data in and out of Hadoop/Spark. 7. Data Movement: Difficult to move data in and out of Hadoop/Spark. 0 5 10 15 20 25 30 35 40 45 % of People Who Consider this a Top Challenge (Rated 1 or 2) Big Data Challenges Data Quality Skills Governance Rapid Change CDC Mainframe Data Movement Cost Connectivity
  • 21. Big Data deemed untrustworthy by business managers/leaders 21Syncsort Confidential and Proprietary - do not copy or distribute Only 33% of senior execs have a high level of trust in the accuracy of their Big Data analytics. ~ KPMG 2016 85% of global execs say major investments are needed to update existing data platform, including data cleaning and consolidating. ~ Bain 2015 59% of global execs do not believe their company has capabilities to generate meaningful business insights from their data. ~ Bain 2015
  • 22. Three Insights on Data Quality in Big Data Architectures The greater the diversity of data, the greater the need for data quality processes. – Over 60% of respondents said storing enterprise- wide data was critical to supporting their business. – Respondents cited an average of four sources each. – Respondents who identified five or more sources were 4X as likely to name data quality as a critical factor in a successful data lake implementation. 22Syncsort Confidential and Proprietary - do not copy or distribute Financial services and insurance industries are the most focused on data quality and governance. – Highly regulated industries, with high cost of non-compliance. – 60% in these industries named data quality as most critical compared to 40% in other industries. Not everyone is making the connection between quality and business benefits. – 70% of respondents who did not include data quality as a top priority put Advanced/Predictive Analytics as their top use case. – Increased reliance of executives on Analytics insights should go hand-in-hand with trusted, high quality data.
  • 23. Washing Out Money Laundering at a Large UK-Based Bank • Selected BAE Systems’ NetReveal as new Anti-Money Laundering (AML) solution, operating on a Hadoop data lake. • Hadoop functionality was key to meeting next-gen AML transaction monitoring and FCA compliance demands using an efficient, inexpensive distributed architecture. • Needed a new data quality solution for party/entity matching in Hadoop to support its new Anti-Money Laundering solution. • Trillium Quality for Big Data was selected after a competitive RFP process as solution of choice for party/entity matching in the data lake. • Proven speed and performance in Hadoop using integrated DMX-h Intelligent eXecution functionality. • Ability to leverage existing Trillium Software System skills; i.e, visual creation of data quality jobs. • Proven domain expertise. TSS is in active use elsewhere in the company. The Trillium team also showed its domain expertise, such as proper SWIFT processing. • Native processing of data quality jobs within Hadoop “financial crimes database” at high performance and massive scale. • Will support AML compliance for many years to come. Business Challenge Solution Benefit Business Value A UK-based bank serving over 30 million customers, providing current (checking) accounts, savings, personal loans, credit cards and mortgages. Employing over 75,000 people, this bank funds a large percentage of UK new-build properties and lends to many first-time UK home buyers.
  • 25. Implementation Challenges 25Syncsort Confidential and Proprietary - do not copy or distribute 1. Data Quality: Assessing and improving quality of data as it enters and/or in the data lake. 2. Skills/Staff: Need to learn a new set of skills, Hadoop programmers are difficult to find and/or expensive. 3. Data Governance: Including data lake in governance initiatives and meeting regulatory compliance. 4. Rapid Change: Frameworks and tools evolve fast, and it’s difficult to keep up with the latest tech. 5. Fresh Data (CDC): Difficult to keep data lake up-to- date with changes made on other platforms. 6. Mainframe: Difficult to move mainframe data in and out of Hadoop/Spark. 7. Data Movement: Difficult to move data in and out of Hadoop/Spark. 0 5 10 15 20 25 30 35 40 45 % of People Who Consider this a Top Challenge (Rated 1 or 2) Big Data Challenges Data Quality Skills Governance Rapid Change CDC Mainframe Data Movement Cost Connectivity
  • 26. Data Quality & Data Governance Work Together 26Syncsort Confidential and Proprietary - do not copy or distribute The processes that help ensure data is understood, corrected and monitored to ensure TRUST and COMPLIANCE. Collection of practices and processes which help ensure the formal management of data assets within an organization. DATA QUALITY DATA GOVERNANCE Data Governance vs Data Quality: Managing Data-Driven Solutions. www.dataversity.com Data Availability Data Compliance Defining Key Data Elements Assigning Data Stewards Data Consistency Data Cleansing Enrichment Monitoring Standardization Defining Policies Consistent Analytics, Metrics & Reporting Parsing Matching Discovery & Profiling Data Lineage
  • 27. Data Quality Processing for Compliance 27Syncsort Confidential and Proprietary - do not copy or distribute Cleanse data while improving contextual understanding: Parse data values from unstructured fields into useful, usable new attributes. Verify and enrich global postal addresses. Standardize values for matching and linking. Enrich data with external, third-party sources to create comprehensive, unified records. Link records spanning multiple sources of personal data related to same customer.
  • 28. 4. Data Lakes Stay Fresher
  • 29. Implementation Challenges 29Syncsort Confidential and Proprietary - do not copy or distribute 1. Data Quality: Assessing and improving quality of data as it enters and/or in the data lake. 2. Skills/Staff: Need to learn a new set of skills, Hadoop programmers are difficult to find and/or expensive. 3. Data Governance: Including data lake in governance initiatives and meeting regulatory compliance. 4. Rapid Change: Frameworks and tools evolve fast, and it’s difficult to keep up with the latest tech. 5. Fresh Data (CDC): Difficult to keep data lake up- to-date with changes made on other platforms. 6. Mainframe: Difficult to move mainframe data in and out of Hadoop/Spark. 7. Data Movement: Difficult to move data in and out of Hadoop/Spark. 0 5 10 15 20 25 30 35 40 45 % of People Who Consider this a Top Challenge (Rated 1 or 2) Big Data Challenges Data Quality Skills Governance Rapid Change CDC Mainframe Data Movement Cost Connectivity
  • 30. Keeping the Data Lake Fresh: Even Harder Than You Think 30Syncsort Confidential and Proprietary - do not copy or distribute Keeping data in the data lake fresh is difficult, especially when the source is mainframe data. Transactional sources change with each transaction – often millions per day. Each source has its own way of tracking data changes. Some Hadoop targets such as Hive don’t even support fast updating.
  • 31. Mastering Data Assets with Guardian Guardian Life Insurance has 150 years of protection solutions, a long history of strong, successful customer relationships, and 20 years in the Fortune 500 list. Guardian uses state-of-the-art technology to drive awareness and engagement for optimal results. Flexible funding options to meet each customers’ unique needs, fast and accurate claims and long-term financial strength have led to award winning, customer-focused service. “We found DMX-h to be very usable and easy to ramp up in terms of skills. Most of all, Syncsort has been a very good partner in terms of support and listening to our needs.” – Alex Rosenthal, Enterprise Data Office “Syncsort’s DataFunnel™ has been a powerful tool in our data lake strategy. We were able to ingest into Hadoop over 800 tables from one source system … with one press of the button.” • Include mainframe data in comprehensive data-as-a-service for internal self-service analytics. • Ingest to HDFS hundreds of mainframe DB2 tables, hundreds of Oracle tables and 11 VSAM data sets • Time-to-market for analytics projects was unacceptable (6-12 months), not repeatable • 100TB of DB2/z data to monitor for changes. Batch CDC couldn't keep it current fast enough. • DMX-h to easily load VSAM data to HDFS; connect, transfer and translate data • DMX DataFunnel to quickly and easily load over 800 tables from DB2 and Oracle • Migrated 49 COBOL and 14 JCL jobs from the mainframe to DMX-h • DMX CDC grabs delta changes in real-time and pushes directly to Hive. • Hard-to-access Mainframe data all included for comprehensive analytics • Simplified transformation processes and reused data assets • Hundreds of man/hours saved • 1.4 terabytes of Oracle data loaded in 3.5 hours • No 3rd party software installed on the mainframe • Shorten time-to-market for data and analytics projects • Centralized standardized reusable data assets that are searchable, accessible and managed • Increased ease of self-service customized report building & dashboarding • 50 different business applications depend on this data. This data is now better managed, more current, and their analytics output – more trustworthy. Business Challenge Solution Benefit Business Value
  • 32. Big Data: Stronger Than Ever
  • 33. Benefits Businesses are Actually Getting from Big Data 33Syncsort Confidential and Proprietary - do not copy or distribute Increase Productivity Reduce Costs Next-Gen Analytics Increase Revenue and Growth Archive Data Increase Agility Get More for EDW/ Mainframe Investment Retain Data for Compliance Free Mainframe Resources and Reduce Costs
  • 34. Insurance Company Moves Historical Data to Azure Cloud 34Syncsort Confidential and Proprietary - do not copy or distribute One year of sales data available to key business apps, data stored on expensive DASD storage. 97 TB of historical data stored on unreadable, inaccessible virtual tape. No access of key business applications to historical data on a daily basis. Syncsort MFX could run several jobs to access that data in a few WEEKS if it was needed for a quote, etc. Syncsort MFX converted virtual tape to mainframe variable. Syncsort DMX used over 300 copybooks to translate mainframe variable data into human readable text, and remove duplicates. Microsoft Azure Data Import Service put all 97 TB in Cloudera CDH in the Azure cloud. Key business applications moved to the Cloud. All sales data encrypted securely in the Cloud. Applications have instant access to all 97 TB of historical data. Before Current data on expensive mainframe DASD. Older data on inaccessible virtual tape. After with MFX, DMX & Azure Cloud App Gives quotes, reports sold cases, and rejects in seconds. Instant access to all data. Virtual Tape 18 Years of Sales Data Mainframe 1 Year of Sales Data NO ACCESSMainframe App Does quotes. Checks sold cases, rejects.
  • 36. Syncsort Helps You Beat the Challenges of Big Data 36Syncsort Confidential and Proprietary - do not copy or distribute • Get mainframe data into Hadoop easily, in Hadoop format, or even original mainframe format. • Secure, govern, manage and monitor the entire process • Bridge the Big Iron to Big Data skills gap • Reduce development time from weeks to days
  • 37. Get Your Database data into Hadoop, At the Press of a Button 37Syncsort Confidential and Proprietary - do not copy or distribute DMX DataFunnel™ • Funnel hundreds of tables at once into your data lake ‒ Extract, map and move whole DB schemas in one invocation ‒ Extract from DB2, Oracle, Teradata, Netezza, S3, Redshift … ‒ To SQL Server, Postgres, Hive, Redshift and HDFS ‒ Automatically create target tables • Process multiple funnels in parallel on edge node or data nodes ‒ Order data flows by dependencies ‒ Leverage DMX-h high performance data processing engine • Filter unwanted data before extraction ‒ Data type filtering ‒ Table, record or column exclusion / inclusion • In-flight transformations and cleansing
  • 38. Trillium Quality for Big Data 38Syncsort Confidential and Proprietary - do not copy or distribute Intelligent Execution enables deployment to Hadoop MapReduce and Spark. Verify and enrich global postal addresses using global postal reference sources. Enrich data from external, third-party sources to create comprehensive, unified records, enabling 360-degree views of the customer and other key business entities. Identify records that belong to the same domain (i.e., household or business). Parse data values to their correct fields and standardize for better matching. Match like records and eliminate duplicates. Easily Create Data Quality Workflows on Hadoop Without MapReduce or Spark Coding
  • 39. Syncsort Enables Governance 39Syncsort Confidential and Proprietary - do not copy or distribute Metadata and data lineage for Hive, Avro and Parquet through HCatalog Metadata lineage from DMX/DMX-h, Trillium Quality for Big Data – Simplify audits, analytics dashboards, metrics – Run-time job metadata and lineage REST API – Integrate with enterprise metadata repositories like ASG Cloudera Navigator certified integration – Extends HCatalog metadata – HDFS, YARN, Spark and other metadata – Business and structural metadata – Audit and track data from source to cluster Apache Atlas ingestion lineage integration – Audit and track data from source to cluster – Detailed field level lineage
  • 40. Syncsort Real-Time Change Data Capture 40Syncsort Confidential and Proprietary - do not copy or distribute Keep data in sync in real-time: Without overloading networks. Without affecting source database performance. Without coding or tuning. • HDFS • Hive • IBM DB2 • IBM Informix • Oracle • Oracle RAC • Sybase • MS SQL Server • Teradata • MySQL • PostgreSQL • IBM DB2 • IBM Informix • Oracle • Oracle RAC • Sybase • MS SQL Server Dependable – Reliable transfer of data even if connectivity fails on either side. Fast – Captures changes in source as they happen. Updates table statistics for faster queries. Flexible – Writes to HDFS, all Hive tables, including those backed by text, ORC, Parquet or Avro, and most major RDBMSs. Even updates Hive versions that don’t support updates. Real-Time Replication with Transformation Conflict Resolution, Collision Monitoring, Tracking and Auditing
  • 41. Implementation Challenges 41Syncsort Confidential and Proprietary - do not copy or distribute 0 5 10 15 20 25 30 35 40 45 % of People Who Consider this a Top Challenge (Rated 1 or 2) Big Data Challenges Data Quality Skills Governance Rapid Change CDC Mainframe Data Movement Cost Connectivity 1. Data Quality: Assessing and improving quality of data as it enters and/or in the data lake. 2. Skills/Staff: Need to learn a new set of skills, Hadoop programmers are difficult to find and/or expensive. 3. Data Governance: Including data lake in governance initiatives and meeting regulatory compliance. 4. Rapid Change: Frameworks and tools evolve fast, and it’s difficult to keep up with the latest tech. 5. Fresh Data (CDC): Difficult to keep data lake up-to- date with changes made on other platforms. 6. Mainframe: Difficult to move mainframe data in and out of Hadoop/Spark. 7. Data Movement: Difficult to move data in and out of Hadoop/Spark.
  • 42. Design Once, Deploy Anywhere 42Syncsort Confidential and Proprietary - do not copy or distribute • Use existing ETL skills • No need to worry about mappers, reducers, big side or small side of joins, etc • Automatic optimization for best performance, load balancing, etc. • No changes or tuning required, even if you change execution frameworks • Future-proof job designs for emerging compute frameworks, e.g. Spark 2.x • Run multiple execution frameworks in a single job Single GUI Execute Anywhere! Intelligent Execution - Insulate your organization from underlying complexities of Big Data.
  • 43. Syncsort Makes ALL Data Accessible & Usable – Ready for Analytics 43Syncsort Confidential and Proprietary - do not copy or distribute Get the ebook: 2018 Big Data Trends: Liberate, Integrate and Trust http://www.syncsort.com/en/Resource-Center/BigData/eBooks/2018-Big-Data-Trends-Liberate-Integrate-Trust Contact Syncsort sales to get the latest Syncsort info: http://www.syncsort.com/en/ContactSales
  • 44. Questions 44Syncsort Confidential and Proprietary - do not copy or distribute