SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
The New Trillium DQ:
Big Data Insights When and
Where You Need Them
Harald Smith
1
Speaker
Harald Smith
• Director of Product Marketing, Syncsort
• 20+ years in Information Management with a focus
on data quality, integration, and governance
• Co-author of Patterns of Information Management
• Author of two Redbooks on Information Governance
and Data Integration
• Blogs on Dataversity and InfoWorld
2
Only 35%of senior executives have a
high level of trust in the
accuracy of their Big Data
Analytics
KPMG 2016 Global CEO Outlook
92% of
executives are concerned
about the negative impact of
data and analytics on
corporate reputation
KPMG 2017 Global CEO Outlook
80%of AI/ML projects are stalling
due to poor data quality
Dimensional Research, 2019
ALL Data Needs
Data Quality
“Societal trust in business is
arguably at an all-time low
and, in a world increasingly
driven by data and technology,
reputations and brands are
ever harder to protect.”
EY “Trust in Data and Why it Matters”, 2017.
The importance of data quality
in the enterprise:
• Decision making
• Customer centricity
• Compliance
• Machine learning & AI
3
Key Outcomes
• Maximize the value of data quality across your organization
• Deploy and leverage data quality capabilities consistently when and
where needed
• Leverage the resources and skills your organization has invested in
whether on-premise or in the cloud
• Scale to address the data challenges you face and deliver high quality
results you can trust for critical business decisions
• Integrate best-in-class data quality into your data governance framework
to ensure visibility across your organization
• Ensure global data requirements are addressed
4
Trillium DQ version 16
• Single cross-platform scalable architecture
• Native Big Data connectivity
• Distributed execution for all functions
• Full, rich data quality capabilities and familiar interface
• Design-once, deploy-anywhere data quality projects
• Out-of-the-box data governance integration with Collibra
• Broad location and geoenrichment data options
Trillium DQ v16 Highlights
5
Ensures consistent use, processing, and outcomes for traditional or distributed platforms, on-premise or in the cloud
6
Trillium DQ – common scalable architecture
UI Server or
Edge Node
ODBC
Native RDBMS
Delimited
Fixed
Cobol
Distributed
Cluster
Distributed HDFS / Distributed Execution / Distributed Storage
Name Node
Trillium DQ
Metadata
Delimited
HDFS
2xFaster data cleansing and
matching on small
distributed cluster – more
nodes, faster time
3xFaster data profiling on
small distributed cluster
– more nodes, faster
time with linear scaling
2xFaster data profiling even
on traditional platforms
Key Outcomes
• More sources of data
• Higher volumes of data
• Faster processing of data
• Fit limited time windows
• Utilize Big Data investments
• Reduced disk space usage
Scalable
Architecture
7
8
Trillium DQ for Big Data on Amazon EMR:
• Cleansed, standardized and matched over
130 million recs/hour on basic 10-node
test cluster
• Processing full transaction volume daily, and
business is growing
• Met the business SLA’s with ability to scale
Challenge Solution
Delivered higher levels of matching/data accuracy and satisfied contracts
Saved software costs – Replaced multiple solutions – Melissa Data, Oracle de-dupe, ...
Saved Amazon cluster costs and left room for company growth
Impact
Ensure accurate corporate credit ratings of 330M global
companies for clients within contracted timeframes.
• Could not scale to deliver ratings to clients within SLA’s –
impacting client fulfillment
• Need to process >800M records daily
• Lacked flexibility to address issues with similar company
names including volume and variety of data sources
“We can’t afford to miss or mix up information about businesses with similar names. Companies
count on our highly accurate predictive scoring to provide fast, accurate ratings for their potential
customers and vendors.”
Match to corporate credit data with Syncsort Trillium
Key Outcomes
• Reduce the time for business analysts to discover and understand
data on Hadoop platforms
• Allow business analysts who understand the data but have little
technical expertise to quickly find data and run data profiling in
three steps
• Let analysts explore results and drilldown to details within
seconds per view to review and then report on data issues to
business leaders
• Scale to large volumes of data sources & attributes so that
business analysts can understand the contents of any data source
needed for business decisions
9
Trillium Discovery
• Delivers enterprise trusted Trillium Discovery on traditional and distributed
Hadoop platforms for high-volume, scalable data profiling
• Provides complete Trillium Discovery data profiling for analysis & review
• Attribute metadata, value & pattern frequencies, key & dependency analysis,
cross-source join analysis, drill down to any outlier or issue, and more…
• Provides easily configured native connectivity for Big Data sources
• Provides managing and monitoring for task execution
• Integrates with the security frameworks (Kerberos, AD, LDAP) of
Big Data platforms
10
Trillium Discovery
Execute Profiling
1
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
Trillium Discovery – Data Profiling at Scale
Select Source Explore ProfilesRun Profiling
Stored Profiling Results
▪ Metadata & Statistics
▪ Frequency Distributions
▪ Drilldown Indices
Share &
Govern
Results
Integration
(APIs)
Notification
Collaboration
Native Connectors
▪ HDFS source directories
▪ …
Drilldown to IssuesEvaluate Business Rules
3 Steps to Run
Key Outcomes
• Match and link any data entity – customers, suppliers, products, etc. –
into a trusted single view to support a broad array of business-critical
use cases (e.g. Customer 360, fraud, AML)
• Parse and standardize complex multi-domain data, extended with
enrichment and verification of critical address and geolocation data –
all leveraging out-of-the-box templates
• Utilize “design once, deploy anywhere” approach to speed time-to-
value and focus on building data quality business logic while letting the
product handle the technical aspects of framework execution with no
coding or tuning required
• Leverage the high-performance compute power of distributed Hadoop
frameworks to process high volumes within targeted time windows to
meet critical Service Level Agreements (SLA’s)
12
Trillium Quality
13
Trillium Quality
• Integrate, parse, standardize, and match new and legacy customer data
from multiple disparate sources.
• Provide high-quality entity resolution through multi-domain deduplication
and matching with the most comprehensive set of match comparisons
available, including fuzzy matching, distance comparisons, and more.
• Standardize, enhance, and match international data sets with postal and
country-code validation.
• Deploy data quality workflows as native MapReduce processes for optimal
efficiency.
• Process hundreds of millions of records of data.
• Increase processing efficiency.
• Support failover through Hadoop’s fault-tolerant design; during a node
failure, processing is redirected to another node.
Syncsort Trillium Delivers Data You can Trust
Data Profiling Business Rules &
Data Quality
Assessment
Data Validation,
Standardization,
Enrichment & more
Matching, Entity
Resolution &
Verification
•Customer 360
•AI/ML
Operational Integrations
•Analytics &
Reporting
Data Governance
Trillium Discovery
Trillium Quality
+ Global Address Verification
Trillium DQ/Trillium DQ for Big Data
•Collibra DGC
•BI tools
14
15
Trillium Quality for Big Data to support next-generation
AML transaction monitoring and FCA compliance
• Cluster-native data verification, enrichment, and
demanding multi-field entity resolution executing
natively on Spark within financial crimes database
• Unmodified mainframe “Golden Records” stored on
Hadoop
Global Bank
Challenge Solution
Ensure Anti-Money Laundering regulatory compliance is met through financial crimes data lake –
high performance results at massive scale.
Achieve fast time to value with flexible deployment and ease of use
Ensure the data lake is trusted source of data feeding critical machine learning-based fraud detection
Expanding use to additional Customer Engagement solutions and applications.
Impact
Meet AML transaction monitoring and
Financial Conduct Authority (FCA) compliance
• Data volume too large, diversely scattered to
analyze
• Disparate data sources – Mainframe, RDBMS,
Cloud, etc.
• Maximize the value/ROI of the data lake
Trillium DQ + Collibra DGC
Trillium Discovery
• Market-leading, best-of-breed
data quality solution
• Profile and understand all the
critical data
• Leverage highly flexible business
rules for the right metrics
• Find ALL the DQ issues
Out-of-the-box integration of DQ
metrics with Collibra DGC
✓ Bi-directional solution
✓ Automated & synchronized
✓ Configurable to organizational
needs for all profiling results –
broad API support
Collibra DGC
• Market-leading, best-of-breed
data governance solution
• Establish a common
understanding of the business
• Automate governance and
stewardship tasks
• Interact with common workflows
Deploy Trillium’s bi-directional data
quality integration to ensure:
✓ All key business rules are
implemented and validated
✓ DQ metrics are automatically
delivered to those who need to
know when they need to know
16
Delivers fully integrated data duality with Collibra
Collibra Data Governance Center
✓ Enables non-technical users to define business
policies and data quality rules in plain
language
✓ Makes data quality metrics and performance
available to all users
Trillium Discovery
✓ Automatically receives business rules so technical
user can convert to executable data quality rules
✓ Constantly runs data quality metrics on desired
schedule, automatically delivers results back to
Collibra dashboards
Rulebooks to Rules
Quality test Results
Bi-directional connectivity Constant sync
Metric falling below
thresholds can
trigger workflow in
Collibra Issue
Management
17
18
Connection to/from Collibra is straightforward
Packaged
Workflow
• Out-of-the-box packaged workflow with Trillium Discovery
✓ Easy to setup and run – no complex technical requirements
✓ Part of delivered product – use immediately; no add-on charges; fully supported
• Automatically connects to and delivers content via REST API’s
✓ Collibra provides a single self-service API which facilitates connecting integrations to Collibra DGC
✓ Trillium Discovery provides standard, documented REST API’s – easy to extend application;
insulated from underlying product changes; same API’s used by UI, so always tested
19
Trillium DQ with Collibra DGC to:
• Profile, analyse and provide measurement of
data quality concerns
• Integrate data quality rules and metrics between
the tools to ensure management has immediate
knowledge of improvements/issues
DNB
Challenge Solution
Pilot phase for 2 branches completed July 2019
• Able to provide proof that data wasn’t “missing”, but pinpointed a number of quality issues requiring improvements
• Able to report to regulators on the findings with proof rather than previous hearsay
Spun off requirements to provide similar work for all branches AND Head Office
Addressing Master Data Analysis on customer data and associated cleanup
Impact
Poor, inconsistent customer data, and aggressive
timelines to address regulatory compliance
requirements (BCBS239, GDPR, and AML)
• Focus on whether DNB can measure Data Quality
in an ongoing manner
• Concerns around Customer Sanctions Screening
and Transaction Monitoring
See: The Data Journey at DNB: Data Driven Customer Centricity
• Rich set of capabilities to discover, classify, profile, and evaluate data across
platforms including big data, cloud.
Don’t need to move data off the cluster and can provide drilldown to all issues
• High performance standardization and matching for entity resolution with
global coverage in batch & real time.
Meet challenging time windows for critical analytics and regulations
• Native connectivity, execution, and storage for optimized Big Data processing.
Take full advantage of the cluster to expand and scale
• Design once, deploy anywhere architecture that future proofs existing
applications.
Leverage the skills you already have
• Ease to connect to & integrate with CRM, ERP, MDM, enrichment, and Data
Governance solutions.
Deliver consistent data quality processing and results throughout the organization
20
Trillium DQ
21
Available end of month
• Linux
• Cloudera
• CDH 5.8.3, 5.11, 5.15.2, 5.16.2
• HDP 2.6.4
• Google Cloud Platform
• Amazon EMR (Trillium Quality – now; Trillium Discovery - coming soon)
• Windows (coming soon)
Turn your data into a
trusted view of your
customers, products
and more
Power machine
learning and
advanced analytics
with reliable, fit-for-
purpose data
Gain actionable
business insights
from high-volume
disparate data sets
from across the
enterprise
Deploy industry-
leading data quality
processes at massive
scale, with no coding
or Big Data skills
required
Trillium DQ
evaluates &
transforms your
data for trusted
business insights
22
Next Steps
For more information on Trillium DQ and our other Syncsort
Trillium data quality solutions, please visit:
https://www.syncsort.com/en/solutions/data-quality
https://www.syncsort.com/en/products/trillium-dq
https://www.syncsort.com/en/products/trillium-dq-for-big-data
23
Questions?
24
The New Trillium DQ: Big Data Insights When and Where You Need Them

Mais conteúdo relacionado

Mais procurados

The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
DATAVERSITY
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
DataWorks Summit
 

Mais procurados (20)

Optimize the Value of Your Mainframe
Optimize the Value of Your MainframeOptimize the Value of Your Mainframe
Optimize the Value of Your Mainframe
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
Top 4 Priorities in Building Insurance Data Governance Programs That Work
Top 4 Priorities in Building Insurance Data Governance Programs That WorkTop 4 Priorities in Building Insurance Data Governance Programs That Work
Top 4 Priorities in Building Insurance Data Governance Programs That Work
 
Do You Trust Your Machine Learning Outcomes?
 Do You Trust Your Machine Learning Outcomes?  Do You Trust Your Machine Learning Outcomes?
Do You Trust Your Machine Learning Outcomes?
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Webinar: Customer Experience in Banking - a CTO's Perspective
Webinar: Customer Experience in Banking - a CTO's PerspectiveWebinar: Customer Experience in Banking - a CTO's Perspective
Webinar: Customer Experience in Banking - a CTO's Perspective
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
 
Microsoft Data Warehousing
Microsoft Data Warehousing Microsoft Data Warehousing
Microsoft Data Warehousing
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
 
Big Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning associationBig Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning association
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?
 

Semelhante a The New Trillium DQ: Big Data Insights When and Where You Need Them

On the Cloud? Data Integrity for Insurers in Cloud-Based Platforms
On the Cloud? Data Integrity for Insurers in Cloud-Based PlatformsOn the Cloud? Data Integrity for Insurers in Cloud-Based Platforms
On the Cloud? Data Integrity for Insurers in Cloud-Based Platforms
Precisely
 

Semelhante a The New Trillium DQ: Big Data Insights When and Where You Need Them (20)

What’s New in Syncsort’s Trillium Software System (TSS) 15.7
What’s New in Syncsort’s Trillium Software System (TSS) 15.7What’s New in Syncsort’s Trillium Software System (TSS) 15.7
What’s New in Syncsort’s Trillium Software System (TSS) 15.7
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
On the Cloud? Data Integrity for Insurers in Cloud-Based Platforms
On the Cloud? Data Integrity for Insurers in Cloud-Based PlatformsOn the Cloud? Data Integrity for Insurers in Cloud-Based Platforms
On the Cloud? Data Integrity for Insurers in Cloud-Based Platforms
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data Virtualization
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 View
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Data Quality from Precisely: Trillium Quality & Discovery
Data Quality from Precisely: Trillium Quality & DiscoveryData Quality from Precisely: Trillium Quality & Discovery
Data Quality from Precisely: Trillium Quality & Discovery
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Strategically Thinking:  Data Integrity for Your Master Data
Strategically Thinking:  Data Integrity for Your Master DataStrategically Thinking:  Data Integrity for Your Master Data
Strategically Thinking:  Data Integrity for Your Master Data
 
Multi-Cloud Integration with Data Virtualization (ASEAN)
Multi-Cloud Integration with Data Virtualization (ASEAN)Multi-Cloud Integration with Data Virtualization (ASEAN)
Multi-Cloud Integration with Data Virtualization (ASEAN)
 
Accelerate Innovation with Databricks and Your Mainframe Data
Accelerate Innovation with Databricks and Your Mainframe DataAccelerate Innovation with Databricks and Your Mainframe Data
Accelerate Innovation with Databricks and Your Mainframe Data
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
 

Mais de Precisely

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
Precisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Precisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Precisely
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
Precisely
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
Precisely
 

Mais de Precisely (20)

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIs
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to Know
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar Deck
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

The New Trillium DQ: Big Data Insights When and Where You Need Them

  • 1. The New Trillium DQ: Big Data Insights When and Where You Need Them Harald Smith 1
  • 2. Speaker Harald Smith • Director of Product Marketing, Syncsort • 20+ years in Information Management with a focus on data quality, integration, and governance • Co-author of Patterns of Information Management • Author of two Redbooks on Information Governance and Data Integration • Blogs on Dataversity and InfoWorld 2
  • 3. Only 35%of senior executives have a high level of trust in the accuracy of their Big Data Analytics KPMG 2016 Global CEO Outlook 92% of executives are concerned about the negative impact of data and analytics on corporate reputation KPMG 2017 Global CEO Outlook 80%of AI/ML projects are stalling due to poor data quality Dimensional Research, 2019 ALL Data Needs Data Quality “Societal trust in business is arguably at an all-time low and, in a world increasingly driven by data and technology, reputations and brands are ever harder to protect.” EY “Trust in Data and Why it Matters”, 2017. The importance of data quality in the enterprise: • Decision making • Customer centricity • Compliance • Machine learning & AI 3
  • 4. Key Outcomes • Maximize the value of data quality across your organization • Deploy and leverage data quality capabilities consistently when and where needed • Leverage the resources and skills your organization has invested in whether on-premise or in the cloud • Scale to address the data challenges you face and deliver high quality results you can trust for critical business decisions • Integrate best-in-class data quality into your data governance framework to ensure visibility across your organization • Ensure global data requirements are addressed 4 Trillium DQ version 16
  • 5. • Single cross-platform scalable architecture • Native Big Data connectivity • Distributed execution for all functions • Full, rich data quality capabilities and familiar interface • Design-once, deploy-anywhere data quality projects • Out-of-the-box data governance integration with Collibra • Broad location and geoenrichment data options Trillium DQ v16 Highlights 5
  • 6. Ensures consistent use, processing, and outcomes for traditional or distributed platforms, on-premise or in the cloud 6 Trillium DQ – common scalable architecture UI Server or Edge Node ODBC Native RDBMS Delimited Fixed Cobol Distributed Cluster Distributed HDFS / Distributed Execution / Distributed Storage Name Node Trillium DQ Metadata Delimited HDFS
  • 7. 2xFaster data cleansing and matching on small distributed cluster – more nodes, faster time 3xFaster data profiling on small distributed cluster – more nodes, faster time with linear scaling 2xFaster data profiling even on traditional platforms Key Outcomes • More sources of data • Higher volumes of data • Faster processing of data • Fit limited time windows • Utilize Big Data investments • Reduced disk space usage Scalable Architecture 7
  • 8. 8 Trillium DQ for Big Data on Amazon EMR: • Cleansed, standardized and matched over 130 million recs/hour on basic 10-node test cluster • Processing full transaction volume daily, and business is growing • Met the business SLA’s with ability to scale Challenge Solution Delivered higher levels of matching/data accuracy and satisfied contracts Saved software costs – Replaced multiple solutions – Melissa Data, Oracle de-dupe, ... Saved Amazon cluster costs and left room for company growth Impact Ensure accurate corporate credit ratings of 330M global companies for clients within contracted timeframes. • Could not scale to deliver ratings to clients within SLA’s – impacting client fulfillment • Need to process >800M records daily • Lacked flexibility to address issues with similar company names including volume and variety of data sources “We can’t afford to miss or mix up information about businesses with similar names. Companies count on our highly accurate predictive scoring to provide fast, accurate ratings for their potential customers and vendors.” Match to corporate credit data with Syncsort Trillium
  • 9. Key Outcomes • Reduce the time for business analysts to discover and understand data on Hadoop platforms • Allow business analysts who understand the data but have little technical expertise to quickly find data and run data profiling in three steps • Let analysts explore results and drilldown to details within seconds per view to review and then report on data issues to business leaders • Scale to large volumes of data sources & attributes so that business analysts can understand the contents of any data source needed for business decisions 9 Trillium Discovery
  • 10. • Delivers enterprise trusted Trillium Discovery on traditional and distributed Hadoop platforms for high-volume, scalable data profiling • Provides complete Trillium Discovery data profiling for analysis & review • Attribute metadata, value & pattern frequencies, key & dependency analysis, cross-source join analysis, drill down to any outlier or issue, and more… • Provides easily configured native connectivity for Big Data sources • Provides managing and monitoring for task execution • Integrates with the security frameworks (Kerberos, AD, LDAP) of Big Data platforms 10 Trillium Discovery
  • 11. Execute Profiling 1 n . . . . . . . . . . . . . . . . . . . . . . 11 Trillium Discovery – Data Profiling at Scale Select Source Explore ProfilesRun Profiling Stored Profiling Results ▪ Metadata & Statistics ▪ Frequency Distributions ▪ Drilldown Indices Share & Govern Results Integration (APIs) Notification Collaboration Native Connectors ▪ HDFS source directories ▪ … Drilldown to IssuesEvaluate Business Rules 3 Steps to Run
  • 12. Key Outcomes • Match and link any data entity – customers, suppliers, products, etc. – into a trusted single view to support a broad array of business-critical use cases (e.g. Customer 360, fraud, AML) • Parse and standardize complex multi-domain data, extended with enrichment and verification of critical address and geolocation data – all leveraging out-of-the-box templates • Utilize “design once, deploy anywhere” approach to speed time-to- value and focus on building data quality business logic while letting the product handle the technical aspects of framework execution with no coding or tuning required • Leverage the high-performance compute power of distributed Hadoop frameworks to process high volumes within targeted time windows to meet critical Service Level Agreements (SLA’s) 12 Trillium Quality
  • 13. 13 Trillium Quality • Integrate, parse, standardize, and match new and legacy customer data from multiple disparate sources. • Provide high-quality entity resolution through multi-domain deduplication and matching with the most comprehensive set of match comparisons available, including fuzzy matching, distance comparisons, and more. • Standardize, enhance, and match international data sets with postal and country-code validation. • Deploy data quality workflows as native MapReduce processes for optimal efficiency. • Process hundreds of millions of records of data. • Increase processing efficiency. • Support failover through Hadoop’s fault-tolerant design; during a node failure, processing is redirected to another node.
  • 14. Syncsort Trillium Delivers Data You can Trust Data Profiling Business Rules & Data Quality Assessment Data Validation, Standardization, Enrichment & more Matching, Entity Resolution & Verification •Customer 360 •AI/ML Operational Integrations •Analytics & Reporting Data Governance Trillium Discovery Trillium Quality + Global Address Verification Trillium DQ/Trillium DQ for Big Data •Collibra DGC •BI tools 14
  • 15. 15 Trillium Quality for Big Data to support next-generation AML transaction monitoring and FCA compliance • Cluster-native data verification, enrichment, and demanding multi-field entity resolution executing natively on Spark within financial crimes database • Unmodified mainframe “Golden Records” stored on Hadoop Global Bank Challenge Solution Ensure Anti-Money Laundering regulatory compliance is met through financial crimes data lake – high performance results at massive scale. Achieve fast time to value with flexible deployment and ease of use Ensure the data lake is trusted source of data feeding critical machine learning-based fraud detection Expanding use to additional Customer Engagement solutions and applications. Impact Meet AML transaction monitoring and Financial Conduct Authority (FCA) compliance • Data volume too large, diversely scattered to analyze • Disparate data sources – Mainframe, RDBMS, Cloud, etc. • Maximize the value/ROI of the data lake
  • 16. Trillium DQ + Collibra DGC Trillium Discovery • Market-leading, best-of-breed data quality solution • Profile and understand all the critical data • Leverage highly flexible business rules for the right metrics • Find ALL the DQ issues Out-of-the-box integration of DQ metrics with Collibra DGC ✓ Bi-directional solution ✓ Automated & synchronized ✓ Configurable to organizational needs for all profiling results – broad API support Collibra DGC • Market-leading, best-of-breed data governance solution • Establish a common understanding of the business • Automate governance and stewardship tasks • Interact with common workflows Deploy Trillium’s bi-directional data quality integration to ensure: ✓ All key business rules are implemented and validated ✓ DQ metrics are automatically delivered to those who need to know when they need to know 16
  • 17. Delivers fully integrated data duality with Collibra Collibra Data Governance Center ✓ Enables non-technical users to define business policies and data quality rules in plain language ✓ Makes data quality metrics and performance available to all users Trillium Discovery ✓ Automatically receives business rules so technical user can convert to executable data quality rules ✓ Constantly runs data quality metrics on desired schedule, automatically delivers results back to Collibra dashboards Rulebooks to Rules Quality test Results Bi-directional connectivity Constant sync Metric falling below thresholds can trigger workflow in Collibra Issue Management 17
  • 18. 18 Connection to/from Collibra is straightforward Packaged Workflow • Out-of-the-box packaged workflow with Trillium Discovery ✓ Easy to setup and run – no complex technical requirements ✓ Part of delivered product – use immediately; no add-on charges; fully supported • Automatically connects to and delivers content via REST API’s ✓ Collibra provides a single self-service API which facilitates connecting integrations to Collibra DGC ✓ Trillium Discovery provides standard, documented REST API’s – easy to extend application; insulated from underlying product changes; same API’s used by UI, so always tested
  • 19. 19 Trillium DQ with Collibra DGC to: • Profile, analyse and provide measurement of data quality concerns • Integrate data quality rules and metrics between the tools to ensure management has immediate knowledge of improvements/issues DNB Challenge Solution Pilot phase for 2 branches completed July 2019 • Able to provide proof that data wasn’t “missing”, but pinpointed a number of quality issues requiring improvements • Able to report to regulators on the findings with proof rather than previous hearsay Spun off requirements to provide similar work for all branches AND Head Office Addressing Master Data Analysis on customer data and associated cleanup Impact Poor, inconsistent customer data, and aggressive timelines to address regulatory compliance requirements (BCBS239, GDPR, and AML) • Focus on whether DNB can measure Data Quality in an ongoing manner • Concerns around Customer Sanctions Screening and Transaction Monitoring See: The Data Journey at DNB: Data Driven Customer Centricity
  • 20. • Rich set of capabilities to discover, classify, profile, and evaluate data across platforms including big data, cloud. Don’t need to move data off the cluster and can provide drilldown to all issues • High performance standardization and matching for entity resolution with global coverage in batch & real time. Meet challenging time windows for critical analytics and regulations • Native connectivity, execution, and storage for optimized Big Data processing. Take full advantage of the cluster to expand and scale • Design once, deploy anywhere architecture that future proofs existing applications. Leverage the skills you already have • Ease to connect to & integrate with CRM, ERP, MDM, enrichment, and Data Governance solutions. Deliver consistent data quality processing and results throughout the organization 20 Trillium DQ
  • 21. 21 Available end of month • Linux • Cloudera • CDH 5.8.3, 5.11, 5.15.2, 5.16.2 • HDP 2.6.4 • Google Cloud Platform • Amazon EMR (Trillium Quality – now; Trillium Discovery - coming soon) • Windows (coming soon)
  • 22. Turn your data into a trusted view of your customers, products and more Power machine learning and advanced analytics with reliable, fit-for- purpose data Gain actionable business insights from high-volume disparate data sets from across the enterprise Deploy industry- leading data quality processes at massive scale, with no coding or Big Data skills required Trillium DQ evaluates & transforms your data for trusted business insights 22
  • 23. Next Steps For more information on Trillium DQ and our other Syncsort Trillium data quality solutions, please visit: https://www.syncsort.com/en/solutions/data-quality https://www.syncsort.com/en/products/trillium-dq https://www.syncsort.com/en/products/trillium-dq-for-big-data 23