SQL Server 2012
SQL Server 2014
Melissa Data MVP
What is record matching?
DQS Matching Process
DQS Data Matching Principles
Modifications, removal, correcting data or incomplete,
either computer-assisted or interactively.
Identification of duplicates in a rules-based process,
perform de-duplication., verifying data quality using
reference data provider. Use reference data services from
Azure Marketplace providers
Analysis of data for insight into its data quality , domain
management, matching, and data cleansing processes.
Profiling is a powerful tool in a DQS data quality solution.
Determine the state of data quality activities. Validate
data quality solution is doing what it was designed to do.
DQS is a knowledge-driven solution , analyzing data using
knowledge built with DQS. Create data quality processes
to enhance the knowledge of data , continuously
improving data quality
• Create a Matching Policy
• Data Quality Matching
• Match Similar Data
Master Data Services Configuration Manager
Tool to create and configure Master Data Services
databases and web applications.
Master Data Manager
Web application for performing administrative tasks
(creating a model or business rule), and that users
access to modify master data.
Tool to create packages of your model objects and
data, for deploying to other environments.
Master Data Services web service
Developers can use to develop custom solutions for
Master Data Services.
Master Data Services Add-in for Excel
Manage data and create new entities and attributes.
Record matching is the task of identifying
records that match the same real world
The Cost of Duplicate Data
…a few examples…
Direct marketing communications are doubled up unnecessarily.
Product shipments and customer-site based services could be sent
to the wrong address due to an incorrect duplicate record being
Your sales reporting may be inaccurate due to an over-
inflated number of customers.
Inaccurate sales analysis due to sales being split between multiple
records that represent the same customer, resulting in an
undervaluing of some key customers.
Where do Duplicate Records come from?
Poorly designed software No verification of existing records upon entry
"Doctor Robert Smith" Vs. "Dr. Bob Smith".
Data validation Human errors can creep into the system when fields’
input is not validated
Company merging and
Merging systems may result in duplicates in the merged
Change of attributes The same person may appear to not exist in the
database if some of the attributes were changed
(e.g., address, name etc.)
There are different ways to represent the same person or address in a database:
Data is ‘fuzzy’ in nature (spelling mistakes, abbreviations etc.).
How Data Issues Affects Matching?
Matching Results Reasoning
Identifies exact and approximate matches, enabling
removal of duplicate data.
Enables creating a matching policy interactively using a
Ensures that values that are equivalent, but were entered
in a different format or style, are in fact rendered
A matching policy is prepared in the knowledge base.
A matching policy consists of matching rules that
assess how well one record matches to another.
Specify in the rule whether records’ values have to be
an exact match, similar, or prerequisite.
Train your policy by running and tuning each rule
Identify the attributes in your data that are most
significant for matching.
Create domains/composite domains based on your data
Define matching rules.
Birth Date Gender
Composite Domain Full Name
F. Name M. Name L. Name Email Phone
Composite Domain Full Address
Street City State Country
Similarity, select Similar if field values can be similar. Select Exact if field values must
Weight, determines the contribution of each domain in the rule to the overall
matching score for two records.
Prerequisite validates whether field values return a 100% match; else the records are
not considered a match.
Minimum matching score is the threshold at or above which two records are
considered to be a match.
Domains of type ‘Date’, ‘Integer’ or ‘Decimal’ can be matched using the
‘Similar’ property by assigning a tolerance either in percentage or integer.
Field values that fall within the defined tolerance are considered a match.
Uniqueness Usage Description Domains
Low • Define as Prerequisite
• Define with lower weights
Gender, City, State
High • Define as Similar or Exact
• Define with higher weights
Provides highly identifiable
information and is highly
Names (First, Last,
Address Line 1
Completeness Usage Description
Low Do not use or define with low weight High level of missing values
High Include for matching if the column
provides highly identifiable
Low level of missing values
• The Matching Results tab displays statistics for the current and
previous run of a matching rule.
• Restore the previous rule.
The DQS matching system uses the knowledge accumulated in the
knowledge base to propose matching candidates. This knowledge
Synonyms, Syntax Errors and their Leading Value (by domain)
Domain Values and their synonyms and syntax errors are used
by the matching system to find identical or similar records.
Term-Based Relations (TBR)
TBR improves consistency of data attributes values by
transforming data values to a single form using user-defined
term relations. In matching, TBRs are only applied in-memory
for boosting matching accuracy.
Nulls and Equivalents (“Unknown”, “99999”…)
Manage values that represent missing data by linking to the
‘DQS_Null’ value to assure that they are considered as a
String 1 String 2 Similarity Score Character
175 CLEARBROOK ROAD P.O. BOX 535 175 CLEARBROOK ROAD P.O.BOX 535 0.92 1.00 .
1834 E. 42ND STREET 1834 E. 42ND. ST. 0.695 0.857 .
1721 DE KALB AVE, NE 1721 DE KALB AVE NE 0.88 1.00 ,
14538 S. GARFIELD AVE., BLDG. 1-B 14538 S GARFIELD AVE BLDG 1B 0.676 0.944 , . -
#704, SJ Technoville BD, 60-19 704 SJ Technoville BD 60 19 0.65 1.00 # , -
Export - export both matching results (clusters) and survivors
A Matching project is performed in three steps:
Mapping - map source columns to domains.
Matching - run matching and view the results; it includes additional
functionality such as:
• Reject records
• Filter results by ‘Matched’ & ‘Unmatched’ and by matching
• Display clusters in two different methods (overlapping and
non- overlapping )
In Overlapping clusters a record may appear more than once in various clustered
results. This structure may be harder to read since the same record exists in multiple
In Non-Overlapping clusters, the system unifies clusters containing the same
record. This structure is easier to read as you won't repeat the same observation
(A~B) , (B~C)
Check the Rejected box to move the records out of the proposed cluster upon
moving to the next page in the activity. Unlike the Cleansing Data Project where
records move between tabs instantly, the rejected records are not removed from
the clusters on the user interface.
DQS Client User Interface
Exported Matching Results
Matching and Survivorship results can be exported to a SQL table,
Excel or CSV file for further analysis or consumption.
in a matching rule
In a Matching Rule Minimum matching score parameter
Parece que tem um bloqueador de anúncios ativo. Ao listar o SlideShare no seu bloqueador de anúncios, está a apoiar a nossa comunidade de criadores de conteúdo.
Atualizámos a nossa política de privacidade.
Atualizámos a nossa política de privacidade de modo a estarmos em conformidade com os regulamentos de privacidade em constante mutação a nível mundial e para lhe fornecer uma visão sobre as formas limitadas de utilização dos seus dados.
Pode ler os detalhes abaixo. Ao aceitar, está a concordar com a política de privacidade atualizada.