SlideShare uma empresa Scribd logo
1 de 23
SESUG Paper 153-2018
Obtain better data accuracy using reference
table
Kiran Venna, SMACT Works, Inc.
Suryakiran Pothuraju, Unify solutions Inc.
Presenter
Suryakiran Pothuraju is SAS Certified Base & Advanced
Programmer. He uses SAS in his daily job for Analysis,
Data Management and Reporting. @Suryakiran is also
very active in SAS Community and currently holds PROC
STAR role.
Suryakiran Pothuraju
SAS Programmer, Unify solutions Inc
Agenda
• Motivation
• Introduction
• Metadata check for each table using
appropriate reference table.
• Data check for many tables using a single
reference table.
Motivation
• Limitations of Manual Checking
• Data Integrity
Introduction
• Data accuracy is very important
component of data quality.
• Inaccurate data often leads to very
ambiguous insights.
• Data accuracy consists of two main
components, having data in right form with
right values.
Introduction – Contd..
• Reference tables – can improve data
accuracy.
• Metadata check and data check’s with
help of reference tables are topics of this
presentation
Metadata check for each table using
appropriate reference table
Metadata Check
• Creating reference table with appropriate
metadata.
• Comparing column names of source
tables with reference table.
• Validating length, position and data type of
columns in source table with reference
table.
• Reporting results of metadata check.
Creating reference table with
appropriate metadata
proc sql;
create table ref_paylist
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num,
Hired num informat=date7. format=date7.);
Comparing column names of external
file with reference table
proc sql noprint;
select
case when name is missing
then "ALL COLUMN NAMES MATCHED"
else name
end into :COLUMN_NAME_MATCHORNOMATCH SEPARATED by ','
from
(select coalesce(a.name, b.name) as name, count(*)
from (Select name
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('ref_paylist'))a
full join
(Select name
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_paylist1'))b
on upcase(a.name) =upcase(b.name)
where a.name is missing
or b.name is missing);
Validating attributes of columns of
source table with reference table
proc sql;
select
case when name is missing
then "ALL COLUMN Type position length MATCHED"
else name
end into :COLUMN_Values_MATCHorNOMATCH SEPARATED by ','
from
(select coalesce(a.name, b.name) as name, count(*)
from (Select name, varnum, type, length
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('ref_paylist'))a
inner join
(Select name, varnum, type, length
from Dictionary.columns
where upcase(libname)= upcase('work')
and upcase(memname) = upcase('test_paylist1'))b
on upcase(a.name) =upcase(b.name)
where a.varnum ne b.varnum
or a.type ne b.type
or a.length ne b.length);
Reporting results of metadata check
Data Temp;
Length column $500.;
If index(strip("&COLUMN_NAME_MATCHORNOMATCH" ), "ALL COLUMN")>0
then column ="&COLUMN_NAME_MATCHORNOMATCH";
else column =catx(' ', "Unmacthed columns
is/are","&COLUMN_NAME_MATCHORNOMATCH");
output;
If index(strip("&COLUMN_Values_MATCHorNOMATCH " ), "ALL COLUMN")>0
then column ="&COLUMN_Values_MATCHorNOMATCH ";
else column =catx(' ', "Unmacthed columns for type or length or
position is/are","&COLUMN_Values_MATCHorNOMATCH ");
output;
run;
Reporting results of metadata check-
contd..
Proc report data =temp nowd;
column column;
define column/display;
compute column;
if index(column, "ALL COLUMN") = 0 then call define
(_col_, "style", "style={background = red}");
else call define
(_col_, "style", "style={background = green}");
endcomp;
run;
Data check for many tables using a
single reference table.
Data check
• For metadata check, a reference table is
needed for each table.
• For data check a single reference table
can be used to check many external files
or source tables.
• First step – Creating and populating
reference table
• Second step: Capturing bad records.
Creating and populating reference table
proc sql;
create table referencetable
(column_name char(55),
column_value char(20),
column_flag char(1),
column_bucket num);
quit;
Proc sql;
Insert into referencetable values('Gender', 'F', 'Y',1);
Insert into referencetable values('Gender', 'M', 'Y',1);
Insert into referencetable values('Jobcode', '123', 'Y',1);
Insert into referencetable values('Jobcode', '111', 'Y',1);
Insert into referencetable values('Jobcode', '121', 'Y',1);
Insert into referencetable values('Salary', '1000', 'Y',2);
Insert into referencetable values('Birth', '/[0-9]{8}/', 'Y',3);
Bucketing gives flexibility
Capturing bad records in Bucket 1
title "code for checking first bucket(equal to comparison) to capture inaccurate
records";
proc sql;
select distinct idnum
,Gender
, case when Gender = column_value
then column_flag
else "N"
end as gender_code_flag
From test_playlist a
left join referencetable b
on Gender = column_value
where calculated Gender_code_flag = "N";
Capturing bad records in Bucket 2
title " code for checking Second bucket (greater than equal to) to capture
inaccurate records ";
proc sql;
select idnum
,salary
, case when Salary ge column_Value then column_flag
else "N"
end as Salary_flag from
(select idnum, Salary from test_playlist)a
cross join
(select input(column_value,8.) as column_value, column_flag from
referencetable
where column_name = "Salary")b
where calculated salary_flag = "N";
Capturing bad records in Bucket 3
title " code for checking third bucket (pattern match) to capture inaccurate
records";
proc sql;
select idnum
,Birth
,column_value
, case when prxmatch(column_value, trim(put(Birth,8.))) > 0
then column_flag
else "N"
end as Birth_flag
from test_playlist a
left join
referencetable b
on trim(b.column_name) = "Birth"
where calculated Birth_flag = 'N';
All three buckets can be easily placed into the macro, to analyze
multiple variables that fall into each bucket.
Macro (partial) to analyze multiple
variables in each bucket
%macro data_check(ref=,test=);
Proc sql noprint;
select distinct cats('"', column_name, '"') into :ref_col1 separated by ','
from &ref
where column_bucket = 1;
quit;
proc sql noprint;
select distinct name into :curr_col1 separated by ' '
from dictionary.columns
where catx('.', libname, memname) =upcase("&test")
and name in (&ref_col1) ;
quit;
%do i = 1 %to %sysfunc(countw(&curr_col1,' '));
%let val%left(&i) = %scan(&curr_col1, &i);
%let forflag = &&&val&i;
%let forflag1 = _flag;
%let final_flag = &forflag&forflag1;
%put &final_flag;
Macro (partial) to analyze multiple
variables- contd..
title " Inaccurate record captured for variable &&&val&i in bucket 1";
proc sql;
select idnum
,&&&val&i
, case when &&&val&i = column_Value
then column_flag
else "N"
end as &final_flag
from &test a
left join &ref b
on &&&val&i = column_value
where calculated &final_flag = "N";
%end;
%mend;
All the three buckets can be kept in single
macro and is shown in the appendix of the
paper.
Conclusion
• Helpful in finding data inaccuracies before
they make their way into target table.
• Manually checking data is often time
consuming and inaccurate
• Reference table can be easily rebuilt or
enhanced as per requirement without
making any changes to target tables.
Acknowledgements
Thanks to Vinay Srikanth Marapaka, Vikas
Chhabra, Charannag Devarapalli, Ravi
Ganesana, Nagaraju Dande, Sarika Elisetti,
Anvesh Reddy Perati, for helping me with
proof reading and giving their valuable
suggestions.

Mais conteúdo relacionado

Mais procurados

Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL WebinarRam Kedem
 
20150730 Sql AutoCommannotator - sac v2
20150730  Sql AutoCommannotator - sac v220150730  Sql AutoCommannotator - sac v2
20150730 Sql AutoCommannotator - sac v2Sharon Liu
 
Sql tutorial
Sql tutorialSql tutorial
Sql tutorialamitabros
 
Creating a database
Creating a databaseCreating a database
Creating a databaseRahul Gupta
 
Exploring collections with example
Exploring collections with exampleExploring collections with example
Exploring collections with examplepranav kumar verma
 
DDL,DML,SQL Functions and Joins
DDL,DML,SQL Functions and JoinsDDL,DML,SQL Functions and Joins
DDL,DML,SQL Functions and JoinsAshwin Dinoriya
 
MS SQL Database basic
MS SQL Database basicMS SQL Database basic
MS SQL Database basicwali1195189
 
Database Systems - SQL - DDL Statements (Chapter 3/3)
Database Systems - SQL - DDL Statements (Chapter 3/3)Database Systems - SQL - DDL Statements (Chapter 3/3)
Database Systems - SQL - DDL Statements (Chapter 3/3)Vidyasagar Mundroy
 
Sql basics and DDL statements
Sql basics and DDL statementsSql basics and DDL statements
Sql basics and DDL statementsMohd Tousif
 
Sql server ___________session_15(data integrity)
Sql server  ___________session_15(data integrity)Sql server  ___________session_15(data integrity)
Sql server ___________session_15(data integrity)Ehtisham Ali
 
Nested Queries Lecture
Nested Queries LectureNested Queries Lecture
Nested Queries LectureFelipe Costa
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6MYXPLAIN
 

Mais procurados (18)

Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL Webinar
 
20150730 Sql AutoCommannotator - sac v2
20150730  Sql AutoCommannotator - sac v220150730  Sql AutoCommannotator - sac v2
20150730 Sql AutoCommannotator - sac v2
 
Sql tutorial
Sql tutorialSql tutorial
Sql tutorial
 
Db1 lecture4
Db1 lecture4Db1 lecture4
Db1 lecture4
 
Creating a database
Creating a databaseCreating a database
Creating a database
 
Exploring collections with example
Exploring collections with exampleExploring collections with example
Exploring collections with example
 
Sql commands
Sql commandsSql commands
Sql commands
 
DDL,DML,SQL Functions and Joins
DDL,DML,SQL Functions and JoinsDDL,DML,SQL Functions and Joins
DDL,DML,SQL Functions and Joins
 
Ddl commands
Ddl commandsDdl commands
Ddl commands
 
MS SQL Database basic
MS SQL Database basicMS SQL Database basic
MS SQL Database basic
 
SQL Views
SQL ViewsSQL Views
SQL Views
 
Database Systems - SQL - DDL Statements (Chapter 3/3)
Database Systems - SQL - DDL Statements (Chapter 3/3)Database Systems - SQL - DDL Statements (Chapter 3/3)
Database Systems - SQL - DDL Statements (Chapter 3/3)
 
Sql basics and DDL statements
Sql basics and DDL statementsSql basics and DDL statements
Sql basics and DDL statements
 
Sql server ___________session_15(data integrity)
Sql server  ___________session_15(data integrity)Sql server  ___________session_15(data integrity)
Sql server ___________session_15(data integrity)
 
Nested Queries Lecture
Nested Queries LectureNested Queries Lecture
Nested Queries Lecture
 
View & index in SQL
View & index in SQLView & index in SQL
View & index in SQL
 
SQL JOINS
SQL JOINSSQL JOINS
SQL JOINS
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6
 

Semelhante a Obtain better data accuracy using reference tables

CA2_CYS101_31184422012_Arvind-Shukla.pptx
CA2_CYS101_31184422012_Arvind-Shukla.pptxCA2_CYS101_31184422012_Arvind-Shukla.pptx
CA2_CYS101_31184422012_Arvind-Shukla.pptxArvindShukla81
 
Database development coding standards
Database development coding standardsDatabase development coding standards
Database development coding standardsAlessandro Baratella
 
Oracle basic queries
Oracle basic queriesOracle basic queries
Oracle basic queriesPRAKHAR JHA
 
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMSM.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMSSupriya Radhakrishna
 
Intro to tsql unit 11
Intro to tsql   unit 11Intro to tsql   unit 11
Intro to tsql unit 11Syed Asrarali
 
Dictionary.coumns is your friend while appending or moving data
Dictionary.coumns is your friend while appending or moving dataDictionary.coumns is your friend while appending or moving data
Dictionary.coumns is your friend while appending or moving dataKiran Venna
 
SQL Lesson 6 - Select.pdf
SQL Lesson 6 - Select.pdfSQL Lesson 6 - Select.pdf
SQL Lesson 6 - Select.pdfMadhusha15
 
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn Sagar Arlekar
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries InformationNishant Munjal
 
98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.ppt98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.pptHastavaramDineshKuma
 

Semelhante a Obtain better data accuracy using reference tables (20)

SQL.ppt
SQL.pptSQL.ppt
SQL.ppt
 
CA2_CYS101_31184422012_Arvind-Shukla.pptx
CA2_CYS101_31184422012_Arvind-Shukla.pptxCA2_CYS101_31184422012_Arvind-Shukla.pptx
CA2_CYS101_31184422012_Arvind-Shukla.pptx
 
Database development coding standards
Database development coding standardsDatabase development coding standards
Database development coding standards
 
Oracle basic queries
Oracle basic queriesOracle basic queries
Oracle basic queries
 
Scala active record
Scala active recordScala active record
Scala active record
 
Sql wksht-2
Sql wksht-2Sql wksht-2
Sql wksht-2
 
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMSM.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
 
Sql
SqlSql
Sql
 
Intro to tsql unit 11
Intro to tsql   unit 11Intro to tsql   unit 11
Intro to tsql unit 11
 
Dictionary.coumns is your friend while appending or moving data
Dictionary.coumns is your friend while appending or moving dataDictionary.coumns is your friend while appending or moving data
Dictionary.coumns is your friend while appending or moving data
 
SQL Lesson 6 - Select.pdf
SQL Lesson 6 - Select.pdfSQL Lesson 6 - Select.pdf
SQL Lesson 6 - Select.pdf
 
MySql slides (ppt)
MySql slides (ppt)MySql slides (ppt)
MySql slides (ppt)
 
Sql
SqlSql
Sql
 
Sql 
statements , functions & joins
Sql 
statements , functions  &  joinsSql 
statements , functions  &  joins
Sql 
statements , functions & joins
 
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
 
SQL Reports in Koha
SQL Reports in KohaSQL Reports in Koha
SQL Reports in Koha
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries Information
 
Unit3 C
Unit3 C Unit3 C
Unit3 C
 
SQL : introduction
SQL : introductionSQL : introduction
SQL : introduction
 
98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.ppt98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.ppt
 

Último

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Último (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Obtain better data accuracy using reference tables

  • 1. SESUG Paper 153-2018 Obtain better data accuracy using reference table Kiran Venna, SMACT Works, Inc. Suryakiran Pothuraju, Unify solutions Inc.
  • 2. Presenter Suryakiran Pothuraju is SAS Certified Base & Advanced Programmer. He uses SAS in his daily job for Analysis, Data Management and Reporting. @Suryakiran is also very active in SAS Community and currently holds PROC STAR role. Suryakiran Pothuraju SAS Programmer, Unify solutions Inc
  • 3. Agenda • Motivation • Introduction • Metadata check for each table using appropriate reference table. • Data check for many tables using a single reference table.
  • 4. Motivation • Limitations of Manual Checking • Data Integrity
  • 5. Introduction • Data accuracy is very important component of data quality. • Inaccurate data often leads to very ambiguous insights. • Data accuracy consists of two main components, having data in right form with right values.
  • 6. Introduction – Contd.. • Reference tables – can improve data accuracy. • Metadata check and data check’s with help of reference tables are topics of this presentation
  • 7. Metadata check for each table using appropriate reference table
  • 8. Metadata Check • Creating reference table with appropriate metadata. • Comparing column names of source tables with reference table. • Validating length, position and data type of columns in source table with reference table. • Reporting results of metadata check.
  • 9. Creating reference table with appropriate metadata proc sql; create table ref_paylist (IdNum char(4), Gender char(1), Jobcode char(3), Salary num, Birth num, Hired num informat=date7. format=date7.);
  • 10. Comparing column names of external file with reference table proc sql noprint; select case when name is missing then "ALL COLUMN NAMES MATCHED" else name end into :COLUMN_NAME_MATCHORNOMATCH SEPARATED by ',' from (select coalesce(a.name, b.name) as name, count(*) from (Select name from Dictionary.columns where upcase(libname)= upcase('work') and upcase(memname) = upcase('ref_paylist'))a full join (Select name from Dictionary.columns where upcase(libname)= upcase('work') and upcase(memname) = upcase('test_paylist1'))b on upcase(a.name) =upcase(b.name) where a.name is missing or b.name is missing);
  • 11. Validating attributes of columns of source table with reference table proc sql; select case when name is missing then "ALL COLUMN Type position length MATCHED" else name end into :COLUMN_Values_MATCHorNOMATCH SEPARATED by ',' from (select coalesce(a.name, b.name) as name, count(*) from (Select name, varnum, type, length from Dictionary.columns where upcase(libname)= upcase('work') and upcase(memname) = upcase('ref_paylist'))a inner join (Select name, varnum, type, length from Dictionary.columns where upcase(libname)= upcase('work') and upcase(memname) = upcase('test_paylist1'))b on upcase(a.name) =upcase(b.name) where a.varnum ne b.varnum or a.type ne b.type or a.length ne b.length);
  • 12. Reporting results of metadata check Data Temp; Length column $500.; If index(strip("&COLUMN_NAME_MATCHORNOMATCH" ), "ALL COLUMN")>0 then column ="&COLUMN_NAME_MATCHORNOMATCH"; else column =catx(' ', "Unmacthed columns is/are","&COLUMN_NAME_MATCHORNOMATCH"); output; If index(strip("&COLUMN_Values_MATCHorNOMATCH " ), "ALL COLUMN")>0 then column ="&COLUMN_Values_MATCHorNOMATCH "; else column =catx(' ', "Unmacthed columns for type or length or position is/are","&COLUMN_Values_MATCHorNOMATCH "); output; run;
  • 13. Reporting results of metadata check- contd.. Proc report data =temp nowd; column column; define column/display; compute column; if index(column, "ALL COLUMN") = 0 then call define (_col_, "style", "style={background = red}"); else call define (_col_, "style", "style={background = green}"); endcomp; run;
  • 14. Data check for many tables using a single reference table.
  • 15. Data check • For metadata check, a reference table is needed for each table. • For data check a single reference table can be used to check many external files or source tables. • First step – Creating and populating reference table • Second step: Capturing bad records.
  • 16. Creating and populating reference table proc sql; create table referencetable (column_name char(55), column_value char(20), column_flag char(1), column_bucket num); quit; Proc sql; Insert into referencetable values('Gender', 'F', 'Y',1); Insert into referencetable values('Gender', 'M', 'Y',1); Insert into referencetable values('Jobcode', '123', 'Y',1); Insert into referencetable values('Jobcode', '111', 'Y',1); Insert into referencetable values('Jobcode', '121', 'Y',1); Insert into referencetable values('Salary', '1000', 'Y',2); Insert into referencetable values('Birth', '/[0-9]{8}/', 'Y',3); Bucketing gives flexibility
  • 17. Capturing bad records in Bucket 1 title "code for checking first bucket(equal to comparison) to capture inaccurate records"; proc sql; select distinct idnum ,Gender , case when Gender = column_value then column_flag else "N" end as gender_code_flag From test_playlist a left join referencetable b on Gender = column_value where calculated Gender_code_flag = "N";
  • 18. Capturing bad records in Bucket 2 title " code for checking Second bucket (greater than equal to) to capture inaccurate records "; proc sql; select idnum ,salary , case when Salary ge column_Value then column_flag else "N" end as Salary_flag from (select idnum, Salary from test_playlist)a cross join (select input(column_value,8.) as column_value, column_flag from referencetable where column_name = "Salary")b where calculated salary_flag = "N";
  • 19. Capturing bad records in Bucket 3 title " code for checking third bucket (pattern match) to capture inaccurate records"; proc sql; select idnum ,Birth ,column_value , case when prxmatch(column_value, trim(put(Birth,8.))) > 0 then column_flag else "N" end as Birth_flag from test_playlist a left join referencetable b on trim(b.column_name) = "Birth" where calculated Birth_flag = 'N'; All three buckets can be easily placed into the macro, to analyze multiple variables that fall into each bucket.
  • 20. Macro (partial) to analyze multiple variables in each bucket %macro data_check(ref=,test=); Proc sql noprint; select distinct cats('"', column_name, '"') into :ref_col1 separated by ',' from &ref where column_bucket = 1; quit; proc sql noprint; select distinct name into :curr_col1 separated by ' ' from dictionary.columns where catx('.', libname, memname) =upcase("&test") and name in (&ref_col1) ; quit; %do i = 1 %to %sysfunc(countw(&curr_col1,' ')); %let val%left(&i) = %scan(&curr_col1, &i); %let forflag = &&&val&i; %let forflag1 = _flag; %let final_flag = &forflag&forflag1; %put &final_flag;
  • 21. Macro (partial) to analyze multiple variables- contd.. title " Inaccurate record captured for variable &&&val&i in bucket 1"; proc sql; select idnum ,&&&val&i , case when &&&val&i = column_Value then column_flag else "N" end as &final_flag from &test a left join &ref b on &&&val&i = column_value where calculated &final_flag = "N"; %end; %mend; All the three buckets can be kept in single macro and is shown in the appendix of the paper.
  • 22. Conclusion • Helpful in finding data inaccuracies before they make their way into target table. • Manually checking data is often time consuming and inaccurate • Reference table can be easily rebuilt or enhanced as per requirement without making any changes to target tables.
  • 23. Acknowledgements Thanks to Vinay Srikanth Marapaka, Vikas Chhabra, Charannag Devarapalli, Ravi Ganesana, Nagaraju Dande, Sarika Elisetti, Anvesh Reddy Perati, for helping me with proof reading and giving their valuable suggestions.