SlideShare uma empresa Scribd logo
1 de 31
Bringing OpenClinica Data into SAS rick.watts@ualberta.ca 780-248-1170
CRIC and OpenClinica CRIC supports a wide variety of studies ‘Regulatory’ clinical trials Many different types of academic study Variable size and complexity Investigators design their own CRFs CRIC has limited control over design strategies and CRF consistency. Analysis requirements and data formats vary SPSS, Stata, SAS, Excel. CRIC’s Preferred data handling tool is SAS
OpenClinica exports seem difficult for our users to work with. Data structures vary depending on the data content. CRF versions (repeat as extra columns) Group contents (number of repeats) Multi-select objects difficult to handle. Must be ‘broken’ into separate variables for analysis. Null values represented as text in otherwise numeric variables OpenClinica Export
The Challenge We wanted to: Produce consistently usable data for minimal up front effort. Get data that could easily be transferred into different formats. Produce tall, thin, de-normalized data sets suitable for data management purposes. Leverage CRF metadata to add value: Dataset labels Variable labels SAS formats and informats SAS special missing values.
Create ‘SAS friendly’ XML to be read by the XML Libname engine. Create a SAS XML Map file to assign labels, data types, informats and formats. Generate a CNTLIN data set in the XML suitable for use by PROC FORMAT. Note: The XML file can also be imported directly into MS Access. The Solution
SAS macros or external utility? Hi complexity Ensure OpenClinica metadata translated into legal SAS names. Map OC hierarchy to SAS data sets. CRFs, sections, groups and data items to tables, rows and columns. De-duplicate object names No resource to develop complex macros Development Approach
Command Line Java Utility Programmer available (I would have to write SAS code myself!) Capable development environment Portable (Windows / Linux) Callable from within SAS The Choice
Enter connection parameters and study identifier (interactively or command line) Connect to Postgres via ODBC Read study metadata Manipulate the metadata Write map file Read study data Write data file Data Processing
Legalize Names SAS names <= 32 characters Must start with a letter or underscore Format names cannot end in a number De-duplicate names Multiple CRFs may contain the same section and response option names. Duplicate names have numbers and underscores appended. Metadata Manipulations
CRFs No ‘top level’ mapping between CRFs and data sets. CRF Section -> SAS data set CRF sections contain logically grouped data – CRFs may not! CRFs containing multiple sections result in multiple output data sets. Every data item contained within a section is output to the same data set. Section label -> dataset name Section title -> dataset label Metadata Manipulations
Groups -> Rows Ungrouped section data repeated in each row Each repeat becomes a separate row in the data set Rows are numbered to provide a unique key based on their order within the group. Multiple groups contained within the same section are merged based on order within the groups. Where groups contain unequal numbers of rows missing values result. Metadata Manipulations
CRF items -> dataset variables Item_name -> variable name Description_label -> variable label Calculate length of character variables SAS has no support for VARCHARs. Explicitly specifying variable length saves considerable space on disk. Metadata Manipulations
A new column is created for each response value Column names based on item_name Columns labeled based on item_label and response option value. Columns contain 1 or 0 to indicate selected or unselected. Multi-select and Checkbox items
Response option lists become SAS formats and informats. Format names created from CRF item’s response_label. Format names legalized and de-duplicated. If separate CRFs contain identical response option lists only one format results. Formats and Informats are written to the XML as a new data table. This is used as a CNTRLIN data set for PROC FORMAT. Response Options
Informats are created to read numeric data and handle OpenClinica null values. CRF Dates procformat; invaluecrfdate'ASKU' = .k 'NA'   = .a 'NASK' = .d 'NI'   = .i 'NP'   = .p 'OTH'  = .o 'UNK'  = .u 				other  = [mmddyy10.]; run; Missing Values
Numeric Response Options procformat; invaluebestnull'ASKU' = .k 'NA'   = .a 'NASK' = .d 'NI'   = .i 'NP'   = .p 'OTH'  = .o 'UNK'  = .u 				other  = [best10.]; run; Missing Values
Formats are created for CRF data. Response options procformat; valueyesno0 = 'No' 1 = 'Yes' .k = 'ASKU' .a = 'NA' .d = 'NASK' .i = 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK'; run; Missing Values
Dates procformat; valuecrfdate	.k	= 'ASKU' .a	= 'NA' .d	= 'NASK' .i= 'NI' .p	= 'NP' .o	= 'OTH' .u	= 'UNK‘ Other	= [date9.]; run; Missing Values
Numeric Data procformat; valuebestnull	.k	= 'ASKU' .a	= 'NA' .d	= 'NASK' .i= 'NI' .p	= 'NP' .o	= 'OTH' .u	= 'UNK‘ Other	= [best10.] ; run; Missing Values
CRF Data One data set per CRF section Each row contains: Study ID Site ID Subject ID Study event name Event start and end date CRF Name CRF Version Data Set Output
Subject Data List of subjects including site, secondary ID, group, etc.  Event Data List of subjects study events including start date, end date and status. CRF Status List of subject CRFs including event details, CRF version, creation date, completion date and status. Discrepancies Output Data Sets
Data for removed subjects is not exported. PHI data remains encrypted . Output Data Sets
C:> java -jar export.jar ----------------------------------------              Export Output:              ----------------------------------------           MAP FILE: export.map.xml        EXPORT FILE: export.xml ---------------------------------------- Postgresql driver loaded   Enter Database url (default: localhost): Database port (default: 5432): Database name (default: openclinica): username (default: clinica): password:    Enter Export file name (default: derived from study): Enter Map file name (default: derived from study): Interactive Execution
Successful connection to database openclinica on jdbc:postgresql://localhost:5432/   Please choose a study: ----------------------  1) Study1  2) Study2  3) Study3  4) Study4 ==> 1   Retrieving study metadata Creating subject table Writing formats to .xml file Writing subjects to .xml file Retrieving study item data Writing study item data to file Complete Files generated:   study1.map.xml                            Study1.xml Interactive Execution
Command line options may be used rather than prompts. Options include: Host, database, ID and password Study OID File names Suppression of map file Creation of ‘SPSS friendly’ SAS data sets Minimal formatting allows data sets to be exported to SPSS using PROC EXPORT. Command line options allow the utility to be executed from within SAS. Command Line Options
Define libraries libnameocdata xml92 “data_file.xml"xmlmap=“map_file.map“ access=readonly; libname library “c:rojectmt"; libnamestdylib“c:rojectata"; SAS Code
Execute the Import %letscommand	=java -Xmx256m -jar c:xportxport.jar; %letshost	=-h 10.11.12.13; %let sport	=-p 5432; %letsstudy	=-soid S_STDY1234; %letsdatabase	=-D openclinica; %letsuser	=-U dbuserid; %letspswd	=-P password; %letspss	= ; X "&scommand &shost &sport &sstudy &sdatabase &suser &spswd &smapFile &sdataFile &spss"; SAS Code
Create the Format Catalog from the XML procsortdata=ocdata92.fmtlib out=work.fmtlib; byfmtname type start; run; procformatcntlin=work.fmtliblibrary=library fmtlib; run; SAS Code
Copy the Data Sets procdatasetslibrary=ocdata92; copyout=studylib; excludefmtlib;  quit; SAS Code
Import into SAS If we have time: XML Structures Import into Access Import into Excel Do It!
Rick Watts rick.watts@ualberta.ca 780-248-1170 Contact

Mais conteúdo relacionado

Mais procurados

Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Aaa ped-6-Data manipulation:  Data Files, and Data Cleaning & PreparationAaa ped-6-Data manipulation:  Data Files, and Data Cleaning & Preparation
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & PreparationAminaRepo
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFramePrashant Gupta
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming FundamentalsRagia Ibrahim
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureWake Tech BAS
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureWake Tech BAS
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesGuohui Xiao
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Database management system chapter12
Database management system chapter12Database management system chapter12
Database management system chapter12Md. Mahedi Mahfuj
 
Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop DatabasesMartín Rezk
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System iChuck Walker
 
Programming in C++ and Data Strucutres
Programming in C++ and Data StrucutresProgramming in C++ and Data Strucutres
Programming in C++ and Data StrucutresDr. C.V. Suresh Babu
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environmentizahn
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R StudioRupak Roy
 

Mais procurados (20)

Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Aaa ped-6-Data manipulation:  Data Files, and Data Cleaning & PreparationAaa ped-6-Data manipulation:  Data Files, and Data Cleaning & Preparation
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
 
Spark core
Spark coreSpark core
Spark core
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFrame
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
SAS - overview of SAS
SAS - overview of SASSAS - overview of SAS
SAS - overview of SAS
 
Database management system chapter12
Database management system chapter12Database management system chapter12
Database management system chapter12
 
Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
 
SPARQL 1.1 Status
SPARQL 1.1 StatusSPARQL 1.1 Status
SPARQL 1.1 Status
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System i
 
Unit 08 dbms
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
 
Programming in C++ and Data Strucutres
Programming in C++ and Data StrucutresProgramming in C++ and Data Strucutres
Programming in C++ and Data Strucutres
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 

Semelhante a Bringing OpenClinica Data into SAS

MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...VMware Tanzu
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Mark Kromer
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSyed Hadoop
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingDatabricks
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...confluent
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Databricks
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldDatabricks
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 
Informatica overview
Informatica overviewInformatica overview
Informatica overviewkarthik kumar
 
Informatica overview
Informatica overviewInformatica overview
Informatica overviewkarthik kumar
 

Semelhante a Bringing OpenClinica Data into SAS (20)

MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
SAS - Training
SAS - Training SAS - Training
SAS - Training
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
 
HEPData workshop talk
HEPData workshop talkHEPData workshop talk
HEPData workshop talk
 

Bringing OpenClinica Data into SAS

  • 1. Bringing OpenClinica Data into SAS rick.watts@ualberta.ca 780-248-1170
  • 2. CRIC and OpenClinica CRIC supports a wide variety of studies ‘Regulatory’ clinical trials Many different types of academic study Variable size and complexity Investigators design their own CRFs CRIC has limited control over design strategies and CRF consistency. Analysis requirements and data formats vary SPSS, Stata, SAS, Excel. CRIC’s Preferred data handling tool is SAS
  • 3. OpenClinica exports seem difficult for our users to work with. Data structures vary depending on the data content. CRF versions (repeat as extra columns) Group contents (number of repeats) Multi-select objects difficult to handle. Must be ‘broken’ into separate variables for analysis. Null values represented as text in otherwise numeric variables OpenClinica Export
  • 4. The Challenge We wanted to: Produce consistently usable data for minimal up front effort. Get data that could easily be transferred into different formats. Produce tall, thin, de-normalized data sets suitable for data management purposes. Leverage CRF metadata to add value: Dataset labels Variable labels SAS formats and informats SAS special missing values.
  • 5. Create ‘SAS friendly’ XML to be read by the XML Libname engine. Create a SAS XML Map file to assign labels, data types, informats and formats. Generate a CNTLIN data set in the XML suitable for use by PROC FORMAT. Note: The XML file can also be imported directly into MS Access. The Solution
  • 6. SAS macros or external utility? Hi complexity Ensure OpenClinica metadata translated into legal SAS names. Map OC hierarchy to SAS data sets. CRFs, sections, groups and data items to tables, rows and columns. De-duplicate object names No resource to develop complex macros Development Approach
  • 7. Command Line Java Utility Programmer available (I would have to write SAS code myself!) Capable development environment Portable (Windows / Linux) Callable from within SAS The Choice
  • 8. Enter connection parameters and study identifier (interactively or command line) Connect to Postgres via ODBC Read study metadata Manipulate the metadata Write map file Read study data Write data file Data Processing
  • 9. Legalize Names SAS names <= 32 characters Must start with a letter or underscore Format names cannot end in a number De-duplicate names Multiple CRFs may contain the same section and response option names. Duplicate names have numbers and underscores appended. Metadata Manipulations
  • 10. CRFs No ‘top level’ mapping between CRFs and data sets. CRF Section -> SAS data set CRF sections contain logically grouped data – CRFs may not! CRFs containing multiple sections result in multiple output data sets. Every data item contained within a section is output to the same data set. Section label -> dataset name Section title -> dataset label Metadata Manipulations
  • 11. Groups -> Rows Ungrouped section data repeated in each row Each repeat becomes a separate row in the data set Rows are numbered to provide a unique key based on their order within the group. Multiple groups contained within the same section are merged based on order within the groups. Where groups contain unequal numbers of rows missing values result. Metadata Manipulations
  • 12. CRF items -> dataset variables Item_name -> variable name Description_label -> variable label Calculate length of character variables SAS has no support for VARCHARs. Explicitly specifying variable length saves considerable space on disk. Metadata Manipulations
  • 13. A new column is created for each response value Column names based on item_name Columns labeled based on item_label and response option value. Columns contain 1 or 0 to indicate selected or unselected. Multi-select and Checkbox items
  • 14. Response option lists become SAS formats and informats. Format names created from CRF item’s response_label. Format names legalized and de-duplicated. If separate CRFs contain identical response option lists only one format results. Formats and Informats are written to the XML as a new data table. This is used as a CNTRLIN data set for PROC FORMAT. Response Options
  • 15. Informats are created to read numeric data and handle OpenClinica null values. CRF Dates procformat; invaluecrfdate'ASKU' = .k 'NA' = .a 'NASK' = .d 'NI' = .i 'NP' = .p 'OTH' = .o 'UNK' = .u other = [mmddyy10.]; run; Missing Values
  • 16. Numeric Response Options procformat; invaluebestnull'ASKU' = .k 'NA' = .a 'NASK' = .d 'NI' = .i 'NP' = .p 'OTH' = .o 'UNK' = .u other = [best10.]; run; Missing Values
  • 17. Formats are created for CRF data. Response options procformat; valueyesno0 = 'No' 1 = 'Yes' .k = 'ASKU' .a = 'NA' .d = 'NASK' .i = 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK'; run; Missing Values
  • 18. Dates procformat; valuecrfdate .k = 'ASKU' .a = 'NA' .d = 'NASK' .i= 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK‘ Other = [date9.]; run; Missing Values
  • 19. Numeric Data procformat; valuebestnull .k = 'ASKU' .a = 'NA' .d = 'NASK' .i= 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK‘ Other = [best10.] ; run; Missing Values
  • 20. CRF Data One data set per CRF section Each row contains: Study ID Site ID Subject ID Study event name Event start and end date CRF Name CRF Version Data Set Output
  • 21. Subject Data List of subjects including site, secondary ID, group, etc. Event Data List of subjects study events including start date, end date and status. CRF Status List of subject CRFs including event details, CRF version, creation date, completion date and status. Discrepancies Output Data Sets
  • 22. Data for removed subjects is not exported. PHI data remains encrypted . Output Data Sets
  • 23. C:> java -jar export.jar ---------------------------------------- Export Output: ---------------------------------------- MAP FILE: export.map.xml EXPORT FILE: export.xml ---------------------------------------- Postgresql driver loaded   Enter Database url (default: localhost): Database port (default: 5432): Database name (default: openclinica): username (default: clinica): password:   Enter Export file name (default: derived from study): Enter Map file name (default: derived from study): Interactive Execution
  • 24. Successful connection to database openclinica on jdbc:postgresql://localhost:5432/   Please choose a study: ---------------------- 1) Study1 2) Study2 3) Study3 4) Study4 ==> 1   Retrieving study metadata Creating subject table Writing formats to .xml file Writing subjects to .xml file Retrieving study item data Writing study item data to file Complete Files generated: study1.map.xml Study1.xml Interactive Execution
  • 25. Command line options may be used rather than prompts. Options include: Host, database, ID and password Study OID File names Suppression of map file Creation of ‘SPSS friendly’ SAS data sets Minimal formatting allows data sets to be exported to SPSS using PROC EXPORT. Command line options allow the utility to be executed from within SAS. Command Line Options
  • 26. Define libraries libnameocdata xml92 “data_file.xml"xmlmap=“map_file.map“ access=readonly; libname library “c:rojectmt"; libnamestdylib“c:rojectata"; SAS Code
  • 27. Execute the Import %letscommand =java -Xmx256m -jar c:xportxport.jar; %letshost =-h 10.11.12.13; %let sport =-p 5432; %letsstudy =-soid S_STDY1234; %letsdatabase =-D openclinica; %letsuser =-U dbuserid; %letspswd =-P password; %letspss = ; X "&scommand &shost &sport &sstudy &sdatabase &suser &spswd &smapFile &sdataFile &spss"; SAS Code
  • 28. Create the Format Catalog from the XML procsortdata=ocdata92.fmtlib out=work.fmtlib; byfmtname type start; run; procformatcntlin=work.fmtliblibrary=library fmtlib; run; SAS Code
  • 29. Copy the Data Sets procdatasetslibrary=ocdata92; copyout=studylib; excludefmtlib; quit; SAS Code
  • 30. Import into SAS If we have time: XML Structures Import into Access Import into Excel Do It!
  • 31. Rick Watts rick.watts@ualberta.ca 780-248-1170 Contact