2. CRIC and OpenClinica CRIC supports a wide variety of studies ‘Regulatory’ clinical trials Many different types of academic study Variable size and complexity Investigators design their own CRFs CRIC has limited control over design strategies and CRF consistency. Analysis requirements and data formats vary SPSS, Stata, SAS, Excel. CRIC’s Preferred data handling tool is SAS
3. OpenClinica exports seem difficult for our users to work with. Data structures vary depending on the data content. CRF versions (repeat as extra columns) Group contents (number of repeats) Multi-select objects difficult to handle. Must be ‘broken’ into separate variables for analysis. Null values represented as text in otherwise numeric variables OpenClinica Export
4. The Challenge We wanted to: Produce consistently usable data for minimal up front effort. Get data that could easily be transferred into different formats. Produce tall, thin, de-normalized data sets suitable for data management purposes. Leverage CRF metadata to add value: Dataset labels Variable labels SAS formats and informats SAS special missing values.
5. Create ‘SAS friendly’ XML to be read by the XML Libname engine. Create a SAS XML Map file to assign labels, data types, informats and formats. Generate a CNTLIN data set in the XML suitable for use by PROC FORMAT. Note: The XML file can also be imported directly into MS Access. The Solution
6. SAS macros or external utility? Hi complexity Ensure OpenClinica metadata translated into legal SAS names. Map OC hierarchy to SAS data sets. CRFs, sections, groups and data items to tables, rows and columns. De-duplicate object names No resource to develop complex macros Development Approach
7. Command Line Java Utility Programmer available (I would have to write SAS code myself!) Capable development environment Portable (Windows / Linux) Callable from within SAS The Choice
8. Enter connection parameters and study identifier (interactively or command line) Connect to Postgres via ODBC Read study metadata Manipulate the metadata Write map file Read study data Write data file Data Processing
9. Legalize Names SAS names <= 32 characters Must start with a letter or underscore Format names cannot end in a number De-duplicate names Multiple CRFs may contain the same section and response option names. Duplicate names have numbers and underscores appended. Metadata Manipulations
10. CRFs No ‘top level’ mapping between CRFs and data sets. CRF Section -> SAS data set CRF sections contain logically grouped data – CRFs may not! CRFs containing multiple sections result in multiple output data sets. Every data item contained within a section is output to the same data set. Section label -> dataset name Section title -> dataset label Metadata Manipulations
11. Groups -> Rows Ungrouped section data repeated in each row Each repeat becomes a separate row in the data set Rows are numbered to provide a unique key based on their order within the group. Multiple groups contained within the same section are merged based on order within the groups. Where groups contain unequal numbers of rows missing values result. Metadata Manipulations
12. CRF items -> dataset variables Item_name -> variable name Description_label -> variable label Calculate length of character variables SAS has no support for VARCHARs. Explicitly specifying variable length saves considerable space on disk. Metadata Manipulations
13. A new column is created for each response value Column names based on item_name Columns labeled based on item_label and response option value. Columns contain 1 or 0 to indicate selected or unselected. Multi-select and Checkbox items
14. Response option lists become SAS formats and informats. Format names created from CRF item’s response_label. Format names legalized and de-duplicated. If separate CRFs contain identical response option lists only one format results. Formats and Informats are written to the XML as a new data table. This is used as a CNTRLIN data set for PROC FORMAT. Response Options
15. Informats are created to read numeric data and handle OpenClinica null values. CRF Dates procformat; invaluecrfdate'ASKU' = .k 'NA' = .a 'NASK' = .d 'NI' = .i 'NP' = .p 'OTH' = .o 'UNK' = .u other = [mmddyy10.]; run; Missing Values
20. CRF Data One data set per CRF section Each row contains: Study ID Site ID Subject ID Study event name Event start and end date CRF Name CRF Version Data Set Output
21. Subject Data List of subjects including site, secondary ID, group, etc. Event Data List of subjects study events including start date, end date and status. CRF Status List of subject CRFs including event details, CRF version, creation date, completion date and status. Discrepancies Output Data Sets
22. Data for removed subjects is not exported. PHI data remains encrypted . Output Data Sets
23. C:> java -jar export.jar ---------------------------------------- Export Output: ---------------------------------------- MAP FILE: export.map.xml EXPORT FILE: export.xml ---------------------------------------- Postgresql driver loaded Enter Database url (default: localhost): Database port (default: 5432): Database name (default: openclinica): username (default: clinica): password: Enter Export file name (default: derived from study): Enter Map file name (default: derived from study): Interactive Execution
24. Successful connection to database openclinica on jdbc:postgresql://localhost:5432/ Please choose a study: ---------------------- 1) Study1 2) Study2 3) Study3 4) Study4 ==> 1 Retrieving study metadata Creating subject table Writing formats to .xml file Writing subjects to .xml file Retrieving study item data Writing study item data to file Complete Files generated: study1.map.xml Study1.xml Interactive Execution
25. Command line options may be used rather than prompts. Options include: Host, database, ID and password Study OID File names Suppression of map file Creation of ‘SPSS friendly’ SAS data sets Minimal formatting allows data sets to be exported to SPSS using PROC EXPORT. Command line options allow the utility to be executed from within SAS. Command Line Options
28. Create the Format Catalog from the XML procsortdata=ocdata92.fmtlib out=work.fmtlib; byfmtname type start; run; procformatcntlin=work.fmtliblibrary=library fmtlib; run; SAS Code
29. Copy the Data Sets procdatasetslibrary=ocdata92; copyout=studylib; excludefmtlib; quit; SAS Code
30. Import into SAS If we have time: XML Structures Import into Access Import into Excel Do It!