Bringing OpenClinica Data into SAS

Bringing OpenClinica Data into SAS rick.watts@ualberta.ca 780-248-1170

CRIC and OpenClinica CRIC supports a wide variety of studies ‘Regulatory’ clinical trials Many different types of academic study Variable size and complexity Investigators design their own CRFs CRIC has limited control over design strategies and CRF consistency. Analysis requirements and data formats vary SPSS, Stata, SAS, Excel. CRIC’s Preferred data handling tool is SAS

OpenClinica exports seem difficult for our users to work with. Data structures vary depending on the data content. CRF versions (repeat as extra columns) Group contents (number of repeats) Multi-select objects difficult to handle. Must be ‘broken’ into separate variables for analysis. Null values represented as text in otherwise numeric variables OpenClinica Export

The Challenge We wanted to: Produce consistently usable data for minimal up front effort. Get data that could easily be transferred into different formats. Produce tall, thin, de-normalized data sets suitable for data management purposes. Leverage CRF metadata to add value: Dataset labels Variable labels SAS formats and informats SAS special missing values.

Create ‘SAS friendly’ XML to be read by the XML Libname engine. Create a SAS XML Map file to assign labels, data types, informats and formats. Generate a CNTLIN data set in the XML suitable for use by PROC FORMAT. Note: The XML file can also be imported directly into MS Access. The Solution

SAS macros or external utility? Hi complexity Ensure OpenClinica metadata translated into legal SAS names. Map OC hierarchy to SAS data sets. CRFs, sections, groups and data items to tables, rows and columns. De-duplicate object names No resource to develop complex macros Development Approach

Command Line Java Utility Programmer available (I would have to write SAS code myself!) Capable development environment Portable (Windows / Linux) Callable from within SAS The Choice

Enter connection parameters and study identifier (interactively or command line) Connect to Postgres via ODBC Read study metadata Manipulate the metadata Write map file Read study data Write data file Data Processing

Legalize Names SAS names <= 32 characters Must start with a letter or underscore Format names cannot end in a number De-duplicate names Multiple CRFs may contain the same section and response option names. Duplicate names have numbers and underscores appended. Metadata Manipulations

CRFs No ‘top level’ mapping between CRFs and data sets. CRF Section -> SAS data set CRF sections contain logically grouped data – CRFs may not! CRFs containing multiple sections result in multiple output data sets. Every data item contained within a section is output to the same data set. Section label -> dataset name Section title -> dataset label Metadata Manipulations

Groups -> Rows Ungrouped section data repeated in each row Each repeat becomes a separate row in the data set Rows are numbered to provide a unique key based on their order within the group. Multiple groups contained within the same section are merged based on order within the groups. Where groups contain unequal numbers of rows missing values result. Metadata Manipulations

CRF items -> dataset variables Item_name -> variable name Description_label -> variable label Calculate length of character variables SAS has no support for VARCHARs. Explicitly specifying variable length saves considerable space on disk. Metadata Manipulations

A new column is created for each response value Column names based on item_name Columns labeled based on item_label and response option value. Columns contain 1 or 0 to indicate selected or unselected. Multi-select and Checkbox items

Response option lists become SAS formats and informats. Format names created from CRF item’s response_label. Format names legalized and de-duplicated. If separate CRFs contain identical response option lists only one format results. Formats and Informats are written to the XML as a new data table. This is used as a CNTRLIN data set for PROC FORMAT. Response Options

Informats are created to read numeric data and handle OpenClinica null values. CRF Dates procformat; invaluecrfdate'ASKU' = .k 'NA' = .a 'NASK' = .d 'NI' = .i 'NP' = .p 'OTH' = .o 'UNK' = .u other = [mmddyy10.]; run; Missing Values

Numeric Response Options procformat; invaluebestnull'ASKU' = .k 'NA' = .a 'NASK' = .d 'NI' = .i 'NP' = .p 'OTH' = .o 'UNK' = .u other = [best10.]; run; Missing Values

Formats are created for CRF data. Response options procformat; valueyesno0 = 'No' 1 = 'Yes' .k = 'ASKU' .a = 'NA' .d = 'NASK' .i = 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK'; run; Missing Values

Dates procformat; valuecrfdate .k = 'ASKU' .a = 'NA' .d = 'NASK' .i= 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK‘ Other = [date9.]; run; Missing Values

Numeric Data procformat; valuebestnull .k = 'ASKU' .a = 'NA' .d = 'NASK' .i= 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK‘ Other = [best10.] ; run; Missing Values

CRF Data One data set per CRF section Each row contains: Study ID Site ID Subject ID Study event name Event start and end date CRF Name CRF Version Data Set Output

Subject Data List of subjects including site, secondary ID, group, etc. Event Data List of subjects study events including start date, end date and status. CRF Status List of subject CRFs including event details, CRF version, creation date, completion date and status. Discrepancies Output Data Sets

Data for removed subjects is not exported. PHI data remains encrypted . Output Data Sets

C:> java -jar export.jar ---------------------------------------- Export Output: ---------------------------------------- MAP FILE: export.map.xml EXPORT FILE: export.xml ---------------------------------------- Postgresql driver loaded Enter Database url (default: localhost): Database port (default: 5432): Database name (default: openclinica): username (default: clinica): password: Enter Export file name (default: derived from study): Enter Map file name (default: derived from study): Interactive Execution

Successful connection to database openclinica on jdbc:postgresql://localhost:5432/ Please choose a study: ---------------------- 1) Study1 2) Study2 3) Study3 4) Study4 ==> 1 Retrieving study metadata Creating subject table Writing formats to .xml file Writing subjects to .xml file Retrieving study item data Writing study item data to file Complete Files generated: study1.map.xml Study1.xml Interactive Execution

Command line options may be used rather than prompts. Options include: Host, database, ID and password Study OID File names Suppression of map file Creation of ‘SPSS friendly’ SAS data sets Minimal formatting allows data sets to be exported to SPSS using PROC EXPORT. Command line options allow the utility to be executed from within SAS. Command Line Options

Define libraries libnameocdata xml92 “data_file.xml"xmlmap=“map_file.map“ access=readonly; libname library “c:rojectmt"; libnamestdylib“c:rojectata"; SAS Code

Execute the Import %letscommand =java -Xmx256m -jar c:xportxport.jar; %letshost =-h 10.11.12.13; %let sport =-p 5432; %letsstudy =-soid S_STDY1234; %letsdatabase =-D openclinica; %letsuser =-U dbuserid; %letspswd =-P password; %letspss = ; X "&scommand &shost &sport &sstudy &sdatabase &suser &spswd &smapFile &sdataFile &spss"; SAS Code

Create the Format Catalog from the XML procsortdata=ocdata92.fmtlib out=work.fmtlib; byfmtname type start; run; procformatcntlin=work.fmtliblibrary=library fmtlib; run; SAS Code

Copy the Data Sets procdatasetslibrary=ocdata92; copyout=studylib; excludefmtlib; quit; SAS Code

Import into SAS If we have time: XML Structures Import into Access Import into Excel Do It!

Rick Watts rick.watts@ualberta.ca 780-248-1170 Contact

Bringing OpenClinica Data into SAS

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Bringing OpenClinica Data into SAS

Semelhante a Bringing OpenClinica Data into SAS (20)

Bringing OpenClinica Data into SAS