Using Define-XML for Dataset Design

Partner Content

In the past, sponsors submitting to FDA were required to submit a PDF describing their submission datasets. As we all know PDF is great for viewing on-screen or printing, but the information inside can't be interpreted by a computer. Enter CDISC's Define-XML model.

Define-XML standardizes how to describe datasets in a machine-readable manner. It can be used to define any tabular dataset structure, though it's primarily used to describe SDTM, ADaM, and SEND datasets for regulatory submission. FDA now requires that all submissions use Define-XML to describe their datasets.

If you're currently starting your Define-XML at the end of your study, you're missing many of the benefits it can bring to your end-to-end study process. This post provides an overview of how it can be used throughout your process to drive efficiencies.


Define SDTM, ADaM and SEND datasets upfront

Many organizations define their CRFs, collect data, and then think about converting to SDTM datasets. There are 2 problems with this process.

  1. They don't know if all relevant SDTM data is being collected during CRF design.
  2. They don't have a definition for what they want to submit, and so can't verify if the submission data is what they intended.

This can lead to incomplete data, protocol amendments, complex mapping to standardized CDISC datasets, increased QA, and a longer study process.

The solution is to define your study, end-to-end, right at the start. You’ll know before you even start collecting your data that your CRFs are correct. Then, they can easily be converted into submission datasets that’ll satisfy the regulator.

The first step is to define your submission datasets upfront, using Define-XML. The right dataset design software, like the Formedix clinical metadata repository and study automation platform, can help you rapidly define SDTM, SEND, and ADaM datasets and export their definition as Define-XML.


Check compliance of Define-XML dataset definitions to SDTM, SEND, and ADaM

You can verify the compliance of your submission dataset designs to CDISC standards before collecting any data, by running standard validation tools. Once you have the datasets defined, the next step is to define the mappings to them.



Mapping from EDC to submission datasets

Some EDC systems support exporting data in ODM format that matches your study design, however, most people use tools for working with tabular datasets. Datasets are still by far the most popular type of data export from an EDC system. If you have created your study from an ODM study specification then the datasets will be similar to the ODM, but they're not the same. When mapping your collected data to SDTM or SEND, you need to know what the datasets coming from your EDC system will look like.

The Formedix platform can predict these EDC export datasets and generate a Define-XML describing them. This enables you to define the mappings to your submission datasets, before collecting any data. If your CRFs are designed using CDASH then mapping should be easy.


Verify CRO datasets against your Define-XML specification

If you're using a CRO to generate your submission datasets, how do you know what they've delivered is correct? If you’ve defined your datasets upfront using Define-XML, you can automatically verify whether the delivered data conforms to your original specification. This greatly reduces the amount of QA resources required and will show up any problems much faster. No more having to manually check data against an Excel or PDF specification!


View as PDF or HTML

Define-XML is great for computers, but it's not something most people want to look at. Thankfully it can easily be converted into PDF or HTML, making it simple for anyone to understand.


Working with legacy data

Organizations often have lots of legacy data in XPT datasets for which they have no machine-readable metadata. There may not even be something akin to an Excel description of the data, or if there is it may be incomplete or incorrect. To help make better use of this data, it's possible to generate Define-XML metadata directly from the XPT datasets. This makes it easy to understand the content of the datasets and make appropriate use of them.

As you can see, Define-XML isn't just something you should be using because you have to. It can bring real benefits to your study process.


Formedix can help…

The Formedix clinical metadata repository and study automation platform has CDISC standards built-in to the end-to-end process. We keep up to date with CDISC and NCI standards, and we support older versions too. This keeps you compliant from start to finish. So you can be confident your submissions will be correct!



4 January, 2021