Creating an Events SDTM domain

Introduction

This article describes how to create an events SDTM domain using the {sdtm.oak} package. Examples are currently presented and tested in the context of the AE domain.

Before reading this article, it is recommended that users review the “Creating an Interventions Domain” article, which provides a detailed explanation of various concepts in {sdtm.oak}, such as oak_id_vars, condition_add, etc. It also offers guidance on which mapping algorithms or functions to use for different mappings and provides a more detailed explanation of how these mapping algorithms or functions work.

In this article, we will dive directly into programming and provide further explanation only where it is required.

Programming workflow

Repeat the above steps for different raw datasets before proceeding with the below steps.

Read in data

Read all the raw datasets into the environment. In this example, the raw dataset name is ae_raw. Users can read them from the {pharmaverseraw} package using the below code:

ae_raw <- pharmaverseraw::ae_raw
PATNUM FOLDER IT.AETERM AEOUTCOME AEDECOD IT.AESEV IT.AESER IT.AEREL AEDTCOL IT.AESTDAT IT.AEENDAT
701-1015 AE Application Site Erythema Not Recovered/not Resolved APPLICATION SITE ERYTHEMA Mild Adverse Event No Probably Related 01/16/2014 01/03/2014 NA
701-1015 AE Application Site Pruritus Not Recovered/not Resolved APPLICATION SITE PRURITUS Mild Adverse Event No Probably Related 01/16/2014 01/03/2014 NA
701-1015 AE Diarrhoea Recovered/Resolved DIARRHOEA Mild Adverse Event No Remote 01/16/2014 01/09/2014 01/11/2014
701-1023 AE Atrioventricular Block Second Degree Not Recovered/not Resolved ATRIOVENTRICULAR BLOCK SECOND DEGREE Mild Adverse Event No Possibly Related 08/27/2012 08/26/2012 NA
701-1023 AE Erythema Not Recovered/not Resolved ERYTHEMA Mild Adverse Event No Possibly Related 08/27/2012 08/07/2012 08/30/2012
701-1023 AE Erythema Not Recovered/not Resolved ERYTHEMA Moderate Adverse Event No Probably Related 08/27/2012 08/07/2012 NA
701-1023 AE Erythema Recovered/Resolved ERYTHEMA Mild Adverse Event No Possibly Related 09/02/2012 08/07/2012 08/30/2012
701-1028 AE Application Site Erythema Not Recovered/not Resolved APPLICATION SITE ERYTHEMA Mild Adverse Event No Possibly Related 08/01/2013 07/21/2013 NA
701-1028 AE Application Site Pruritus Not Recovered/not Resolved APPLICATION SITE PRURITUS Mild Adverse Event No Probably Related 08/14/2013 08/08/2013 NA
701-1034 AE Application Site Pruritus Not Recovered/not Resolved APPLICATION SITE PRURITUS Mild Adverse Event No Probably Related 09/25/2014 08/27/2014 NA

Create oak_id_vars

ae_raw <- ae_raw %>%
  generate_oak_id_vars(
    pat_var = "PATNUM",
    raw_src = "ae_raw"
  )
oak_id raw_source patient_number PATNUM FOLDER IT.AETERM AEOUTCOME AEDECOD
1 ae_raw 701-1015 701-1015 AE Application Site Erythema Not Recovered/not Resolved APPLICATION SITE ERYTHEMA
2 ae_raw 701-1015 701-1015 AE Application Site Pruritus Not Recovered/not Resolved APPLICATION SITE PRURITUS
3 ae_raw 701-1015 701-1015 AE Diarrhoea Recovered/Resolved DIARRHOEA
4 ae_raw 701-1023 701-1023 AE Atrioventricular Block Second Degree Not Recovered/not Resolved ATRIOVENTRICULAR BLOCK SECOND DEGREE
5 ae_raw 701-1023 701-1023 AE Erythema Not Recovered/not Resolved ERYTHEMA
6 ae_raw 701-1023 701-1023 AE Erythema Not Recovered/not Resolved ERYTHEMA
7 ae_raw 701-1023 701-1023 AE Erythema Recovered/Resolved ERYTHEMA
8 ae_raw 701-1028 701-1028 AE Application Site Erythema Not Recovered/not Resolved APPLICATION SITE ERYTHEMA
9 ae_raw 701-1028 701-1028 AE Application Site Pruritus Not Recovered/not Resolved APPLICATION SITE PRURITUS
10 ae_raw 701-1034 701-1034 AE Application Site Pruritus Not Recovered/not Resolved APPLICATION SITE PRURITUS

Read in the DM domain

Read in CT

Controlled Terminology is part of the SDTM specification and it is prepared by the user. In this example, the study controlled terminology name is sdtm_ct.csv. Users can read it from the package using the below code:

study_ct <- read.csv(system.file("raw_data/sdtm_ct.csv",
  package = "sdtm.oak"
))
codelist_code term_code term_value collected_value term_preferred_term term_synonyms
C66726 C25158 CAPSULE Capsule Capsule Dosage Form cap
C66726 C25394 PILL Pill Pill Dosage Form
C66726 C29167 LOTION Lotion Lotion Dosage Form
C66726 C42887 AEROSOL Aerosol Aerosol Dosage Form aer
C66726 C42944 INHALANT Inhalant Inhalant Dosage Form
C66726 C42946 INJECTION Injection Injectable Dosage Form
C66726 C42953 LIQUID Liquid Liquid Dosage Form
C66726 C42968 PATCH Patch patch Patch Dosage Form
C66726 C42998 TABLET Tablet Tablet Dosage Form tab
C66728 C25629 BEFORE Prior Prior

Map Topic Variable

The topic variable is mapped as a first step in the mapping process. It is the primary variable in the SDTM domain. The rest of the variables add further definition to the topic variable. In this example, the topic variable is AETERM. It is mapped from the raw dataset column IT.AETERM. The mapping logic is Map the collected value in the ae_raw dataset IT.AETERM variable to AE.AETERM.

This mapping does not involve any controlled terminology. The assign_no_ct function is used for mapping. Once the topic variable is mapped, the Qualifier, Identifier, and Timing variables can be mapped.

ae <-
  # Derive topic variable
  # Map AETERM using assign_no_ct, raw_var=IT.AETERM, tgt_var=AETERM
  assign_no_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AETERM",
    tgt_var = "AETERM",
    id_vars = oak_id_vars()
  )
oak_id raw_source patient_number AETERM
1 ae_raw 701-1015 Application Site Erythema
2 ae_raw 701-1015 Application Site Pruritus
3 ae_raw 701-1015 Diarrhoea
4 ae_raw 701-1023 Atrioventricular Block Second Degree
5 ae_raw 701-1023 Erythema
6 ae_raw 701-1023 Erythema
7 ae_raw 701-1023 Erythema
8 ae_raw 701-1028 Application Site Erythema
9 ae_raw 701-1028 Application Site Pruritus
10 ae_raw 701-1034 Application Site Pruritus

Map Rest of the Variables

The Qualifiers, Identifiers, and Timing Variables can be mapped in any order.

ae <- ae %>%
  # Map AEOUT using assign_ct, raw_var=AEOUTCOME, tgt_var=AEOUT
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "AEOUTCOME",
    tgt_var = "AEOUT",
    ct_spec = study_ct,
    ct_clst = "C66768",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESEV using assign_no_ct, raw_var=IT.AESEV, tgt_var=AESEV
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AESEV",
    tgt_var = "AESEV",
    ct_spec = study_ct,
    ct_clst = "C66769",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESER using assign_no_ct, raw_var=IT.AESER, tgt_var=AESER
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AESER",
    tgt_var = "AESER",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AEACN using assign_no_ct, raw_var=IT.AEACN, tgt_var=AEACN
  assign_no_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AEACN",
    tgt_var = "AEACN",
    id_vars = oak_id_vars()
  ) %>%
  # Map AEREL using assign_ct, raw_var=IT.AEREL, tgt_var=AEREL
  # User-added codelist is in the ct,
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AEREL",
    tgt_var = "AEREL",
    ct_spec = study_ct,
    ct_clst = "AEREL",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESCAN using assign_ct, raw_var=AESCAN, tgt_var=AESCAN
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "AESCAN",
    tgt_var = "AESCAN",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESCNO using assign_ct, raw_var=AESCNO, tgt_var=AESCNO
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "AESCNO",
    tgt_var = "AESCONG",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AEDIS using assign_ct, raw_var=AEDIS, tgt_var=AEDIS
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "AEDIS",
    tgt_var = "AESDISAB",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESDTH using assign_ct, raw_var=IT.AESDTH, tgt_var=AESDTH
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AESDTH",
    tgt_var = "AESDTH",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESHOSP using assign_ct, raw_var=IT.AESHOSP, tgt_var=AESHOSP
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AESHOSP",
    tgt_var = "AESHOSP",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESLIFE using assign_ct, raw_var=IT.AESLIFE, tgt_var=AESLIFE
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AESLIFE",
    tgt_var = "AESLIFE",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESOD using assign_ct, raw_var=AESOD, tgt_var=AESOD
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "AESOD",
    tgt_var = "AESOD",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  ) %>%
  # Map AEDTC using assign_datetime, raw_var=AEDTCOL
  assign_datetime(
    raw_dat = ae_raw,
    raw_var = "AEDTCOL",
    tgt_var = "AEDTC",
    raw_fmt = c("m/d/y")
  ) %>%
  # Map AESTDTC using assign_datetime, raw_var=IT.AESTDAT
  assign_datetime(
    raw_dat = ae_raw,
    raw_var = "IT.AESTDAT",
    tgt_var = "AESTDTC",
    raw_fmt = c("m/d/y"),
    id_vars = oak_id_vars()
  ) %>%
  # Map AEENDTC using assign_datetime, raw_var=IT.AEENDAT
  assign_datetime(
    raw_dat = ae_raw,
    raw_var = "IT.AEENDAT",
    tgt_var = "AEENDTC",
    raw_fmt = c("m/d/y"),
    id_vars = oak_id_vars()
  )

Repeat Map Topic and Map Rest

There is only one topic variable in this raw data source, and there are no additional topic variable mappings. Users can proceed to the next step. This is required only if there is more than one topic variable to map.

Create SDTM derived variables

The SDTM derived variables or any SDTM mapping that is applicable to all the records in the ae dataset produced in the previous step cam be created now.

ae <- ae %>%
  dplyr::mutate(
    STUDYID = ae_raw$STUDY,
    DOMAIN = "AE",
    USUBJID = paste0("01-", ae_raw$PATNUM),
    AELLT = ae_raw$AELLT,
    AELLTCD = ae_raw$AELLTCD,
    AEDECOD = ae_raw$AEDECOD,
    AEPTCD = ae_raw$AEPTCD,
    AEHLT = ae_raw$AEHLT,
    AEHLTCD = ae_raw$AEHLTCD,
    AEHLGT = ae_raw$AEHLGT,
    AEHLGTCD = ae_raw$AEHLGTCD,
    AEBODSYS = ae_raw$AEBODSYS,
    AEBDSYCD = ae_raw$AEBDSYCD,
    AESOC = ae_raw$AESOC,
    AESOCCD = ae_raw$AESOCCD,
    AETERM = toupper(AETERM)
  ) %>%
  derive_seq(
    tgt_var = "AESEQ",
    rec_vars = c("USUBJID", "AETERM")
  ) %>%
  derive_study_day(
    sdtm_in = .,
    dm_domain = dm,
    tgdt = "AESTDTC",
    refdt = "RFXSTDTC",
    study_day_var = "AESTDY"
  ) %>%
  derive_study_day(
    sdtm_in = .,
    dm_domain = dm,
    tgdt = "AEENDTC",
    refdt = "RFXENDTC",
    study_day_var = "AEENDY"
  ) %>%
  select(
    "STUDYID", "DOMAIN", "USUBJID", "AESEQ", "AETERM", "AELLT", "AELLTCD", "AEDECOD", "AEPTCD", "AEHLT", "AEHLTCD", "AEHLGT",
    "AEHLGTCD", "AEBODSYS", "AEBDSYCD", "AESOC", "AESOCCD", "AESEV", "AESER", "AEACN", "AEREL", "AEOUT", "AESCAN", "AESCONG",
    "AESDISAB", "AESDTH", "AESHOSP", "AESLIFE", "AESOD", "AEDTC", "AESTDTC", "AEENDTC", "AESTDY", "AEENDY"
  )
STUDYID DOMAIN USUBJID AESEQ AETERM AELLT AELLTCD AEDECOD AEPTCD AEHLT AEHLTCD AEHLGT AEHLGTCD AEBODSYS AEBDSYCD AESOC AESOCCD AESEV AESER AEACN AEREL AEOUT AESCAN AESCONG AESDISAB AESDTH AESHOSP AESLIFE AESOD AEDTC AESTDTC AEENDTC AESTDY AEENDY
CDISCPILOT01 AE 01-701-1015 1 APPLICATION SITE ERYTHEMA APPLICATION SITE REDNESS 10003058 APPLICATION SITE ERYTHEMA NA HLT_0617 NA HLGT_0152 NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS 10018065 MILD N NA PROBABLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2014-01-16 2014-01-03 NA 2 NA
CDISCPILOT01 AE 01-701-1015 2 APPLICATION SITE PRURITUS APPLICATION SITE ITCHING 10003047 APPLICATION SITE PRURITUS NA HLT_0317 NA HLGT_0338 NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS 10018065 MILD N NA PROBABLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2014-01-16 2014-01-03 NA 2 NA
CDISCPILOT01 AE 01-701-1015 3 DIARRHOEA DIARRHEA 10012727 DIARRHOEA NA HLT_0148 NA HLGT_0588 NA GASTROINTESTINAL DISORDERS NA GASTROINTESTINAL DISORDERS 10017947 MILD N NA REMOTE RECOVERED/RESOLVED No N N N N N No 2014-01-16 2014-01-09 2014-01-11 8 -172
CDISCPILOT01 AE 01-701-1023 1 ATRIOVENTRICULAR BLOCK SECOND DEGREE AV BLOCK SECOND DEGREE 10003851 ATRIOVENTRICULAR BLOCK SECOND DEGREE NA HLT_0415 NA HLGT_0086 NA CARDIAC DISORDERS NA CARDIAC DISORDERS 10007541 MILD N NA POSSIBLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2012-08-27 2012-08-26 NA 22 NA
CDISCPILOT01 AE 01-701-1023 2 ERYTHEMA ERYTHEMA 10015150 ERYTHEMA NA HLT_0284 NA HLGT_0192 NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS 10040785 MILD N NA POSSIBLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2012-08-27 2012-08-07 2012-08-30 3 -2
CDISCPILOT01 AE 01-701-1023 3 ERYTHEMA LOCALIZED ERYTHEMA 10024781 ERYTHEMA NA HLT_0284 NA HLGT_0192 NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS 10040785 MODERATE N NA PROBABLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2012-08-27 2012-08-07 NA 3 NA
CDISCPILOT01 AE 01-701-1023 4 ERYTHEMA ERYTHEMA 10015150 ERYTHEMA NA HLT_0284 NA HLGT_0192 NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS 10040785 MILD N NA POSSIBLE RECOVERED/RESOLVED No N N N N N No 2012-09-02 2012-08-07 2012-08-30 3 -2
CDISCPILOT01 AE 01-701-1028 1 APPLICATION SITE ERYTHEMA APPLICATION SITE ERYTHEMA 10003041 APPLICATION SITE ERYTHEMA NA HLT_0617 NA HLGT_0152 NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS 10018065 MILD N NA POSSIBLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2013-08-01 2013-07-21 NA 3 NA
CDISCPILOT01 AE 01-701-1028 2 APPLICATION SITE PRURITUS APPLICATION SITE ITCHING 10003047 APPLICATION SITE PRURITUS NA HLT_0317 NA HLGT_0338 NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS 10018065 MILD N NA PROBABLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2013-08-14 2013-08-08 NA 21 NA
CDISCPILOT01 AE 01-701-1034 1 APPLICATION SITE PRURITUS APPLICATION SITE ITCHING 10003047 APPLICATION SITE PRURITUS NA HLT_0317 NA HLGT_0338 NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS 10018065 MILD N NA PROBABLE NOT RECOVERED/NOT RESOLVED No N N N N N No 2014-09-25 2014-08-27 NA 58 NA

Add Labels and Attributes

Yet to be developed.