Documentation for 'get_compile_data' Function
Md Aminul Islam Prodhan
2025-01-02
Source:vignettes/get_compile_data.Rmd
get_compile_data.Rmd
Purpose
The get_compile_data
function retrieves and cleans study
data from the DM (Demographics) domain by applying multiple filtering
steps and compiles the remaining data into a cleaned format. First, it
removes recovery animals by filtering the DM data using information from
the DS (Disposition) domain. Additionally, if the study involves rats or
mice,the function further filters out toxicokinetic animals by excluding
USUBJIDs present in the Pharmacokinetic (PC) domain. These steps ensure
the data set excludes recovery animals and toxicokinetic (TK) animals,
focusing only onthe target population relevant to the study’s primary
analysis.This function supports data retrieval from both SQLite
databases and .xpt
files.
Function Parameters (Arguments)
Parameter | Type | Description | Mandatory/Default |
---|---|---|---|
studyid |
Character | The study ID number, which uniquely identifies the study within the database. | Mandatory |
path_db |
Character | The path to the database file. This could be a path to an SQLite
database or a directory containing .xpt files. |
Mandatory |
fake_study |
Boolean | Indicates if the study data has been generated using the
SENDsanitizer package. |
Optional (Default: FALSE) |
use_xpt_file |
Boolean | Specifies whether to use .xpt file format when dealing
with data generated by the SENDsanitizer package. |
Optional (Default: FALSE) |
Output
Returns a cleaned data.frame
with the following
columns:
STUDYID
USUBJID
Species
SEX
ARMCD
SETCD
The cleaned data is now ready to be used for further analysis.
Implementation Details
The get_compile_data
function leverages the following
steps to calculate the compile_data
data frame for a given
study:
Database Connection
-This function connects to a SQLite database or reads
.xpt
files specified by path_db
.
Data Fetching
The function retrieves data from the following SEND domains based on the input parameters:
-
DM
(Demographics): Provides animal-level information. -
DS
(Disposition): Identifies recovery animals using theDSDECOD
column. -
PC
(Pharmacokinetics): Excludes TK animals for rats and mice based onUSUBJID
. -
TX
(Treatment): Determines dose levels such as “vehicle” or “HD.”
Filtering Steps
Filtering Recovery Animals
Recovery animals are excluded by filtering the DM data based onDSDECOD
values in theDS
domain.Filtering Toxicokinetic (TK) Animals
For studies involving rats or mice, the function removes animals whoseUSUBJID
appears in the PC domain.Dose Selection
The function identifies and retains animals assigned to either the “vehicle” group or the “high-dose” (HD) group by applying dose-ranking logic from the TX domain, where “Control” groups are reclassified as “vehicle.”
Examples Usage
# Example usage with SQLite database
df <- get_compile_data(
studyid = "1234123",
path_db = "path/to/database.db"
)
# Example usage with .xpt files
df <- get_compile_data(
studyid = "1234123",
path_db = "path/to/files",
fake_study = TRUE,
use_xpt_file = TRUE
)
Required Libraries
This function requires the following R packages:
DBI
RSQLite
data.table
dplyr
haven
tidyr
stringr
##Notes
- The function assumes standard SEND domains and column names.
- For non-standard data, adjustments may be needed.
- Check your database or
.xpt
files to ensure compatibility with the function.