Skip to contents

This function trains a Random Forest model, computes the ROC curve, and calculates the AUC (Area Under the Curve). It allows various preprocessing options, such as imputation, rounding, undersampling, and hyperparameter tuning.

Usage

get_auc_curve_with_rf_model(
  Data = NULL,
  path_db = NULL,
  rat_studies = FALSE,
  studyid_metadata,
  fake_study = FALSE,
  use_xpt_file = FALSE,
  Round = FALSE,
  Impute = FALSE,
  best.m = NULL,
  reps,
  holdback,
  Undersample = FALSE,
  hyperparameter_tuning = FALSE,
  error_correction_method,
  output_individual_scores = TRUE,
  output_zscore_by_USUBJID = FALSE
)

Arguments

Data

A data frame containing the training data. If NULL, data will be fetched from the database.

path_db

A string representing the path to the SQLite database used to fetch data when Data is NULL.

rat_studies

Logical; whether to filter for rat studies. Defaults to FALSE.

studyid_metadata

A data frame containing metadata associated with study IDs.

fake_study

Logical; whether to use fake study IDs for data simulation. Defaults to FALSE.

use_xpt_file

Logical; whether to use an XPT file for input data. Defaults to FALSE.

Round

Logical; whether to round numerical values. Defaults to FALSE.

Impute

Logical; whether to perform imputation on missing values. Defaults to FALSE.

best.m

The 'mtry' hyperparameter for Random Forest. If NULL, it is determined by the function.

reps

A numeric value indicating the number of repetitions for cross-validation. Defaults to a numeric value.

holdback

Numeric; either 1 or a fraction value (e.g., 0.75) for holdback during cross-validation.

Undersample

Logical; whether to perform undersampling. Defaults to FALSE.

hyperparameter_tuning

Logical; whether to perform hyperparameter tuning. Defaults to FALSE.

error_correction_method

Character; one of "Flip", "Prune", or "None", specifying the method of error correction.

output_individual_scores

Logical; whether to output individual scores. Defaults to TRUE.

output_zscore_by_USUBJID

Logical; whether to output z-scores by subject ID. Defaults to FALSE.

Value

This function does not return any explicit value. It generates:

  • The AUC (Area Under the Curve) printed to the console.

  • A ROC curve plot with the calculated AUC value.

  • Various performance metrics (e.g., True Positive Rate, False Positive Rate), displayed in the plot.

Details

The function prepares data for training a Random Forest model by first fetching data from an SQLite database or generating synthetic data (if fake_study is TRUE). It processes the data using various options such as imputation, rounding, and undersampling. The model is trained using the Random Forest algorithm, and performance is evaluated via the ROC curve and AUC metric.

The function also allows for hyperparameter tuning and error correction. After training the model, predictions are made, and the AUC is calculated and visualized with a ROC curve plot.

See also

randomForest, ROCR

Examples

if (FALSE) { # \dontrun{
# Example 1: Using real data from the database
#This is a placeholder example. Replace the path with a valid database.
 get_auc_curve_with_rf_model(Data = NULL, path_db = "path/to/database.db", rat_studies = TRUE, reps = 10,
                             holdback = 0.75, error_correction_method = "Prune")

# Example 2: Using sample data (if applicable)
 get_auc_curve_with_rf_model(Data = sample_data, path_db = NULL, rat_studies = FALSE, reps = 5,
                             holdback = 0.5, error_correction_method = "CorrectMethod")
} # }