Compute and Plot AUC Curve with Random Forest Model
Source:R/get_auc_curve_with_rf_model.R
get_auc_curve_with_rf_model.Rd
This function trains a Random Forest model, computes the ROC curve, and calculates the AUC (Area Under the Curve). It allows various preprocessing options, such as imputation, rounding, undersampling, and hyperparameter tuning.
Usage
get_auc_curve_with_rf_model(
Data = NULL,
path_db = NULL,
rat_studies = FALSE,
studyid_metadata,
fake_study = FALSE,
use_xpt_file = FALSE,
Round = FALSE,
Impute = FALSE,
best.m = NULL,
reps,
holdback,
Undersample = FALSE,
hyperparameter_tuning = FALSE,
error_correction_method,
output_individual_scores = TRUE,
output_zscore_by_USUBJID = FALSE
)
Arguments
- Data
A data frame containing the training data. If
NULL
, data will be fetched from the database.- path_db
A string representing the path to the SQLite database used to fetch data when
Data
isNULL
.- rat_studies
Logical; whether to filter for rat studies. Defaults to
FALSE
.- studyid_metadata
A data frame containing metadata associated with study IDs.
- fake_study
Logical; whether to use fake study IDs for data simulation. Defaults to
FALSE
.- use_xpt_file
Logical; whether to use an XPT file for input data. Defaults to
FALSE
.- Round
Logical; whether to round numerical values. Defaults to
FALSE
.- Impute
Logical; whether to perform imputation on missing values. Defaults to
FALSE
.- best.m
The 'mtry' hyperparameter for Random Forest. If
NULL
, it is determined by the function.- reps
A numeric value indicating the number of repetitions for cross-validation. Defaults to a numeric value.
- holdback
Numeric; either 1 or a fraction value (e.g., 0.75) for holdback during cross-validation.
- Undersample
Logical; whether to perform undersampling. Defaults to
FALSE
.- hyperparameter_tuning
Logical; whether to perform hyperparameter tuning. Defaults to
FALSE
.- error_correction_method
Character; one of "Flip", "Prune", or "None", specifying the method of error correction.
- output_individual_scores
Logical; whether to output individual scores. Defaults to
TRUE
.- output_zscore_by_USUBJID
Logical; whether to output z-scores by subject ID. Defaults to
FALSE
.
Value
This function does not return any explicit value. It generates:
The AUC (Area Under the Curve) printed to the console.
A ROC curve plot with the calculated AUC value.
Various performance metrics (e.g., True Positive Rate, False Positive Rate), displayed in the plot.
Details
The function prepares data for training a Random Forest model by first fetching data from an SQLite database
or generating synthetic data (if fake_study
is TRUE
). It processes the data using various options such
as imputation, rounding, and undersampling. The model is trained using the Random Forest algorithm, and
performance is evaluated via the ROC curve and AUC metric.
The function also allows for hyperparameter tuning and error correction. After training the model, predictions are made, and the AUC is calculated and visualized with a ROC curve plot.
Examples
if (FALSE) { # \dontrun{
# Example 1: Using real data from the database
#This is a placeholder example. Replace the path with a valid database.
get_auc_curve_with_rf_model(Data = NULL, path_db = "path/to/database.db", rat_studies = TRUE, reps = 10,
holdback = 0.75, error_correction_method = "Prune")
# Example 2: Using sample data (if applicable)
get_auc_curve_with_rf_model(Data = sample_data, path_db = NULL, rat_studies = FALSE, reps = 5,
holdback = 0.5, error_correction_method = "CorrectMethod")
} # }