Generate Prediction Plot for Random Forest Model — get_prediction

This function performs model building and prediction using a random forest algorithm. It iterates over multiple test repetitions, training the model on the training data and predicting on the test data. After predictions are made, a histogram plot is generated to visualize the distribution of predicted probabilities for the outcome variable (LIVER).

Usage

get_prediction_plot(
  Data = NULL,
  path_db,
  rat_studies = FALSE,
  studyid_metadata = NULL,
  fake_study = FALSE,
  use_xpt_file = FALSE,
  Round = FALSE,
  Impute = FALSE,
  reps,
  holdback,
  Undersample = FALSE,
  hyperparameter_tuning = FALSE,
  error_correction_method,
  testReps
)

Arguments

Data: A data frame containing the dataset to use for training and testing. If NULL, the function will attempt to fetch and format the data from the database using get_Data_formatted_for_ml_and_best.m function.
path_db: A string indicating the path to the database that contains the dataset.
rat_studies: A logical flag indicating whether to use rat studies data. Defaults to FALSE.
studyid_metadata: A data frame containing metadata related to the study IDs. Defaults to NULL.
fake_study: A logical flag indicating whether to use fake study data. Defaults to FALSE.
use_xpt_file: A logical flag indicating whether to use an XPT file. Defaults to FALSE.
Round: A logical flag indicating whether to round the predictions. Defaults to FALSE.
Impute: A logical flag indicating whether to impute missing values. Defaults to FALSE.
reps: An integer specifying the number of repetitions for cross-validation.
holdback: A numeric value indicating the proportion of data to hold back for testing during cross-validation.
Undersample: A logical flag indicating whether to perform undersampling on the dataset to balance the classes. Defaults to FALSE.
hyperparameter_tuning: A logical flag indicating whether to perform hyperparameter tuning. Defaults to FALSE.
error_correction_method: A string specifying the error correction method to be used. Possible values are "Flip", "Prune", or "None".
testReps: An integer specifying the number of test repetitions for model evaluation.

Value

A ggplot object representing the histogram of predicted probabilities for the LIVER variable across test repetitions.

Details

The function works as follows:

If Data is NULL, the function fetches the data and the best model configuration by calling the get_Data_formatted_for_ml_and_best.m function.
The dataset is divided into training and test sets for each repetition (testReps).
If Undersample is enabled, undersampling is applied to balance the dataset.
A random forest model is trained on the training data and predictions are made on the test data.
The predictions are averaged over the test repetitions and a histogram is plotted to visualize the distribution of predicted probabilities for LIVER.

Examples

if (FALSE) { # \dontrun{
# Example function call
get_prediction_plot(
  path_db = "path_to_db",
  rat_studies = FALSE,
  reps = 10,
  holdback = 0.2,
  Undersample = TRUE,
  hyperparameter_tuning = FALSE,
  error_correction_method = "Flip",
  testReps = 5
)
} # }