Generate Prediction Plot for Random Forest Model
Source:R/get_prediction_plot.R
get_prediction_plot.Rd
This function performs model building and prediction using a random forest algorithm. It iterates over multiple test repetitions, training the model on the training data and predicting on the test data. After predictions are made, a histogram plot is generated to visualize the distribution of predicted probabilities for the outcome variable (LIVER
).
Usage
get_prediction_plot(
Data = NULL,
path_db,
rat_studies = FALSE,
studyid_metadata = NULL,
fake_study = FALSE,
use_xpt_file = FALSE,
Round = FALSE,
Impute = FALSE,
reps,
holdback,
Undersample = FALSE,
hyperparameter_tuning = FALSE,
error_correction_method,
testReps
)
Arguments
- Data
A data frame containing the dataset to use for training and testing. If
NULL
, the function will attempt to fetch and format the data from the database usingget_Data_formatted_for_ml_and_best.m
function.- path_db
A string indicating the path to the database that contains the dataset.
- rat_studies
A logical flag indicating whether to use rat studies data. Defaults to
FALSE
.- studyid_metadata
A data frame containing metadata related to the study IDs. Defaults to
NULL
.- fake_study
A logical flag indicating whether to use fake study data. Defaults to
FALSE
.- use_xpt_file
A logical flag indicating whether to use an XPT file. Defaults to
FALSE
.- Round
A logical flag indicating whether to round the predictions. Defaults to
FALSE
.- Impute
A logical flag indicating whether to impute missing values. Defaults to
FALSE
.- reps
An integer specifying the number of repetitions for cross-validation.
- holdback
A numeric value indicating the proportion of data to hold back for testing during cross-validation.
- Undersample
A logical flag indicating whether to perform undersampling on the dataset to balance the classes. Defaults to
FALSE
.- hyperparameter_tuning
A logical flag indicating whether to perform hyperparameter tuning. Defaults to
FALSE
.- error_correction_method
A string specifying the error correction method to be used. Possible values are "Flip", "Prune", or "None".
- testReps
An integer specifying the number of test repetitions for model evaluation.
Value
A ggplot
object representing the histogram of predicted probabilities for the LIVER
variable across test repetitions.
Details
The function works as follows:
If
Data
isNULL
, the function fetches the data and the best model configuration by calling theget_Data_formatted_for_ml_and_best.m
function.The dataset is divided into training and test sets for each repetition (
testReps
).If
Undersample
is enabled, undersampling is applied to balance the dataset.A random forest model is trained on the training data and predictions are made on the test data.
The predictions are averaged over the test repetitions and a histogram is plotted to visualize the distribution of predicted probabilities for
LIVER
.