This function builds a random forest model using the randomForest
package, evaluates it through cross-validation,
and computes performance metrics such as sensitivity, specificity, and accuracy.
It optionally applies undersampling to handle class imbalance and supports custom settings for the number of predictors sampled at each split.
Arguments
- Data
Mandatory, data frame The input dataset, which must include a column named
Target_Organ
as the response variable.- Undersample
Optional, logical If
TRUE
, balances the dataset by undersampling the majority class. Default isFALSE
.- best.m
Optional, numeric or
NULL
Specifies the number of predictors sampled at each split. IfNULL
, the default value ofrandomForest
is used.- testReps
Mandatory, integer The number of cross-validation repetitions. Must be at least 2.
- Type
Mandatory, numeric Specifies the importance metric type:
1
for Mean Decrease Accuracy or2
for Gini.
Value
A list with the following elements:
performance_metrics
: A vector of aggregated performance metrics, including sensitivity, specificity, and accuracy.raw_results
: A list containing raw sensitivity, specificity, and accuracy values for each cross-validation fold.
Details
This function splits the input data into training and testing subsets based on the specified testReps
cross-validation folds.
If undersampling is enabled, the function balances the training set to reduce class imbalance.
A random forest model is trained on the training set, and predictions are evaluated on the test set. The results are aggregated to provide summary performance metrics.
Examples
if (FALSE) { # \dontrun{
# Load necessary libraries
library(randomForest)
library(caret)
# Example dataset
Data$Target_Organ <- ifelse(iris$Species == "setosa", 1, 0)
Data <- Data[, -5] # Remove Species column
# Run the function
results <- get_rf_model_with_cv(Data = iris,
Undersample = TRUE,
best.m = 2,
testReps = 5,
Type = 2)
# Print results
print(results$performance_metrics)
} # }