Skip to contents

This function builds a random forest model using the randomForest package, evaluates it through cross-validation, and computes performance metrics such as sensitivity, specificity, and accuracy. It optionally applies undersampling to handle class imbalance and supports custom settings for the number of predictors sampled at each split.

Usage

get_rf_model_with_cv(Data, Undersample = FALSE, best.m = NULL, testReps, Type)

Arguments

Data

Mandatory, data frame The input dataset, which must include a column named Target_Organ as the response variable.

Undersample

Optional, logical If TRUE, balances the dataset by undersampling the majority class. Default is FALSE.

best.m

Optional, numeric or NULL Specifies the number of predictors sampled at each split. If NULL, the default value of randomForest is used.

testReps

Mandatory, integer The number of cross-validation repetitions. Must be at least 2.

Type

Mandatory, numeric Specifies the importance metric type: 1 for Mean Decrease Accuracy or 2 for Gini.

Value

A list with the following elements:

  • performance_metrics: A vector of aggregated performance metrics, including sensitivity, specificity, and accuracy.

  • raw_results: A list containing raw sensitivity, specificity, and accuracy values for each cross-validation fold.

Details

This function splits the input data into training and testing subsets based on the specified testReps cross-validation folds. If undersampling is enabled, the function balances the training set to reduce class imbalance. A random forest model is trained on the training set, and predictions are evaluated on the test set. The results are aggregated to provide summary performance metrics.

Examples

if (FALSE) { # \dontrun{
# Load necessary libraries
library(randomForest)
library(caret)

# Example dataset

Data$Target_Organ <- ifelse(iris$Species == "setosa", 1, 0)
Data <- Data[, -5]  # Remove Species column

# Run the function
results <- get_rf_model_with_cv(Data = iris,
                               Undersample = TRUE,
                                best.m = 2,
                                testReps = 5,
                                Type = 2)

# Print results
print(results$performance_metrics)
} # }