Skip to contents

Introduction

The get_rf_model_with_cv function implements a random forest-based modeling pipeline with cross-validation to assess model performance. It includes optional undersampling for handling imbalanced data and provides detailed metrics for evaluating model accuracy.

Function Overview

get_rf_model_with_cv <- function(Data,
                                 Undersample = FALSE,
                                 best.m = NULL, # any numeric value or call function to get it
                                 testReps, # testReps must be at least 2;
                                 Type) {
  ...
}

Purpose

This function:

  • Builds a random forest model using the randomForest package.
  • Performs cross-validation to evaluate model metrics.
  • Optionally applies undersampling to balance datasets.
  • Returns aggregated performance metrics.

Parameters

Parameter Type Description
Data Data Frame Input dataset. Must include a Target_Organ column as the response variable.
Undersample Logical If TRUE, balances the dataset by undersampling the majority class.
best.m Numeric/NULL Number of predictors sampled at each split. If NULL, default is used.
testReps Integer Number of cross-validation folds (must be >= 2).
Type Numeric Type of importance metric (1 for Mean Decrease Accuracy, 2 for Gini).

Outputs

The function returns a list containing:

  1. performance_metrics: Aggregated performance metrics including sensitivity, specificity, and accuracy.
  2. raw_results: Raw data of sensitivity, specificity, and accuracy for each cross-validation fold.

Cross-Validation Workflow

Data Preparation

  • Splits data into training and testing subsets based on the specified testReps.
  • Optionally applies undersampling to balance the training set.

Model Training

  • Trains a random forest model using the randomForest package.

Prediction and Metrics Calculation

  • Predicts probabilities on the test set.
  • Computes metrics (sensitivity, specificity, accuracy, etc.) using the caret package.

Performance Summary

  • Aggregates performance metrics across cross-validation folds.

Example Usage

# Load necessary libraries
library(randomForest)
library(caret)

# Example dataset
data(Data)
Data$Target_Organ <- ifelse(iris$Species == "setosa", 1, 0)

# Run the function
results <- get_rf_model_with_cv(Data = iris[, -5], 
                                Undersample = TRUE, 
                                best.m = 2, 
                                testReps = 5, 
                                Type = 2)

# Print results
print(results$performance_metrics)

Conclusion

The get_rf_model_with_cv function is a powerful tool for evaluating random forest models with cross-validation, especially for datasets with class imbalance. Adjust parameters such as Undersample and best.m to optimize performance for your specific dataset.