Documentation: get_rf_model_with

Introduction

The get_rf_model_with_cv function implements a random forest-based modeling pipeline with cross-validation to assess model performance. It includes optional undersampling for handling imbalanced data and provides detailed metrics for evaluating model accuracy.

Function Overview

get_rf_model_with_cv <- function(Data,
                                 Undersample = FALSE,
                                 best.m = NULL, # any numeric value or call function to get it
                                 testReps, # testReps must be at least 2;
                                 Type) {
  ...
}

Purpose

This function:

Builds a random forest model using the randomForest package.
Performs cross-validation to evaluate model metrics.
Optionally applies undersampling to balance datasets.
Returns aggregated performance metrics.

Parameters

Parameter	Type	Description
`Data`	Data Frame	Input dataset. Must include a `Target_Organ` column as the response variable.
`Undersample`	Logical	If `TRUE`, balances the dataset by undersampling the majority class.
`best.m`	Numeric/NULL	Number of predictors sampled at each split. If `NULL`, default is used.
`testReps`	Integer	Number of cross-validation folds (must be >= 2).
`Type`	Numeric	Type of importance metric (`1` for Mean Decrease Accuracy, `2` for Gini).

Outputs

The function returns a list containing:

performance_metrics: Aggregated performance metrics including sensitivity, specificity, and accuracy.
raw_results: Raw data of sensitivity, specificity, and accuracy for each cross-validation fold.

Cross-Validation Workflow

Data Preparation

Splits data into training and testing subsets based on the specified testReps.
Optionally applies undersampling to balance the training set.

Model Training

Trains a random forest model using the randomForest package.

Prediction and Metrics Calculation

Predicts probabilities on the test set.
Computes metrics (sensitivity, specificity, accuracy, etc.) using the caret package.

Performance Summary

Aggregates performance metrics across cross-validation folds.

Example Usage

# Load necessary libraries
library(randomForest)
library(caret)

# Example dataset
data(Data)
Data$Target_Organ <- ifelse(iris$Species == "setosa", 1, 0)

# Run the function
results <- get_rf_model_with_cv(Data = iris[, -5], 
                                Undersample = TRUE, 
                                best.m = 2, 
                                testReps = 5, 
                                Type = 2)

# Print results
print(results$performance_metrics)

Conclusion

The get_rf_model_with_cv function is a powerful tool for evaluating random forest models with cross-validation, especially for datasets with class imbalance. Adjust parameters such as Undersample and best.m to optimize performance for your specific dataset.

Documentation: get_rf_model_with_cv

Your Name

2025-01-02