Documentation: get_rf_model_with_cv
Your Name
2025-01-02
Source:vignettes/get_rf_model_with_cv.Rmd
get_rf_model_with_cv.Rmd
Introduction
The get_rf_model_with_cv
function implements a random
forest-based modeling pipeline with cross-validation to assess model
performance. It includes optional undersampling for handling imbalanced
data and provides detailed metrics for evaluating model accuracy.
Function Overview
get_rf_model_with_cv <- function(Data,
Undersample = FALSE,
best.m = NULL, # any numeric value or call function to get it
testReps, # testReps must be at least 2;
Type) {
...
}
Purpose
This function:
- Builds a random forest model using the
randomForest
package. - Performs cross-validation to evaluate model metrics.
- Optionally applies undersampling to balance datasets.
- Returns aggregated performance metrics.
Parameters
Parameter | Type | Description |
---|---|---|
Data |
Data Frame | Input dataset. Must include a Target_Organ column as
the response variable. |
Undersample |
Logical | If TRUE , balances the dataset by undersampling the
majority class. |
best.m |
Numeric/NULL | Number of predictors sampled at each split. If NULL ,
default is used. |
testReps |
Integer | Number of cross-validation folds (must be >= 2). |
Type |
Numeric | Type of importance metric (1 for Mean Decrease
Accuracy, 2 for Gini). |
Outputs
The function returns a list containing:
-
performance_metrics
: Aggregated performance metrics including sensitivity, specificity, and accuracy. -
raw_results
: Raw data of sensitivity, specificity, and accuracy for each cross-validation fold.
Cross-Validation Workflow
Data Preparation
- Splits data into training and testing subsets based on the specified
testReps
. - Optionally applies undersampling to balance the training set.
Example Usage
# Load necessary libraries
library(randomForest)
library(caret)
# Example dataset
data(Data)
Data$Target_Organ <- ifelse(iris$Species == "setosa", 1, 0)
# Run the function
results <- get_rf_model_with_cv(Data = iris[, -5],
Undersample = TRUE,
best.m = 2,
testReps = 5,
Type = 2)
# Print results
print(results$performance_metrics)