Random Forest Model with Cross-validation and Exclusion
Source:R/get_zone_exclusioned_rf_model_with_cv.R
get_zone_exclusioned_rf_model_with_cv.Rd
This function implements a Random Forest classification model with cross-validation and allows for undersampling, handling indeterminate predictions, and calculating various model performance metrics such as sensitivity, specificity, and accuracy. It tracks the proportion of indeterminate predictions and provides an aggregated performance summary across multiple test repetitions.
Usage
get_zone_exclusioned_rf_model_with_cv(
Data = NULL,
Undersample = FALSE,
best.m = NULL,
testReps,
indeterminateUpper,
indeterminateLower,
Type
)
Arguments
- Data
A data frame containing the features and the target variable
Target_Organ
to train the Random Forest model on.- Undersample
A logical value indicating whether to perform undersampling to balance the classes in the training data. Defaults to
FALSE
.- best.m
A numeric value representing the best number of variables (
mytry
) to use at each split in the Random Forest model. This can be manually set or determined through optimization.- testReps
An integer specifying the number of test repetitions. This must be at least 2, as the function relies on multiple test sets to assess the model performance.
- indeterminateUpper
A numeric value indicating the upper bound for the predicted probability to consider a prediction indeterminate. Predictions with probabilities within this range are marked as indeterminate.
- indeterminateLower
A numeric value indicating the lower bound for the predicted probability to consider a prediction indeterminate. Predictions with probabilities within this range are marked as indeterminate.
- Type
An integer indicating the type of feature importance to use in the Random Forest model. Typically,
1
for "Mean Decrease Accuracy" or2
for "Mean Decrease Gini".
Value
A list containing two components:
- performance_metrics
A vector with the aggregated performance metrics, including sensitivity, specificity, accuracy, and others, calculated across all test repetitions.
- raw_results
A list containing the raw performance metrics for each repetition, including sensitivity, specificity, and accuracy.