Random Forest Model with Cross-validation and Exclusion

This function implements a Random Forest classification model with cross-validation and allows for undersampling, handling indeterminate predictions, and calculating various model performance metrics such as sensitivity, specificity, and accuracy. It tracks the proportion of indeterminate predictions and provides an aggregated performance summary across multiple test repetitions.

Usage

get_zone_exclusioned_rf_model_with_cv(
  scores_data_df,
  Undersample = FALSE,
  best.m = NULL,
  testReps,
  indeterminateUpper,
  indeterminateLower,
  Type
)

Arguments

Undersample: A logical value indicating whether to perform undersampling to balance the classes in the training data. Defaults to FALSE.
best.m: A numeric value representing the best number of variables (mytry) to use at each split in the Random Forest model. This can be manually set or determined through optimization.
testReps: An integer specifying the number of test repetitions. This must be at least 2, as the function relies on multiple test sets to assess the model performance.
indeterminateUpper: A numeric value indicating the upper bound for the predicted probability to consider a prediction indeterminate. Predictions with probabilities within this range are marked as indeterminate.
indeterminateLower: A numeric value indicating the lower bound for the predicted probability to consider a prediction indeterminate. Predictions with probabilities within this range are marked as indeterminate.
Type: An integer indicating the type of feature importance to use in the Random Forest model. Typically, 1 for "Mean Decrease Accuracy" or 2 for "Mean Decrease Gini".
Data: A data frame containing the features and the target variable Target_Organ to train the Random Forest model on.

Value

A list containing two components:

performance_metrics: A vector with the aggregated performance metrics, including sensitivity, specificity, accuracy, and others, calculated across all test repetitions.
raw_results: A list containing the raw performance metrics for each repetition, including sensitivity, specificity, and accuracy.

Usage

Arguments

Value

See also

Examples