Skip to contents

This function performs cross-validation with test repetitions on a random forest model, calculates feature importance using Gini importance, and returns the top n important features.


  Data = NULL,
  Undersample = FALSE,
  best.m = NULL,



A data frame containing the training data (rows as samples, columns as features). The first column is assumed to be the target variable.


A logical value indicating whether to apply under-sampling to balance the classes in the training data. Default is FALSE.


A numeric value representing the number of variables to consider at each split of the Random Forest model (or a function to determine this). Default is NULL.


A numeric value indicating the number of test repetitions (must be at least 2).


A numeric value indicating the type of importance to be calculated. 1 for Mean Decrease Accuracy and 2 for Mean Decrease Gini.


A numeric value indicating the number of top important features to return based on their importance scores.


A list containing:


A matrix of Gini importance scores for each feature across the different cross-validation iterations. The matrix has rows representing features and columns representing test iterations.


This function trains a Random Forest model using cross-validation with specified repetitions and calculates the feature importance using Gini importance scores. The function also supports optional under-sampling to balance the class distribution in the training set.

The function performs the following steps:

  • Initializes performance metric trackers.

  • Prepares the input data for cross-validation.

  • Performs cross-validation, where each repetition involves training the model on a subset of data and testing on the remaining data.

  • Optionally applies under-sampling to the training data.

  • Trains a Random Forest model on each fold and calculates Gini importance scores.

  • Aggregates and sorts the Gini importance scores to identify the top features.

  • Plots the importance of top features.


if (FALSE) { # \dontrun{
# Example of calling the function
result <- get_imp_features_from_rf_model_with_cv(
  Data = scores_df,
  Undersample = FALSE,
  best.m = 3,
  testReps = 5,
  Type = 2,
  nTopImportance = 10
} # }