Skip to contents

The get_ml_data_and_tuned_hyperparameters function processes input data and metadata to prepare data for random forest analysis. It includes steps for data preprocessing, optional imputation, rounding, error correction, and hyperparameter tuning.

Usage

get_ml_data_and_tuned_hyperparameters(
  Data,
  studyid_metadata,
  Impute = FALSE,
  Round = FALSE,
  reps,
  holdback,
  Undersample = FALSE,
  hyperparameter_tuning = FALSE,
  error_correction_method = NULL
)

Arguments

Data

data.frame. Input data frame containing scores, typically named scores_df.

studyid_metadata

data.frame. Metadata containing STUDYID values, used for joining with Data.

Impute

logical. Indicates whether to impute missing values in the dataset using random forest imputation. Default is FALSE.

Round

logical. Specifies whether to round specific numerical columns according to predefined rules. Default is FALSE.

reps

integer. Number of repetitions for cross-validation. A value of 0 skips repetition.

holdback

numeric. Fraction of data to hold back for testing. A value of 1 performs leave-one-out cross-validation.

Undersample

logical. Indicates whether to undersample the training data to balance the target classes. Default is FALSE.

hyperparameter_tuning

logical. Specifies whether to perform hyperparameter tuning for the random forest model. Default is FALSE.

error_correction_method

character. Specifies the method for error correction. Can be "Flip", "Prune", or NULL. Default is NULL.

Value

A list containing:

rfData

The final processed data after preprocessing and error correction.

best.m

The best mtry hyperparameter determined for the random forest model.

Examples

if (FALSE) { # \dontrun{
# Example usage:
Data <- scores_df
studyid_metadata <- read.csv("path/to/study_metadata.csv")
result <- get_ml_data_and_tuned_hyperparameters(
  Data = Data,
  studyid_metadata = studyid_metadata,
  Impute = TRUE,
  Round = TRUE,
  reps = 10,
  holdback = 0.75,
  Undersample = TRUE,
  hyperparameter_tuning = TRUE,
  error_correction_method = "Flip"
)
rfData <- result$rfData
best_mtry <- result$best.m
} # }