Get Random Forest Data and Tuned Hyperparameters
Source:R/get_ml_data_and_tuned_hyperparameters.R
get_ml_data_and_tuned_hyperparameters.Rd
The get_ml_data_and_tuned_hyperparameters
function processes input data and metadata to prepare data for
random forest analysis. It includes steps for data preprocessing, optional imputation, rounding,
error correction, and hyperparameter tuning.
Usage
get_ml_data_and_tuned_hyperparameters(
Data,
studyid_metadata,
Impute = FALSE,
Round = FALSE,
reps,
holdback,
Undersample = FALSE,
hyperparameter_tuning = FALSE,
error_correction_method = NULL
)
Arguments
- Data
data.frame. Input data frame containing scores, typically named
scores_df
.- studyid_metadata
data.frame. Metadata containing
STUDYID
values, used for joining withData
.- Impute
logical. Indicates whether to impute missing values in the dataset using random forest imputation. Default is
FALSE
.- Round
logical. Specifies whether to round specific numerical columns according to predefined rules. Default is
FALSE
.- reps
integer. Number of repetitions for cross-validation. A value of
0
skips repetition.- holdback
numeric. Fraction of data to hold back for testing. A value of
1
performs leave-one-out cross-validation.- Undersample
logical. Indicates whether to undersample the training data to balance the target classes. Default is
FALSE
.- hyperparameter_tuning
logical. Specifies whether to perform hyperparameter tuning for the random forest model. Default is
FALSE
.- error_correction_method
character. Specifies the method for error correction. Can be
"Flip"
,"Prune"
, orNULL
. Default isNULL
.
Value
A list containing:
- rfData
The final processed data after preprocessing and error correction.
- best.m
The best
mtry
hyperparameter determined for the random forest model.
Examples
if (FALSE) { # \dontrun{
# Example usage:
Data <- scores_df
studyid_metadata <- read.csv("path/to/study_metadata.csv")
result <- get_ml_data_and_tuned_hyperparameters(
Data = Data,
studyid_metadata = studyid_metadata,
Impute = TRUE,
Round = TRUE,
reps = 10,
holdback = 0.75,
Undersample = TRUE,
hyperparameter_tuning = TRUE,
error_correction_method = "Flip"
)
rfData <- result$rfData
best_mtry <- result$best.m
} # }