Get Representation Tree from Random Forest Model
Source:R/get_reprtree_from_rf_model.R
get_reprtree_from_rf_model.Rd
This function trains a Random Forest model on a provided dataset and generates a representation tree (ReprTree) from the trained model. It supports various preprocessing configurations, model hyperparameters, and sampling strategies, including random undersampling. The function also allows for error correction and hyperparameter tuning.
Usage
get_reprtree_from_rf_model(
Data = NULL,
path_db,
rat_studies = FALSE,
studyid_metadata = NULL,
fake_study = FALSE,
use_xpt_file = FALSE,
Round = FALSE,
Impute = FALSE,
reps,
holdback,
Undersample = FALSE,
hyperparameter_tuning = FALSE,
error_correction_method
)
Arguments
- Data
A data frame containing the dataset to train the Random Forest model. If
NULL
, data is fetched using theget_Data_formatted_for_ml_and_best.m
function.- path_db
A character string representing the path to the database used for fetching or processing the data.
- rat_studies
A logical flag indicating whether rat studies are used (default:
FALSE
).- studyid_metadata
A data frame containing metadata related to study IDs (default:
NULL
).- fake_study
A logical flag indicating whether to use fake study data (default:
FALSE
).- use_xpt_file
A logical flag indicating whether to use the XPT file format for data input (default:
FALSE
).- Round
A logical flag indicating whether to round the data before processing (default:
FALSE
).- Impute
A logical flag indicating whether to impute missing values in the data (default:
FALSE
).- reps
An integer specifying the number of repetitions to perform for cross-validation or resampling.
- holdback
A numeric value representing the fraction of data to hold back for testing.
- Undersample
A logical flag indicating whether undersampling should be applied to balance the dataset (default:
FALSE
).- hyperparameter_tuning
A logical flag indicating whether hyperparameter tuning should be performed (default:
FALSE
).- error_correction_method
A character string specifying the method for error correction. Must be one of
'Flip'
,'Prune'
, or'None'
.
Value
A plot of the first tree from the Random Forest model is displayed. The function does not return the ReprTree object explicitly, but it is generated and used for plotting.
Details
The function performs the following steps:
Data Preparation: If
Data
isNULL
, it is fetched using theget_Data_formatted_for_ml_and_best.m
function. Data is then split into training (70%) and testing (30%) sets. IfUndersample
isTRUE
, the training data is balanced using undersampling.Model Training: A Random Forest model is trained using the
randomForest::randomForest
function. The target variable isTarget_Organ
, and the model uses the best hyperparameter (best.m
). The number of trees is set to 500.ReprTree Generation: The
reprtree::ReprTree
function is used to generate the representation tree from the trained Random Forest model.Visualization: The first tree from the Random Forest model is plotted using the
reprtree::plot.getTree
function.