Geneartes a (list of) lgb.Dataset. Unsupported for clusters. Requires Matrix
and lightgbm
packages.
Laurae.lgb.dmat(data, label = NULL, missing = NA, save_names = NULL, save_keep = TRUE, clean_mem = FALSE, progress_bar = TRUE, ...)
data | Type: matrix or dgCMatrix or data.frame or data.table or filename, or potentially a list of any of them. When a list is provided, it generates the appropriate |
---|---|
label | Type: numeric, or a list of numeric. The label of associated rows in |
missing | Type: numeric. The value used to represent missing values in |
save_names | Type: character or NULL, or a list of characters. If names are provided, the generated |
save_keep | Type: logical, or a list of logicals. When names are provided, |
clean_mem | Type: logical. Whether the force garbage collection at the end of each matrix construction in order to reclaim RAM. Defaults to |
progress_bar | Type: logical. Whether to print a progress bar in case of list inputs. Defaults to |
... | More arguments to pass to |
The lgb.Dataset
library(Matrix) library(lightgbm) set.seed(0) # Generate lgb.Dataset from matrix random_mat <- matrix(runif(10000, 0, 1), nrow = 1000) random_labels <- runif(1000, 0, 1) lgb_from_mat <- Laurae.lgb.dmat(data = random_mat, label = random_labels, missing = NA) # Generate lgb.Dataset from data.frame random_df <- data.frame(random_mat) random_labels_2 <- runif(1000, 0, 1) lgb_from_df <- Laurae.lgb.dmat(data = random_df, label = random_labels, missing = NA) # Generate lgb.Dataset from respective elements of a list with progress bar # while keeping memory usage as low as theoretically possible random_list <- list(random_mat, random_df) random_labels_3 <- list(random_labels, random_labels_2) lgb_from_list <- Laurae.lgb.dmat(data = random_list, label = random_labels_3, missing = NA, progress_bar = TRUE, clean_mem = TRUE)#> | | 0 % ~calculating |+++++++++++++++++++++++++ | 50% ~00s |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 00s# Generate lgb.Dataset from respective elements of a list and keep only first # while keeping memory usage as low as theoretically possible lgb_from_list <- Laurae.lgb.dmat(data = random_list, label = random_labels_3, missing = NA, save_keep = c(TRUE, FALSE), clean_mem = TRUE)#> | | 0 % ~calculating |+++++++++++++++++++++++++ | 50% ~00s |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 00s