Parallel CSV Converter � parallel.csv • LauraeDS

Parallelizes the writing of separate CSV files (still sequential reading) in order to store them in fst format (also, overwrites fst::threads_fst. Requires data.table and fst packages.

parallel.csv(file, compress = 35, progress_bar = TRUE, clean_mem = FALSE,
  cl = NULL, max_threads = max(ifelse(is.null(cl), parallel::detectCores(),
  ifelse(!is.list(cl), round(parallel::detectCores()/cl),
  round(parallel::detectCores()/length(cl)))), 1), wkdir = NULL, ...)

Arguments

file	Type: vector of characters. Path to all files to read.
compress	Type: numeric. Compression rate to use. Defaults to `35`.
progress_bar	Type: logical. Whether to print a progress bar. Defaults to `TRUE`.
clean_mem	Type: logical. Whether the force garbage collection at the end of each file read in order to reclaim RAM. Defaults to `FALSE`.
cl	Type: cluster or integer. A parallel cluster for parallelized calls. Used only when `progress_bar = TRUE`. Writes to the cluster most of the variables (`compress`, `max_threads`, `clean_mem`, `wkdir`) and removes them at the end. When it is a number, creates and destroys a cluster with the specified number of parallel clusters. Defaults to `NULL`.
max_threads	Type: numeric. The maximum number of threads allowed to adapt `fst::threads_fst`. Make sure the result of `cl` cores multiplicated by `max_threads` is not bigger than the number of threads in your computer. Defaults to `max(ifelse(is.null(cl), parallel::detectCores(), ifelse(!is.list(cl), round(parallel::detectCores() / cl), round(parallel::detectCores() / length(cl)))), 1)`, which means at least 1 thread, and adjust automatically the number of threads depending on the number of cores per cluster. Note that it takes the rounded value, which might over and under allocate threads.
wkdir	Type: character. The working directory, when using a cluster. Defaults to `NULL`.
...	Other arguments to pass to `fst::write.fst`.

Value

The element or the list of fst file names.

Examples

# NOT RUN {
# Cannot pass CRAN checks. Disabled.
# Do it on your own files!
library(fst) # devtools::install_github("fstPackage/fst@e060e62")
library(data.table)
library(parallel)

parallel.csv(c("file_1.csv", "file_2.csv"), max_threads = 1, progress_bar = TRUE)
parallel.csv(paste0("file_", 1:100, ".csv"), max_threads = 1, progress_bar = TRUE, cl = 8)
# }

Parallel CSV Converter

Arguments

Value

Examples

Contents