Group-wise caching of operations on data frame.
flow_dfg(..., fn = NULL, fn_id = NULL, group_by = NULL, flow_options = get_flow_options())
... | Named arguments to pass to |
---|---|
fn | The function to apply to the data frame. It must accept a data
frame as the first argument. |
fn_id | Optional id to uniquely identify the function. By default, rflow functions reuse the cache if the same function is given. The id allows the user to suppress console messages and to explicitly indicate whether to reuse the old cache or create a new one. |
group_by | A character vector of column names. If provided, groups already present will be ignored. |
flow_options | List of options created using |
The flow object.
Function fn
will receive only the rows and groups changed;
it may drop some of the rows, but will not add any new rows.
The function fn
may return fewer or more columns or modify
existing columns as long it always returns a consistent schema
(i.e., the same column data types and names) for all calls.
The data frame df
passed to fn
will include two
additional columns: ..row_hash..
and ..group_hash..
that
must be returned as is in order to identify changes.
Arguments fn
, fn_id
and flow_options
, when provided,
must be named. Argument fn
must be always provided.
dfg_fn <- function(df) { df <- df %>% dplyr::mutate(Sepal.Length = Sepal.Length * 2) } dfg_fn2 <- function(df) { df <- df %>% dplyr::mutate(Petal.Length = Petal.Length * 3) } iris <- iris %>% dplyr::group_by(Species) dfg_flow <- flow_dfg(iris, fn = dfg_fn)#>collected_dfg <- dfg_flow %>% collect() # when a change in group is made, the flow object updates its state iris[1, "Species"] <- "virginica" dfg_flow <- flow_dfg(iris, fn = dfg_fn)#>collected_dfg <- dfg_flow %>% collect() # the flow element can also become input for another flow_dfg function # in order to allow multiple, chained computations collected_dfg2 <- dfg_flow %>% flow_dfg(fn = dfg_fn2, group_by = "Species") %>% collect()#>