Group-wise caching of operations on data frame.

flow_dfg(..., fn = NULL, fn_id = NULL, group_by = NULL,
  flow_options = get_flow_options())

Arguments

...	Named arguments to pass to `fn`. The first argument must be a `data.frame` or `tibble`. Row names are not supported. If no `group_by` values are provided, the data frame must be grouped.
fn	The function to apply to the data frame. It must accept a data frame as the first argument. `fn` may also apply `group_by` operations if the data frame given as input is not already grouped.
fn_id	Optional id to uniquely identify the function. By default, rflow functions reuse the cache if the same function is given. The id allows the user to suppress console messages and to explicitly indicate whether to reuse the old cache or create a new one.
group_by	A character vector of column names. If provided, groups already present will be ignored.
flow_options	List of options created using `get_flow_options`.

Value

The flow object.

Details

Function fn will receive only the rows and groups changed; it may drop some of the rows, but will not add any new rows. The function fn may return fewer or more columns or modify existing columns as long it always returns a consistent schema (i.e., the same column data types and names) for all calls. The data frame df passed to fn will include two additional columns: ..row_hash.. and ..group_hash.. that must be returned as is in order to identify changes.

Arguments fn, fn_id and flow_options, when provided, must be named. Argument fn must be always provided.

Examples

dfg_fn <- function(df) {
    df <- df %>%
        dplyr::mutate(Sepal.Length = Sepal.Length * 2)
}

dfg_fn2 <- function(df) {
    df <- df %>%
        dplyr::mutate(Petal.Length = Petal.Length * 3)
}

iris <- iris %>%
    dplyr::group_by(Species)
dfg_flow <- flow_dfg(iris, fn = dfg_fn)
#> New cache: fn=dfg_fn / fn_id=1 / fn_key=aefdebf33429da8b
collected_dfg <- dfg_flow %>% collect()

# when a change in group is made, the flow object updates its state
iris[1, "Species"] <- "virginica"
dfg_flow <- flow_dfg(iris, fn = dfg_fn)
#> Reusing cache: fn=dfg_fn / fn_id=1 / fn_key=aefdebf33429da8b
collected_dfg <- dfg_flow %>% collect()

# the flow element can also become input for another flow_dfg function 
# in order to allow multiple, chained computations
collected_dfg2 <- dfg_flow %>%
   flow_dfg(fn = dfg_fn2, group_by = "Species") %>%
   collect()
#> New cache: fn=dfg_fn2 / fn_id=1 / fn_key=23dfb53ab256088c

Group-wise caching of operations on data frame.

Arguments

Value

Details

Examples

Contents