Title: | A 'dplyr' Interface for Crunch |
---|---|
Description: | In order to facilitate analysis of datasets hosted on the Crunch data platform <https://crunch.io/>, the 'crplyr' package implements 'dplyr' methods on top of the Crunch backend. The usual methods 'select', 'filter', 'group_by', 'summarize', and 'collect' are implemented in such a way as to perform as much computation on the server and pull as little data locally as possible. |
Authors: | Greg Freedman Ellis [aut, cre], Jonathan Keane [aut], Neal Richardson [aut], Mike Malecki [aut], Gordon Shotwell [aut], Aljaž Sluga [aut] |
Maintainer: | Greg Freedman Ellis <[email protected]> |
License: | LGPL (>= 3) |
Version: | 0.4.1 |
Built: | 2024-11-11 05:16:19 UTC |
Source: | https://github.com/crunch-io/crplyr |
Crunch Cubes can be expressed as a long data frame instead of a multidimensional array. In this form each dimension of the cube is a variable and the cube values are expressed as columns for each measure. This is useful both to better understand what each entry of a cube represents, and to work with the cube result using tidyverse tools.
as_cr_tibble(x, ...)
as_cr_tibble(x, ...)
x |
a CrunchCube |
... |
further arguments passed on to |
The cr_tibble
class is a subclass of tibble
that has extra metadata
to allow ggplot::autoplot()
to work. If you find that this extra
metadata is getting in the way, you can use as_tibble()
to get
a true tibble
.
The Crunch autoplot methods generate ggplots
that are tailored to various
Crunch objects. This allows you to visualize the object without bringing it
into memory. You can select between three families of plots, which will
attempt to accomodate the dimensionality of the plotted object. These plots
can be further extended and customized with other ggplot methods.
## S3 method for class 'DatetimeVariable' autoplot(object, ...) ## S3 method for class 'NumericVariable' autoplot(object, ...) ## S3 method for class 'CategoricalVariable' autoplot(object, ...) ## S3 method for class 'CategoricalArrayVariable' autoplot(object, ...) ## S3 method for class 'MultipleResponseVariable' autoplot(object, ...) ## S3 method for class 'CrunchCube' autoplot(object, ...) ## S3 method for class 'CrunchCubeCalculation' autoplot(object, plot_type = "dot", ...) ## S3 method for class 'tbl_crunch_cube' autoplot(object, plot_type = c("dot", "tile", "bar"), measure, ...)
## S3 method for class 'DatetimeVariable' autoplot(object, ...) ## S3 method for class 'NumericVariable' autoplot(object, ...) ## S3 method for class 'CategoricalVariable' autoplot(object, ...) ## S3 method for class 'CategoricalArrayVariable' autoplot(object, ...) ## S3 method for class 'MultipleResponseVariable' autoplot(object, ...) ## S3 method for class 'CrunchCube' autoplot(object, ...) ## S3 method for class 'CrunchCubeCalculation' autoplot(object, plot_type = "dot", ...) ## S3 method for class 'tbl_crunch_cube' autoplot(object, plot_type = c("dot", "tile", "bar"), measure, ...)
object |
A Crunch variable or cube aggregation |
... |
additional plotting arguments |
plot_type |
One of |
measure |
The measure you wish to plot. This will usually be |
A ggplot
object.
This function brings a Crunch dataset into memory so that you can
work with the data using R functions. Since this can create a long running
query it is recommended that you try to filter the dataset down as much as possible
before running collect()
.
## S3 method for class 'CrunchDataset' collect(x, ...) ## S3 method for class 'GroupedCrunchDataset' collect(x, ...)
## S3 method for class 'CrunchDataset' collect(x, ...) ## S3 method for class 'GroupedCrunchDataset' collect(x, ...)
x |
A Crunch Dataset |
... |
Other arguments passed to |
When collecting a grouped CrunchDataset, the grouping will be preserved.
A tbl_df
or grouped_df
## Not run: ds %>% group_by(cyl) %>% select(cyl, gear) %>% collect() ## End(Not run)
## Not run: ds %>% group_by(cyl) %>% select(cyl, gear) %>% collect() ## End(Not run)
This function applies a CrunchLogicalExpression
filter to a
CrunchDataset
. It's a "tidy" way of doing ds[ds$var == val,]
.
## S3 method for class 'CrunchDataset' filter(.data, ..., .preserve = FALSE)
## S3 method for class 'CrunchDataset' filter(.data, ..., .preserve = FALSE)
.data |
A |
... |
filter expressions |
.preserve |
Relevant when the |
.data
with the filter expressions applied.
## Not run: ds %>% select(cyl, gear) %>% filter(cyl > 4) %>% collect() ## End(Not run)
## Not run: ds %>% select(cyl, gear) %>% filter(cyl > 4) %>% collect() ## End(Not run)
This function is deprecated, use filter()
instead.
Applies a CrunchLogicalExpression
filter to a
CrunchDataset
. It's a "tidy" way of doing ds[ds$var == val,]
.
## S3 method for class 'CrunchDataset' filter_(.data, ..., .dots)
## S3 method for class 'CrunchDataset' filter_(.data, ..., .dots)
.data |
A |
... |
filter expressions |
.dots |
More dots! |
.data
with the filter expressions applied.
group_by()
sets grouping variables that affect what summarize()
computes.
ungroup()
removes any grouping variables.
## S3 method for class 'CrunchDataset' group_by(.data, ..., .add = FALSE) ## S3 method for class 'CrunchDataset' ungroup(x, ...)
## S3 method for class 'CrunchDataset' group_by(.data, ..., .add = FALSE) ## S3 method for class 'CrunchDataset' ungroup(x, ...)
.data |
For |
... |
references to variables to group by, passed to
|
.add |
Logical: add the variables in |
x |
For |
Note that group_by()
only supports grouping on variables that exist in the
dataset, not ones that are derived on the fly. dplyr::group_by()
supports
that by calling mutate()
internally, but mutate
is not yet supported in
crplyr
.
group_by()
returns a GroupedCrunchDataset
object (a
CrunchDataset
with grouping annotations). ungroup()
returns a
CrunchDataset
.
## Not run: ds %>% group_by(cyl) %>% select(cyl, gear) %>% collect() ## End(Not run)
## Not run: ds %>% group_by(cyl) %>% select(cyl, gear) %>% collect() ## End(Not run)
This is a subclass of crunch::CrunchDataset
that has a field for recording
"group_by" expressions.
## Not run: ds <- loadDataset("Your dataset name") class(ds) ## "CrunchDataset" grouped_ds <- group_by(ds, var1) class(grouped_ds) ## "GroupedCrunchDataset" ## End(Not run)
## Not run: ds <- loadDataset("Your dataset name") class(ds) ## "CrunchDataset" grouped_ds <- group_by(ds, var1) class(grouped_ds) ## "GroupedCrunchDataset" ## End(Not run)
Just a method that returns a nicer error message. mutate()
hasn't been
implemented yet. You can, however, derive expressions on the fly in
summarize()
.
## S3 method for class 'CrunchDataset' mutate(.data, ...)
## S3 method for class 'CrunchDataset' mutate(.data, ...)
.data |
A crunch Dataset |
... |
Other arguments, currently ignored |
This function uses "tidy select" methods of subsetting the columns of a
dataset. It's another way of doing ds[,vars]
.
## S3 method for class 'CrunchDataset' select(.data, ...)
## S3 method for class 'CrunchDataset' select(.data, ...)
.data |
A |
... |
names of variables in |
.data
with only the selected variables.
## Not run: ds %>% select(contains("ear")) %>% filter(gear > 4) %>% collect() ## End(Not run)
## Not run: ds %>% select(contains("ear")) %>% filter(gear > 4) %>% collect() ## End(Not run)
This is an alternate interface to crunch::crtabs()
that, in addition to
being "tidy", makes it easier to query multiple measures at the same time.
## S3 method for class 'CrunchDataset' summarise(.data, ...)
## S3 method for class 'CrunchDataset' summarise(.data, ...)
.data |
A |
... |
named aggregations to include in the resulting table. |
Note that while mutate()
is not generally supported in crplyr
, you can
derive expressions on the fly in summarize()
.
A tbl_crunch_cube
or cr_tibble
of results. This subclass
of tibble
allows ggplot2::autoplot
to work, but can get in the way
in some tidyverse operations. You may wish to convert to a tibble using
as_tibble()
.
## Not run: ds %>% filter(cyl == 6) %>% group_by(vs) %>% summarize(hp=mean(hp), sd_hp=sd(hp), count=n()) ## End(Not run)
## Not run: ds %>% filter(cyl == 6) %>% group_by(vs) %>% summarize(hp=mean(hp), sd_hp=sd(hp), count=n()) ## End(Not run)
Style ggplots according to Crunch style.
theme_crunch(base_size = 12, base_family = "sans")
theme_crunch(base_size = 12, base_family = "sans")
base_size |
Base text size |
base_family |
Base text family |
This function allows you to return the unweighted counts from a Crunch dataset
or grouped crunch dataset. It can only be used from within a summarise()
call. If your dataset is unweighted, then unweighted_n() is equivalent to n().
unweighted_n()
unweighted_n()
## Not run: ds %>% group_by(cyl) %>% summarize( raw_counts = unweighted_n(), mean = mean(wt) ) ## End(Not run)
## Not run: ds %>% group_by(cyl) %>% summarize( raw_counts = unweighted_n(), mean = mean(wt) ) ## End(Not run)