--- title: "Variable Folders and Organization" description: "Variables within datasets are organized in folders. The crunch package provides tools for creating folders and moving variables among them." output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Variable Folders and Organization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- [Previous: array variables](array-variables.html) ```{r, results='hide', echo=FALSE, message=FALSE} library(crunch) options(width=120) ``` In the web application, variables in a dataset are displayed in a list on the left side of the screen. Typically, when you import a dataset, the variable list is flat, but it can be organized into an accordion-like hierarchy. The variable organizer in the Crunch GUI allows you to organize your variables visually, but you can also manage this metadata from R using the `crunch` package. ## File-system like This variable hierarchy can be thought of like a file system on your computer, with files (variables) organized into directories (folders). As such, the main functions you use to manage it are reminiscent of a file system. * `cd()`, changes directories, i.e. selects a folder * `mkdir()` makes a directory, i.e. creates a folder * `mv()` moves variables and folders to a different folder * `rmdir()` removes a directory, i.e. deletes a folder Like a file system, you can express the "path" to a folder as a string separated by a "/" delimiter, like this: ```r mkdir(ds, "Brands/Cars and Trucks/Domestic") ``` If your folder names should legitimately have a "/" in them, you can set a different character to be the path separator. See `?mkdir` or any of the other functions' help files for details. Paths can be expressed relative to the current object---a folder, or in this case, the dataset, which translates to its top-level `"/"` root folder in path specification---and the file system's special path segments `".."` (go up a level) and `"."` (this level) are also supported. We'll use those in examples below. You can also specify paths as a vector of path segments, like ```r mkdir(ds, c("Brands", "Cars and Trucks", "Domestic")) ``` which is equivalent to the previous example. One or the other way may be more convenient, depending on what you're trying to accomplish. These four functions all take a dataset or a folder as the first argument, and they return the same object passed to it, except for `cd`, which returns the selected folder. As such, they are designed to work with `magrittr`-style piping (`%>%`) for convenience in chaining together steps, though they don't require that you do. ## Viewing the folders To get started, let's pick up the dataset we used in the [array variables vignette](array-variables.html) and view its starting layout. We can do that by selecting the root folder ("/") and printing it ```r library(magrittr) ds %>% cd("/") %>% print() ``` (The `print()` isn't strictly necessary here as `cd` will return the folder and thus it will print by default, but we'll use different `print` arguments later, so it's included here both for explicitness and illustration.) It's flat---there are no folders here, only variables. If you're importing data from a `data.frame` or a file, like an SPSS file, this is where you'll begin. ## Creating folders Let's make some folders and move some variables into them. To start, I know that the demographic variables are at the back of the dataset, so let's make a "Demos" folder and move variables 21 to 37 into it: ```r ds %>% mkdir("Demos") %>% mv(21:37, "Demos") ``` Now when I print the top-level directory again, I see a "Demos" folder and don't see those demographic variables: ```r ds %>% cd("/") ``` `mv()` can reference variables or folders within a level in several ways. Numeric indices like we just did probably won't be the most common way you'll do it: names work just as well and are more transparent. Let's move the first variable, `perc_skipped`, into "Demos" as well ```r ds %>% mv("perc_skipped", "Demos") %>% cd("Demos") ## To print the folder contents ``` > A side note: although the last step of that chain was `cd()`, we haven't changed state in our R session. There is no "working folder" set globally. `cd()` is a function that returns a folder; if we had assigned the return from the function (pipeline) to some object, we could then pass that in to another function to "start" in that folder. Another way we can identify variables is by using the `dplyr`-like functions `starts_with`, `ends_with`, `matches`, and `contains`. Let's use `matches` to move all of the questions about Edward Snowden or Bradley (Chelsea) Manning to a folder for the topical questions in this week's survey: ```r ds %>% mkdir("This week") %>% mv(matches("manning|snowden", ignore.case = TRUE), "This week") ``` We can also select all variables in a folder using the `variables` function (or all folders within a folder using `folders`). Let's move all remaining variables from the top level folder to a folder called "Tracking questions". To do this, we do need to explicitly change to the top level folder. ```r ds %>% cd("/") mkdir("Tracking questions") %>% mv(., variables(.), "Tracking questions") ``` > (Curious about the "dot" notation? See the [magrittr docs](https://magrittr.tidyverse.org/articles/magrittr.html).) The reason we change to the top level folder here is that there is a subtle difference between passing `ds` to `mv()` versus `cd(ds, "/")`. Whatever object, dataset or folder, that is passed into `mv()` determines the scope from which the objects to move are selected. If you pass the dataset in, you can select any variables in the dataset, regardless of what folder they're in. If you pass in a folder, you're selecting just from that folder's contents. It can be convenient to find all variables that match some criteria across the whole dataset to move them, but sometimes we don't want that. In this case, we wanted only the variables sitting in the top level folder, not nested in other folders, so we wanted `variables(cd(ds, "/"))` and not `variables(ds)`. Now, our variable tree has some structure. Let's use `print(folder, depth = 1)` to see these folders and their contents one level deep: ```r ds %>% cd("/") %>% print(depth = 1) ``` ## Nested folders We can create folders within folders as well. In the "This week" folder, we have a set of questions about Edward Snowden. Let's nest them inside their own subfolder inside "This week": ```r ds %>% cd("This week") %>% mkdir("Snowden") %>% mv(matches("snowden", ignore.case = TRUE), "Snowden") %>% cd("..") %>% print(depth = 2) ``` Note how we used `".."` to change folders up a level, as you can in a file system . We did that just so we can print the folder structure at the top level (and to illustrate that you can specify relative paths :). You could also do this using the full path segments. `mkdir` will recursively make all path segments it needs in order to ensure that the target folder exists. ```{r, eval=FALSE} ds %>% mkdir("This week/Snowden") %>% mv(matches("snowden", ignore.case = TRUE), "This week/Snowden") %>% cd("/") %>% print(depth = 2) ``` ## Renaming folders and folder contents Folders themselves have names, which we can set with `setName()`: ```r ds %>% cd("Demos") %>% setName("Demographics") ``` We can also set the names of the objects contained in a folder with `setNames()`: ```r ds %>% cd("Demos") %>% setNames(c("Birth Year", "Gender", "Political Ideo. (3 category)", "Political Ideo. (7 category)", "Political Ideo. (7 category; other)", "Race", "Education", "Marital Status", "Phone", "Family Income", "Region", "State", "Weight", "Voter Registration (new)", "Is a voter?", "Voter Registration (old)", "Voter Registration")) ``` ## Ordering within folders Unlike files in a file system, variables within folders are ordered. Let's move "Demographics" to the end. One way to do that is with the `setOrder` function. This lets you provide a specific order, but it requires you to specify all of the folder's contents. Let's use that function to put "Tracking questions" first: ```r ds %>% cd("/") %>% setOrder(c("Tracking questions", "This week", "Demographics")) ``` ## Deleting folders The cleanest way to delete a folder is with `rmdir()`: ```r ds %>% rmdir("This week/Snowden") ``` This deletes the folder and all variables contained within it. [Next: transforming and deriving](derive.html)