Formatting data as a list is sometimes necessary. However, retrieving this kind of non-tabular information for analysis can be challenging. This workshop will introduce students to the motivations and techniques for storing and parsing list objects in R. Some familiarity with R will be helpful.

Introduction

Compared to the data frame, vector and matrix, the list is under-represented in many introductory R tutorials. This likely has less do with the relative importance of lists, and more to do with their potential complexity. However, an understanding of how to create, curate and manipulate objects of this type can prove immensely useful.

The list is one of the most versatile data types in R thanks to its ability to accommodate heterogenous elements. A single list can contain multiple elements, regardless of their types or whether these elements contain further nested data. So you can have a list of a list of a list of a list of a list …

Garrett Grolemund and Hadley Wickham’s R For Data Science includes a section on lists. They use a helpful simile for the list as a shaker filled with packets of pepper1. To retrieve individual “grains” of pepper, you’d have to first access the shaker … then the packet inside the shaker … then the pepper inside the packet.

Still confused? Here’s another way of thinking about it: the list is like a movie. Each movie has a cast, crew, budget, script, etc. These elements may have different dimensions (more cast members than crew) and be of different types (budget is a number, script is a series of characters), yet they are all part of the same movie.

We’ll use a brief review of R basics as a vehicle to get started with lists.

R Basics

To do anything interesting in R, you must assign values (or experessions that produce values) to objects. The syntax for assignment is the name of the object followed by a <- operator and the expression to be evaluated.

x <- 3
y <- 2 + x

Although the two are mostly equivalent, the <- should be used in place of the = to improve code legibility and reduce potential mistakes … we’ll see why this is important when we start creating “named” lists.

Every object has a class, which can be accessed using the class() function. Certain functions are specific to a given class. Other functions can behave differently depending on the class of the input. The “list” class is what we are interested in for this tutorial.

One of the most fundamental types of objects is the vector. A vector is a series of elements from 1 to n. Each element can be accessed by an identifier (“index”) using square brackets y[1]. We will make extensive use of a modifed version of this syntax in order to manipulate list items.

Creating Lists

The most direct way to create a list is with the list() function.

slamwins <- list(17,14,14,12,11)

To confirm that the object we’ve created is indeed a “list” we can use class() as described above.

class(slamwins)
## [1] "list"

OK. Let’s see what a list looks like as printed output …

slamwins
## [[1]]
## [1] 17
## 
## [[2]]
## [1] 14
## 
## [[3]]
## [1] 14
## 
## [[4]]
## [1] 12
## 
## [[5]]
## [1] 11

Indexing Lists

The printed output above isn’t pretty, but it does include some hints as to how we can isolate specific elements of the list. In this case there are double square brackets (e.g. [[1]]) as well as single square brackets (e.g. [1]). As with vectors, data frames and matrices, the bracket notation is used for indexing. However, a list can have mulitple levels of indices. The value in the double brackets represents the number of the parent element in the list. The value in the single brackets represents the number of the element in that parent element of the list. We can chain this notation together to access granular parts of our list.

slamwins[[2]][1]
## [1] 14

If we’d prefer a more explicit way to access elements of a list, then we can give them names. When given a list as an argument, the names() function can let you assign a character vector of the same length as the list as the names for each corresponding element.

names(slamwins) <- c("Federer", "Sampras", "Nadal", "Djokovic", "Borg")
slamwins
## $Federer
## [1] 17
## 
## $Sampras
## [1] 14
## 
## $Nadal
## [1] 14
## 
## $Djokovic
## [1] 12
## 
## $Borg
## [1] 11

Another way to set names to is to do so while creating the list.

slamwins <- list(Federer = 17, Sampras = 14, Nadal = 14, Djokovic = 12, Borg = 11)
slamwins
## $Federer
## [1] 17
## 
## $Sampras
## [1] 14
## 
## $Nadal
## [1] 14
## 
## $Djokovic
## [1] 12
## 
## $Borg
## [1] 11

With our list named now we can use the $ operator to extract specific values by key.

slamwins$Federer
## [1] 17
# federer has ? more titles than borg
slamwins$Federer - slamwins$Borg
## [1] 6

The example above could be consider a minimal viable list … there’s a single level of named elements, which just as easily could have been stored as a vector. Let’s add another layer of data nested into our list object.

slamwins <- 
    list(
        Federer = 
            list(
                AUS = 4, 
                FR = 1,
                WIM = 7,
                US = 5),
        Sampras = 
            list(
                AUS = 2,
                FR = 0,
                WIM = 7,
                US = 5),
        Nadal = 
            list(
                AUS = 1,
                FR = 9,
                WIM = 2,
                US = 2),
        Djokovic = 
            list(
                AUS = 6,
                FR = 1,
                WIM = 3,
                US = 2),
        Borg = 
            list(
                AUS = 0,
                FR = 6,
                WIM = 5,
                US = 0)
    )

In this case we have created a named list of 5 named lists each of which has 5 named values.

But wait … we’re missing something … we have the number of slam wins by event but what about the total number of wins per player?

Editing Lists

One way to solve the problem we’re encountering would be to use the indexing syntax discussed earlier to match our “totals” with the appropriate list item. That would basically amount to using a for loop:

totals <- c(17, 14, 14, 12, 11)

for (i in 1:length(slamwins)) {
    
    slamwins[[i]]$Total <- totals[i]

}

slamwins
## $Federer
## $Federer$AUS
## [1] 4
## 
## $Federer$FR
## [1] 1
## 
## $Federer$WIM
## [1] 7
## 
## $Federer$US
## [1] 5
## 
## $Federer$Total
## [1] 17
## 
## 
## $Sampras
## $Sampras$AUS
## [1] 2
## 
## $Sampras$FR
## [1] 0
## 
## $Sampras$WIM
## [1] 7
## 
## $Sampras$US
## [1] 5
## 
## $Sampras$Total
## [1] 14
## 
## 
## $Nadal
## $Nadal$AUS
## [1] 1
## 
## $Nadal$FR
## [1] 9
## 
## $Nadal$WIM
## [1] 2
## 
## $Nadal$US
## [1] 2
## 
## $Nadal$Total
## [1] 14
## 
## 
## $Djokovic
## $Djokovic$AUS
## [1] 6
## 
## $Djokovic$FR
## [1] 1
## 
## $Djokovic$WIM
## [1] 3
## 
## $Djokovic$US
## [1] 2
## 
## $Djokovic$Total
## [1] 12
## 
## 
## $Borg
## $Borg$AUS
## [1] 0
## 
## $Borg$FR
## [1] 6
## 
## $Borg$WIM
## [1] 5
## 
## $Borg$US
## [1] 0
## 
## $Borg$Total
## [1] 11

There are a couple of potential issues with this code. The main thing is that we need to know what the totals are ahead of time. It would be a lot better to calculate those dynamically in case our underlying data changes … or in case we’re performing a calculation that’s not as simple as a sum. Another problem with this approach is that it’s implemented with a for loop, which is a construct that works when programming R but can be problematic2.

Enter the “apply” functions …

For this lesson, the two most relevant members of this family of functions are lapply() and sapply(), both of which allow you to pass other functions to each element of a list.

Before we start working with these functions, we need to restore our list the state it was in before we ran the loop to add the sums for each element. Assigning an element as NULL effectively deletes that element from the list.

for (i in 1:length(slamwins)) {
    
    slamwins[[i]]$Total <- NULL
    
}

And because he have nested data (lists within lists within lists …) we also need to understand how to use unlist() in order to apply our functions appropriately. Unlist is simply returns a “flat” version of all of the elements in the list as a vector. You can specify this to be recursive (i.e. flatten out all lists of lists) and to either retain or discard any named identifiers you have for your list.

In this context, we’ll use unlist() in conjunction with lapply() to reduce the complexity of our original list.

lapply(slamwins, unlist)
## $Federer
## AUS  FR WIM  US 
##   4   1   7   5 
## 
## $Sampras
## AUS  FR WIM  US 
##   2   0   7   5 
## 
## $Nadal
## AUS  FR WIM  US 
##   1   9   2   2 
## 
## $Djokovic
## AUS  FR WIM  US 
##   6   1   3   2 
## 
## $Borg
## AUS  FR WIM  US 
##   0   6   5   0

The lapply() function will go to each element in the highest level of the list, and perform an arbitrary action. In this case, we’ve “unlisted” each of the player lists in our slamwins object. It is important to understand that lapply() always returns a list. So essentially we’ve just created another list, which we could then use within another lapply() call.

lapply(lapply(slamwins, unlist), sum)
## $Federer
## [1] 17
## 
## $Sampras
## [1] 14
## 
## $Nadal
## [1] 14
## 
## $Djokovic
## [1] 12
## 
## $Borg
## [1] 11

Now that we’ve figured out how to calculate the values we’re interested in, we just need to append them to the original list. One of the keys here is appreciating that lapply() can take any function (including one that we write … an “anonymous function”3) and use that operation on each element in the list. Another point worth noting is that the c() function works on lists. Most introduction to R tutorials include examples of using c() to create a vector, and it works very similarly for lists. Essentially it appends either a single item or a list of items onto the list.

slamwins <- lapply(lapply(slamwins, unlist), function(x) c(x, Total = sum(x)))
slamwins
## $Federer
##   AUS    FR   WIM    US Total 
##     4     1     7     5    17 
## 
## $Sampras
##   AUS    FR   WIM    US Total 
##     2     0     7     5    14 
## 
## $Nadal
##   AUS    FR   WIM    US Total 
##     1     9     2     2    14 
## 
## $Djokovic
##   AUS    FR   WIM    US Total 
##     6     1     3     2    12 
## 
## $Borg
##   AUS    FR   WIM    US Total 
##     0     6     5     0    11

Converting Lists

Using the subsetting and manipulation features above we can perform a wide variety of manipulations on our list object. But ultimately (especially if you’re familiar with the “Tidyverse” approach to using R) it may be helpful to cast list data in a tabular format … a data frame.

as.data.frame(slamwins)
##       Federer Sampras Nadal Djokovic Borg
## AUS         4       2     1        6    0
## FR          1       0     9        1    6
## WIM         7       7     2        3    5
## US          5       5     2        2    0
## Total      17      14    14       12   11
datmat <- do.call(rbind, slamwins)
datdf <- as.data.frame(datmat, row.names = FALSE)
datdf$player <- row.names(datmat)
datdf
##   AUS FR WIM US Total   player
## 1   4  1   7  5    17  Federer
## 2   2  0   7  5    14  Sampras
## 3   1  9   2  2    14    Nadal
## 4   6  1   3  2    12 Djokovic
## 5   0  6   5  0    11     Borg

Lists “In The Wild”

The above is a contrived example. In practice, you’re much more likely to encounter lists written by other people (or applications) than to code out a list of your own. The example data we’ll use will be pulled from an Application Programming Interface (API) for the github.com website4. Like many other wep APIs, the data comes out in JavaScript Object Notation (JSON). JSON is a format for storing and transmitting “semi-structured” data5. Keys and values are paired together to facilitate parsing6. When read into R, JSON is interpreted as a list.

Example

Github is a platform for sharing, storing and managing code. Projects can be defined in a “repository” structure. The example that follows will look at repositories for a single user: Hadley Wickham.

To read the data into R, we can use the fromJSON() function the jsonlite package. For this example, we can pull each page of results (in this case, we know a priori that there are two pages) and make sure to pass the simplifyVector = FALSE argument after the url.

library(jsonlite)
had1 <- fromJSON("https://api.github.com/users/hadley/repos?page=1&per_page=100", simplifyVector = FALSE)
had2 <- fromJSON("https://api.github.com/users/hadley/repos?page=2&per_page=100", simplifyVector = FALSE)

The data are stored in two separate lists, so we need to combine them with the c() function. Since the original objects are no longer necessary (and may be large), it’s probably a good idea to remove them.

had <- c(had1,had2)
rm(had1, had2)

The first item of interest is to know how many elements are in this list:

length(had)
## [1] 200

It’s also helpful to take a peek at the data structure:

had[[1]]
## $id
## [1] 40423928
## 
## $name
## [1] "15-state-of-the-union"
## 
## $full_name
## [1] "hadley/15-state-of-the-union"
## 
## $owner
## $owner$login
## [1] "hadley"
## 
## $owner$id
## [1] 4196
## 
## $owner$avatar_url
## [1] "https://avatars0.githubusercontent.com/u/4196?v=3"
## 
## $owner$gravatar_id
## [1] ""
## 
## $owner$url
## [1] "https://api.github.com/users/hadley"
## 
## $owner$html_url
## [1] "https://github.com/hadley"
## 
## $owner$followers_url
## [1] "https://api.github.com/users/hadley/followers"
## 
## $owner$following_url
## [1] "https://api.github.com/users/hadley/following{/other_user}"
## 
## $owner$gists_url
## [1] "https://api.github.com/users/hadley/gists{/gist_id}"
## 
## $owner$starred_url
## [1] "https://api.github.com/users/hadley/starred{/owner}{/repo}"
## 
## $owner$subscriptions_url
## [1] "https://api.github.com/users/hadley/subscriptions"
## 
## $owner$organizations_url
## [1] "https://api.github.com/users/hadley/orgs"
## 
## $owner$repos_url
## [1] "https://api.github.com/users/hadley/repos"
## 
## $owner$events_url
## [1] "https://api.github.com/users/hadley/events{/privacy}"
## 
## $owner$received_events_url
## [1] "https://api.github.com/users/hadley/received_events"
## 
## $owner$type
## [1] "User"
## 
## $owner$site_admin
## [1] FALSE
## 
## 
## $private
## [1] FALSE
## 
## $html_url
## [1] "https://github.com/hadley/15-state-of-the-union"
## 
## $description
## NULL
## 
## $fork
## [1] FALSE
## 
## $url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union"
## 
## $forks_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/forks"
## 
## $keys_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/keys{/key_id}"
## 
## $collaborators_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/collaborators{/collaborator}"
## 
## $teams_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/teams"
## 
## $hooks_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/hooks"
## 
## $issue_events_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/issues/events{/number}"
## 
## $events_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/events"
## 
## $assignees_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/assignees{/user}"
## 
## $branches_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/branches{/branch}"
## 
## $tags_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/tags"
## 
## $blobs_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/blobs{/sha}"
## 
## $git_tags_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/tags{/sha}"
## 
## $git_refs_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/refs{/sha}"
## 
## $trees_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/trees{/sha}"
## 
## $statuses_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/statuses/{sha}"
## 
## $languages_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/languages"
## 
## $stargazers_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/stargazers"
## 
## $contributors_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/contributors"
## 
## $subscribers_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/subscribers"
## 
## $subscription_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/subscription"
## 
## $commits_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/commits{/sha}"
## 
## $git_commits_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/commits{/sha}"
## 
## $comments_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/comments{/number}"
## 
## $issue_comment_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/issues/comments{/number}"
## 
## $contents_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/contents/{+path}"
## 
## $compare_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/compare/{base}...{head}"
## 
## $merges_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/merges"
## 
## $archive_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/{archive_format}{/ref}"
## 
## $downloads_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/downloads"
## 
## $issues_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/issues{/number}"
## 
## $pulls_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/pulls{/number}"
## 
## $milestones_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/milestones{/number}"
## 
## $notifications_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/notifications{?since,all,participating}"
## 
## $labels_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/labels{/name}"
## 
## $releases_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/releases{/id}"
## 
## $deployments_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/deployments"
## 
## $created_at
## [1] "2015-08-09T03:22:26Z"
## 
## $updated_at
## [1] "2017-03-09T18:03:51Z"
## 
## $pushed_at
## [1] "2015-08-10T20:29:10Z"
## 
## $git_url
## [1] "git://github.com/hadley/15-state-of-the-union.git"
## 
## $ssh_url
## [1] "git@github.com:hadley/15-state-of-the-union.git"
## 
## $clone_url
## [1] "https://github.com/hadley/15-state-of-the-union.git"
## 
## $svn_url
## [1] "https://github.com/hadley/15-state-of-the-union"
## 
## $homepage
## NULL
## 
## $size
## [1] 4519
## 
## $stargazers_count
## [1] 26
## 
## $watchers_count
## [1] 26
## 
## $language
## [1] "R"
## 
## $has_issues
## [1] TRUE
## 
## $has_downloads
## [1] TRUE
## 
## $has_wiki
## [1] TRUE
## 
## $has_pages
## [1] FALSE
## 
## $forks_count
## [1] 10
## 
## $mirror_url
## NULL
## 
## $open_issues_count
## [1] 0
## 
## $forks
## [1] 10
## 
## $open_issues
## [1] 0
## 
## $watchers
## [1] 26
## 
## $default_branch
## [1] "master"

Some of the elements and sub-elements of this particular list are nested (lists of lists) … but overall this data is formatted in a friendly, parseable format. Each parent element has the same number of children, which are named and defined as “key : value” pairs.

So if we wanted to extract a specific child element from one of its parents, we could use something like the following:

had[[5]]$language
## [1] "Python"

We mentioned sapply() above, and now we can put it into action. This function will be useful in extracting the same child elements from different parents. To do so, we’ll need to define an anonymous function to apply across the list. Note that sapply() is similar to lapply() but always returns a vector, matrix or array rather than a list.

sapply(had, function(x) x$watchers)
##   [1]   26   13    0  915    5   69   50    8    4  255    5   12    6    1
##  [15]    0    8   13    1    0    8    5    6    5   34  102   20   17    4
##  [29]   38  116    2   44    7 1434    5    3   17    1 1683    0    1   72
##  [43]    5   40    1    4   11    3    0    6   18    3   57    3    3    3
##  [57]   15    7  351    4    3   74    7   24    3    8    8    0   28    4
##  [71]    4    2    2    0  593    9    2    1   11    8   76    0    3    4
##  [85]   91    3    0   28    6  288   21    9    5    2   11   66    7    0
##  [99]  175   38   14  236   10    3    8   25   33    3   39    4    0  203
## [113]  450    4   31    6   22   30    8  113   38    6    3   81   29  166
## [127]   58    2    6    2  171  617    7   27    6    4    1    1    0    5
## [141]    1    0    4    1   28    1   17    0  151    5    6    9    0    1
## [155]   33    1    7   22   29    1    7  744    3    0   15    7   73   85
## [169]   27    0    3   21    0   14    5    1    3    6    3    5    1  107
## [183]    2    6  381  173    7    2    1   17   16    1    4    8    1    1
## [197]    0    4    0    8

We’ve successfully extracted the child element of interest from each of the parent elements in the list. However, this vector could be hard to interpret since the elements are divorced from the larger context. One solution might be to assign names to the original list, which will give sapply() a named vector output.

names(had) <- sapply(had, function(x) x$name)
sapply(had, function(x) x$watchers)
##  15-state-of-the-union      15-student-papers               500lines 
##                     26                     13                      0 
##                  adv-r                appdirs             assertthat 
##                    915                      5                     69 
##              babynames         beautiful-data                  bench 
##                     50                      8                      4 
##                 bigvis         bigvis-infovis         boxplots-paper 
##                    255                      5                     12 
##                  broom                builder             cellranger 
##                      6                      1                      0 
##              classifly             clusterfly       cocktail-balance 
##                      8                     13                      1 
##             commonmark         cran-downloads        cran-logs-dplyr 
##                      0                      8                      5 
##          cran-packages              cranatics             crantastic 
##                      6                      5                     34 
##        data-baby-names          data-counties      data-fuel-economy 
##                    102                     20                     17 
##               data-gbd    data-housing-crisis            data-movies 
##                      4                     38                    116 
##            data-stride                decumar             densityvis 
##                      2                     44                      7 
##               devtools           directlabels              distpower 
##                   1434                      5                      3 
##                 docker                   docs                  dplyr 
##                     17                      1                   1683 
##          dplyrimpaladb                   drat                 dtplyr 
##                      0                      1                     72 
##                eggnogr               evaluate              example-r 
##                      5                     40                      1 
##              extrafont              fec-dplyr        fivethirtyeight 
##                      4                     11                      3 
##                 foobar                fortify            fueleconomy 
##                      0                      6                     18 
##                gdtools                   gg2v             ggenealogy 
##                      3                     57                      3 
##                  ggmap                 ggplot                ggplot1 
##                      3                      3                     15 
##        ggplot2-bayarea           ggplot2-book           ggplot2-docs 
##                      7                    351                      4 
##          ggplot2movies                 ggstat               ggthemes 
##                      3                     74                      7 
##                 gtable              gun-sales              hadladdin 
##                     24                      3                      8 
##      hadley.github.com              hclpicker                  helpr 
##                      8                      0                     28 
##     herndon-ash-pollin               hflights      highlighting-kate 
##                      4                      4                      2 
##                httpbin                 httpuv                   httr 
##                      2                      0                    593 
##                  ideas              imvisoned                 kmeans 
##                      9                      2                      1 
##                   l1tf                 layers               lazyeval 
##                     11                      8                     76 
##                leaflet          leaflet-shiny                legends 
##                      0                      3                      4 
##               lineprof                 linval                   lme4 
##                     91                      3                      0 
##                 lobstr               localmds              lubridate 
##                     28                      6                    288 
##                 lvplot           lvplot-paper          maplight-data 
##                     21                      9                      5 
##      markdown-licenses                 meifly                memoise 
##                      2                     11                     66 
##       mexico-mortality                minimal                 modelr 
##                      7                      0                    175 
##                 monads                 mturkr             multidplyr 
##                     38                     14                    236 
##                 mutatr              mutatrGui            nasaweather 
##                     10                      3                      8 
##                  neiss           nycflights13               olctools 
##                     25                     33                      3 
##            oldbookdown                packman               PivotalR 
##                     39                      4                      0 
##                pkgdown                   plyr              pop-flows 
##                    203                    450                      4 
##                 precis          prodplotpaper           productplots 
##                     31                      6                     22 
##                  profr                  proto                   pryr 
##                     30                      8                    113 
##               purrrlyr          qtpaint-demos      r-devel-san-clang 
##                     38                      6                      3 
##            r-internals            r-on-github                 r-pkgs 
##                     81                     29                    166 
##               r-python               r-source               r-travis 
##                     58                      2                      6 
##                 r-yaml                   r2d3                   r4ds 
##                      2                    171                    617 
##    ranking-correlation               rappdirs              rastermap 
##                      7                     27                      6 
##                rblocks                Rcereal                   Rcpp 
##                      4                      1                      1 
##           rcpp-gallery           RcppDateTime           rcpplonglong 
##                      0                      5                      1 
##           RcppProgress            rcrunchbase                  RCurl 
##                      0                      4                      1 
##          reactive-docs               ReadStat                recipes 
##                     28                      1                     17 
##                 reprex                reshape                  rfmt2 
##                      0                    151                      5 
##               rifftron                    rio riotworkshop.github.io 
##                      6                      9                      0 
##                  rJava                  rlang              rmarkdown 
##                      1                     33                      1 
##                 rminds               roxygen3                 rsmith 
##                      7                     22                     29 
##                RSQLite                    rv2                  rvest 
##                      1                      7                    744 
##              rworldmap                   rydn                     S3 
##                      3                      0                     15 
##            scagnostics                 scales                 secure 
##                      7                     73                     85 
##              sfhousing                    sfr                  shiny 
##                     27                      0                      3 
##           shinySignals               simpleS4               sinartra 
##                     21                      0                     14 
##             spatialVis               sqlutils       stat405-practice 
##                      5                      1                      3 
##      stat405-resources  STAT545-UBC.github.io             stationaRy 
##                      6                      3                      5 
##              strptimer                svglite                syuzhet 
##                      1                    107                      2 
##              tanglekit               testthat              tidy-data 
##                      6                    381                    173 
##                toc-vis               unittest          USAboundaries 
##                      7                      2                      1 
##          usdanutrients                  vctrs                   vega 
##                     17                     16                      1 
##          vis-migration                   vita                warncpp 
##                      4                      8                      1 
##               webreadr                 webuse                 weeder 
##                      1                      0                      4 
##         weight-and-see            wesanderson 
##                      0                      8

Exercise

  • How many times are these repositories forked on average?
  • Is Hadley a “site_admin” on any of these repositories?
  • Try reading the data from Github again. Make sure to use the simplifyVector = TRUE argument instead. What happened?

Other Methods

There are many, many ways to work with lists. What follows is a very brief nod to a few features from packages that help address list complexity.

rlist

rlist includes a set of very useful tools for list manipulation7.

Some highlights:

  • list.map()
  • list.sort()
  • list.filter()
  • list.group()
  • list.table()
library(rlist)
list.map(had, created_at)
list.sort(had, forks_count)
list.filter(had, size > 50000)
list.group(had, language)
list.table(had, fork)

purrr

According to its author, Hadley Wickham, the purrr package, “… fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr”8. This is especially useful for working with lists when using lists for programmatic purposes, like writing functions or packages. But there are applications for interactive list manipulation with purrr as well. The following are particularly helpful:

  • map(): allows functions to be passed to each element of the list (roughly analogous to sapply() or lapply())
  • flatten(): simplifies a list to a vector (roughly analogous to unlist())
  • transpose(): turns a list inside out (transpose() then transpose() will revert the list back to original state)

References