--- title: "Introduction to cppcontainers" author: "Christian Düben" date: "Updated: November 22, 2024" output: rmarkdown::html_vignette: number_sections: true vignette: > %\VignetteIndexEntry{Introduction to cppcontainers} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) require("cppcontainers") ``` # Motivation
This package incorporates C++ Standard Template Library containers into R. In doing so, it meets various objectives and targets different groups of R users.
First, it speeds up development workflows. If you previously wanted to make use of C++ containers, you had to write C++ code and wait for compilation every time you changed it. To evaluate the data held in C++, you also had to write print or export functions for each setup. cppcontainers does these steps for you. It wraps C++ containers, their methods, and helper functions and compiles them when the package is installed. There is no need to write C++ code or wait for compilation after the package has been installed. You can use the containers as interactively and without waiting time as you do with R's native data frames.
Second, it allows users who do not have a compiler installed (e.g., without Rtools on Windows) to use C++ containers, through the binary version of the package.
Third, it makes C++ easily accessible to R users. Much of the R community does not understand C++ and does not want to undertake the effort of learning it. This can impede collaboration between the team members who are familiar with C++ and those who are not. cppcontainers circumvents this problem by abstracting away much of the underlying C++ code, framing everything in an R-style syntax (incl. 1-based indices), and explaining functions in a language intuitive to R users.
Fourth, it pushes R towards a general purpose programming language. One of the big differences between R and general purpose languages is the availability of data structures. In R, almost everything is some sort of vector; a vector of primitives, a vector of pointers, etc. Even scalars are vectors of length one. This simplicity makes R very accessible and easy to learn for its audience, that is largely comprised of people with a background in quantitative, empirical methods but without training in software engineering. However, users who come from a general purpose programming language, such as C++ or Python, likely miss a larger collection of data structures. cppcontainers fills this gap.
Fifth, this package brings the performance of C++ Standard Template Library containers to R. These data structures are tailored towards certain use cases and can speed up execution times. Of course, there is some overhead involved compared to entirely writing the code in C++. Yet, by providing a thin wrapper around `Rcpp::XPtr`, cppcontainers attempts to minimize this overhead.
It is important to keep this motivation in mind. There is a lot more what you can do with C++ and consequently Rcpp, than what this package implements. This, e.g., includes nested containers. The aim of this package is not to incorporate all features of an entire other language into R. It adds functionality in one specific domain.
cppcontainers wraps many of the Standard Template Library's container types. The following list briefly introduces them in terms that are easy to grasp for users with little technical expertise. It avoids a discussion of performance characteristics or thread-safety implications.
**Associative containers** - Set: unique, sorted elements. - Unordered set: unique, unsorted elements. - Multiset: non-unique, sorted elements. - Unordered multiset: non-unique, unsorted elements. - Map: key-value pairs sorted by unique keys. - Unordered map: key-value pairs with unique, unsorted keys. - Multimap: key-value pairs sorted by non-unique keys. - Unordered multimap: key-value pairs with non-unique, unsorted keys. **Container adapters** - Stack: elements in a last-in, first-out order. The last element added is the first to be removed. - Queue: elements in a first-in, first-out order. The first element added is the first to be removed. - Priority queue: non-unique, sorted elements. **Sequence containers** - Vector: non-unique, unsorted elements. Similar to R vectors and space-efficient. - Deque: non-unique, unsorted elements with efficient manipulation on both sides of the double-ended queue. - Forward list: non-unique, unsorted elements iterable into one direction. - List: non-unique, unsorted elements iterable in both directions.There is a lot to know on the efficiency of element access at different positions, element insertion and removal, space efficiency, etc. It is too much to cover in this vignette. The list above primarily elaborates on uniqueness and sorting. It grants a first idea of what container types the package wraps and on which container types to read up on in other sources of documentation.
In this package, containers can hold primitives of four different types: integers, doubles, strings, and booleans. These correspond to the integer, numeric, character, and logical types in R. Native C++ strings are different from R strings, which is why the string type requires comparatively more conversions and can be a little less efficient to work with than the alternatives. Booleans are also differently implemented, but are converted more implicitly than strings are.
Much of STL containers' power lies in nested structures. In C++, you can, e.g., assemble a vector of sets or a queue of multimaps. This increases the universe of potential setups exponentially. Since an R wrapper package needs to hardcode all options, cppcontainers does not implement nested compositions and only targets primitive elements of the four mentioned types.
This package covers many of the container types' methods and maintains their original names. To make them more intuitive for R users, it replaces the object-oriented notation (`x.size()`) with the in R more commonly used functional notation (`size(x)`). cppcontainers omits methods that return values such as pointers or iterators, which are not directly utilizable in R. One example is the `equal_range` method for maps.
A few methods are modified to make them more convenient for R users. This particularly includes indexing and vector inputs. Methods like `count` and `contains` only accept a single value in C++, but allow for a vector in this package. cppcontainers solves it with iterative procedures on the C++ side.
An exception from the naming convention are the functions creating the objects. To avoid clashes with, e.g., `base::vector` and `utils::stack`, this package prefixes the respective methods with `cpp_`. `cpp_set` creates a set, `cpp_vector` creates a vector, etc. Another outlier is `remove`, which the package implements as `remove.` (with a dot suffix) to avoid compatibility problems with `base::remove`.
The following table illustrates which methods apply to which container type. Italicized names do not refer to original C++ methods, but to functions added by this package. These aid in accessing the objects' data from R.
## Associative Containers | method | set | unordered set | multiset | unordered multiset | map | unordered map | multimap | unordered multimap | | :--------------- | :-----: | :-----------: | :------: | :----------------: | :-----: | :-----------: | :------: | :----------------: | | == | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | [ | | | | | ✓ | ✓ | | | | assign | | | | | | | | | | at | | | | | ✓ | ✓ | | | | back | | | | | | | | | | bucket_count | | ✓ | | ✓ | | ✓ | | ✓ | | capacity | | | | | | | | | | clear | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | contains | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | count | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | emplace | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | emplace_after | | | | | | | | | | emplace_back | | | | | | | | | | emplace_front | | | | | | | | | | empty | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | erase | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | erase_after | | | | | | | | | | flip | | | | | | | | | | front | | | | | | | | | | insert | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | insert_after | | | | | | | | | | insert_or_assign | | | | | ✓ | ✓ | | | | load_factor | | ✓ | | ✓ | | ✓ | | ✓ | | max_bucket_count | | ✓ | | ✓ | | ✓ | | ✓ | | max_load_factor | | ✓ | | ✓ | | ✓ | | ✓ | | max_size | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | merge | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | pop | | | | | | | | | | pop_back | | | | | | | | | | pop_front | | | | | | | | | | *print* | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | push | | | | | | | | | | push_back | | | | | | | | | | push_front | | | | | | | | | | rehash | | ✓ | | ✓ | | ✓ | | ✓ | | remove. | | | | | | | | | | reserve | | ✓ | | ✓ | | ✓ | | ✓ | | resize | | | | | | | | | | reverse | | | | | | | | | | shrink_to_fit | | | | | | | | | | size | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | sort | | | | | | | | | | *sorting* | | | | | | | | | | splice | | | | | | | | | | splice_after | | | | | | | | | | *to_r* | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | top | | | | | | | | | | try_emplace | | | | | ✓ | ✓ | | | | *type* | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | unique | | | | | | | | | ## Container Adapters | method | stack | queue | priority queue | | :--------------- | :-----: | :-----: | :------------: | | == | ✓ | ✓ | | | [ | | | | | assign | | | | | at | | | | | back | | ✓ | | | bucket_count | | | | | capacity | | | | | clear | | | | | contains | | | | | count | | | | | emplace | ✓ | ✓ | ✓ | | emplace_after | | | | | emplace_back | | | | | emplace_front | | | | | empty | ✓ | ✓ | ✓ | | erase | | | | | erase_after | | | | | flip | | | | | front | | ✓ | | | insert | | | | | insert_after | | | | | insert_or_assign | | | | | load_factor | | | | | max_bucket_count | | | | | max_load_factor | | | | | max_size | | | | | merge | | | | | pop | ✓ | ✓ | ✓ | | pop_back | | | | | pop_front | | | | | *print* | ✓ | ✓ | ✓ | | push | ✓ | ✓ | ✓ | | push_back | | | | | push_front | | | | | rehash | | | | | remove. | | | | | reserve | | | | | resize | | | | | reverse | | | | | shrink_to_fit | | | | | size | ✓ | ✓ | ✓ | | sort | | | | | *sorting* | | | ✓ | | splice | | | | | splice_after | | | | | *to_r* | ✓ | ✓ | ✓ | | top | ✓ | | ✓ | | try_emplace | | | | | *type* | ✓ | ✓ | ✓ | | unique | | | | ## Sequence Containers | method | vector | deque | forward list | list | | :--------------- | :-----: | :-----: | :----------: | :-----: | | == | ✓ | ✓ | ✓ | ✓ | | [ | ✓ | ✓ | | | | assign | ✓ | ✓ | ✓ | ✓ | | at | ✓ | ✓ | | | | back | ✓ | ✓ | | ✓ | | bucket_count | | | | | | capacity | ✓ | | | | | clear | ✓ | ✓ | ✓ | ✓ | | contains | | | | | | count | | | | | | emplace | ✓ | ✓ | | ✓ | | emplace_after | | | ✓ | | | emplace_back | ✓ | ✓ | | ✓ | | emplace_front | | ✓ | ✓ | ✓ | | empty | ✓ | ✓ | ✓ | ✓ | | erase | ✓ | ✓ | | ✓ | | erase_after | | | ✓ | | | flip | ✓ | | | | | front | ✓ | ✓ | ✓ | ✓ | | insert | ✓ | ✓ | | ✓ | | insert_after | | | ✓ | | | insert_or_assign | | | | | | load_factor | | | | | | max_bucket_count | | | | | | max_load_factor | | | | | | max_size | ✓ | ✓ | ✓ | ✓ | | merge | | | ✓ | ✓ | | pop | | | | | | pop_back | ✓ | ✓ | | ✓ | | pop_front | | ✓ | ✓ | ✓ | | *print* | ✓ | ✓ | ✓ | ✓ | | push | | | | | | push_back | ✓ | ✓ | | ✓ | | push_front | | ✓ | ✓ | ✓ | | rehash | | | | | | remove. | | | ✓ | ✓ | | reserve | ✓ | | | | | resize | ✓ | ✓ | ✓ | ✓ | | reverse | | | ✓ | ✓ | | shrink_to_fit | ✓ | ✓ | | | | size | ✓ | ✓ | | ✓ | | sort | | | ✓ | ✓ | | *sorting* | | | | | | splice | | | | ✓ | | splice_after | | | ✓ | | | *to_r* | ✓ | ✓ | ✓ | ✓ | | top | | | | | | try_emplace | | | | | | *type* | ✓ | ✓ | ✓ | ✓ | | unique | | | ✓ | ✓ |The manual pages provide ample examples on the classes and methods in this package. The subsequent code just showcases a few of the functionalities outside of any meaningful workflow.
```{r} ## set s <- cpp_set(c(4, 2, 3)) s insert(s, c(4, 6)) s print(s, n = -2) ## unordered set s <- cpp_unordered_set(3:5) s emplace(s, 2) s type(s) ## multiset s <- cpp_multiset(c(3, 3, 6, 2)) s contains(s, 2) to_r(s) ## unordered multiset s <- cpp_unordered_multiset(c(3, 3, 6, 2)) s count(s, 3) rehash(s) ## map m <- cpp_map(c("Alice", "Bob"), c(TRUE, FALSE)) m m["Bob"] insert_or_assign(m, FALSE, "Alice") m ## unordered map m1 <- cpp_unordered_map(c("Alice", "Bob"), 3:4) m1 m2 <- cpp_unordered_map(c("Bob", "Jane"), 6:7) m2 merge(m1, m2) m1 m2 at(m1, "Jane") ## multimap m <- cpp_multimap(c("Alice", "Bob", "Bob"), 3:5) m clear(m) empty(m) ## unordered multimap m <- cpp_multimap(c("Alice", "Bob", "Bob"), 3:5) m size(m) erase(m, "Bob") m ## stack s1 <- cpp_stack(3:4) s1 s2 <- cpp_stack(3:4) s2 s1 == s2 pop(s1) s1 ## queue q <- cpp_queue(c(2.1, 3.3)) q push(q, 2.7) q back(q) ## priority queue p <- cpp_priority_queue(c(2.4, 1.3, 4.2, 1.5)) p top(p) sorting(p) ## vector v <- cpp_vector(2:4) v reserve(v, 10) capacity(v) ## deque d <- cpp_deque(c("Alice", "Bob", "Bob")) d push_front(d, "Jane") d pop_front(d) d ## forward list l <- cpp_forward_list(c(4, 2, 6)) l sort(l) l resize(l, 7) l ## list l <- cpp_list(c("Alice", "Bob", "Bob", "Alice")) unique(l) l print(l, from = 3) ```