Returns document subsets of a tokens that meet certain conditions, including direct logical operations on docvars (document-level variables). tokens_subset functions identically to subset.data.frame, using non-standard evaluation to evaluate conditions based on the docvars in the tokens.

tokens_subset(x, subset, select, ...)

Arguments

x

tokens object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

select

expression, indicating the docvars to select from the tokens; or a tokens object, in which case the returned tokens will contain the same documents in the same order as the original tokens, even if these are empty.

...

not used

Value

tokens object, with a subset of documents (and docvars) selected according to arguments

See also

subset.data.frame

Examples

corp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) toks <- tokens(corp) # selecting on a docvars condition tokens_subset(toks, grp > 1)
#> tokens from 2 documents. #> d3 : #> [1] "b" "b" "c" "e" #> #> d4 : #> [1] "e" "e" "f" "a" "b" #>
# selecting on a supplied vector tokens_subset(toks, c(TRUE, FALSE, TRUE, FALSE))
#> tokens from 2 documents. #> d1 : #> [1] "a" "b" "c" "d" #> #> d3 : #> [1] "b" "b" "c" "e" #>
# selecting on a tokens toks1 <- tokens(c(d1 = "a b b c", d2 = "b b c d")) toks2 <- tokens(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x")) tokens_subset(toks1, subset = toks2)
#> tokens from 3 documents. #> d1 : #> [1] "a" "b" "b" "c" #> #> d2 : #> [1] "b" "b" "c" "d" #> #> d3 : #> character(0) #>
tokens_subset(toks1, subset = toks2[c(3,1,2)])
#> tokens from 3 documents. #> d3 : #> character(0) #> #> d1 : #> [1] "a" "b" "b" "c" #> #> d2 : #> [1] "b" "b" "c" "d" #>