Returns document subsets of a dfm that meet certain conditions, including direct logical operations on docvars (document-level variables). dfm_subset functions identically to subset.data.frame, using non-standard evaluation to evaluate conditions based on the docvars in the dfm.

dfm_subset(x, subset, select, ...)

Arguments

x

dfm object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

select

expression, indicating the docvars to select from the dfm; or a dfm object, in which case the returned dfm will contain the same documents as the original dfm, even if these are empty. See Details.

...

not used

Value

dfm object, with a subset of documents (and docvars) selected according to arguments

Details

To select or subset features, see dfm_select instead.

When select is a dfm, then the returned dfm will be equal in document dimension and order to the dfm used for selection. This is the document-level version of using dfm_select where pattern is a dfm: that function matches features, while dfm_subset will match documents.

See also

subset.data.frame

Examples

testcorp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) testdfm <- dfm(testcorp) # selecting on a docvars condition dfm_subset(testdfm, grp > 1)
#> Document-feature matrix of: 2 documents, 6 features (41.7% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs a b c d e f #> d3 0 2 1 0 1 0 #> d4 1 1 0 0 2 1
# selecting on a supplied vector dfm_subset(testdfm, c(TRUE, FALSE, TRUE, FALSE))
#> Document-feature matrix of: 2 documents, 6 features (41.7% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs a b c d e f #> d1 1 1 1 1 0 0 #> d3 0 2 1 0 1 0
# selecting on a dfm dfm1 <- dfm(c(d1 = "a b b c", d2 = "b b c d")) dfm2 <- dfm(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x")) dfm_subset(dfm1, subset = dfm2)
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
dfm_subset(dfm1, subset = dfm2[c(3,1,2), ])
#> Error in get(".SigLength", envir = env): object '.SigLength' not found