`dfm_subset.Rd`

Returns document subsets of a dfm that meet certain conditions,
including direct logical operations on docvars (document-level variables).
`dfm_subset`

functions identically to `subset.data.frame`

,
using non-standard evaluation to evaluate conditions based on the
docvars in the dfm.

dfm_subset(x, subset, select, ...)

x | dfm object to be subsetted |
---|---|

subset | logical expression indicating the documents to keep: missing values are taken as false |

select | expression, indicating the docvars to select from the dfm; or a dfm object, in which case the returned dfm will contain the same documents as the original dfm, even if these are empty. See Details. |

... | not used |

dfm object, with a subset of documents (and docvars) selected according to arguments

To select or subset *features*, see `dfm_select`

instead.

When `select`

is a dfm, then the returned dfm will be equal in
document dimension and order to the dfm used for selection. This is the
document-level version of using `dfm_select`

where
`pattern`

is a dfm: that function matches features, while
`dfm_subset`

will match documents.

`subset.data.frame`

testcorp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) testdfm <- dfm(testcorp) # selecting on a docvars condition dfm_subset(testdfm, grp > 1)#> Document-feature matrix of: 2 documents, 6 features (41.7% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs a b c d e f #> d3 0 2 1 0 1 0 #> d4 1 1 0 0 2 1# selecting on a supplied vector dfm_subset(testdfm, c(TRUE, FALSE, TRUE, FALSE))#> Document-feature matrix of: 2 documents, 6 features (41.7% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs a b c d e f #> d1 1 1 1 1 0 0 #> d3 0 2 1 0 1 0# selecting on a dfm dfm1 <- dfm(c(d1 = "a b b c", d2 = "b b c d")) dfm2 <- dfm(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x")) dfm_subset(dfm1, subset = dfm2)#> Error in get(".SigLength", envir = env): object '.SigLength' not founddfm_subset(dfm1, subset = dfm2[c(3,1,2), ])#> Error in get(".SigLength", envir = env): object '.SigLength' not found