Combine documents in a dfm by a grouping variable, which can also be one of the docvars attached to the dfm. This is identical in functionality to using the "groups" argument in dfm.

dfm_group(x, groups = NULL, fill = FALSE)

Arguments

x

a dfm

groups

either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details.

fill

logical; if TRUE and groups is a factor, then use all levels of the factor when forming the new "documents" of the grouped dfm. This will result in documents with zero feature counts for levels not observed. Has no effect if the groups variable(s) are not factors.

Value

dfm_group returns a dfm whose documents are equal to the unique group combinations, and whose cell values are the sums of the previous values summed by group. This currently erases any docvars in the dfm.

Setting the fill = TRUE offers a way to "pad" a dfm with document groups that may not have been observed, but for which an empty document is needed, for various reasons. If groups is a factor of dates, for instance, then using fill = TRUE ensures that the new documents will consist of one row of the dfm per date, regardless of whether any documents previously existed with that date.

Examples

mycorpus <- corpus(c("a a b", "a b c c", "a c d d", "a c c d"), docvars = data.frame(grp = c("grp1", "grp1", "grp2", "grp2"))) mydfm <- dfm(mycorpus) dfm_group(mydfm, groups = "grp")
#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3
dfm_group(mydfm, groups = c(1, 1, 2, 2))
#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3
# equivalent dfm(mydfm, groups = "grp")
#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3
dfm(mydfm, groups = c(1, 1, 2, 2))
#> Document-feature matrix of: 2 documents, 4 features (25% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3