List the most (or least) frequently occuring features in a dfm, either as a whole or separated by document.

topfeatures(x, n = 10, decreasing = TRUE, scheme = c("count", "docfreq"),
  groups = NULL)

Arguments

x

the object whose features will be returned

n

how many top features should be returned

decreasing

If TRUE, return the n most frequent features; otherwise return the n least frequent features

scheme

one of count for total feature frequency (within group if applicable), or docfreq for the document frequencies of features

groups

either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details.

Value

A named numeric vector of feature counts, where the names are the feature labels, or a list of these if groups is given.

Examples

mydfm <- dfm(corpus_subset(data_corpus_inaugural, Year > 1980), remove_punct = TRUE) mydfm_nostopw <- dfm_remove(mydfm, stopwords("english")) # most frequent features topfeatures(mydfm)
#> the and of to our we a in is that #> 1100 927 761 584 565 539 426 337 300 293
topfeatures(mydfm_nostopw)
#> us must america new people world nation time can freedom #> 165 99 98 93 90 87 79 73 72 71
# least frequent features topfeatures(mydfm_nostopw, decreasing = FALSE)
#> hatfield mondale baker moomaw momentous occurrence #> 1 1 1 1 1 1 #> routinely unique really every-4-year #> 1 1 1 1
# top features of individual documents topfeatures(mydfm_nostopw, n = 5, groups = docnames(mydfm_nostopw))
#> $`1981-Reagan` #> us government must believe people #> 25 16 10 10 9 #> #> $`1985-Reagan` #> us people world one freedom #> 27 16 15 14 13 #> #> $`1989-Bush` #> new us can nation world #> 14 13 11 10 10 #> #> $`1993-Clinton` #> world must america us people #> 18 18 15 13 12 #> #> $`1997-Clinton` #> new us century nation time #> 29 27 20 13 12 #> #> $`2001-Bush` #> us citizens country story nation #> 11 9 9 9 8 #> #> $`2005-Bush` #> freedom liberty america every nation #> 25 15 12 10 9 #> #> $`2009-Obama` #> us can nation new must #> 23 13 12 11 8 #> #> $`2013-Obama` #> us must people time can #> 21 17 11 10 7 #> #> $`2017-Trump` #> america american people country one #> 18 11 10 9 8 #>
# grouping by president last name topfeatures(mydfm_nostopw, n = 5, groups = "President")
#> $Reagan #> us government people world one #> 52 29 25 23 22 #> #> $Bush #> freedom us nation america can #> 36 27 27 27 24 #> #> $Clinton #> us new world must america #> 40 38 28 28 26 #> #> $Obama #> us must can nation people #> 44 25 20 18 18 #> #> $Trump #> america american people country one #> 18 11 10 9 8 #>
# features by document frequencies tail(topfeatures(mydfm, scheme = "docfreq", n = 200))
#> congress said throughout came heart find #> 7 7 7 7 7 7