Apply a stemmer to words. This is a wrapper to wordStem designed to allow this function to be called without loading the entire SnowballC package. wordStem uses Martin Porter's stemming algorithm and the C libstemmer library generated by Snowball.

tokens_wordstem(x, language = quanteda_options("language_stemmer"))

char_wordstem(x, language = quanteda_options("language_stemmer"))

dfm_wordstem(x, language = quanteda_options("language_stemmer"))

Arguments

x

a character, tokens, or dfm object whose word stems are to be removed. If tokenized texts, the tokenization must be word-based.

language

the name of a recognized language, as returned by getStemLanguages, or a two- or three-letter ISO-639 code corresponding to one of these languages (see references for the list of codes)

Value

tokens_wordstem returns a tokens object whose word types have been stemmed. char_wordstem returns a character object whose word types have been stemmed. dfm_wordstem returns a dfm object whose word types (features) have been stemmed, and recombined to consolidate features made equivalent because of stemming.

References

http://snowball.tartarus.org/

http://www.iso.org/iso/home/standards/language_codes.htm for the ISO-639 language codes

See also

wordStem

Examples

# example applied to tokens txt <- c(one = "eating eater eaters eats ate", two = "taxing taxes taxed my tax return") th <- tokens(txt) tokens_wordstem(th)
#> tokens from 2 documents. #> one : #> [1] "eat" "eater" "eater" "eat" "ate" #> #> two : #> [1] "tax" "tax" "tax" "my" "tax" "return" #>
# simple example char_wordstem(c("win", "winning", "wins", "won", "winner"))
#> [1] "win" "win" "win" "won" "winner"
# example applied to a dfm (origdfm <- dfm(txt))
#> Document-feature matrix of: 2 documents, 11 features (50% sparse). #> 2 x 11 sparse Matrix of class "dfm" #> features #> docs eating eater eaters eats ate taxing taxes taxed my tax return #> one 1 1 1 1 1 0 0 0 0 0 0 #> two 0 0 0 0 0 1 1 1 1 1 1
dfm_wordstem(origdfm)
#> Document-feature matrix of: 2 documents, 6 features (50% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs eat eater ate tax my return #> one 2 2 1 0 0 0 #> two 0 0 0 4 1 1