Returns a count of the number of syllables in texts. For English words, the syllable count is exact and looked up from the CMU pronunciation dictionary, from the default syllable dictionary data_int_syllables. For any word not in the dictionary, the syllable count is estimated by counting vowel clusters.

data_int_syllables is a quanteda-supplied data object consisting of a named numeric vector of syllable counts for the words used as names. This is the default object used to count English syllables. This object that can be accessed directly, but we strongly encourage you to access it only through the nsyllable() wrapper function.

nsyllable(x, syllable_dictionary = quanteda::data_int_syllables,
  use.names = FALSE)

Arguments

x

character vector or tokens object whose syllables will be counted

syllable_dictionary

optional named integer vector of syllable counts where the names are lower case tokens. When set to NULL (default), then the function will use the quanteda data object data_int_syllables, an English pronunciation dictionary from CMU.

use.names

logical; if TRUE, assign the tokens as the names of the syllable count vector

Value

If x is a character vector, a named numeric vector of the counts of the syllables in each element. If x is a tokens object, return a list of syllable counts where each list element corresponds to the tokens in a document.

Note

All tokens are automatically converted to lowercase to perform the matching with the syllable dictionary, so there is no need to perform this step prior to calling nsyllable().

Examples

# character nsyllable(c("cat", "syllable", "supercalifragilisticexpialidocious", "Brexit", "Administration"), use.names = TRUE)
#> cat syllable #> 1 3 #> supercalifragilisticexpialidocious Brexit #> 13 2 #> administration #> 5
# tokens txt <- c(doc1 = "This is an example sentence.", doc2 = "Another of two sample sentences.") nsyllable(tokens(txt, remove_punct = TRUE))
#> $doc1 #> [1] 1 1 1 3 2 #> #> $doc2 #> [1] 3 1 1 2 3 #>
# punctuation is not counted nsyllable(tokens(txt), use.names = TRUE)
#> $doc1 #> this is an example sentence . #> 1 1 1 3 2 NA #> #> $doc2 #> another of two sample sentences . #> 3 1 1 2 3 NA #>