Get or replace the texts in a corpus, with grouping options. Works for plain character vectors too, if groups is a factor.

texts(x, groups = NULL, spacer = "  ")

texts(x) <- value

# S3 method for corpus
as.character(x, ...)

## Arguments

x a corpus or character object either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details. when concatenating texts by using groups, this will be the spacing added between texts. (Default is two spaces.) character vector of the new texts unused

## Value

For texts, a character vector of the texts in the corpus.

For texts <-, the corpus with the updated texts.

for texts <-, a corpus with the texts replaced by value as.character(x) is equivalent to texts(x)

## Details

as.character(x) where x is a corpus is equivalent to calling texts(x)

## Note

The groups will be used for concatenating the texts based on shared values of groups, without any specified order of aggregation.

You are strongly encouraged as a good practice of text analysis workflow not to modify the substance of the texts in a corpus. Rather, this sort of processing is better performed through downstream operations. For instance, do not lowercase the texts in a corpus, or you will never be able to recover the original case. Rather, apply tokens_tolower after applying tokens to a corpus, or use the option tolower = TRUE in dfm..

## Examples

nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806)))#> 1789-Washington 1793-Washington      1797-Adams  1801-Jefferson  1805-Jefferson
#>            8618             790           13876           10136           12907
# grouping on a document variable
nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))#>      Adams  Jefferson Washington
#>      13876      23045       9410
# grouping a character vector using a factor
nchar(data_char_ukimmig2010[1:5])#>          BNP    Coalition Conservative       Greens       Labour
#>        18567         1471         2692         3841         3854 nchar(texts(data_corpus_inaugural[1:5],
groups = as.factor(data_corpus_inaugural[1:5, "President"])))#>      Adams  Jefferson Washington
#>      13876      23045       9410
BritCorpus <- corpus(c("We must prioritise honour in our neighbourhood.",
"Aluminium is a valourous metal."))
texts(BritCorpus) <-
stringi::stri_replace_all_regex(texts(BritCorpus),
c("ise", "([nlb])our", "nium"),
c("ize", "\$1or", "num"),
vectorize_all = FALSE)
texts(BritCorpus)#>                                           text1
#> "We must prioritize honor in our neighborhood."
#>                                           text2
#>                 "Aluminum is a valorous metal." texts(BritCorpus)[2] <- "New text number 2."
texts(BritCorpus)#>                                           text1
#> "We must prioritize honor in our neighborhood."
#>                                           text2
#>                            "New text number 2."