Convert a quanteda dfm object to a format useable by other text analysis packages. The general function convert provides easy conversion from a dfm to the document-term representations used in all other text analysis packages for which conversions are defined.

convert(x, to = c("lda", "tm", "stm", "austin", "topicmodels", "lsa",
  "matrix", "data.frame", "tripletlist"), docvars = NULL)



a dfm to be converted


target conversion format, consisting of the name of the package into whose document-term matrix representation the dfm will be converted:


a list with components "documents" and "vocab" as needed by the function lda.collapsed.gibbs.sampler from the lda package


a DocumentTermMatrix from the tm package


the format for the stm package


the wfm format from the austin package


the "dtm" format as used by the topicmodels package


the "textmatrix" format as used by the lsa package


a data.frame where each feature is a variable


a named "triplet" format list consisting of document, feature, and frequency


optional data.frame of document variables used as the meta information in conversion to the stm package format. This aids in selecting the document variables only corresponding to the documents with non-zero counts.


A converted object determined by the value of to (see above). See conversion target package documentation for more detailed descriptions of the return formats.


mycorpus <- corpus_subset(data_corpus_inaugural, Year > 1970) quantdfm <- dfm(mycorpus, verbose = FALSE) # austin's wfm format identical(dim(quantdfm), dim(convert(quantdfm, to = "austin")))
#> [1] TRUE
# stm package format stmdfm <- convert(quantdfm, to = "stm")
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
#> Error in str(stmdfm): object 'stmdfm' not found
#' # triplet triplet <- convert(quantdfm, to = "tripletlist") str(triplet)
#> List of 3 #> $ document : chr [1:8389] "1973-Nixon" "1981-Reagan" "1989-Bush" "2005-Bush" ... #> $ feature : chr [1:8389] "mr" "mr" "mr" "mr" ... #> $ frequency: num [1:8389] 3 3 6 1 1 69 52 130 124 142 ...
# illustrate what happens with zero-length documents quantdfm2 <- dfm(c(punctOnly = "!!!", mycorpus[-1]), verbose = FALSE) rowSums(quantdfm2)
#> Error in rowSums(quantdfm2): 'x' must be an array of at least two dimensions
stmdfm2 <- convert(quantdfm2, to = "stm", docvars = docvars(mycorpus))
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
#> Error in str(stmdfm2): object 'stmdfm2' not found
# NOT RUN { # tm's DocumentTermMatrix format tmdfm <- convert(quantdfm, to = "tm") str(tmdfm) # topicmodels package format str(convert(quantdfm, to = "topicmodels")) # lda package format ldadfm <- convert(quantdfm, to = "lda") str(ldadfm) # }