Package-level

quanteda-package

An R package for the quantitative analysis of textual data

quanteda_options()

Get or set package options for quanteda

Data

Built-in data objects.

data_char_sampletext

A paragraph of text for testing various text-based functions

data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos

data_corpus_dailnoconf1991

Confidence debate from 1991 Irish Parliament

data_corpus_inaugural

US presidential inaugural address texts

data_corpus_irishbudget2010

Irish budget speeches from 2010

data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)

Corpus functions

Functions for constructing and manipulating corpus class objects.

corpus()

Construct a corpus object

corpus_reshape()

Recast the document units of a corpus

corpus_sample()

Randomly sample documents from a corpus

corpus_segment() char_segment()

Segment texts on a pattern match

corpus_subset()

Extract a subset of a corpus

docvars() `docvars<-`()

Get or set document-level variables

head(<corpus>) tail(<corpus>)

Return the first or last part of a corpus

metacorpus() `metacorpus<-`()

Get or set corpus metadata

metadoc() `metadoc<-`()

Get or set document-level meta-data

texts() `texts<-`() as.character(<corpus>)

Get or assign corpus texts

as.corpus(<corpuszip>)

Coerce a compressed corpus to a standard corpus

Tokens functions

Functions for constructing and manipulating tokens class objects.

tokens()

Tokenize a set of texts

tokens_compound()

Convert token sequences into compound tokens

tokens_lookup()

Apply a dictionary to a tokens object

tokens_ngrams() char_ngrams() tokens_skipgrams()

Create ngrams and skipgrams from tokens

tokens_select() tokens_remove() tokens_keep()

Select or remove tokens from a tokens object

tokens_replace()

Replace types in tokens object

tokens_subset()

Extract a subset of a tokens

tokens_tolower() tokens_toupper()

Convert the case of tokens

tokens_tortl() char_tortl()

[Experimental] Change direction of words in tokens

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

types()

Get word types from a tokens object

as.tokens() as.list(<tokens>) unlist(<tokens>) as.character(<tokens>) is.tokens() `+`(<tokens>) c(<tokens>)

Coercion, checking, and combining functions for tokens objects

Character functions

Functions for constructing and manipulating character objects.

char_tolower() char_toupper()

Convert the case of character objects

corpus_segment() char_segment()

Segment texts on a pattern match

tokens_ngrams() char_ngrams() tokens_skipgrams()

Create ngrams and skipgrams from tokens

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

Text matrix functions

Functions for constructing and manipulating a document-feature matrix (dfm) or feature co-occurrence matrix object.

dfm()

Create a document-feature matrix

dfm_compress() fcm_compress()

Recombine a dfm or fcm by combining identical dimension elements

dfm_group()

Combine documents in a dfm by a grouping variable

dfm_lookup()

Apply a dictionary to a dfm

dfm_sample()

Randomly sample documents or features from a dfm

dfm_select() dfm_remove() dfm_keep() fcm_select() fcm_remove() fcm_keep()

Select features from a dfm or fcm

dfm_replace()

Replace features in dfm

dfm_subset()

Extract a subset of a dfm

dfm_sort()

Sort a dfm by frequency of one or more margins

dfm_tfidf()

Weight a dfm by tf-idf

dfm_tolower() dfm_toupper() fcm_tolower() fcm_toupper()

Convert the case of the features of a dfm and combine

dfm_trim()

Trim a dfm using frequency threshold-based feature selection

dfm_weight() dfm_smooth()

Weight the feature frequencies in a dfm

tokens_wordstem() char_wordstem() dfm_wordstem()

Stem the terms in an object

docfreq()

Compute the (weighted) document frequency of a feature

head(<dfm>) tail(<dfm>)

Return the first or last part of a dfm

as.dfm() is.dfm()

Coercion and checking functions for dfm objects

as.data.frame(<dfm>)

Convert a dfm to a data.frame

as.matrix(<dfm>)

Coerce a dfm to a matrix or data.frame

fcm()

Create a feature co-occurrence matrix

fcm_sort()

Sort an fcm in alphabetical order of the features

Text Statistics

Functions for computing statistics from texts and dfm objects.

textstat_collocations() is.collocations()

Identify and score multi-word expressions

textstat_dist() textstat_simil()

Similarity and distance computation between documents or features

textstat_lexdiv()

Calculate lexical diversity

textstat_frequency()

Tabulate feature frequencies

textstat_keyness()

Calculate keyness statistics

textstat_readability()

Calculate readability

sparsity()

Compute the sparsity of a document-feature matrix

topfeatures()

Identify the most frequent features in a dfm

Dictionary functions

Constructor and utility functions for working with dictionaries.

dictionary()

Create a dictionary

as.dictionary() is.dictionary()

Coercion and checking functions for dictionary objects

as.yaml()

Convert quanteda dictionary objects to the YAML format

Phrase discovery functions

Functions for exploring and detecting keywords and phrases.

textstat_collocations() is.collocations()

Identify and score multi-word expressions

kwic() is.kwic()

Locate keywords-in-context

Text plot functions

Plot functions for representing text and the analysis of texts.

textplot_influence()

Influence plot for text scaling models

textplot_keyness()

Plot word keyness

textplot_network() as.network(<fcm>)

Plot a network of feature co-occurrences

textplot_scale1d()

Plot a fitted scaling model

textplot_wordcloud()

Plot features as a wordcloud

textplot_xray()

Plot the dispersion of key word(s)

Text Model Functions

Plot functions for fitting analytic models from text matrixes.

textmodel_affinity()

Class affinity maximum likelihood text scaling model

textmodel_ca()

Correspondence analysis of a document-feature matrix

textmodel_lsa()

Latent Semantic Analysis

textmodel_nb()

Naive Bayes classifier for texts

textmodel_wordfish()

Wordfish text model

textmodel_wordscores()

Wordscores text model

Utility functions

R-like functions to return counts and object information.

ndoc() nfeat() nfeature()

Count the number of documents or features

nscrabble()

Count the Scrabble letter values of text

nsentence()

Count the number of sentences

nsyllable()

Count syllables in a text

ntoken() ntype()

Count the number of tokens or types

docnames() `docnames<-`()

Get or set document names

featnames()

Get the feature labels from a dfm

Miscellaneous functions

phrase() is.phrase()

Declare a compound character to be a sequence of separate pattern matches

as.list(<dist>)

Coerce a dist object into a list

convert()

Convert a dfm to a non-quanteda format

bootstrap_dfm()

Bootstrap a dfm

spacy_parse(<corpus>)

Extensions for and from spacy_parse objects