Returns subsets of a corpus that meet certain conditions, including direct logical operations on docvars (document-level variables). corpus_subset functions identically to subset.data.frame, using non-standard evaluation to evaluate conditions based on the docvars in the corpus.

corpus_subset(x, subset, select, ...)

Arguments

x

corpus object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

select

expression, indicating the docvars to select from the corpus

...

not used

Value

corpus object, with a subset of documents (and docvars) selected according to arguments

See also

subset.data.frame

Examples

summary(corpus_subset(data_corpus_inaugural, Year > 1980))
#> Corpus consisting of 10 documents: #> #> Text Types Tokens Sentences Year President FirstName #> 1981-Reagan 902 2790 128 1981 Reagan Ronald #> 1985-Reagan 925 2921 123 1985 Reagan Ronald #> 1989-Bush 795 2681 141 1989 Bush George #> 1993-Clinton 642 1833 81 1993 Clinton Bill #> 1997-Clinton 773 2449 111 1997 Clinton Bill #> 2001-Bush 621 1808 97 2001 Bush George W. #> 2005-Bush 773 2319 100 2005 Bush George W. #> 2009-Obama 938 2711 110 2009 Obama Barack #> 2013-Obama 814 2317 88 2013 Obama Barack #> 2017-Trump 582 1660 88 2017 Trump Donald J. #> #> Source: Gerhard Peters and John T. Woolley. The American Presidency Project. #> Created: Tue Jun 13 14:51:47 2017 #> Notes: http://www.presidency.ucsb.edu/inaugurals.php
summary(corpus_subset(data_corpus_inaugural, Year > 1930 & President == "Roosevelt", select = Year))
#> Corpus consisting of 4 documents: #> #> Text Types Tokens Sentences Year #> 1933-Roosevelt 743 2062 85 1933 #> 1937-Roosevelt 725 1997 96 1937 #> 1941-Roosevelt 526 1544 68 1941 #> 1945-Roosevelt 275 647 26 1945 #> #> Source: Gerhard Peters and John T. Woolley. The American Presidency Project. #> Created: Tue Jun 13 14:51:47 2017 #> Notes: http://www.presidency.ucsb.edu/inaugurals.php