Calculate the readability of text(s) using one of a variety of computed indexes.

textstat_readability(x, measure = c("all", "ARI", "ARI.simple", "Bormuth",
  "Bormuth.GP", "Coleman", "Coleman.C2", "Coleman.Liau", "Coleman.Liau.grade",
  "Coleman.Liau.short", "Dale.Chall", "Dale.Chall.old", "Dale.Chall.PSK",
  "Danielson.Bryan", "Danielson.Bryan.2", "Dickes.Steiwer", "DRP", "ELF",
  "Farr.Jenkins.Paterson", "Flesch", "Flesch.PSK", "Flesch.Kincaid", "FOG",
  "FOG.PSK", "FOG.NRI", "FORCAST", "FORCAST.RGL", "Fucks", "Linsear.Write",
  "LIW", "nWS", "nWS.2", "nWS.3", "nWS.4", "RIX", "Scrabble", "SMOG", "SMOG.C",
  "SMOG.simple",      "SMOG.de", "Spache", "Spache.old", "Strain",
  "Traenkle.Bailer", "Traenkle.Bailer.2", "Wheeler.Smith", "meanSentenceLength",
  "meanWordSyllables"), remove_hyphens = TRUE, min_sentence_length = 1,
  max_sentence_length = 10000, ...)

Arguments

x

a character or corpus object containing the texts

measure

character vector defining the readability measure to calculate. Matches are case-insensitive.

remove_hyphens

if TRUE, treat constituent words in hyphenated as separate terms, for purposes of computing word lengths, e.g. "decision-making" as two terms of lengths 8 and 6 characters respectively, rather than as a single word of 15 characters

min_sentence_length, max_sentence_length

set the minimum and maximum sentence lengths (in tokens, excluding punctuation) to include in the computation of readability. This makes it easy to exclude "sentences" that may not really be sentences, such as section titles, table elements, and other cruft that might be in the texts following conversion.

For finer-grained control, consider filtering sentences prior first, including through pattern-matching, using corpus_trim.

...

not used

Value

textstat_readability returns a data.frame of documents and their readability scores.

Examples

txt <- c("Readability zero one. Ten, Eleven.", "The cat in a dilapidated tophat.") textstat_readability(txt, "Flesch.Kincaid")
#> document Flesch.Kincaid #> 1 text1 13.705000 #> 2 text2 8.383333
textstat_readability(txt, c("FOG", "FOG.PSK", "FOG.NRI"))
#> document FOG FOG.PSK FOG.NRI #> 1 text1 17.000000 4.608659 -1.3875 #> 2 text2 9.066667 3.254382 -1.2600
inaugReadability <- textstat_readability(data_corpus_inaugural, "all")
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
cor(inaugReadability[,-1])
#> Error in is.data.frame(x): object 'inaugReadability' not found
textstat_readability(data_corpus_inaugural, measure = "Flesch.Kincaid")
#> document Flesch.Kincaid #> 1 1789-Washington 28.432155 #> 2 1793-Washington 17.151759 #> 3 1797-Adams 28.692087 #> 4 1801-Jefferson 19.295924 #> 5 1805-Jefferson 21.917088 #> 6 1809-Madison 25.557611 #> 7 1813-Madison 18.034452 #> 8 1817-Monroe 14.443950 #> 9 1821-Monroe 16.773058 #> 10 1825-Adams 19.748273 #> 11 1829-Jackson 21.893148 #> 12 1833-Jackson 19.958757 #> 13 1837-VanBuren 20.263524 #> 14 1841-Harrison 19.355855 #> 15 1845-Polk 16.406306 #> 16 1849-Taylor 23.748805 #> 17 1853-Pierce 16.438241 #> 18 1857-Buchanan 16.110868 #> 19 1861-Lincoln 13.661936 #> 20 1865-Lincoln 11.774929 #> 21 1869-Grant 14.008426 #> 22 1873-Grant 15.082444 #> 23 1877-Hayes 20.297851 #> 24 1881-Garfield 14.042973 #> 25 1885-Cleveland 19.052937 #> 26 1889-Harrison 14.944583 #> 27 1893-Cleveland 18.248838 #> 28 1897-McKinley 15.854298 #> 29 1901-McKinley 12.728640 #> 30 1905-Roosevelt 13.959254 #> 31 1909-Taft 17.099757 #> 32 1913-Wilson 11.707659 #> 33 1917-Wilson 11.780913 #> 34 1921-Harding 13.214469 #> 35 1925-Coolidge 11.626699 #> 36 1929-Hoover 13.259377 #> 37 1933-Roosevelt 11.640683 #> 38 1937-Roosevelt 10.619558 #> 39 1941-Roosevelt 9.687985 #> 40 1945-Roosevelt 9.175680 #> 41 1949-Truman 11.328244 #> 42 1953-Eisenhower 10.328852 #> 43 1957-Eisenhower 8.094908 #> 44 1961-Kennedy 11.822997 #> 45 1965-Johnson 7.550602 #> 46 1969-Nixon 9.235663 #> 47 1973-Nixon 12.288965 #> 48 1977-Carter 11.670742 #> 49 1981-Reagan 9.798608 #> 50 1985-Reagan 10.420294 #> 51 1989-Bush 7.147029 #> 52 1993-Clinton 10.374204 #> 53 1997-Clinton 9.828863 #> 54 2001-Bush 8.918201 #> 55 2005-Bush 11.036277 #> 56 2009-Obama 10.229426 #> 57 2013-Obama 11.734767 #> 58 2017-Trump 9.163084
inaugReadability <- textstat_readability(data_corpus_inaugural, "all")
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
cor(inaugReadability[,-1])
#> Error in is.data.frame(x): object 'inaugReadability' not found