textmodel_affinity implements the maximum likelihood supervised text scaling method described in Perry and Benoit (2017).

textmodel_affinity(x, y, exclude = NULL, smooth = 0.5, ref_smooth = 0.5,
  verbose = FALSE)

Arguments

x

the dfm or bootstrap_dfm object on which the model will be fit. Does not need to contain only the training documents, since the index of these will be matched automatically.

y

vector of training classes/scores associated with each document identified in data

exclude

a set of words to exclude from the model

smooth

a smoothing parameter for class affinities; defaults to 0.5 (Jeffreys prior). A plausible alternative would be 1.0 (Laplace prior).

ref_smooth

a smoothing parameter for token distributions; defaults to 0.5

verbose

logical; if TRUE print diagnostic information during fitting.

References

Perry, Patrick O. and Kenneth Benoit. (2017) "Scaling Text with the Class Affinity Model". arXiv:1710.08963 [stat.ML].

Examples

(fitted <- textmodel_affinity(data_dfm_lbgexample, y = c("L", NA, NA, NA, "R", NA)))
#> Call: #> textmodel_affinity.dfm(x = data_dfm_lbgexample, y = c("L", NA, #> NA, NA, "R", NA)) #> #> Training documents per class: L: 1, R: 1 #> Total training features: 37
predict(fitted)
#> Predicted textmodel of type: affinity #> #> Estimated coefficients: #> #> L s.e. R s.e. chi2 #> R1 0.99950 0.00071 0.00050 0.00071 9.3 #> R2 0.99941 0.00083 0.00059 0.00083 9415.8 #> R3 0.50000 0.02734 0.50000 0.02734 24864.5 #> R4 0.00059 0.00083 0.99941 0.00083 9415.8 #> R5 0.00050 0.00071 0.99950 0.00071 9.3 #> V1 0.99867 0.00187 0.00133 0.00187 19458.5 #> #> Some diagnostics here about how many words were not found in training vocabulary.
predict(fitted, newdata = data_dfm_lbgexample[6, ])
#> Predicted textmodel of type: affinity #> #> Estimated coefficients: #> #> L s.e. R s.e. chi2 #> V1 1 0.0019 0.0013 0.0019 19459 #> #> Some diagnostics here about how many words were not found in training vocabulary.
# NOT RUN { # compute bootstrapped SEs bsdfm <- bootstrap_dfm(data_corpus_dailnoconf1991, n = 10, remove_punct = TRUE) textmodel_affinity(bsdfm, y = c("Govt", "Opp", "Opp", rep(NA, 55))) # }