`textmodel_wordfish.Rd`

Estimate Slapin and Proksch's (2008) "wordfish" Poisson scaling model of one-dimensional document positions using conditional maximum likelihood.

textmodel_wordfish(x, dir = c(1, 2), priors = c(Inf, Inf, 3, 1), tol = c(1e-06, 1e-08), dispersion = c("poisson", "quasipoisson"), dispersion_level = c("feature", "overall"), dispersion_floor = 0, sparse = FALSE, abs_err = FALSE, svd_sparse = TRUE, residual_floor = 0.5)

x | the dfm on which the model will be fit |
---|---|

dir | set global identification by specifying the indexes for a pair of documents such that \(\hat{\theta}_{dir[1]} < \hat{\theta}_{dir[2]}\). |

priors | prior precisions for the estimated parameters \(\alpha_i\), \(\psi_j\), \(\beta_j\), and \(\theta_i\), where \(i\) indexes documents and \(j\) indexes features |

tol | tolerances for convergence. The first value is a convergence threshold for the log-posterior of the model, the second value is the tolerance in the difference in parameter values from the iterative conditional maximum likelihood (from conditionally estimating document-level, then feature-level parameters). |

dispersion | sets whether a quasi-Poisson quasi-likelihood should be
used based on a single dispersion parameter ( |

dispersion_level | sets the unit level for the dispersion parameter,
options are |

dispersion_floor | constraint for the minimal underdispersion multiplier
in the quasi-Poisson model. Used to minimize the distorting effect of
terms with rare term or document frequencies that appear to be severely
underdispersed. Default is 0, but this only applies if |

sparse | specifies whether the |

abs_err | specifies how the convergence is considered |

svd_sparse | uses svd to initialize the starting values of theta,
only applies when |

residual_floor | specifies the threshold for residual matrix when
calculating the svds, only applies when |

An object of class `textmodel_fitted_wordfish`

. This is a list
containing:

global identification of the dimension

estimated document positions

estimated document fixed effects

estimated feature marginal effects

estimated word fixed effects

document labels

feature labels

regularization parameter for betas in Poisson form

log likelihood at convergence

standard errors for theta-hats

dfm to which the model was fit

The returns match those of Will Lowe's R implementation of
`wordfish`

(see the austin package), except that here we have renamed
`words`

to be `features`

. (This return list may change.) We
have also followed the practice begun with Slapin and Proksch's early
implementation of the model that used a regularization parameter of
se\((\sigma) = 3\), through the third element in `priors`

.

In the rare situation where a warning message of "The algorithm did not converge." shows up, removing some documents may work.

Jonathan Slapin and Sven-Oliver Proksch. 2008. "A Scaling Model
for Estimating Time-Series Party Positions from Texts." *American
Journal of Political Science* 52(3):705-772.

Lowe, Will and Kenneth Benoit. 2013. "Validating Estimates of Latent Traits
from Textual Data Using Human Judgment as a Benchmark." *Political Analysis*
21(3), 298-313. http://doi.org/10.1093/pan/mpt002

(wf <- textmodel_wordfish(data_dfm_lbgexample, dir = c(1,5)))#> Error in get(".SigLength", envir = env): object '.SigLength' not foundsummary(wf, n = 10)#> Error in summary(wf, n = 10): object 'wf' not foundcoef(wf)#> Error in coef(wf): object 'wf' not foundpredict(wf)#> Error in predict(wf): object 'wf' not foundpredict(wf, se.fit = TRUE)#> Error in predict(wf, se.fit = TRUE): object 'wf' not foundpredict(wf, interval = "confidence")#> Error in predict(wf, interval = "confidence"): object 'wf' not found# NOT RUN { ie2010dwf <- dfm(data_corpus_irishbudget2010, verbose = FALSE) (wf1 <- textmodel_wordfish(ie2010dfm, dir = c(6,5))) (wf2a <- textmodel_wordfish(ie2010dfm, dir = c(6,5), dispersion = "quasipoisson", dispersion_floor = 0)) (wf2b <- textmodel_wordfish(ie2010dfm, dir = c(6,5), dispersion = "quasipoisson", dispersion_floor = .5)) plot(wf2a$phi, wf2b$phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5", xlim = c(0, 1.0), ylim = c(0, 1.0)) plot(wf2a$phi, wf2b$phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5", xlim = c(0, 1.0), ylim = c(0, 1.0), type = "n") underdispersedTerms <- sample(which(wf2a$phi < 1.0), 5) which(featnames(ie2010dfm) %in% names(topfeatures(ie2010dfm, 20))) text(wf2a$phi, wf2b$phi, wf2a$features, cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "grey90") text(wf2a$phi['underdispersedTerms'], wf2b$phi['underdispersedTerms'], wf2a$features['underdispersedTerms'], cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "black") if (require(austin)) { wf_austin <- austin::wordfish(quanteda::as.wfm(ie2010dfm), dir = c(6,5)) cor(wf1$theta, wf_austin$theta) } # }