Estimate Slapin and Proksch's (2008) "wordfish" Poisson scaling model of one-dimensional document positions using conditional maximum likelihood.

textmodel_wordfish(x, dir = c(1, 2), priors = c(Inf, Inf, 3, 1),
tol = c(1e-06, 1e-08), dispersion = c("poisson", "quasipoisson"),
dispersion_level = c("feature", "overall"), dispersion_floor = 0,
sparse = TRUE, abs_err = FALSE, svd_sparse = TRUE,
residual_floor = 0.5)

## Arguments

x the dfm on which the model will be fit set global identification by specifying the indexes for a pair of documents such that $$\hat{\theta}_{dir[1]} < \hat{\theta}_{dir[2]}$$. prior precisions for the estimated parameters $$\alpha_i$$, $$\psi_j$$, $$\beta_j$$, and $$\theta_i$$, where $$i$$ indexes documents and $$j$$ indexes features tolerances for convergence. The first value is a convergence threshold for the log-posterior of the model, the second value is the tolerance in the difference in parameter values from the iterative conditional maximum likelihood (from conditionally estimating document-level, then feature-level parameters). sets whether a quasi-Poisson quasi-likelihood should be used based on a single dispersion parameter ("poisson"), or quasi-Poisson ("quasipoisson") sets the unit level for the dispersion parameter, options are "feature" for term-level variances, or "overall" for a single dispersion parameter constraint for the minimal underdispersion multiplier in the quasi-Poisson model. Used to minimize the distorting effect of terms with rare term or document frequencies that appear to be severely underdispersed. Default is 0, but this only applies if dispersion = "quasipoisson". specifies whether the "dfm" is coerced to dense specifies how the convergence is considered uses svd to initialize the starting values of theta, only applies when sparse = TRUE specifies the threshold for residual matrix when calculating the svds, only applies when sparse = TRUE

## Value

An object of class textmodel_fitted_wordfish. This is a list containing:

dir

global identification of the dimension

theta

estimated document positions

alpha

estimated document fixed effects

beta

estimated feature marginal effects

psi

estimated word fixed effects

docs

document labels

features

feature labels

sigma

regularization parameter for betas in Poisson form

ll

log likelihood at convergence

se.theta

standard errors for theta-hats

x

dfm to which the model was fit

## Details

The returns match those of Will Lowe's R implementation of wordfish (see the austin package), except that here we have renamed words to be features. (This return list may change.) We have also followed the practice begun with Slapin and Proksch's early implementation of the model that used a regularization parameter of se$$(\sigma) = 3$$, through the third element in priors.

## Note

In the rare situation where a warning message of "The algorithm did not converge." shows up, removing some documents may work.

## References

Jonathan Slapin and Sven-Oliver Proksch. 2008. "A Scaling Model for Estimating Time-Series Party Positions from Texts." American Journal of Political Science 52(3):705-772.

Lowe, Will and Kenneth Benoit. 2013. "Validating Estimates of Latent Traits from Textual Data Using Human Judgment as a Benchmark." Political Analysis 21(3), 298-313. http://doi.org/10.1093/pan/mpt002

## Examples

textmodel_wordfish(data_dfm_lbgexample, dir = c(1,5))#> Fitted wordfish model:
#> Call:
#> 	textmodel_wordfish.dfm(x = data_dfm_lbgexample, dir = c(1, 5))
#>
#> Estimated document positions:
#>
#>   Documents      theta         SE       lower       upper
#> 1        R1 -1.3313038 0.01573038 -1.36213533 -1.30047223
#> 2        R2 -0.6062624 0.01215633 -0.63008885 -0.58243603
#> 3        R3  0.0633325 0.01129470  0.04119488  0.08547011
#> 4        R4  0.7026667 0.01266607  0.67784116  0.72749217
#> 5        R5  1.5012178 0.01775940  1.46640943  1.53602626
#> 6        V1 -0.3296508 0.01194618 -0.35306530 -0.30623627
#>
#> Estimated feature scores: showing first 30 beta-hats for features
#>
#>           A           B           C           D           E           F
#>  -7.4064130  -8.0776290  -9.7350167 -10.7146273 -11.5831921 -11.3612887
#>           G           H           I           J           K           L
#> -11.5594175 -10.4555689  -9.7338716  -8.6724718  -7.6191931  -6.6315804
#>           M           N           O           P           Q           R
#>  -5.6178715  -4.6855543  -3.7792851  -2.7246986  -1.8090842  -0.7117806
#>           S           T           U           V           W           X
#>   0.2594746   1.2559890   2.3935800   3.3679906   4.4773749   5.2773752
#>           Y           Z          ZA          ZB          ZC          ZD
#>   6.1121066   6.9830059   7.8338385   8.8092101   9.7279003  10.4986535
# NOT RUN {
ie2010dfm <- dfm(data_corpus_irishbudget2010, verbose = FALSE)
(wfm1 <- textmodel_wordfish(ie2010dfm, dir = c(6,5)))
(wfm2a <- textmodel_wordfish(ie2010dfm, dir = c(6,5),
dispersion = "quasipoisson", dispersion_floor = 0))
(wfm2b <- textmodel_wordfish(ie2010dfm, dir = c(6,5),
dispersion = "quasipoisson", dispersion_floor = .5))
plot(wfm2a@phi, wfm2b@phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
xlim = c(0, 1.0), ylim = c(0, 1.0))
plot(wfm2a@phi, wfm2b@phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
xlim = c(0, 1.0), ylim = c(0, 1.0), type = "n")
underdispersedTerms <- sample(which(wfm2a@phi < 1.0), 5)
which(featnames(ie2010dfm) %in% names(topfeatures(ie2010dfm, 20)))
text(wfm2a@phi, wfm2b@phi, wfm2a@features,
cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "grey90")
text(wfm2a@phi[underdispersedTerms], wfm2b@phi[underdispersedTerms],
wfm2a@features[underdispersedTerms],
cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "black")
if (require(austin)) {
wfmodelAustin <- austin::wordfish(quanteda::as.wfm(ie2010dfm), dir = c(6,5))
cor(wfm1@theta, wfmodelAustin\$theta)
}
# }