Testing Bayesian measures of relevance in discourse

Authors

  • Alex Warstadt
  • Omar Agha

DOI:

https://doi.org/10.18148/sub/2022.v26i0.1034

Abstract

Despite their popularity among linguists, categorical notions of relevance based on partial answerhood have well-known problems. Gradient information-theoretic measures of utility adopted in cognitive modeling, on the other hand, have not been tested as measures of relevance in discourse. We begin to fill this gap by experimentally evaluating two gradient measures of question under discussion (QUD) relevance in question-answer pairs in comparison to the categorical theory: entropy reduction, which measures the degree to which an answer decreases uncertainty about the resolution of the QUD; and KL divergence, which measures the degree to which an answer changes the probability distribution over the alternatives. Our experiments provide decisive evidence against the categorical theory of relevance, but do not give strong support to any one gradient measure. Both KL divergence and entropy reduction have systematic failure modes, and are less predictive of relevance judgments than comparatively unmotivated measures like the difference between prior and posterior. We outline several criteria for an adequate gradient theory of relevance, and identify candidate measures for future investigation.

Downloads

Published

2022-12-22

How to Cite

Warstadt, A., & Agha, O. (2022). Testing Bayesian measures of relevance in discourse. Proceedings of Sinn Und Bedeutung, 26, 865–886. https://doi.org/10.18148/sub/2022.v26i0.1034