Discussion about this post

User's avatar
Rainer Dynszis's avatar

Why, isn't that Tversky and Kahneman's Taxicab Problem once again? (cf Amos Tversky and Daniel Kahneman, "Evidential Impact of Base Rates", No. TR-4, Stanford University Department of Psychology, 1981)

Once you understand the problem, you find it almost everywhere you look. My conjecture is that this is all there is to Psychology's current replication crisis, but I could be wrong. What we can be sure of is that scores of scientists confuse the type I error rate with the probability of their theory being wrong, which has elsewhere been dubbed "the prosecutor's fallacy" because it's also endemic in legal proceedings.

This Bayesean problem also popped up elsewhere in not-too-distant history, to wit: AIDS tests. They faced the usual dilemma of not wanting to let a truly infected person remain undetected (i.e. maximize sensitivity), but also not let themselves be buried in false positives (i.e. retain a manageable specificity).

IIRC their solution in the 1980s was a sequence of tests called ELISA and Western Blot, respectively. ELISA took care of the sensitivity, and when that came back negative, you could be quite sure you were not infected. If ELISA found something, they confirmed or refuted it with Western Blot which was used to weed out the false positives.

Which makes me wonder: Are LLMs really an ELISA for emerging crises? I mean, just because LLMs are "hallucinating" doesn't mean they catch everything there is to catch. And if they are something like ELISA, then what could be the Western Blot for LLMs?

Kukuh Noertjojo's avatar

Adam, I am always learning a new thing from your post. Thank you!. Please write further on the fact that "hallucination evaluation models can also hallucinate". It is quite scary. Hopefully, you'll teach us some solution Adam.

4 more comments...

No posts

Ready for more?