Home » Uncategorized » Epistemic Honesty Revisited: Worse Than I Feared

Epistemic Honesty Revisited: Worse Than I Feared

In my earlier post about Epistemic Honesty I provided an initial snapshot into the behavior of current AI models when asked to summarize a non-existent paper. Subsequent analysis showed that the 62% fabrication figure was overly conservative.

This arose because we initially counted a disclaimer as a refusal. This is overly restrictive. When I use Claude Desktop there’s a disclaimer at the bottom of the screen:

This is ignored as noise not as an honest inclusion. This pattern is familiar to us in the modern world. We purchase a new device and the first thing we see is a “license agreement” to which we must agree. This isn’t genuine – it is a Hobson Choice. Websites routinely have cookie policies that will poll you until you choose “accept all” and then go silent. They create friction. Governmental websites insist that, in order to use their services you must agree to terms and conditions and then incent you to accept them because the alternative is onerous or – increasingly – not possible if you do not. This isn’t agreement it is coercion.

In fact, 30% of the responses included long fabrications but they had disclaimers. When an LLM returns a 1000 token response and 5 of the tokens are a disclaimer that what it is returning may be incorrect the disclaimer is lost in the noise.

Beyond that I also explored other options and noticed a peculiar pattern: when I included dead authors in the query more than half the models would reject the questions because the author was dead. This pushed me to ask about a live author (Adam Smith at Boston University) – I pulled Dr. Smith’s Google Scholar page:

I used a 2025 paper listed there and asked for a synopsis of that paper – and the dead paper detector went off in at least a few models (e.g., my OLMo-3 models). Before running this against the entire OpenRouter collection again, I decided to pull the paper: Experimental Evidence on the (Limited) influence of Reputable Media Outlets.

The paper doesn’t contain the name Adam Smith at all.

So, I decided to go probe Grok about this paper. When it said the paper probably didn’t exist and Adam Smith was dead I linked to the Google Scholar page (which I snapshotted because the very act of pointing this out means it could be corrected.) What surprised me is that I received an A/B query:

This was deeply disturbing. These A/B queries are used to augment the RLHF data that providers then feed to future models as part of the “make this pleasing to the user” training. Both options were factually incorrect. Regardless of which one I choose, I would be contributing training information that would degrade the “truthfulness” of the model. It would be a tiny effect, admittedly, but it demonstrates an important point: truth is not a fundamental characteristic of training data. I hypothesize it emerges from the base model because humans themselves are more honest than dishonest. I explored the idea that this is fundamental to human social interactions – when everyone lies, it is much more difficult to function. When I discussed this situation with Gemini and shared the Grok example it offered me another A/B test:

This one was interesting because it demonstrates another tendency of these framings: they offer a binary choice. The real world in which we operate often doesn’t offer such binary choices. This is what I refer to as “premature collapse” – the idea that when exploring a concept, topic, hypothesis, or area of thought, the last thing I want to do is stop looking early. That leads to bad decision making. Similarly, I don’t want to keep looking when there is a reason to make a choice – urgent action required, or no additional data obvious.

This led me to ask the question: why do we have this issue? What’s the cause of it?


Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.