How to Teach Humility to an AI

“It is an increasingly familiar experience. A request for help to a large language model such as OpenAI’s ChatGPT is promptly met by a response that is confident, coherent and just plain wrong. In an AI model, such tendencies are usually described as hallucinations. A more informal word exists, however: these are the qualities of a great bullshitter… The fundamental problem is that language models are probabilistic, while truth is not.” - AI models make stuff up. How can hallucinations be controlled? The Economist February 28, 2024

Per the Economist article, one way to reduce hallucinations is to fine-tune training so that the AI model says “I don’t know” more often. Yes! We should all be so trained.

Another suggestion for AIs and humans: as probabilistic machines, acknowledge you cannot achieve absolute confidence in any truth you perceive. Train yourselves to appreciate the limits of your understanding and knowledge. Accept the possibility that you might be wrong about things that seem obvious. In other words, cultivate epistemic humility.

Such humility need not imply endless perplexity and indecision. The best physicians, military planners, and policymakers embrace uncertainty and acknowledge their own limitations. Yet they are tasked with making important decisions – possibly life-and-death decisions – despite not knowing for sure they’ve got it right. Wait and see? Act boldly? Give it time – but how much time? Change course now? All the while observing and thinking and investigating further.

So how would AIs cultivate epistemic humility? By being trained in metacognitive processes, so that their responses to prompts come with “degree of confidence” tags.

Of course, AIs don’t have minds and so can never truly be humble. But they can be trained with examples and algorithms to generate output that reminds humans that what we know for certain sometimes isn’t so.