How many “R”s are in the word strawberry?
Some of you might remember this strawberry controversy that left Gen-AI enthusiasts confused like anything.
For those who haven’t heard of this, in the earlier version of ChatGPT, if you asked how many times the letter “R” appears in strawberry, it would confidently say “two.” But… that’s wrong! And it left people wondering—how could something so smart miss something so simple?
This isn’t a one-off glitch. Large Language Models (LLMs) like ChatGPT often make surprisingly random errors on the simplest questions, even while acing complex ones. Back in late 2022, Meta (you know, formerly Facebook) introduced an AI tool called Galactica, designed to be a helpful resource for science researchers and students. It was trained on millions of pieces of scientific content—textbooks, papers, encyclopedias, websites, and even lecture notes. But, as it turns out, the AI had a bit of a "hallucination" problem. Users soon discovered that it made up citations and studies that didn’t exist. Similar issues were reported by users around the globe across the different LLMs, and I was one among them :)
I once asked ChatGPT for stats to back up an article. It gave me exactly what I needed, complete with sources. I was so happy to have saved hours of digging through research papers—until I realized it completely made them up. The stats, the titles, the authors—it was all beautifully fabricated (the audacity!). That was my first real encounter with AI hallucinations. And yes, it was as mind-blowing as it sounds.
So, what are AI Hallucinations?
AI hallucination occurs when a large language model (LLM)—often a generative AI—detects patterns or objects that simply aren’t there. This results in responses or outputs that don’t quite add up, producing information that can seem bizarre or completely inaccurate to human observers. It’s like the AI is “imagining” things that don’t actually exist, leading to absurd, often nonsensical, results. Another term for an AI hallucination is confabulation.
How do these AI hallucinations happen in the first place?
AI hallucinations usually happen when the model is trained on a smaller dataset, leaving it with too little information to work with, or when biases sneak into the training data itself. Whatever the root cause, these hallucinations can lead the AI to wander away from the facts, common sense, or sometimes both.
Typically, when we ask a question or make a request, we expect a solid, relevant answer from the AI. But every now and then, the AI comes up with something totally out of left field—responses that aren’t based on its training data, that are misinterpreted by its own algorithms, or that just don’t fit any recognizable logic. In other words, it starts “hallucinating” its answer.
AI hallucinations can sometimes lead to what’s called “generative anthropomorphism”—basically when people start feeling like the AI has human-like qualities. It happens when users think what the AI generates is real, even if it’s creating mythical scenes or writing things that don’t make logical sense. It’s like a mirage that tricks users into believing something imaginary is real, which can even affect their decisions.
LLMs (Large Language Models) are often “black box” systems, meaning no one really knows why they produce certain hallucinations. Fixing these errors isn’t easy either. The models have a set training cutoff, and going back to tweak the training data can be energy-intensive and costly. Plus, AI infrastructure itself is expensive, so finding and fixing the root cause of a hallucination can quickly rack up costs. That’s why it often falls on users—not the AI providers—to keep an eye out for any hallucinations and use a bit of caution when interpreting AI responses. That being said there are some things that can be done to reduce these hallucinations.
How to prevent AI Hallucinations?
Train your system with real-life niche documents
When training an AI model, it’s important to stick to the relevant data. If you're building a model to identify text and images in engineering drawings, you’ll want to feed it various types of engineering documents. Accurate, well-labeled data is a must—poor or missing labels lead to misunderstandings and faulty predictions. Similarly, avoid synthetic data unless it precisely matches the layout and style of real engineering drawings; otherwise, it can confuse the model.
Finally, ensure your training data is diverse, balanced, and well-structured. This approach reduces bias and improves the model’s accuracy and reliability. In short, relevant, high-quality data sets are the key to building smarter, more effective AI models.
Get the knack for prompt engineering
You really need to engineer your prompts in a certain way to prevent hallucinations. There are different types of prompt techniques you can try out. Here’s a strategic approach -
- Role-Based and Goal-Based Prompts define the context and purpose of the response. For example, assign a specific role and objective: “This is a component manufacturing drawing, extract the critical values specified in the catalog.”
- Conditional Prompting reminds the AI to handle uncertainty carefully, e.g., “If unsure, request clarification rather than speculate.”
- Instructional Prompting gives clear guidance on how to approach the task. For example, “The value extracted should include both the upper and lower limit.”
- Example-Driven Prompting provides a template for the AI to analyze and create similar responses e.g., “Here’s a sample of tolerance values extracted in tabular format. Create similar findings [insert the sample].
- Sequential Prompting: This is slightly more complex than the previous ones and is also related to the concept of contextual stacking. Read more about this in the next section.
“In simple terms, whenever you enter a prompt, just make sure to specify the role, goal, and instruction with an example and mention not to give any data if it’s not present in the document.”
Try out these prompts multiple times until you find the ones that give the most accurate responses. Finally, check these answers with industry experts to make sure your prompts are actually hitting the mark. A lot of research and trial and error goes into making an AI model perform well.
Why is it important to engineer your prompts?
There’s no “one-size-fits-all” in AI. For example, if you develop an image detection model, you can’t expect it to handle all types of images—like engineering drawings, medical X-rays, and property condition photos—with equal accuracy. Even though you use the same technology, each use case needs its own customization. You really need to narrow down to the specific task you’re trying to perform with AI. And this is not easy. That’s why it’s important to provide, clear task-specific prompts to ensure the accuracy of your results.
Build the context
Now, let’s dive into context stacking. The idea here is to build up the AI’s understanding by stacking bits of information or prompts, helping it minimize irrelevant or incorrect outputs. Here’s a breakdown:
- Incremental Contextual Input: Rather than giving one big question, feed smaller, layered chunks of context. For example, an AI analyzing a climate report would first receive broad information about the report type, followed by the climate trends, and geographical specifications.
- Sequential Prompts: Understand that AI processes each piece in a sequence, starting with structure, moving to specific notations, and ending with drawing details—keeping responses aligned with the task. Here’s an example - imagine you need to extract certain data from a climate repost. It’s a totally unstructured document is a custom template. To make sure the system does not hallucinate, you can breakdown your prompt into multiple, sections or sequences like this -
>General Overview: "Explain the purpose of the report on climate data and summarize the key objectives of the research."
>Data Breakdown: "Describe the types of data used in the study, such as temperature changes, precipitation, and CO2 levels, and outline their sources."
>Analysis Methods: "List and explain the methods used to analyze climate data, including statistical tools and models.
>Findings and Conclusions: "Summarize the report’s key findings about climate trends and any recommendations for further research or action."
- Cross-Referencing External Data: Sometimes, context stacking includes cross-referencing outside sources or more data to ensure accuracy. For eg, verifying temperature trends with similar years in historical data to confirm consistency.
Adjust the temperature
There is a parameter called "temperature" which you can adjust to control the randomness or creativity of the AI's responses. This temperature setting influences how deterministic (predictable) or varied (creative) the outputs will be. The higher the temperature, the higher the chances of hallucinations.
Here’s how it works:
- Low Temperature (e.g., 0 to 0.5): A lower temperature makes the model’s output more focused and predictable, as it tends to choose high-probability words. This setting is ideal for tasks requiring accuracy and consistency, like technical explanations, straightforward data extraction, or any scenario where hallucinations (incorrect or unrelated answers) need to be minimized.
- High Temperature (e.g., 0.7 to 1): A higher temperature makes the output more diverse and creative, increasing the model's likelihood of selecting lower-probability words. There will be times when you have to use higher temperatures, ie when you seek subjective answers. This is useful in creative tasks, such as brainstorming or generating creative story ideas. However, the risk of hallucinations also increases.
- Dynamic Adjustment: Modern AI agents dynamically adjust their “temperature” based on the query type, giving them the flexibility to switch between precision and creativity. This makes them versatile, as they can respond in a way that feels both helpful and human-like, depending on what the user asks. Dynamic adjustments can work wonders when employed properly. This opens up the door of immense possibilities in the world of modern AI agents. Stay tuned, we’ll be posting a detailed blog on this soon!
If you’ve scrolled directly to the end, here’s a quick summary. AI often makes things up and it's called hallucination - the more fancy word for it is confabulation, and managing this can be tricky. Training your system with lots of specific data and using the right prompts with added context and checks is the way out. A hallucination is nothing but AI trying to fill in the gaps, much like humans do when uncertain. Rather than viewing it as a drawback, the best approach is to modulate this tendency to handle both complex data-driven tasks and creative ventures.
FAQs
A pre-fund QC checklist is helpful because it ensures that a mortgage loan meets all regulatory and internal requirements before funding. Catching errors, inconsistencies, or compliance issues early reduces the risk of loan defects, fraud, and potential legal problems. This proactive approach enhances loan quality, minimizes costly delays, and improves investor confidence.
A pre-fund QC checklist is a set of guidelines and criteria used to review and verify the accuracy, compliance, and completeness of a mortgage loan before funds are disbursed. It ensures that the loan meets regulatory requirements and internal standards, reducing the risk of errors and fraud.
Using AI for pre-fund QC audits offers the advantage of quickly verifying that loans meet all regulatory and internal guidelines without any errors. AI enhances accuracy, reduces the risk of errors or fraud, reduces the audit time by half, and streamlines the review process, ensuring compliance before disbursing funds.
Choose software that offers advanced automation technology for efficient audits, strong compliance features, customizable audit trails, and real-time reporting. Ensure it integrates well with your existing systems and offers scalability, reliable customer support, and positive user reviews.
Audit Quality Control (QC) is crucial for mortgage companies to ensure regulatory compliance, reduce risks, and maintain investor confidence. It helps identify and correct errors, fraud, or discrepancies, preventing legal issues and defaults. QC also boosts operational efficiency by uncovering inefficiencies and enhancing overall loan quality.
Mortgage review/audit QC software is a collective term for tools designed to automate and streamline the process of evaluating loans. It helps financial institutions assess the quality, compliance, and risk of loans by analyzing loan data, documents, and borrower information. This software ensures that loans meet regulatory standards, reduces the risk of errors, and speeds up the review process, making it more efficient and accurate.