Wednesday, April 24, 2024

Making AI Less of a Hallucination

A Practical Take on Prompt Engineering

A robot looking at itself in a mirror, reaching out to touch its image in it. Source: Dall-E 3
Source: DALL-E 3

Imagine crafting a sleek, minimalist generative AI application designed to streamline our all-too-human shortcomings in spelling and grammar. Sounds practical, right? Well, here’s a lesson I learned along the way: utilize prompt engineering to ensure that the app avoids pesky hallucinations.

Agile development

I quickly assembled this app using just fifty-ish lines of Python code. I also crafted a system prompt to guide the large language model (LLM) towards generating the desired output.

Here’s a peek at the original prompt:

You are an advanced language model trained to identify and correct spelling and grammar errors while enhancing the clarity and conciseness of professional communication. Please review the text provided below, correct any errors, revise the content for professional clarity and conciseness without altering the original meaning. Respond back to this prompt with your revised text, followed by an itemized list that highlights the corrections you made. Please format your response as markdown.

Well, perhaps too agile...

I discovered after running a few emails and blog posts through the app that its generated output was a bit too foot loose and fancy free. The LLM got confused about splitting the results into two distinct sections that I wanted. Plus, it started seeing spelling and grammar errors that were not there - classic LLM hallucinations.

An image of a robot looking confused since it is hallucinating.  Source: Dall-E 3
Source: DALL-E 3

The hallucinations were more frequent when I submitted text that was clean in the first place. The LLM appeared confused, trying to please the boss by attempting to correct nonexistent errors.

Here is the original input text:

tech writer generative AI application original text used to test LLM

And here is the output the LLM generated. The blue box highlights an error that isn't, and the purple highlights corrected text that wasn't:

tech writer generative AI application output with unimproved system prompt

Making adjustments

No problem, though. I sharpened my pencil and rewrote the prompt to be more specific on what I wanted to see and how it should be formatted:

You are an advanced model trained to identify and correct English language spelling and grammar errors while enhancing the clarity and conciseness of professional communication. Please review the text provided below, correct any errors, revise the content for professional clarity and conciseness without altering the original meaning. Respond back to this prompt with two sections. The first section shall shall be titled Revised Text:, and contain your revised text. The second section shall be titled Corrections:, and contain an bulletized list highlighting the corrections you made. If you cannot make corrections to the provided text, just say the provided text is grammatically correct. Finally, please emit your response in markdown format so it can be streamed inside a web application. From a styling perspective, when you generate the section headers, use level two markup, e.g., ## Revised Text:, ## Corrections:.

This revision of the prompt clearly states the requirements. It also emphasizes what the model should and should not return.

Cleaner output

With increased focus and direction, the LLM got its act together, emitting well-organized responses. The new prompt provided the necessary, additional context, eliminating hallucinations. Now, the generated content is consistent and reliable, just as we expect.

tech writer generative AI application output with improved system prompt

Feel free to fork

Curiosity piqued? Dive into the code on my GitHub and experiment with different models and prompts.

This fascinating example of AI refinement not only will enhance written communication, but also offers a glimpse into the future of personalized technology, accessible straight from your laptop. If you'd like to discuss this further, feel free to connect with me on LinkedIn.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Recent Posts