The Biggest Problem with Generative AI No One Is Talking About

If you put the same prompt into ChatGPT twice, you will generally get a different output (albeit similar). For example (GPT-4o):

Chat 1:

Chat 2:

Why is this when I asked for the same thing?. To answer this, we have to first understand how ChatGPT thinks of what it wants to say.

How does ChatGPT know what to say?

What ChatGPT does is predict the next word in the sentence which has the highest probability of coming next based on its training data (the internet). Think of a really advanced version of predictive text on your phone if you were to continuously press the middle word on the top of your keyboard over and over.

Why do we get different outputs?

ChatGPT has a variable called temperature. Temperature controls the randomness of the next word ChatGPT chooses. The higher temperature is, the more random (and creative) ChatGPT will be. The lower temperature is, the more strict ChatGPT will be and the more likely you are to get the same output if you use the same input prompt. Unfortunately, in the ChatGPT interface, we cannot control temperature (even if the AI gurus say you can just add it to the prompt, they are wrong!) so I can’t show you the difference if we change it. However, this doesn’t even matter. 

Even if the temperature is 0, the same input prompt can still produce a different output and this is a major problem. 

Practical Use Problems

In highly regulated industries especially like healthcare and banking, when a piece of software is to be used, we must know what it is going to output given a specific input. We must also be confident that given the same input again, we get the same output from the software. This is the basis of software testing. Imagine “DoctorGPT” is diagnosing whether you have lung cancer and half the time it says you don’t and the other half it says you do. What a nightmare this would be.

Extreme examples aside, the randomness in generative AI outputs is a serious cause for concern when it comes to ensuring consistency in generative AI solutions. So why do we see so many potential use cases for it almost everywhere, the most highly regulated industries included? 

Executives are always in need of new toys

It has always baffled me that some of the largest causes for concern with traditional machine learning (pre-ChatGPT) in industry were around trust and transparency, namely “black-box” models. Yet when ChatGPT (the biggest black-box of them all) is released into the world, these same people are first in line to think of new extravagant (and sometimes dangerous) ways to use this technology. Ultimately, traditional machine learning lacks the wow-factor generative AI has; it simply isn’t as sexy. But how can we depend on inconsistent outputs? 

Simple answer: we can’t. Humans do not trust anything or anyone that says one thing one day and something different another day. It’s the same reason you don’t trust politicians or your ex-partner. Some may argue that if the outputs are “similar”, then it is okay provided the contexts are the same. My response would be: define “similar”. This is where you start to need business domain expertise to validate the outputs are indeed “similar”. If you need a business expert to check every AI output (e.g. human-in-the-loop) then why have the AI in the first place? This will waste the human’s time as they will have to essentially monitor the AI to ensure it does not make a mistake. Why not just let the human get on with their job as they always have done?

“Yes, we are doing AI” – Every CEO in 2024

Unfortunately, executive leadership of companies don’t think that way. Some executives of companies like to throw all their toys out the pram if a competitor is having some success with a technology they aren’t. Somehow it is not their fault, instead it is the fault of their employees for not having this foresight. The employees then scramble to produce something similar but will always fail as you cannot develop and productionize an AI solution in a matter of months especially if you don’t have the right expertise.

So what is the solution here? Well my advice to anyone who asks is if you cannot get basic advanced analytics working in your organization (I’m talking basic machine learning like random forests and k-means clustering), what makes you think you can develop and productionize a generative AI solution that will generate different outputs every time and not to mention potentially false information? Take control of your company’s AI capabilities properly instead of inflating the generative AI bubble even more… believe me, it will pop and your colleagues will throw you under the bus when it does.

The case for basic machine learning

I don’t dislike generative AI. The technology is revolutionary and has endless uses however I do think it is way overhyped. One key advantage of basic machine learning is that it is deterministic. You can put in the same input as many times as you so please and you will always get the same output. Ask yourself: “Do I need an LLM or do I just need a basic machine learning model?” Ask an expert who isn’t trying to sell you an AI solution and they will most likely tell you a simple technique will do. 

You can start building trust in AI through using simpler techniques before you move onto the big ones. You (unless you are a billionaire) wouldn’t commute to work everyday by airplane, you might drive, get public transport or walk because that makes sense for your specific purpose of commuting. With this in mind, don’t use an airplane to commute to work!