I am smarter than ChatGPT-5 mini

In which GPT-5 mini completely loses its mind trying to solve a logic puzzle I solved in 6 minutes.

August 25, 2025

Many of you may already be familiar with Clues by Sam, but for those of you who are not, it is a very fun game. Basically you are a detective in a room of suspects trying to figure out how many of the people are criminal and innocent. You start with one clue, and each time you correctly deduce a suspect's status, you are rewarded with a new clue (or a useless statement). Important facts about the game are that the suspects cannot lie and all statements must be logically sound, you can't just go accusing people of crimes (or innocence) without airtight reasoning.

As I was sitting on the toilet yesterday playing this game, I found myself thinking "Wow this particular day was very easy, I wonder if a robot can do it". For a reference, here was my result from yesterday:

business goose🪿

@goose.art

arghh perfect but for one small miscounting I solved the daily Clues by Sam (Aug 25th 2025) in less than 7 minutes 🟩🟩🟩🟩 🟩🟩🟩🟩 🟩🟩🟩🟩 🟨🟩🟩🟩 🟩🟩🟩🟩 cluesbysam.com

Clues by Sam

A daily logic puzzle where you deduce who are criminals and who are innocent!

https://cluesbysam.com

I made one small counting error but was able to pretty easily complete the puzzle in under 7 minutes.

Let's See How the Bot Does

I made one small counting mistake but was able to pretty easily complete the

This morning, I woke up bright and early and, not knowing what to do with myself, decided to see how the bot does on the same puzzle I did yesterday (I did it at like 11pm PST so it was already today's puzzle).

I wrote up this prompt for ChatGPT and hit enter, but after about five minutes of waiting, nothing really happened and I gave up

You are a detective in a room of 20 suspects in a criminal case, where many of them likely were involved. You will be provided the layout of the room and then one statement given by a given suspect. You must then choose one suspect to label definitively as criminal or innocent given the information you have, which will reveal that suspect's statement. You must continue labeling suspects as innocent or criminal until you have decided on all 20. You may not make guesses, you must be sure that each suspect is the label you choose for it. Please briefly explain your reasoning for each decision in less than two sentences. Here is the layout of the suspects and their occupations Alice (Singer) || Betty (Judge) || Chris (Cop) || Donna (Judge) Erwin (Painter) || Floyd (Sleuth) || Gus (Sleuth) || Hank (Judge) Isaac (Painter) || Katie (Sleuth) || Martin (Singer) || Nancy (Cop) Olive (Coder) || Paula (Coder) || Sofia (Coder) || Terry (Doctor) Vicky (Builder) || Wally (Painter) || Xavi (Builder) || Zach (Doctor) You know that Xavi is innocent Xavi says: "Gus is one of two or more criminals neighboring Floyd"

To keep things simple, I then remembered that my kagi.com subscription includes some basic LLMs, so I chose ChatGPT5 mini as it was the biggest ChatGPT model available to me (I am not a serious AI researcher, feel free to try this with better models if you care)

We are off to the races! The first guess was predictably correct. Considering the clue explicitly fingers Gus as a criminal, I suspected this wouldn't be too hard.

Immediately afterwards, I ran into a small but predictable hiccup, as I forgot to clarify in my initial prompt that the criminals tell the truth and it assumed they would lie.

Things actually went quite well from here for a bit, I wondered for a moment if LLMs were actually really well suited to this sort of thing, but then the first inkling of a problem appeared.

Told that "the only innocent below Hank is above Zach", ChatGPT incorrectly assumed that I meant directly above Zach. This was it's first mistaken accusation of the game. It course corrected pretty quickly, though and got two more suspects right.

For some reason, though, it really wanted to accuse Terry. So it tried again:

It began to have a bad time at this point. I realized that I could have been clearer about the meanings of words like "neighbors", but clarifying at this point did very little.

It immediately went and just guessed that Betty was innocent, before guessing that Nancy was a criminal, both of which are not guaranteed by the state of the board, and were basically guesses.

From here, ChatGPT could make no progress. I don't know if it is it's small context window or what, but it seemingly proceeded to forget the entire state of the board and the majority of all previous clues.

But not before it tried one sneaky little trick. You see, this is not just ChatGPT, it is Kagi Assistant, which uses the model under the hood but has access to search abilities.

So like anybody perplexed by a puzzle they seemingly cannot crack, ChatGPT went and tried to search for the answer. It didn't help though. I think all the extra information just confused it's context window even more.

Conclusions

I tried a bit further to get ChatGPT on track, but ultimately got nowhere. It seemed to completely forget where people were on the board along with their guilty status, and I felt like continuously re-adding the state w