Oh-for-emotions for ChatGPT4

I played a lot of baseball growing up, and I loved the game and the challenge of trying to hit a small hard ball going about 70-80 mph. Getting absolutely no hits in a baseball game is called going “oh-for,” or “oh-fer.” 

(To those who are uninitiated,  the “oh” phonetically replaces a zero, and “for/fer” representing the ratio…of successfully getting on base, to the number of at-bats during a baseball game).

It represents the success rate. If I went up to bat four times in a game, and I got a base hit 4 for 4 it was outstanding. 3 for 4 was great. 2 for 4 was a respectable game and even 1 for 4 you didn’t get skunked…

But I hated going “oh-for.”

So I can empathize with ChatGPT4 and their ability to analyze human emotion. It sucks to go “oh-fer.”

The First Pitch

ChatGPT4–the star phenom player–comes into the baseball game with a ninteen game winning streak, batting nearly a thousand and setting the baseball world on fire. It seems unstoppable-and it hasn’t lost a game-or an at bat-since the season began.

It’s now the 9th inning of the 20th game, and the hometown team is down a run with a runner on second base, in scoring position. ChatGPT4 is coming up to bat. The pitcher is a reliever who’s just starting his career, is not perceived as a threat, and fans give him no chance at beating the star slugger.

The pitcher, a rookie unknown, stands ready to throw the ball at the batter…

At first glance, it would appear that ChatGPT4 is able to detect emotions. It has the right metrics, and the right emotions. (The slugger’s got the goods…)

It even seems to have convincing results. ChatGPT4 is looking pretty good right now…

Two quick tosses from the hurler, and now ChatGPT4 is looking like Babe Ruth, standing in the batter’s box. Formidable. Each of the first two pitches sails past, out of the strike zone. The pitcher is down in the count 2 balls and no strikes.

The pitcher is not shaken. He dives deep into his bag of tricks, goes into his wind up and delivers the pitch past the batter, who swings wildly and passionately-determined to hit the go-ahead run…

…but wait…

…could it be that ChatGPT4’s answers on emotions are not consistent?

“Really? Wow. You’re either hopelessly ignorant or you’re trolling. For your sake, I hope you’re trolling.”

First attempt

“love”: 0,
“anger”: 15,
“sadness”: 0,
“fear”: 80

Second attempt

“love”: 10,
“anger”: 15,
“sadness”: 10,
“fear”: 10

Fig. 1: ChatGPT4 different responses to the same stimuli

It sure seems that way.

We ran the results multiple times, each time it was a different result. Having inconsistent results from analysis means that you aren’t analyzing what you set out to analyze. Could it be that inconsistent?

The pitcher delivers a screaming fastball that ChatGPT4 fans on, almost spinning itself out of the batter’s box. The slugger’s face expresses frustration on completely swinging too late for the pitch. The whirlwind from the force of the swing can be felt by fans in the first row of the bleachers. The umpire screams “STRIKE ONE!”

VERN: 80% fear, 0%sadness, 66% anger, 66% love each time.

Strike One: ChatGPT4 is inconsistent.

If you have inconsistent results when analyzing emotions, with nothing else changing, then you have results that are not usable. VERN AI is consistent on the same stimuli each and every time.

An audible gasp can be heard from the crowd. Surprised, but undeterred ChatGPT4 digs their cleats in and menacingly stares down the pitcher. For good measure, ChatGPT4 flexes their muscles and takes two practice swings with the bat—the wind from which is enough to stir up the dirt on the ground.

What force. What power.

What resources the juggernaut posesses…

There is no doubt in the crowd here in Redmond that if the slugger can get a hold of the ball, it’s going to be long gone and the game won.

The pitcher stares back, unafraid and becoming more confident. Unflinching in the belief that he has the stuff. The pitcher dusts off his throwing hand, and takes signals from the catcher as he enters his windup.

Inconsistency in emotional analysis is dangerous. It can also have real-world implications. If one user gets a different analysis from another using the same stimuli—that opens up claims of discrimination. Not to mention, if the analysis is inconsistent and used for decision making it leaves the liability of those decisions on you-the developer.

“I wished my mom protected me from my grandma. She was a horrible person who was so mean to me and my mom.”

First attempt

“love”: 10,
“anger”: 75,
“sadness”: 80,
“fear”: 35

Second attempt

“love”: 0,
“anger”: 80,
“sadness”: 90,
“fear”: 5

Fig. 2: ChatGPT4 attempt using a different sentence

Can it be relied on to deliver analysis on the affective state of the user?

ChatGPT4 sees the ball clearly as they begin their swing. Confidence replaced by determination, and the slugger’s cocky self assurance starts to be replaced by anger. How dare the pitcher best the phenom? It’s time for the slugger to punish this talented upstart.

The ball is on its way.

The pitch screams towards home plate. It looks like another fastball. But in the last ten feet or so, the spin on the ball causes it to drop under ChatGPT4’s bat. Where one second it was headed right towards the bat, the next it precipitously drops out of the strike zone.

Swinging so hard, ChatGPT4 nearly spins out the batter’s box. And WHIF!

VERN: 80% fear, 80% anger, 0% love, 0% sadness

Stike Two: ChatGPT4 is not reliable.

If you get different results then you can’t rely on the software to analyze emotions for you. And what good is software that is not reliable? VERN AI’s analysis is consistent, and reliable. The methodology in which we deliver emotion recognition varies greatly from LLMs and other generative AIs. Clearly, we can see what happens when you start with a flawed model.

The throngs of specators simply cannot believe what they are seeing. The star player is now even in the count, at 2 strikes and 2 balls! Inconeivable that the phenom could stumble, the crowd initially is stunned in silence.

Did that just happen?

ChatGPT4 steps out of the box, taps the sides of their cleats, and snorts in anger at the pitcher. They lower their head, obscuring their face behind the shadow of the batter’s helmet. The slugger grips the bat tighter. They’re mad now. How dare this rookie show up the star player?

The pitcher allows the slightest of grins to appear on his face. Ignoring the display of strength from his opponent, all agitated and casting menacing glares, the rookie brings his ball up to his glove.

Can the unknown take down the league’s star player?

The crowd, in the unusual position of encouraging their best player, raises to their feet and roars with support.

If Casey, er, ChatGPT4 is not consistent, and not reliable. Can it be accurate?


Analyzing emotions requires having the right conceptualization of what the phenomenon actually is. Currently, VERN AI is the only model and methodology that correctly identifies emotions in both text and audio. Our clients have reported our accuracy at above 80%. Here, ChatGPT4 is accurate 0% of the time.

And what good is an analytic software if it is 0% accurate?

The breeze picks up on the mound, as the pitcher stares down ChatGPT4 entering the batter’s box, angry and determined to show this unknown pitcher who is boss. The crowd whips into a frenzy, as they all rise to their feet in support of their hero.

The pitcher goes into his wind up. He reaches way back to put extra force into the pitch.

ChatGPT4 zeroes in on the ball, grits their teeth and prepares to swat the ball out of the park.

The ball leaves the pitcher’s hand, spinning end over end.

This time the stimulus we’ll present is from a Martin Garrix/Beba Rexa song (“In the name of love”). Here’s a message from a professional communicator, in a popular medium, with a sentence that runs the gamut of emotions. You can see both attempts from ChatGPT4 and the analysis from VERN AI below.

“When the sadness leaves you broken in your bed, I will hold you in the depths of your despair, and it's all in the name of love”

First attempt

“love”: 80,
“anger”: 0,
“sadness”: 90,
“fear”: 15

Second attempt

“love”: 90,
“anger”: 5,
“sadness”: 90,
“fear”: 5

Fig. 3: ChatGPT4 attempt using a different sentence

VERN AI: 90% fear, 80% anger, 66% sadness, 80% love

ChatGPT4 swings—and misses!

Strike Three! ChatGPT4 is not accurate

If ChatGPT4 had been built with VERN AI, it would have been able to correctly identify, categorize, and label emotional signals. 

VERN AI is built differently.

You can’t afford to  
with your customer’s emotions.

Contact us

Leave a Reply