You are an absolute moron for believing in the hype of “AI Agents”.

Dynamic urban scene showcasing interconnected light trails representing digital communication networks.

I can’t even browse LinkedIn without seeing some product manager hype about agents coming “just around the corner”.

And before you jump into the comment’s section, I am not biased. I’ve worked with large language models since before ChatGPT, when it was GPT-3 in the OpenAI website and it only predicted the next words in the sentence (as opposed to the now-familiar chat interface).

I’ve built AI applications from scratch and trained all types of AI models. I’ve taken Deep Learning courses at the best AI and Computer Science school in the world, Carnegie Mellon, and obtained my Master’s Degree there.

And yet, when I see yet another video on my TikTok feed, I can’t help but cringe and think about how “Web 3 was going to transform the internet”.

Like I swear, this must be bot farms, ignorant non-technical people, and manufactured hype from OpenAI so that they can receive more funding. I mean, how many software engineers do you know that released production-ready agents?

That’s right. None.

Here’s why all of this manufactured hype is nonsense.

What is an “AI Agent”?

Agents actually have a long history within artificial intelligence. In recent times, since the invention of ChatGPT, it has come to mean a large language model structured to perform reasoning and complete tasks autonomously.

This model MIGHT be fine-tuned with reinforcement learning, but in practice people tend to just use OpenAI’s GPT, Google Gemini, or Anthropic’s Claude.

The difference between an agent and a language model is that agents complete task autonomously.

Here’s an example.

I have an algorithmic trading and financial research platform, NexusTrade.

NexusTrade — AI-Powered Algorithmic Trading Platform

Learn to conquer the market by deploying no-code algorithmic trading strategies.

nexustrade.io

Let’s say I wanted to stop paying an external data provider to get fundamental data for US companies.

With traditional language models, I would have to write code that interacts with them. This would look like the following:

  1. Build a script that scrapes the SEC website or use a GitHub repo to fetch company information (conforming to the 10 requests per second guideline in their terms of service)
  2. Use a Python library like pypdf to transform the PDFs to text
  3. Send it to a large language model to format the data
  4. Validate the response
  5. Save it in the database
  6. Repeat for all companies

With an AI agent, you should theoretically just be able to say.

Scrape the past and future historical data for all US companies and save it to a MongoDB database

Maybe it’ll ask you some clarifying questions. It might ask if you have an idea for what the schema should look like or which information is most important.

But the idea is you give it a goal and it will complete the task fully autonomously.

Sounds too good to be true, right?

That’s because it is.

The Problem with AI Agents in Practice

Now if the cheapest, small language model was free, as strong as Claude 3.5, and could be ran locally on any AWS T2 instance, then this article would be in a completely different tone.

It wouldn’t be a critique. It’d be a warning.

However, as it stands, AI agents do not work in the real world and here’s why.

1. Smaller Models are not NEARLY strong enough

The core problem of agents is that they rely on large language models.

More specifically, they rely on a GOOD model.

GPT-4o mini, the cheapest, large language model, other than Flash, is AMAZING for the price.

GPT-4o mini vs GPT-3.5 Turbo. I just tried out the new model and am BEYOND Impressed

We are entering a new era of inexpensive language models

medium.com

But it is quite simply not strong enough to complete real world agentic tasks.

It will steer off, forget its goals, or just make simple mistakes no matter how good you prompt it.

And if deployed live, your business will pay the price. When the large language model makes a mistake, it’s not super easy to detect unless you also build (likely an LLM-based) validation framework. One small error made at the beginning, and everything downstream from that is cooked.

In practice, here’s how this works.

2. Compounding of Errors

Let’s say you’re using GPT-4o-mini for agentic work.

Your agent breaks the task of extracting financial information for a company into smaller subtasks. Let’s say the probability it does each subtask correct is 90%.

With this, the errors compound. If a task is even moderately difficult with four subtasks, the probability of the final output being good is extremely low.

For example, if we break this down:

  1. The probability of completing one subtask is 90%
  2. The probability of completing two subtasks is 0.9*0.9 = 81%
  3. The probability of completing four subtasks is 66%

See where I’m headed?

To mitigate this, you will want to use a better language model. The stronger model might increase the accuracy of each subtask to 99%. After four subtasks, the final accuracy is 96%. A whole lot better (but still not perfect).

Most importantly, changing to these stronger models comes with an explosion of costs.

3. Explosion of costs

The cost difference between OpenAI’s o1 model and GPT-4o-mini

Once you switch to the stronger OpenAI models, you will see how your costs explode.

The pink and orange line are the costs of OpenAI’s o1. I maybe use it 4–5 times per day for extremely intense tasks like generating syntactically valid queries for stock analysis.

How to use Gemini, Claude, and OpenAI to create trading strategies that DESTROY the market.

Large language models have enabled a new way of interacting with the stock market, available to everybody. Instead of…

nexustrade.io

The lime green and dark blue line are GPT-4o-mini. That model sees hundreds of requests everyday, and the final costs are a small fraction of what O1 costs.

Moreover, even after all of this, you still need to validate the final output. You’re going to be using the stronger models for validation for the same reasons.

So you use larger models for the agent, and you use larger models for validation.

See why I think this is an OpenAI conspiracy?

Finally, changing the world from working with code to working with models has massive side effects.

4. You’re creating work with non-deterministic outcomes

Using LLM agents, the whole paradigm of your work shifts into a data science-esque approach.

Instead of writing deterministic code that is cheap to run everywhere and can run on an arduino (or in practice, a T2 micro-instance from AWS), you are writing non-deterministic prompts for a model running on a cluster of GPUs.

If you’re “lucky”, you are running your own GPUs with fine-tuned models, but it’s still going to cost you an arm and a leg just to maintain agents to do simple tasks.

And if you’re unlucky, you’re completely locked into OpenAI; your prompts outright won’t work if you try to move, and they can slowly increase the price as you’re running critical business processes using their APIs.

And before you say “you can use OpenRouter to switch models easily”, think again. The output of Anthropic’s model is different than the output of OpenAI’s.

So you’ll have to re-prompt engineer your entire stack, costing a fortune, just to get a marginal improvement in final performance for another LLM provider.

See what the problem is?

Concluding Thoughts

It seems almost a certainty that when I see a post about agents, it is from someone that has not used language models in practice.

As you can imagine, this is absolutely infuriating.

I am not saying AI doesn’t have its use-cases. Even agents can have value a few years from now to assist engineers in writing simple code.

But no reasonable company is going to replace their operations team with a suite of extremely expensive, error-prone agents in order to run critical processes for their business.

And if they try, we will all see with our own eyes how they go bankrupt in two years. They’ll be a lesson in the business textbooks, and OpenAI will make an additional $1 billion in revenue.

Mark my words.

Thank you for reading! I hope you enjoyed my focused rant on why AI agents aren’t the next best thing since sliced bread. If you’re looking for actual use-cases of AI, I gotchu!

I built NexusTrade, an AI-Powered algorithmic trading platform. NexusTrade helps retail investors create, test, and deploy algorithmic trading strategies and perform DEEP financial research.

I’m up 200% since incorporating NexusTrade strategies for my investing. Join today and see the difference AI can truly make when informing your investing decisions!

Leave a Reply

Your email address will not be published. Required fields are marked *

Top