Cheap models are not always cheap once workflow debugging is counted

fgghyyfk · June 25

I run a small cross-border import business. My background is a bit mixed: I did frontend work years ago, then product/UED, and since 2014 I have mostly been dealing with operations, process, people and cost.

AI pulled me back into building internal tools this year.

The first thing I tried to build was not a toy chatbot. I wanted an internal workflow for operations: different product categories, different compliance rules, different memories, and several agents handing work to each other.

My early assumption was simple: use cheaper models wherever possible, and only pay for expensive models when absolutely necessary. In practice, that was too naive.

For small tasks, cheaper models worked well: category classification, forbidden-word checks, simple extraction, narrow input and narrow output. Those are easy to test and easy to roll back.

But when I used cheaper models for multi-agent workflows with external memory, things became expensive in a different way. Context from category A would leak into category B, handoffs between agents became messy, and I spent hours trying to debug whether the problem was the prompt, the memory layer, tool calls, retries, or the model itself.

One small GLM experiment also surprised me on cost. The lesson was not “GLM is expensive”; it was that uncontrolled context, retries and tool calls can make even a supposedly cheap setup painful.

My current rule is:

cheap models or relay providers for narrow, verifiable, low-risk tasks;
Claude/Codex-style tools for long-context, multi-agent, business-critical workflows;
never put core workflows on a provider path I cannot inspect, log, or replace.

The question I am still testing: where do you draw the line? At what point is saving tokens more expensive than just paying for the model that saves debugging time?

plumberg · June 25

Start saving by not wasting tokens to write this

itachikonoha · June 25

your question is subjective and also depends upon how much knowledge, experience you have with the tools that you will be using.

fgghyyfk · June 25

That’s fair. The more I understand the stack, the more I can push cheap models safely. When the failure mode is opaque, I usually end up paying for stability instead.

ScreenReader · June 25

this is why /plan is invented for your agentic gooning

dbadude · June 25

amen

stable_genius · June 25

You say little about the quality of the data you're feeding into AI, having relevant, complete and unpolluted data available is crucial to achieve good results.

The quality of your data sets the absolute ceiling for AI performance, if the data you're feeding into it is of dubious quality then the results you get will not be good no matter which model you use.

Howdy, Stranger!

Categories

In this Discussion

Cheap models are not always cheap once workflow debugging is counted

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Cheap models are not always cheap once workflow debugging is counted

Comments