How to choose a model for your agent
A practical, no hype guide to picking the right model for your Totebot agent: the everyday default, when to go cheaper, when to go stronger, and what each choice costs against your plan.
Every Totebot agent runs on an AI model, and you can switch it whenever you like. The choice changes three things at once: how well your agent thinks, how fast it replies, and how much of your plan each reply consumes.
There is no single best model. There is a best model for a given job at a given budget. This guide gives you a way to decide, grounded in the models Totebot actually offers and what each one costs against your message allowance.
If you want the short answer first: keep the default, GPT-5.4 Mini, or step up to GPT-5.1. Read on for when to move cheaper and when to move stronger.
How model choice works in Totebot
Before choosing, it helps to know what the setting does.
You pick a model per agent in the agent's settings, choosing from models by OpenAI, Anthropic, and Google. Three behaviors matter in practice:
- New agents start on GPT-5.4 Mini. It is a fast, low cost default that handles the common case well, so it is a sensible baseline.
- Switching is safe. Conversations that are already in progress finish on the model they started with. Only new conversations use your new choice, so you can change models without disrupting live chats.
- The Free plan offers a limited set of basic models. The selector shows what your plan can use. Paid plans unlock the full catalog, including the strongest models.
The message multiplier is your real cost lever
This is the part most people miss, and it matters more than anything else about a model.
Your plan gives you a pool of messages per month: 50 on Free, 1,000 on Basic, 6,000 on Pro, 20,000 on Scale. Every AI reply draws from that pool. How much it draws is the model's message multiplier.
- A multiplier of 1 means one reply costs one message from your pool.
- A multiplier of 2 means one reply costs two.
- The strongest models cost three or five.
So the same plan buys very different amounts of conversation depending on the model. On the Pro plan with 6,000 messages:
| Model multiplier | Replies per month |
|---|---|
| 1x | 6,000 |
| 2x | 3,000 |
| 3x | 2,000 |
| 5x | 1,200 |
The ratios are the same on every plan. A 5x model always gives you a fifth of the replies a 1x model would.
Tip: If you use up your pool, the agent pauses until the next month unless you enable the pay as you go add on, which bills extra replies at 4 cents per message. The multiplier still applies there: one extra reply from a 5x model costs 20 cents, while a 1x model costs 4. Picking a 1x model for routine traffic is the single biggest thing you can do to control cost.
The three forces you are balancing
Every model choice trades between three things, and you rarely get all three at once.
- Quality of thinking. Can the model follow multi step requests, look things up correctly, and stay coherent across a long conversation?
- Speed. How quickly does the reply arrive? Customers feel slowness directly, especially in a live web chat.
- Cost. How many messages does each reply consume, multiplied across thousands of conversations?
A useful way to hold this in your head: cheaper, faster models are excellent at the common case, and stronger models earn their multiplier on the hard case. Most agents see far more common cases than hard ones, which is why an everyday model wins for the majority.
The models, grouped by how you should think about them
Instead of ranking everything, sort the catalog into three groups by job. The selector in your dashboard shows each model's multiplier, so you always see the cost before you commit.
Everyday models (1x)
These are your defaults: strong all rounders at one message per reply. GPT-5.4 Mini, the platform default, lives here, alongside models like GPT-5.1, Claude Haiku, and the fast Gemini models. For most support and sales agents, an everyday model is the right choice, full stop. If your volume is very high and your questions are simple, the catalog also includes lighter 1x models tuned for speed.
Step up (2x)
When better reasoning starts to pay off but you do not need the absolute top, models like GPT-5.4 and Gemini 3.1 Pro step up for double the message cost. They hold longer conversations together better and make stronger recommendations.
Maximum (3x to 5x)
The Claude Sonnet models (3x), and the Claude Opus models and GPT-5.5 (5x), are the most capable available. Reserve them for agents that genuinely need deep thinking, because they consume your pool fastest. Use them where a wrong answer is expensive, not for shipping questions.
Match the model to the job
The groups tell you what things cost. The job tells you what you need.
High volume support with clear answers
If most conversations are shipping questions, returns, and product lookups, the answers already live in your knowledge base. The model's job is to find and phrase them well, not to think deeply.
Use a 1x everyday model. You get quick replies and the lowest cost on exactly the conversations you have most of.
Consultative sales
If your agent guides customers through considered purchases, compares options, and handles objections, quality pays off. A stronger model holds the thread of a longer conversation and makes better recommendations.
An everyday model covers most of this. If you sell high value items where a better recommendation clearly drives revenue, a 2x model is worth it, because each conversation is worth more.
Complex troubleshooting and edge cases
If your agent diagnoses problems or walks customers through tricky multi step help, the maximum group earns its multiplier. It makes fewer mistakes on long, complicated conversations, and a mistake there costs customer trust.
Images and attachments
If customers send photos or PDFs, confirm attachments are enabled on your plan (Basic and up), and lean toward the flagship models, which handle images most reliably.
Pick by plan
Your plan shapes the sensible choice, because it sets your message pool.
- Free (50 messages): you choose from a limited set of basic models, all at 1x. That keeps every reply cheap while you validate the agent; it is not meant for production volume.
- Basic (1,000 messages): a 1x everyday model. That is up to 1,000 replies. If you reach for stronger reasoning, remember a 3x model drops you to about 333.
- Pro (6,000 messages): an everyday model for the main agent, with room to run a second agent on a stronger model for a specialized job.
- Scale (20,000 messages): you have the most headroom. Run an everyday model for volume and reserve a stronger model for the conversations that deserve it.
A simple decision process
If you want a starting point rather than a study:
- Start with the default, or GPT-5.1 if you want a touch more capability at the same cost.
- Run it for a week and read the conversations.
- Separate the failures. If the agent gives wrong answers because knowledge is missing, fix the knowledge, not the model. A bigger model cannot know facts you never gave it.
- If the agent has the right knowledge but still struggles on your hard conversations, move up one group and compare.
- If your conversations are simple and you never see hard cases, confirm a light 1x model holds quality and enjoy the speed.
The mistake to avoid is reaching for the most powerful model first. It feels safe, but it multiplies your cost on every message, including the thousands of simple ones, to buy thinking you only need on a few.
Test the way your customers behave
Whatever you choose, validate it against real behavior, not a few clever prompts. Open the live preview and run the conversations your customers actually have:
- Your three most common questions
- One question that requires combining two pieces of knowledge
- One question you have no knowledge for, to see how it handles not knowing
- One long, meandering conversation that changes topic partway through
Run these on two candidate models and compare. Because ongoing conversations keep the model they started with, you can switch the agent and test freely without touching live chats. The right model is usually obvious within ten minutes of honest testing.
The short version
Start with GPT-5.4 Mini, the default, or GPT-5.1 as your everyday model. Step up to a 2x model for high value sales, and reserve the 3x and 5x models for genuinely hard problems, remembering that a 5x model spends your plan five times as fast. Then let real conversations, not hype, settle the question.
Build your own agent
Go from sign up to a live agent in an afternoon.
Keep reading
6 ways teams use the Totebot Help Desk
The AI handles most conversations on its own. The Help Desk is for the ones it should not finish alone. Here are six situations where a shared team inbox earns its place, and exactly what happens in each.
Deflection vs revenue: support is your most underrated sales channel
Most teams grade AI support by tickets deflected. That metric hides the bigger opportunity: every support question is a buying signal. Here is how to treat support as a sales channel.