How do I make agents respond faster?

AI agent response time is directly linked to the combined length of input and output text, measured in tokens for each transaction. For example, 100 tokens is roughly 75 words, though this ratio varies by model, writing style, and language. By being deliberate with your instructions, you ensure that your agents reason clearly and avoid unnecessary processing.

Here are some recommendations to help your agents respond faster.

Edit summarization prompts to include only essential instructions relevant to your specific use case. The default prompts for supervisor and workflow agents are broadly written to cover many scenarios, so remove any general or redundant sections to streamline processing and improve response speed.
Minimize the use of input and output tokens with concise prompts. Use retrieval-augmented generation (RAG) instead of overloading the context window, and set specific output length limits.
Specify response length to avoid overly verbose answers.
Use smaller, specialized agents that work in parallel and cache static instructions.
Include only relevant information. Remove unnecessary details from context and documents to streamline processing.
Consider a multi-agent approach, where specialized agents handle distinct tasks, such as research, coding, or Q&A, instead of loading all information into a single prompt.

By carefully selecting relevant context and breaking complex tasks into smaller, focused agents, you can improve both the speed and efficiency of your AI agents.