Yuval Avidani
Author
The Shift From Answering to Doing
The landscape of Artificial Intelligence is shifting rapidly. We are moving from the era of "Chatbots" - passive responders that wait for input and provide text outputs - to "Agents" - active systems that can plan, execute, and course-correct toward goals. This isn't just an incremental improvement. It's a fundamental change in what AI systems can accomplish.
This matters to all of us as developers because agents represent the next major platform shift. Just as mobile apps expanded what software could do beyond desktop applications, agents expand what AI can do beyond chat interfaces. The developers who understand how to build effective agent systems will have an outsized impact over the next 3-5 years.
What Actually Defines an Agent?
An agent isn't just an LLM with a nice prompt. It's an LLM embedded in a system with specific capabilities that enable autonomous operation:
1. Tools: The Ability to Take Action
Agents can interact with the world beyond generating text:
from langchain.tools import Tool
Database access
search_tool = Tool(
name="database_search",
func=lambda q: db.query(f"SELECT * FROM products WHERE name LIKE ?", [f"%{q}%"]),
description="Search the product database for items matching a query"
)
Web browsing
browse_tool = Tool(
name="web_browser",
func=lambda url: requests.get(url).text[:5000],
description="Fetch and return the contents of a webpage"
)
Code execution (sandboxed)
code_tool = Tool(
name="python_repl",
func=lambda code: sandbox.run(code), # Sandboxed execution
description="Execute Python code in a sandboxed environment and return the result"
)
External APIs
email_tool = Tool(
name="send_email",
func=lambda args: send_email(json.loads(args)),
description="Send an email. Args: {to: str, subject: str, body: str}"
)
These tools transform an LLM from "something that outputs text" to "something that can read databases, browse the web, run code, send emails, and interact with any API."
2. Planning: Breaking Down Complex Goals
Agents can decompose high-level objectives into actionable steps:
class PlanningAgent:
def plan(self, goal: str) -> list[str]:
prompt = f"""
Goal: {goal}
Break this goal into concrete, actionable steps.
Each step should be something I can verify as complete.
Order steps by dependency - earlier steps first.
Steps:
"""
response = self.llm.complete(prompt)
return self.parse_steps(response)
def execute_plan(self, goal: str):
steps = self.plan(goal)
results = []
for step in steps:
result = self.execute_step(step, results)
results.append(result)
# Re-plan if step failed or new information emerged
if result.requires_replanning:
remaining_steps = self.replan(goal, results)
steps = steps[:len(results)] + remaining_steps
return results
This planning capability transforms agents from "one-shot responders" to "goal-oriented systems" that can adapt as circumstances change.
3. Memory: Maintaining State Across Interactions
Agents remember what happened, what worked, and what didn't:
Short-term Memory (Working Context):
class WorkingMemory:
def __init__(self, max_tokens: int = 8000):
self.messages = []
self.max_tokens = max_tokens
def add(self, message: dict):
self.messages.append(message)
self._truncate_if_needed()
def _truncate_if_needed(self):
while self._count_tokens() > self.max_tokens:
# Remove oldest non-system messages
for i, msg in enumerate(self.messages):
if msg["role"] != "system":
self.messages.pop(i)
break
Long-term Memory (Persistent Knowledge):
class LongTermMemory:
def __init__(self, vector_db):
self.db = vector_db
def remember(self, content: str, metadata: dict):
embedding = self.embed(content)
self.db.upsert(embedding, content, metadata)
def recall(self, query: str, k: int = 5) -> list[str]:
query_embedding = self.embed(query)
results = self.db.search(query_embedding, k=k)
return [r.content for r in results]
Episodic Memory (Specific Experiences):
class EpisodicMemory:
def store_episode(self, task: str, actions: list, outcome: str, success: bool):
episode = {
"task": task,
"actions": actions,
"outcome": outcome,
"success": success,
"timestamp": datetime.now()
}
self.episodes.append(episode)
def retrieve_similar_episodes(self, task: str) -> list:
# Find past experiences with similar tasks
similar = self.search(task)
# Prioritize successful episodes
return sorted(similar, key=lambda e: e["success"], reverse=True)
> "The future of AI is not just about better models, but about better systems that use those models."
Key Frameworks and Their Approaches
We're seeing an explosion of frameworks for building agents, each with different philosophies:
LangChain / LangGraph
The industry standard for LLM orchestration. LangChain provides primitives (chains, tools, memory), while LangGraph adds graph-based workflows for complex multi-step agents:
from langgraph.graph import StateGraph
Define agent state
class AgentState(TypedDict):
messages: list
plan: list[str]
current_step: int
results: list
Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("planner", planning_node)
workflow.add_node("executor", execution_node)
workflow.add_node("validator", validation_node)
workflow.add_node("replanner", replanning_node)
workflow.add_edge("planner", "executor")
workflow.add_conditional_edges("executor", should_continue)
workflow.add_edge("validator", "replanner")
agent = workflow.compile()
Strengths: Mature ecosystem, extensive documentation, production-proven.
Weaknesses: Can feel over-abstracted for simple use cases, learning curve.
AutoGPT and Open Source Autonomous Agents
The open-source pioneer that demonstrated what fully autonomous agents could look like:
class AutoGPTStyle:
def run(self, goal: str):
while not self.goal_achieved(goal):
# Generate thoughts about current state
thoughts = self.think()
# Decide on action
action = self.decide(thoughts)
# Execute action
result = self.act(action)
# Reflect on outcome
self.reflect(result)
# Update goal progress
self.update_progress()
Strengths: Demonstrates autonomous agent architecture, good for learning.
Weaknesses: Often too autonomous (runs away), expensive token usage, reliability issues.
Claude Agent SDK / Anthropic's Approach
Anthropic's approach emphasizes tool use and computer control:
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[
{
"name": "computer",
"type": "computer_20241022",
"display_width_px": 1024,
"display_height_px": 768
},
{
"name": "bash",
"type": "bash_20241022"
}
],
messages=[{"role": "user", "content": "Open a web browser and search for..."}]
)
Strengths: Deep tool integration, computer use capability, reliable.
Weaknesses: Anthropic-specific, less ecosystem around it.
Microsoft Semantic Kernel
Microsoft's contender, designed for enterprise .NET integration:
var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion("gpt-4", apiKey)
.Build();
kernel.ImportPluginFromType();
kernel.ImportPluginFromType();
var result = await kernel.InvokePromptAsync(
"Schedule a meeting with John next week and send him an email about it"
);
Strengths: .NET native, enterprise features, Microsoft ecosystem integration.
Weaknesses: Less Python community support, smaller ecosystem.
The Mindset Shift: From Scripts to Boundaries
Building agents requires a fundamental shift in how we think about programming:
Traditional Programming
Explicit: every step defined
def process_order(order):
validate(order)
check_inventory(order.items)
calculate_total(order)
charge_payment(order.customer)
fulfill_order(order)
send_confirmation(order)
Agent-Based Approach
Declarative: define capabilities and constraints
agent = Agent(
goal="Process customer orders efficiently",
tools=[
inventory_check,
payment_processor,
fulfillment_system,
email_sender
],
constraints=[
"Never charge more than the calculated total",
"Always verify inventory before committing",
"Send confirmation within 5 minutes of successful order"
]
)
result = agent.run("Process order #12345")
You're no longer writing a script; you're defining a probabilistic system's boundaries. The agent decides how to achieve the goal within those boundaries, adapting to situations you didn't explicitly program.
Real-World Agent Architectures
ReAct Pattern (Reasoning + Acting)
The most common agent pattern, alternating between thinking and doing:
Thought: I need to find the customer's order history to process their refund.
Action: database_query("SELECT * FROM orders WHERE customer_id = 123")
Observation: Found 3 orders: #456 (completed), #457 (refunded), #458 (pending)
Thought: Order #458 is pending. I should check if this is the order they want refunded.
Action: send_message("Is order #458 the one you'd like refunded?")
Observation: Customer confirms yes.
Thought: I can now process the refund for order #458.
Action: process_refund(order_id=458)
Observation: Refund processed successfully.
Thought: I should confirm the refund with the customer.
Action: send_email(to="[email protected]", subject="Refund Confirmed"...)
Multi-Agent Systems
Complex tasks often benefit from multiple specialized agents:
class ResearchTeam:
def __init__(self):
self.researcher = Agent(
role="Research Analyst",
tools=[web_search, paper_search, database_query],
goal="Find comprehensive information on topics"
)
self.writer = Agent(
role="Technical Writer",
tools=[text_editor, diagram_generator],
goal="Create clear, accurate documentation"
)
self.reviewer = Agent(
role="Quality Reviewer",
tools=[fact_checker, grammar_checker],
goal="Ensure accuracy and quality of content"
)
def create_report(self, topic: str) -> str:
# Research phase
research = self.researcher.run(f"Research {topic}")
# Writing phase
draft = self.writer.run(f"Write report based on: {research}")
# Review phase
review = self.reviewer.run(f"Review and improve: {draft}")
return review.final_output
Hierarchical Agents
Orchestrator agents that delegate to specialist agents:
class OrchestratorAgent:
def __init__(self):
self.specialists = {
"coding": CodingAgent(),
"research": ResearchAgent(),
"writing": WritingAgent(),
"data": DataAnalysisAgent()
}
def route_task(self, task: str) -> Agent:
# Use LLM to classify task type
task_type = self.classify(task)
return self.specialists.get(task_type, self.default_agent)
def run(self, task: str):
agent = self.route_task(task)
return agent.run(task)
The Challenges We're Still Solving
Reliability and Consistency
Agents are probabilistic. The same task might succeed 95% of the time and fail mysteriously 5% of the time. Production systems need:
class RobustAgent:
def run_with_retries(self, task: str, max_retries: int = 3):
for attempt in range(max_retries):
try:
result = self.run(task)
if self.validate(result):
return result
except Exception as e:
self.log_failure(task, attempt, e)
# Adjust approach for retry
self.adjust_strategy(attempt)
return self.fallback(task)
Cost Management
Autonomous agents can consume thousands of tokens per task:
class CostAwareAgent:
def __init__(self, budget_per_task: float = 0.50):
self.budget = budget_per_task
self.spent = 0.0
def run(self, task: str):
while not self.complete and self.spent < self.budget:
tokens_used = self.step()
self.spent += self.calculate_cost(tokens_used)
if self.spent > self.budget * 0.8:
self.switch_to_cheaper_model()
Human-in-the-Loop
Most production agents need human oversight for critical decisions:
class SupervisedAgent:
def run(self, task: str):
plan = self.create_plan(task)
for step in plan:
if step.requires_approval:
approved = await self.request_human_approval(step)
if not approved:
step = self.get_alternative(step)
result = self.execute(step)
if result.confidence < 0.7:
await self.notify_human(result)
Where Agents Excel (and Where They Don't)
Great For:
- Research and synthesis: Gathering information from multiple sources
- Multi-step workflows: Tasks requiring several coordinated actions
- Adaptive processes: Situations where the path isn't known upfront
- Routine automation: Repetitive tasks with variation
Not Great For:
- High-stakes decisions: Where errors are costly or irreversible
- Real-time requirements: When latency matters (agents are slow)
- Simple queries: Overhead isn't worth it for single-turn responses
- Precise deterministic output: When exact reproduction is required
The Future: What's Coming
Agents as the New API
Instead of calling functions, you'll describe desired outcomes:
Today
response = openai.chat.completions.create(model="gpt-4", messages=[...])
Future
result = agent.achieve("Book a restaurant for 4 people Saturday night near downtown")
Persistent Agents
Agents that run continuously, monitoring and acting:
class PersistentAgent:
async def run_forever(self):
while True:
events = await self.monitor_environment()
for event in events:
if self.should_act(event):
await self.handle_event(event)
await asyncio.sleep(self.poll_interval)
Agent Marketplaces
Pre-built agents for specific domains - sales, legal, medical, engineering - that you configure rather than build:
from agent_marketplace import SalesAgent
agent = SalesAgent(
crm_connection=salesforce_creds,
email_connection=gmail_creds,
calendar_connection=google_calendar_creds,
personality="professional but friendly",
constraints=["Never discount more than 20%", "Always follow up within 24 hours"]
)
My Take: Start Building Now
In my opinion, agents are the most important development in applied AI since the transformer. They transform LLMs from impressive demos into genuine productivity tools. The transition from "AI that answers questions" to "AI that accomplishes tasks" is happening now.
If you're building AI-powered applications, you should be exploring agent architectures. Start simple - a single agent with 2-3 tools solving a specific problem. Learn the patterns (ReAct, planning, memory). Experience the failure modes (loops, hallucinations, cost overruns). Build intuition for what works.
The developers who deeply understand how to build reliable, cost-effective, human-supervised agent systems will be extraordinarily valuable over the next 5 years. The learning curve is steep but the payoff is substantial.
The future isn't just better chatbots. It's AI that actually gets things done.
Start building agents today.
