I'm building a simple agent accessible over SMS for a family member. One of their use cases is finding recipes. A problem I ran into was that doing a web search for recipes would pull tons of web pages into the context, effectively clobbering the system prompt that told the agent to format responses in a manner suited for SMS. I solved this by creating a recipe tool that uses a sub-agent to do the web search and return the most promising recipe to the main agent. When the main agent uses this tool instead of performing the web search itself, it is successfully able to follow the system prompt's directions to format and trim the recipe for SMS. Using this sub-agent to prevent information from entering the context dramatically improved the quality of responses. More context is not always better!
I bring this up because this article discusses context management mostly in terms of context windows having a maximum size. I think that context management is far more than that. I'm still new to this building agents thing, but my experience suggests that context problems start cropping up well before the context window fills up.
edoceo 6 days ago [-]
Are you in USA? How to get around those 10DLC limits on typical SMS/API things (eg Twilio). Or did you go through that process (which seems a lot for a private use-case)
colonCapitalDee 6 days ago [-]
I am in the USA! Although these days that exclamation point doesn't feel great...
I'm using an old Android phone (Pixel 2 from 2017), a 5$ a month unlimited SMS plan from Tello, and https://github.com/capcom6/android-sms-gateway. For bonus points (I wanted to roll my own security, route messages from different numbers to prod and ppe instances of my backend, and dedup messages) I built a little service in Go that acts as an intermediary between my backend and android-sms-gateway. I deploy this service to my android device using ADB, android-sms-gateway talks to it, and it talks to my backend. I also rooted the android device so I could disable battery management for all apps (don't do this if you want to walk around with the phone of course). It works pretty well!
I plan to open-source this eventually TM, but first I need to decouple my personal deployment infra from the bits useful to everyone else
NamPrelude 6 days ago [-]
[dead]
_0ffh 6 days ago [-]
You mean sub-agent as in the formatting agent calls on the the search-and-filter agent? In that case you might just make a pipeline. Use a search agent, then a filter agent (or maybe only one search-and-filter agent), then a formatting agent. Lots of tasks work better with a fixed pipeline than with freely communicating agents.
tfirst 6 days ago [-]
The article addresses this specific use under the 'Claude Code Subagents' section.
> The benefit of having a subagent in this case is that all the subagent’s investigative work does not need to remain in the history of the main agent, allowing for longer traces before running out of context.
jauntywundrkind 6 days ago [-]
This very narrow very specific single-purpose task-oriented subagent was one of the first things talked about in this every lovely recent & popular submission (along with other fun to read points):
Hey, thanks for linking this. That was super informative
6 days ago [-]
faangguyindia 6 days ago [-]
History is nothing just list of dict containing role and messages? You can evict any entry at any point.
sitkack 6 days ago [-]
The large models have all the recipes memorized, you don't need to do a search.
simianwords 6 days ago [-]
Why are you reinventing the wheel? Just use gpt api with search turned on.
knlam 6 days ago [-]
You want to use multiple providers, so if I am not happy with result from gpt, I can switch to perplexity or something else. The power of plug and play is very powerful when you are building agent/subagent systems
simianwords 6 days ago [-]
Each LLM provider already has search built in that you can turn on.
Claude, Gemini, Grok and GPT all of them have this. I get your point about being able to plug and play but all these providers come with the functionality built in.
faangguyindia 6 days ago [-]
There’s both “no multi-agent system” and “multi-agent system,” depending on how you look at it. In reality, you’re always hitting the same /chat/completion API, itself has no awareness of any agents. Any notion of an agent comes purely from the context and instructions you provide.
Separating agents, has a clear advantage. For example, suppose you have a coding agent with a set of rules for safely editing code. Then you also have a code search task, which requires a completely different set of rules. If you try to combine 50 rules for code editing with 50 rules for code searching, the AI can easily get confused.
It’s much more effective to delegate the search task to a search agent and the coding task to a code agent. Think of it this way: when you need to switch how you approach a problem, it helps to switch to a different “agent”, a different mindset with rules tailored for that specific task.
Do i need to think differently about this problem? if yes, you need a different agent!
So yes, conceptually, using separate agents for separate tasks is the better approach.
datadrivenangel 6 days ago [-]
Calling a different prompt template an 'agent' doesn't help communicate meaningful details about an overall system design. Unnecessary verbiage or abstraction in this case.
adastra22 6 days ago [-]
It is what it is. That ship has sailed.
eab- 6 days ago [-]
There's both "no multi-program system" and "multi-program system", depending on how you look at it. In reality, you're always executing the same machine code, itself has no awareness of programs.
stirfish 6 days ago [-]
This unironically helped me work through a bug just now
jmull 6 days ago [-]
> By using React, you embrace building applications with a pattern of reactivity and modularity, which people now accept to be a standard requirement, but this was not always obvious to early web developers.
This is quite a whopper. For one thing, the web started off reactive. It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way (though, I'm sorry, IMO that doesn't actually include react). Second, "modularity" has been a thing for quite some time before the web existed. (If you want to get down to it, separating and organizing and your processes in information systems predate computers.)
pmontra 6 days ago [-]
> the web started off reactive
I was there but I didn't notice reactivity. Maybe we are using two different definitions of reactivity. Do you care to elaborate?
I agree that "It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way".
5 days ago [-]
CuriouslyC 6 days ago [-]
We're in the context engineering stone age. You the engineer shouldn't be trying to curate context, you should be building context optimization/curation engines. You shouldn't be passing agents context like messages, they should share a single knowledge store with the parent, and the context optimizer should just optimally pack their context for the task description.
hansvm 6 days ago [-]
You're not wrong. This is just a storage/retrieval problem. But ... the current systems have limits. If you want commercial success in <3yrs, are any of those ideas remotely viable?
CuriouslyC 6 days ago [-]
Oh yeah, and if you tried to do one now it'd be a bad idea because I'm almost done :)
The agentic revolution is very different from the chatbot/model revolution because agents aren't a model problem, they're a tools/systems/process problem. Honestly the models we have now are very close to good enough for autonomous engineering, but people aren't giving them the right tools, the right processes, we aren't orchestrating them correctly, most people have no idea how to benchmark them to tune them, etc. It's a new discipline and it's very much in its infancy.
adastra22 6 days ago [-]
Dude, ChatGPT isn’t even 3 years old. That’s an eternity.
pglevy 6 days ago [-]
Not an engineer but I think this is where my mind was going after reading the post. Seems like what will be useful is continuously generated "decision documentation." So the system has access to what has come before in a dynamic way. (Like some mix of RAG with knowledge graph + MCP?) Maybe even pre-outlining "decisions to be made," so if an agent is checking in, it could see there is something that needs to be figured out but hasn't been done yet.
CuriouslyC 6 days ago [-]
I actually have a "LLM as a judge" loop on all my codebases. I have an architecture panel that debates improvements given an optimization metric and convergence criteria and I feed their findings into a deterministic spec generator (cue /w validation) that can emit unit/e2e tests, scaffold terraform. It's pretty magical.
This cue spec gets decomposed into individual tasks by an orchestrator that does research per ticket and bundles it.
adastra22 6 days ago [-]
I think we are all building the same thing. If only there was an open source framework for aggregating all our work together.
kordlessagain 6 days ago [-]
Great insight!
*We're hand-crafting context like medieval scribes when we should be building context compilers.*
nickreese 6 days ago [-]
Is there a framework for this?
CuriouslyC 6 days ago [-]
I have one that's currently still cooking, I have good experimental validation for it but I need to go back and tune the latency and improve the install story. It should help any model quite a bit but you have to hack other agents to integrate it into their api call machinery, I have a custom agent I've built that makes it easy to inject though.
The best one is google ADK, I must say they are quite thoughful of all the use cases
CuriouslyC 5 days ago [-]
ADK is a nice framework but it's still stuck in the agent as atomic chatbot that goes out and does stuff model. The reality is you want your agents to all be running within an orchestrator service because it's much more efficient in pretty much every way. The way you separate concerns here is to have the agent emit intents, and have those intents go to a queue to be acted upon safely and securely by executors. This system is more secure, performant and observable than the ADK setup.
sippeangelo 6 days ago [-]
"should just"?
CuriouslyC 6 days ago [-]
It's really not hard. It's just all the IR/optimization machinery we already have applied to a shared context tree with locality bias.
curl-up 6 days ago [-]
In the context compression approach, why aren't the agents labeled as subagents instead? The compressed context is basically a "subtask".
This is my main issue with all these agentic frameworks - they always conviniently forget that there is nothing "individual" about the thing they label "an agent" and draw a box around.
Such "on demand" agents, spawned directly from previos LLM output, are never in any way substantially different from dynamic context compression/filtering.
I think the only sensible framework is to think in terms of tools, with clear interfaces, and a single "agent" (single linear interaction chain) using those tools towards a goal. Such tools could be LLM-based or not. Forcing a distinction between a "function tool" and an "agent that does somethng" doesn't make sense.
jskalc92 6 days ago [-]
I think the most common implementation of "subagents" doesn't get full context of a conversation, rather just an AI-generated command.
Here task is fullfilled with the full context so far, and then compressed. Might work better IMO.
adastra22 6 days ago [-]
In my experience it does not work better. There are two context related values to subagent tool calls: (1) the subagent trials and deliberations don’t poison the callers context [this is a win here]; and (2) the called agent isn’t unduly influenced by the caller’s context. [problem!]
The latter is really helpful for getting a coding assistant to settle on a high quality solution. You want critic subagents to give fresh and unbiased feedback, and not be influenced by arbitrary decisions made so far. This is a good thing, but inheriting context destroys it.
peab 6 days ago [-]
Yeah, i agree with thinking of things as a single agent + tools.
From the perspective of the agent, whether the tools are deterministic functions, or agents themselves, is irrelevant.
OutOfHere 6 days ago [-]
I think one can simplify it further. There is no agent; it's just tools all the way.
The caveat is that multiple tools must be able to run asynchronously.
Context compression of one's own lengthy context is also useful and necessary as the message chain starts to get too large.
sputknick 6 days ago [-]
This is very similar to the conclusion I have been coming to over the past 6 months. Agents are like really unreliable employees, that you have to supervise, and correct so often that its a waste of time to delegate to them. The approach I'm trying to develop for myself is much more human centric. For now I just directly supervise all actions done by an AI, but I would like to move to something like this: https://github.com/langchain-ai/agent-inbox where I as the human am the conductor of work agents do, then check in with me for further instructions or correction.
worik 6 days ago [-]
> Agents are like really unreliable employees,
Yes
But employees (should) get better over time, they learn from me
mreid 6 days ago [-]
Is it concerning to anyone else that the "Simple & Reliable" and "Reliable on Longer Tasks" diagrams look kind of like the much maligned waterfall design process?
worik 6 days ago [-]
One reason it is concerning.
I am mostly worried that I am wrong, in my opinion, that "agents" is a bad paradigm for working with LLMs
I have been using LLMs since I got my first Open AI API key, I think "human in the loop" is what makes them special
I have massively increased my fun, and significantly increased my productivity using just the raw chat interface.
It seems to me that building agents to do work that I am responsible for is the opposite of fun and a productivity sink as I correct the rare, but must check for it, bananas mistakes these agents inevitably make
adastra22 6 days ago [-]
The thing is, the same agent that made the bananas mistake is also quite good at catching that mistake (if called again with fresh context). This results in convergence on working, non-bananas solutions.
worik 4 days ago [-]
Look up The Old Lady who Swallowed a Fly. Or The King, the Mice and his Cheese
What you propose makes things worse, not better
LLMs are magnificent tools, but there needs to be a human hand holding them.
Nothing I have seen anywhere, yet, challenges my view that "agents" will not be a good idea until we have better technology, that there is no sign of yet (?), than LLMs.
CuriouslyC 6 days ago [-]
Waterfall is just a better process with agents. Agile is garbage when inserting yourself in the loop causes the system to drop to 10% velocity.
amelius 6 days ago [-]
It looks more like alchemy, thb.
DarkNova6 6 days ago [-]
To me it seems more like the typical trap of a misfit bounded context.
adastra22 6 days ago [-]
> As of June 2025, Claude Code is an example of an agent that spawns subtasks. However, it never does work in parallel with the subtask agent, and the subtask agent is usually only tasked with answering a question, not writing any code.
Has this changed since June? Because I’ve been experimenting over the last month with Claude Cide subagents that work in parallel and agents which write code (doing both simultaneously is inadvisable for obvious reasons, at least without workspace separation).
clbrmbr 6 days ago [-]
I’ve been quite successful since June doing parallel edits just on different components within the same codebase. But I’ve not been able to do it with “auto-accept” because I need a way to course correct if one of the agents goes off the rails.
kikuska 1 days ago [-]
Forgive me I can't figure out how to reply on the original thread but I saw you used a script to build an Anki deck with the native plant species in your area. I'd love to do the same but I've never coded in my life and doing it manually would take forever. Would you be willing to share how this is done? Thanks!
clbrmbr 21 hours ago [-]
Ooo so glad you asked. That's something I am super passionate about. I built those decks during the COVID lockdowns when I needed to connect with nature a bit more. Those Anki decks were incredibly efficient for learning to recognize many plants and mushroom species!
I actually did very little coding. It was more of like just some light scripting and today with LLMs you should be able to figure it out even without coding experience.
Here's how I did it:
1. I fetched information on the 500 most commonly observed plant species in my area (Northeastern USA and Canada) on iNaturalist. I think I did this just by downloading the web pages with wget. inside of those downloaded files were the common and scientific name of the species as well as a link to the photo.
2. I then wrote a very simple script to download all of those images.
3. Finally, I had a script that I think just created a CSV file that I was able to load into Anki and I created my template, making sure to have a notes section so that as I'm studying, I add interesting facts, such as the genus Eupatorium being named after an emperor infamous for his use of poisons [1].
A note on copyright: Although I would have liked to have a deck that I could freely share with people on the internet, I really wanted to have the best possible photos to help with identification. So I chose to use the primary photograph associated with each species on iNaturalist and most of these are copyrighted without any Creative Commons license. So it's totally fair use to use it for your own edification, but I'm unable to share one of these decks with you directly without getting permission from hundreds of iNat contributors.
Yay this is awesome thank you!! I will try this out :) cheers for the detailed guide, had no idea about wget - that will help a ton!
kordlessagain 6 days ago [-]
I wrote something that watches the directories and started working on a node graph to visualize relationships between changes and dates things like that: https://github.com/kordless/gnosis-flow. It that is useful to you let me know!
adastra22 6 days ago [-]
Isn’t it easier to just write scripts to launch clause in different worktrees?
If you actually read the claude article it says the same things as the cognition article, it just has a different definition of multi-agent.
photochemsyn 6 days ago [-]
Don't hide your content from people using NoScript, how about that for starters...
And oh great, another Peter Thiel company booted to the top of HN, really?
> "Cognition AI, Inc. (also known as Cognition Labs), doing business as Cognition, is an artificial intelligence (AI) company headquartered in San Francisco in the US State of California. The company developed Devin AI, an AI software developer...Originally, the company was focused on cryptocurrency, before moving to AI as it became a trend in Silicon Valley following the release of ChatGPT...
With regards to fundraising, the company was backed by Peter Thiel's Founders Fund which provided $21 million of funding to it in early 2024, valuing the company at $350 million.[2] In April 2024, Founders Fund led a $175 million investment into Cognition valuing the company at $2 billion making it a Unicorn."
The bubble's gonna pop, and you'll have so much egg on your face. This stuff is just compilers with extra compute and who got rich off compilers? VC people...
CuriouslyC 6 days ago [-]
Counterpoint, we'll crack autonomous coding within ~3 years, and with increases in token efficiency the speed of software development is going to go crazy. Project managers and salespeople are gonna take over big tech.
pellmellism 6 days ago [-]
are we calling customers project managers now?
6 days ago [-]
ramesh31 6 days ago [-]
Don't listen to anyone who tells you how to build an agent. This stuff has never existed before in the history of the world, and literally everyone is just figuring it out as we go. Work from the simplest basic building blocks possible and do what works for your use case. Eventually things will be figured out, and you can worry about "best practices" then. But it's all just conjecture right now.
OutOfHere 6 days ago [-]
I would go further and consider that an agent is an unnecessary concept. "Asynchronous tool calling, with context compression" is more on point for solving complex problems.
pbd 6 days ago [-]
The 'dilution effect' is real - even with plenty of context space left, agents start losing track of their original goals around the 50k token mark. It's not just about fitting information in, it's about maintaining coherent reasoning chains. Single agents with good prompt engineering often outperform elaborate multi-agent orchestrations.
wewtyflakes 6 days ago [-]
This resonates heavily with our experience. We ended up using one agent + actively managed context, with the smartness baked into how we manage that context for that one agent, rather than attempting to manage expectations/context across a team of agents.
skybrian 6 days ago [-]
Restricting output from a subagent (not allowing arbitrary strings anywhere in the output) seems like a way to minimize the risks of prompt injection attacks, though? Sometimes you only need to get a boolean or an enum back from a query.
nextworddev 6 days ago [-]
Anecdotally Devin has been one of the worst coding agents I tried, to the point where I didn’t even bother asking for my unused credits to be refunded. That was 2 months ago, so things may have changed.
Der_Einzige 6 days ago [-]
If you don't use the phrase "structured generation" or "constrained generation" in your discussion of how to build Agents, you're doing it wrong.
pengli1707 6 days ago [-]
It's not about ""do not build multi-agents" it's about "do not build paralleled multi-agents"
skissane 6 days ago [-]
> Principle 1: Share context, and share full agent traces, not just individual messages
I was playing around with this task: give a prompt to a low-end model, get the response, and then get the higher-end model to evaluate the quality of the response.
And one thing I've noticed, is while sometimes the higher-end model detects when the low-end model is misinterpreting the prompt (e.g. it blatantly didn't understand some aspect of it and just hallucinated), it still often allows itself be controlled by the low-end model's framing... e.g. if the low-end model takes a negative attitude to an ambiguous text, the high-end model will propose moderating the negativity... but the thing it doesn't realise, is if given the prompt without the low-end model's response, it might not have adopted that negative attitude at all.
So one idea I had... a tool which enables the LLM to get its own "first impression" of a text... so it can give itself the prompt, and see how it would react to it without the framing of the other model's response, and then use that as additional input into its evaluation...
So this is an important point this post doesn't seem to understand – sometimes less is more, sometimes leaving stuff out of the context is more useful than putting it in
> It turns out subagent 1 actually mistook your subtask and started building a background that looks like Super Mario Bros. Subagent 2 built you a bird, but it doesn’t look like a game asset and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications
It seems to me there is another way to handle this... allow the final agent to go back to the subagent and say "hey, you did the wrong thing, this is what you did wrong, please try again"... maybe with a few iterations it will get it right... at some point, you need to limit the iterations to stop an endless loop, and either the final agent does what it can with a flawed response, or escalate to a human for manual intervention (even the human intervention can be a long-running tool...)
avereveard 6 days ago [-]
"I designed a bad system so all system of these class must be bad"
They really handing out ai domain to anyone these days.
TuringNYC 6 days ago [-]
What software are you using to create these beautiful diagrams in the article?
6 days ago [-]
behnamoh 6 days ago [-]
How is this fundamentally any different than Erlang/Elixir concepts of supervisors controlling their child processes? It seems like the AI industry keeps re-discovering several basic techniques that have been around since the 80s.
I'm not surprised—most AI "engineers" are not really good software engineers; they're often "vibe engineers" who don't read academic papers on the subject and keep re-inventing the wheel.
If someone asked me why I think there's an AI bubble, I'd point exactly to this situation.
madrox 6 days ago [-]
Software engineering in general is pretty famous for unironically being disdainful of anything old while simultaneously reinventing the past. This new wave is nothing new in that regard.
I'm not sure that means the people who do this aren't good engineers, though. If someone rediscovers something in practice rather than through learning theory, does that make them bad at something, or simply inexperienced? I think it's one of the strengths of the profession that there isn't a singular path to reach the height of the field.
ramchip 6 days ago [-]
I've done a lot of Erlang and I don't see the relation? Supervisors are an error isolation tool, they don't perform the work, break it down, combine results, or act as a communication channel. It's kind of the point that supervisors don't do much so they can be trusted to be reliable.
jll29 6 days ago [-]
yes, people re-discover stuff, mostly beacause no-one reads older papers. I also thought of Erlang and OAA.
In the early 2000s, we used Open Agent Architecture (OAA) [1], which had a beautiful (declarative) Prolog-like notation for writing goals, and the framework would pick & combine the right agents (all written in different languages, but implementing the OAA interface through proxy libraries) to achieve the specified goals.
This was all on boxes within the same LAN, but conceptually, this could have been generalized.
I blame managers that get all giddy about reducing head count. Sure this year, you get a -1% on developer time (seniors believe they get a 20% increase when its really a decrease for using AI)
But then next year and the year after, the technical debt will be to the point where they just need to throw out the code and start fresh.
Then the head count must go up. Typical short term gains for long term losses/bankruptcy
antonvs 6 days ago [-]
> seniors believe they get a 20% increase when its really a decrease for using AI
There’s no good evidence to support that claim. Just one study which looked at people with minimal AI experience. Essentially, the study found that effective use of AI has a learning curve.
WalterSear 6 days ago [-]
Apart from requiring entirely the opposite solution?
With respect, if there's an AI bubble, I can't see it for all the sour grapes, every time it's brought up, anywhere.
antonvs 6 days ago [-]
A lot of it seems to be resistance to change. People are afraid their skillset may be losing relevance, and instead of making any effort to adapt, they try to resist the change.
WalterSear 6 days ago [-]
I suspect there's more to it than that. Some people are sprinting with this stuff, but it seems that many more are bouncing off it, bruised. It's a tell. Something is different.
It's an entirely new way of thinking, nobody is telling you the rules of the game. Everything that didn't work last month works this month, and everything you learned two months ago, you need to throw away. Coding assistants are inscrutable, overwhelming and bristling with sharp edges. It's easier than ever to paint yourself into a corner.
Back when it took weeks to put out a feature, you were insulated from the consequences of bad architecture, coding and communication skills: by the time things get bad enough to be noticed, the work had been done months ago and everyone on the team had touched the code. Now you can seeing the consequences of poor planning, poor skills, poor articulation being run to their logical conclusion in an afternoon.
I'm sure there are more reasons.
ggandhi 6 days ago [-]
"It is now 2025 and React (and its descendants) dominates the way developers build sites and apps." Is there any research which tells react is dominating or most of the internet is not vanilla HTML but react?
CityOfThrowaway 6 days ago [-]
There's two ways to answer this:
1. On one hand, walled gardens like Facebook, Instagram, YouTube, etc, are most of the internet and they decidedly use React (or similar) frameworks. So from that perspective, the statement is sorta trivially true.
2. There may well be a horde of websites that are pure HTML rendering. But, those sites are not largely being developed by developers – they are being generated by platforms (like Squarespace, etc.) so are out of the scope of "sites and apps built by developers"
All this is stated without data, of course.
jwpapi 6 days ago [-]
Why is this article on top of HN? this is nothing breaking/new/interesting/astonishing..
The principles are super basic to get the first time you build an agent.
The real problem is to get reliability. If you have reliability and clear defined input and output you can easily go parallel.
THis seems like a bad 5th class homework
makk 6 days ago [-]
It doesn’t feel like news, yeah.
I would emphasize, though, that getting clearly defined input _that remains stable_ is hard. Often something is discovered during implementation of a task that informs changes in other task definitions. A parallel system has to deal with this or the results of the parallel tasks diverge.
ramchip 6 days ago [-]
Personally I found the article informative and well-written. I had been wondering for a while why Claude Code didn't more aggressively use sub-agents to split work, and it wasn't obvious to me (I don't build agents for a living).
clbrmbr 6 days ago [-]
Perhaps because author is perhaps the most prominent agent builder outside of the big labs.
I bring this up because this article discusses context management mostly in terms of context windows having a maximum size. I think that context management is far more than that. I'm still new to this building agents thing, but my experience suggests that context problems start cropping up well before the context window fills up.
I'm using an old Android phone (Pixel 2 from 2017), a 5$ a month unlimited SMS plan from Tello, and https://github.com/capcom6/android-sms-gateway. For bonus points (I wanted to roll my own security, route messages from different numbers to prod and ppe instances of my backend, and dedup messages) I built a little service in Go that acts as an intermediary between my backend and android-sms-gateway. I deploy this service to my android device using ADB, android-sms-gateway talks to it, and it talks to my backend. I also rooted the android device so I could disable battery management for all apps (don't do this if you want to walk around with the phone of course). It works pretty well!
I plan to open-source this eventually TM, but first I need to decouple my personal deployment infra from the bits useful to everyone else
> The benefit of having a subagent in this case is that all the subagent’s investigative work does not need to remain in the history of the main agent, allowing for longer traces before running out of context.
What makes Claude Code so damn good (and how to recreate that magic in your agent)!? https://minusx.ai/blog/decoding-claude-code/ https://news.ycombinator.com/item?id=44998295
Separating agents, has a clear advantage. For example, suppose you have a coding agent with a set of rules for safely editing code. Then you also have a code search task, which requires a completely different set of rules. If you try to combine 50 rules for code editing with 50 rules for code searching, the AI can easily get confused.
It’s much more effective to delegate the search task to a search agent and the coding task to a code agent. Think of it this way: when you need to switch how you approach a problem, it helps to switch to a different “agent”, a different mindset with rules tailored for that specific task.
Do i need to think differently about this problem? if yes, you need a different agent!
So yes, conceptually, using separate agents for separate tasks is the better approach.
This is quite a whopper. For one thing, the web started off reactive. It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way (though, I'm sorry, IMO that doesn't actually include react). Second, "modularity" has been a thing for quite some time before the web existed. (If you want to get down to it, separating and organizing and your processes in information systems predate computers.)
I was there but I didn't notice reactivity. Maybe we are using two different definitions of reactivity. Do you care to elaborate?
I agree that "It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way".
The agentic revolution is very different from the chatbot/model revolution because agents aren't a model problem, they're a tools/systems/process problem. Honestly the models we have now are very close to good enough for autonomous engineering, but people aren't giving them the right tools, the right processes, we aren't orchestrating them correctly, most people have no idea how to benchmark them to tune them, etc. It's a new discipline and it's very much in its infancy.
This cue spec gets decomposed into individual tasks by an orchestrator that does research per ticket and bundles it.
*We're hand-crafting context like medieval scribes when we should be building context compilers.*
This is my main issue with all these agentic frameworks - they always conviniently forget that there is nothing "individual" about the thing they label "an agent" and draw a box around.
Such "on demand" agents, spawned directly from previos LLM output, are never in any way substantially different from dynamic context compression/filtering.
I think the only sensible framework is to think in terms of tools, with clear interfaces, and a single "agent" (single linear interaction chain) using those tools towards a goal. Such tools could be LLM-based or not. Forcing a distinction between a "function tool" and an "agent that does somethng" doesn't make sense.
Here task is fullfilled with the full context so far, and then compressed. Might work better IMO.
The latter is really helpful for getting a coding assistant to settle on a high quality solution. You want critic subagents to give fresh and unbiased feedback, and not be influenced by arbitrary decisions made so far. This is a good thing, but inheriting context destroys it.
From the perspective of the agent, whether the tools are deterministic functions, or agents themselves, is irrelevant.
The caveat is that multiple tools must be able to run asynchronously.
Context compression of one's own lengthy context is also useful and necessary as the message chain starts to get too large.
Yes
But employees (should) get better over time, they learn from me
I am mostly worried that I am wrong, in my opinion, that "agents" is a bad paradigm for working with LLMs
I have been using LLMs since I got my first Open AI API key, I think "human in the loop" is what makes them special
I have massively increased my fun, and significantly increased my productivity using just the raw chat interface.
It seems to me that building agents to do work that I am responsible for is the opposite of fun and a productivity sink as I correct the rare, but must check for it, bananas mistakes these agents inevitably make
What you propose makes things worse, not better
LLMs are magnificent tools, but there needs to be a human hand holding them.
Nothing I have seen anywhere, yet, challenges my view that "agents" will not be a good idea until we have better technology, that there is no sign of yet (?), than LLMs.
Has this changed since June? Because I’ve been experimenting over the last month with Claude Cide subagents that work in parallel and agents which write code (doing both simultaneously is inadvisable for obvious reasons, at least without workspace separation).
I actually did very little coding. It was more of like just some light scripting and today with LLMs you should be able to figure it out even without coding experience.
Here's how I did it:
1. I fetched information on the 500 most commonly observed plant species in my area (Northeastern USA and Canada) on iNaturalist. I think I did this just by downloading the web pages with wget. inside of those downloaded files were the common and scientific name of the species as well as a link to the photo.
2. I then wrote a very simple script to download all of those images.
3. Finally, I had a script that I think just created a CSV file that I was able to load into Anki and I created my template, making sure to have a notes section so that as I'm studying, I add interesting facts, such as the genus Eupatorium being named after an emperor infamous for his use of poisons [1].
A note on copyright: Although I would have liked to have a deck that I could freely share with people on the internet, I really wanted to have the best possible photos to help with identification. So I chose to use the primary photograph associated with each species on iNaturalist and most of these are copyrighted without any Creative Commons license. So it's totally fair use to use it for your own edification, but I'm unable to share one of these decks with you directly without getting permission from hundreds of iNat contributors.
[1] https://en.wikipedia.org/wiki/Mithridates_VI_Eupator
And oh great, another Peter Thiel company booted to the top of HN, really?
> "Cognition AI, Inc. (also known as Cognition Labs), doing business as Cognition, is an artificial intelligence (AI) company headquartered in San Francisco in the US State of California. The company developed Devin AI, an AI software developer...Originally, the company was focused on cryptocurrency, before moving to AI as it became a trend in Silicon Valley following the release of ChatGPT... With regards to fundraising, the company was backed by Peter Thiel's Founders Fund which provided $21 million of funding to it in early 2024, valuing the company at $350 million.[2] In April 2024, Founders Fund led a $175 million investment into Cognition valuing the company at $2 billion making it a Unicorn."
The bubble's gonna pop, and you'll have so much egg on your face. This stuff is just compilers with extra compute and who got rich off compilers? VC people...
I was playing around with this task: give a prompt to a low-end model, get the response, and then get the higher-end model to evaluate the quality of the response.
And one thing I've noticed, is while sometimes the higher-end model detects when the low-end model is misinterpreting the prompt (e.g. it blatantly didn't understand some aspect of it and just hallucinated), it still often allows itself be controlled by the low-end model's framing... e.g. if the low-end model takes a negative attitude to an ambiguous text, the high-end model will propose moderating the negativity... but the thing it doesn't realise, is if given the prompt without the low-end model's response, it might not have adopted that negative attitude at all.
So one idea I had... a tool which enables the LLM to get its own "first impression" of a text... so it can give itself the prompt, and see how it would react to it without the framing of the other model's response, and then use that as additional input into its evaluation...
So this is an important point this post doesn't seem to understand – sometimes less is more, sometimes leaving stuff out of the context is more useful than putting it in
> It turns out subagent 1 actually mistook your subtask and started building a background that looks like Super Mario Bros. Subagent 2 built you a bird, but it doesn’t look like a game asset and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications
It seems to me there is another way to handle this... allow the final agent to go back to the subagent and say "hey, you did the wrong thing, this is what you did wrong, please try again"... maybe with a few iterations it will get it right... at some point, you need to limit the iterations to stop an endless loop, and either the final agent does what it can with a flawed response, or escalate to a human for manual intervention (even the human intervention can be a long-running tool...)
They really handing out ai domain to anyone these days.
I'm not surprised—most AI "engineers" are not really good software engineers; they're often "vibe engineers" who don't read academic papers on the subject and keep re-inventing the wheel.
If someone asked me why I think there's an AI bubble, I'd point exactly to this situation.
I'm not sure that means the people who do this aren't good engineers, though. If someone rediscovers something in practice rather than through learning theory, does that make them bad at something, or simply inexperienced? I think it's one of the strengths of the profession that there isn't a singular path to reach the height of the field.
In the early 2000s, we used Open Agent Architecture (OAA) [1], which had a beautiful (declarative) Prolog-like notation for writing goals, and the framework would pick & combine the right agents (all written in different languages, but implementing the OAA interface through proxy libraries) to achieve the specified goals.
This was all on boxes within the same LAN, but conceptually, this could have been generalized.
[1] https://medium.com/dish/75-years-of-innovation-open-agent-ar...
But then next year and the year after, the technical debt will be to the point where they just need to throw out the code and start fresh.
Then the head count must go up. Typical short term gains for long term losses/bankruptcy
There’s no good evidence to support that claim. Just one study which looked at people with minimal AI experience. Essentially, the study found that effective use of AI has a learning curve.
With respect, if there's an AI bubble, I can't see it for all the sour grapes, every time it's brought up, anywhere.
It's an entirely new way of thinking, nobody is telling you the rules of the game. Everything that didn't work last month works this month, and everything you learned two months ago, you need to throw away. Coding assistants are inscrutable, overwhelming and bristling with sharp edges. It's easier than ever to paint yourself into a corner.
Back when it took weeks to put out a feature, you were insulated from the consequences of bad architecture, coding and communication skills: by the time things get bad enough to be noticed, the work had been done months ago and everyone on the team had touched the code. Now you can seeing the consequences of poor planning, poor skills, poor articulation being run to their logical conclusion in an afternoon.
I'm sure there are more reasons.
1. On one hand, walled gardens like Facebook, Instagram, YouTube, etc, are most of the internet and they decidedly use React (or similar) frameworks. So from that perspective, the statement is sorta trivially true.
2. There may well be a horde of websites that are pure HTML rendering. But, those sites are not largely being developed by developers – they are being generated by platforms (like Squarespace, etc.) so are out of the scope of "sites and apps built by developers"
All this is stated without data, of course.
The principles are super basic to get the first time you build an agent.
The real problem is to get reliability. If you have reliability and clear defined input and output you can easily go parallel.
THis seems like a bad 5th class homework
I would emphasize, though, that getting clearly defined input _that remains stable_ is hard. Often something is discovered during implementation of a task that informs changes in other task definitions. A parallel system has to deal with this or the results of the parallel tasks diverge.