At this point, I've settled on Visual Studio Code using GitHub Copilot's ChatGPT5 preview.
It's an Agent, instead of previous offerings that basically had you copy/pasting code, then error messages. It's not perfect by any means, but it does what you tell it in a given folder, writes code that you've asked for, and then fixes it's own mess when things break, like Python indentation going awry, etc.
I'm a Pascal programmer, with a passing familiarity to Python, but I've got a mass of code that it's written for my Passion Project (I'm old, and retired), with a suitable warning posted at the start of the README.md as a warning to those who wander in.
The thing you have to do, is keep an eye on your goals, and tell keep it on track with your project. My project is nearly impossible, involving too many levels of abstraction for the LLM to keep up with... it's close, but not quite there. I've had to pull it out of a few rabbit holes.[1] (To be fair, they seemed like good places to dig for answers)
If you treat the output of LLMs as instant legacy code, and the LLM itself as a junior programmer, you might just do ok with an Agent based code generation tool.
I’ve been in the exact same spiral—new tool drops every Tuesday, I install it, it feels cool for twenty minutes, then I’m back to copy-pasting code like it’s 2019. The thing that finally broke the cycle for me was admitting that the tools weren’t the problem; my process was.
So I stopped reading launch posts and started eavesdropping. In practice that looks like:
- Keeping a muted Discord tab pinned for one competitor tool. I skim #feature-requests once a day—not for ideas, but to see which promises still aren’t keeping users happy.
- Sorting Reddit threads by “controversial.” The bitter, down-voted rants are where the real friction lives.
- On Show HN, I scroll straight to the third-level comments. That’s where the folks who actually tried the thing roast it in detail.
Those three habits give me a short, evergreen list of “this is still broken” problems. Everything else is noise.
From that list I distilled three rules I actually stick to:
1. *Context on purpose.* Before I ask the model for anything, I spend 90 seconds writing a tiny “context manifest”: file paths, line ranges, and a one-sentence goal. Sounds fussy, but it kills the “oh crap I forgot utils.py” loop.
2. *Tokens are cash.* I run with a token counter always visible. If I’m about to ship 1,200 tokens for a 3-line fix, I stop and pare it down like I’m on a 1990s data plan. The constraint hurts for a week, then it becomes a game.
3. *One-screen flow.* Editor left, prompt box right, diff viewer bottom. No browser tabs, no terminal hopping. Alt-tab was costing me more mental RAM than the actual coding.
It’s not sexy, and it definitely isn’t “AI-native,” but it’s the first workflow that hasn’t crumbled after a month. Maybe it’ll help you too.
It's an Agent, instead of previous offerings that basically had you copy/pasting code, then error messages. It's not perfect by any means, but it does what you tell it in a given folder, writes code that you've asked for, and then fixes it's own mess when things break, like Python indentation going awry, etc.
I'm a Pascal programmer, with a passing familiarity to Python, but I've got a mass of code that it's written for my Passion Project (I'm old, and retired), with a suitable warning posted at the start of the README.md as a warning to those who wander in.
The thing you have to do, is keep an eye on your goals, and tell keep it on track with your project. My project is nearly impossible, involving too many levels of abstraction for the LLM to keep up with... it's close, but not quite there. I've had to pull it out of a few rabbit holes.[1] (To be fair, they seemed like good places to dig for answers)
If you treat the output of LLMs as instant legacy code, and the LLM itself as a junior programmer, you might just do ok with an Agent based code generation tool.
Good luck
[1] https://en.wikipedia.org/wiki/Down_the_rabbit_hole
So I stopped reading launch posts and started eavesdropping. In practice that looks like:
- Keeping a muted Discord tab pinned for one competitor tool. I skim #feature-requests once a day—not for ideas, but to see which promises still aren’t keeping users happy. - Sorting Reddit threads by “controversial.” The bitter, down-voted rants are where the real friction lives. - On Show HN, I scroll straight to the third-level comments. That’s where the folks who actually tried the thing roast it in detail.
Those three habits give me a short, evergreen list of “this is still broken” problems. Everything else is noise.
From that list I distilled three rules I actually stick to:
1. *Context on purpose.* Before I ask the model for anything, I spend 90 seconds writing a tiny “context manifest”: file paths, line ranges, and a one-sentence goal. Sounds fussy, but it kills the “oh crap I forgot utils.py” loop. 2. *Tokens are cash.* I run with a token counter always visible. If I’m about to ship 1,200 tokens for a 3-line fix, I stop and pare it down like I’m on a 1990s data plan. The constraint hurts for a week, then it becomes a game. 3. *One-screen flow.* Editor left, prompt box right, diff viewer bottom. No browser tabs, no terminal hopping. Alt-tab was costing me more mental RAM than the actual coding.
It’s not sexy, and it definitely isn’t “AI-native,” but it’s the first workflow that hasn’t crumbled after a month. Maybe it’ll help you too.