The irony of this, is that Microsoft was trying to push CoPilot everywhere, however eventually Apple, Google and JetBrains have their own AI integrations, taking CoPilot out of the loop.
Slowly the AI craziness at Microsoft is taking the similar shape, of going all in at the begining and then losing to the competition, that they also had with Web (IE), mobile (Windows CE/Pocket PC/WP 7/WP 8/UWP), the BUILD sessions that used to be all about UWP with the same vigour as they are all AI nowadays, and then puff, competition took over even if they started later, because Microsoft messed up delivery among everyone trying to meet their KPIs and OKRs.
I also love the C++ security improvements on this release.
nonethewiser 6 hours ago [-]
>The irony of this, is that Microsoft was trying to push CoPilot everywhere, however eventually Apple, Google and JetBrains have their own AI integrations, taking CoPilot out of the loop.
What is the irony? Microsoft integrated copilot in Vscode, bing, etc. Apple is integrating claude in Xcode, Jetbrains has their own AI.
Microsoft moved first with putting AI into their products then other companies put other AI into their products. Nothing about this seems ironic or in any way surprising.
__alexs 6 hours ago [-]
The irony is that most people don't know how to use the word ironic. Personally I blame Alanis Morissette.
SAI_Peregrinus 2 hours ago [-]
It's ironic that a song about irony contains no actual examples of irony. But since that's ironic, is it actually an appropriately named song?
yunwal 59 minutes ago [-]
The song contains many instances of cosmic irony[0], as well as a few instances of other types of irony (e.g. "Well isn't this nice?").
Rain on your wedding day is not cosmic irony or any other kind. It's simply misfortune.
arcticbull 4 hours ago [-]
The irony is biting.
ergocoder 6 hours ago [-]
Yeah, there's no irony.
Apple and Google will never choose to integrate Microsoft's services or products willingly.
It would have been more surprising if they decided to depend on Microsoft.
pjmlp 4 hours ago [-]
The irony is that Microsoft has several cases where it gets there first, only to be left behind when competition catches up.
Bing is irrelevant, VSCode might top in some places, but it is cursor and Claude that people are reaching for, VS is really only used by people like myself that still care about Windows development or console SDKs, otherwise even for .NET people are switching to Rider.
raincole 13 hours ago [-]
Microsoft owns 49% of OpenAI so why they should worry? JetBrains just proudly announce that they now use GPT-5 by default.
> going all in at the begining and then losing to the competition
Sure, but there are counter examples too. Microsoft went late to the party of cloud computing. Today Azure is their main money printing machine. At some point Visual Studio seemed to be a legacy app only used for Windows-specific app development. Then they released VSCode and boom! It became the most popular editor by a huge margin[0].
Anecdotally: Azure is the Teams of cloud services - nobody uses it voluntarily or because it's technically the best solution.
They use it because the corporation mandates it.
pjmlp 10 hours ago [-]
Partially, I still consider the Web shell and VSCode based editing experience the best of clound vendors as replacement from what started to me as telnet and X forwarding on the university DG/UX servers.
AWS is the worst of this experience, even IBM Cloud has better tooling in this regard, GCP is somehow in the middle, others like Vercel/Netlify naturally don't offer this kind of setup.
thevillagechief 3 hours ago [-]
Truer words have never been spoken. Every time I hear an exec say we're a Microsoft shop so we've got to use copilot/Azure, I wonder if they hear themselves.
no_wizard 6 hours ago [-]
Azure isn’t great but AWS continues to be worse by a mile. I don’t know why anyone puts up with their terrible SDKs and poor documentation.
IMO Firebase should be the gold standard of how to do cloud platforms
wiether 6 hours ago [-]
The fact that you're talking about SDKs and comparing AWS to Firebase... probably means that your usecase is very specific and explains why you don't like AWS.
ffsm8 5 hours ago [-]
Fwiw, it was AWS that started the serverless hypetrain and kept pushing it until ppl started to forget what AWS was known for up to that point
hbn 5 hours ago [-]
I'm assuming if you use Firebase you get Google's customer support (or lack thereof)
ryanjshaw 12 hours ago [-]
I’ve used Meet, Slack, Zoom and Teams extensively. Teams beats the others by miles in my opinion.
const_cast 9 hours ago [-]
Zoom is pretty good for video meetings and especially for video conferences. I've never tried to use it for chat, but I imagine it's pretty lackluster. The really nice thing about teams is that it does both in one place.
But you know what's super underrated and I think could really take a hold on the business world? Discord! The video calls are so good! And multiple streams at the same time? Zoom can't do that!
The channels, too, just blow everything teams has out of the water. The video quality is better, its way faster, has more features, and they actually work. The audio filtering stuff actually works.
I really think with the right marketing they could take over the world. Honestly can't believe they haven't tried it yet.
no_wizard 6 hours ago [-]
They would need to do a lot of alterations to make Discord viable in the business world, but I do agree the bones of the platform are miles ahead of Slack or Teams.
It’s a shame they don’t have an enterprise / business tailored product based on it
giancarlostoro 7 hours ago [-]
Well... Slack feels like they have one person working on it dev wise, I havent used Meet in a while, but if its still a in-browser only thing, yikes, and Zoom... that is some legacy feeling app, I dont know how anyone can love Zoom. They bought out KeyBase and didn't even build a better platform. KeyBase was top tier, I'm still sour that the dev team basically stopped maintaining KeyBase after Zoom bought them out.
Teams took the best bits from Skype and whatever that other service Microsoft had for businesses and their phones and started over basically.
I still have pet peeves about Teams (like why dont the 'Teams' within Teams have proper group chats like Slack would, its ridiculous!) but it could be way worse. After years of screensharing hell I can finally move the stupid top bar out of my way when trying to hit 'Debug' within Visual Studio at least.
dcow 6 hours ago [-]
Keybase works entirely fine to this day. What sucks is everyone stopped using it solely to retaliate against the acquisition with thin justifications in speculation that Zoom would ruin keybase. Well that didn't happen, Keybase’s own users ruined Keybase. And now everyone just uses Discord because I guess the encryption didn't actually matter when it counted. Sad but familiar security story.
giancarlostoro 6 hours ago [-]
Agree. I stopped using it because its basically frozen, I dont think they've done any updates, the lights are on but nobody's home.
If anyone who was an original stakeholder for Keybase is reading this, please bring it back in some way someday. I'm assuming Zoom probably made you guys sign some insane non-compete sadly.
In a sea of garbage chat services all built using Electron and other bloatware, Keybase was a breath of fresh air.
hu3 8 hours ago [-]
I used to hate Teams but they seem to have fixed it for me.
It works decently enough in web, mobile and desktop.
pac0 2 hours ago [-]
You are joking right?
pjmlp 10 hours ago [-]
One day we will be able to have threaded conversations....
marliechiller 8 hours ago [-]
is this sarcasm? Teams is by far the worst ive used out of those mentioned
PapaPalpatine 8 hours ago [-]
Yeah, if it’s not sarcasm, it’s a very good indicator that their opinion should be ignored because no one in their right mind would say Teams is better.
CharlesW 3 hours ago [-]
Teams is objectively better, but it's hard to beat the emotional connection that geeks have to Slack from its pre-Salesforce days.
jabart 6 hours ago [-]
Visual Studio is a bad example. It's used for Windows, Web, and Mobile. The big difference between the two is the cost. Visual Studio Pro is $100/month, Enterprise is $300/month, while VSCode is free. It was an incredibly smart marketing play by Microsoft to do that.
orphea 12 hours ago [-]
> At some point Visual Studio seemed to be a legacy app only used for Windows-specific app development. Then they released VSCode and boom!
I'm not sure what the point is. Visual Studio is still Windows-only; VS Code is not related to it in any shape or form, the name is deliberately misleading.
raincole 11 hours ago [-]
The point is MS was so, so, so late to the party of cross-platform developer tools. And then suddenly they won the game.
foobarchu 2 hours ago [-]
It really helps that VSCode isn't intrinsically tied to any other Microsoft products. It doesn't demand you use any specific platforms/languages/compilers, it doesn't demand you use GitHub, it runs on all major operating systems in basically the same way, it doesn't demand or even politely ask that you sign into a Microsoft account.
If it weren't for the name, you'd never know it was even a Microsoft product.
orphea 10 hours ago [-]
Ah, it makes total sense to me now. Thanks.
jen20 8 hours ago [-]
Indeed I heard directly from someone involved that the VS Code team understood the reputation of Visual Studio and wanted to call the product “Code” instead, and the compromise with marketing leadership was the the binary was called “code”.
monkeyelite 7 hours ago [-]
Visual studio is good though. I wish I could it use it instead of code or Xcode
jlarocco 7 hours ago [-]
I use it 5 days a week, and unless you're talking about an ancient version from the 90s, I don't understand how you can say that.
icemelt8 11 hours ago [-]
> VS Code is not related to it in any shape or form
Except they are made by the same company? and literally own the trademark for both?
hennell 9 hours ago [-]
Oracle literally own the trademark for both Java and JavaScript.
JumpCrisscross 13 hours ago [-]
> Microsoft owns 49% of OpenAI
Power at OpenAI seems orthogonal to ownership, precedent or even frankly their legal documents.
yokoprime 14 hours ago [-]
CoPilot isn't anything Microsoft is trying to sell outside of their own products. And with GitHub Copilot there is no "copilot" model to choose, you can choose between Anthropic, OpenAI and Google models.
Sure UWP never caught on, but you know why? Win32, which by the way is also Microsoft, was way to popular and more flexible. Devs weren't going to re-write their apps to UWP in order to support phones.
lenkite 8 hours ago [-]
People were writing to UWP. There were hundreds of UWP apps that got cancelled and abandoned when Microsoft ditched their Windows Phone once Nadella got in. He kill Windows Phone, he killed native Edge (Chakra JS) and a lot of other stuff to focus fully on Cloud and then AI.
Before that ex-Microsoft guy was responsible for killing Nokia OS/Meego too in favor of Windows Phone - which got abandoned. What a train-wreck of errors leading to the mobile phone duopoly today.
Just because you can’t or won’t win the market with your opportunistic investment, doesn’t mean you should let your competitors completely annihilate you by taking that investment for themselves.
Google, Apple, FB or AWS would have been suitors for that licensing deal if MS didn’t bite.
rajnathani 14 hours ago [-]
About GitHub Copilot in specific: One big negative was how when GPT-4 became available that Microsoft didn't upgrade paying Copilot users to it, they simply branded this "coming soon"/"beta" Copilot X for a while. We simply cancelled the only Copilot subscription we had at work.
WhyNotHugo 12 hours ago [-]
Copilot subscription?
I've been getting monthly emails that my free access for GitHub Copilot has been renewed for another month… for years. I've never used it, I thought that all GitHub users got it for free.
If you are a student or maintain a popular open source project, they give it to you for free. I’m guessing you might fall under that category.
onion2k 14 hours ago [-]
"Taking Copilot out of the loop" if you ignore the massive ecosystems of Github, Visual Studio, and Visual Studio Code.
nihonde 13 hours ago [-]
Different CoPilot product. Typical Microsoft naming confusion.
recursive 8 hours ago [-]
There's another copilot?
airstrike 6 hours ago [-]
It's hard to find anything at Microsoft that isn't named Copilot these days
ziml77 1 hours ago [-]
Even Office! It's Microsoft 365 Copilot now...
JumpCrisscross 13 hours ago [-]
Microsoft mistook a product game for a distribution one. AI quality is heterogenous and advancing enough that people will make an effort to use the one they like best. And while CoPilot is excellently distributed, it’s a crap product, in large part due to the limits Microsoft put on GPT.
delta_p_delta_x 5 hours ago [-]
> I also love the C++ security improvements on this release.
These are courtesy of LLVM/Clang (which Xcode ships with), rather than Xcode itself.
pjmlp 4 hours ago [-]
LLVM is only relevant thanks to Apple in first place, otherwise it would still be an university project if at all, clang was born at Apple, and some of their employees are responsible for those improvements in collaboration with Google, presented at a LLVM Developers Meeting.
jnsaff2 11 hours ago [-]
What confuses me about MS Copilot is that there are (according to ChatGPT) 12 distinct services that are all Copilot:
Microsoft Copilot (formerly Bing Chat)
Microsoft 365 Copilot
Microsoft Copilot Studio
GitHub Copilot
Microsoft Security Copilot
Copilot for Azure
Copilot for Service
Sales Copilot
Copilot for Data & Analytics (Fabric)
Copilot Pro
Copilot Vision
const_cast 9 hours ago [-]
Out of all of big tech, Microsoft is by far the worst at naming stuff. Its comically bad most of the time.
jen20 8 hours ago [-]
Copilot.NET Live Ultimate Edition N for Developers
estimator7292 7 hours ago [-]
(With Copilot)
culopatin 4 hours ago [-]
SP2
ramses0 2 hours ago [-]
*Series
mcv 11 hours ago [-]
I use IntelliJ with the Copilot plugin, using Claude. My employer has a big subscription for everything from Microsoft, and that includes Copilot, so that's free for me. But somehow Copilot also gives me access to Claude. No idea how that works.
vbezhenar 9 hours ago [-]
Taking out of the loop? I have a feeling that vscode copilot has huge market share. It's more like competitors are slowly eating small piece of pie.
icemelt8 14 hours ago [-]
umm I don't know what you are talking about, I use a Github Copilot 40 USD subscription in VSCode to code using various models, and this is the industry standard now in my region, as most employers are now giving employees the 10 USD subscription.
pjmlp 11 hours ago [-]
In what industry? Not where I am where unless customers actually sign a permission no one is allowed to come close to those tools in project delivery.
jfoster 13 hours ago [-]
Also OpenAI pioneered but now the many competitors seem to have either caught up or surpassed them. They might still retain a significant brand recognition advantage as long as they don't fall too far behind, though.
vbezhenar 9 hours ago [-]
Which competitor has alternative to ChatGPT Pro? I have Claude subscription and Opus 4.1 is not on the same level. ChatGPT Pro thinks for 5-10 minutes, while Opus either doesn't think at all or thinks very briefly. And response quality is absolutely different. ChatGPT Pro solves problems, Opus does not. Is there any competitor with "Pro" product which spends significant amount of computing for a single query?
Larrikin 7 hours ago [-]
Click the research button in the Claude prompt. It usually asks a few follow up questions then does exactly what you're describing.
SmartestUnknown 6 hours ago [-]
Deepthink with Gemini AI Ultra plan. Daily limit of 10 might be restrictive for you though.
wordofx 13 hours ago [-]
Almost no one uses copilot unless they are not allowed to use anything else or don’t know any better. MS could have been a leader in this space but MS couldn’t understand why people didn’t like copilot but loved the competition.
transcriptase 11 hours ago [-]
Once co-pilot tendrils and icons began appearing in all of my orgs tools, they announced we would no longer be able to expense subscriptions for others. Only those who haven’t used ChatGPT Pro, Claude, Gemini, etc have anything good to say about copilot.
celeryd 12 hours ago [-]
Maybe because Microsoft is a shit company and anything they do is sus af. And everyone knows it. And I'm tired of pretending like it's not. I wouldn't trust Microsoft to babysit my mortal enemy's kids.
Maybe if they weren't literally the borg people would open their hearts and wallets to Redmond. They saw that Windows 10 was a privacy nightmare and what did they do? They doubled down in Windows 11. Not that I care but it plays really poorly. Every nerd on the internet spouts off about Recall even though it's not even enabled if you install straight to the latest build.
They bought GitHub and now it's a honeypot. We live in a world where we have to assume GitHub is adversarial.
_NSAKEY???
Fuck you Microsoft.
Makes sense karma catches up to them. Maybe if their mission statement and vision were pure or at least convincing they would win hearts and minds.
Lammy 16 hours ago [-]
Interesting to think about how Apple get to make product decisions based on Gatekeeper OCSP analytics now that every app launch phones home. They must know exactly how popular VSCode is.
Facebook got excoriated for doing that with Onavo but I guess it's Good Actually when it's done in the name of protecting my computer from myself lol
doctorpangloss 15 hours ago [-]
Apple doesn't need telemetry to send emails about their favorite coding AI to the 2 Xcode users
kennywinker 14 hours ago [-]
Off by about 33,999,998 users, but still a decent dunk.
34 million developers? That number doesn't even pass a basic sniff test. Are there 34 million people that have Xcode installed? That I can believe.
_puk 13 hours ago [-]
If my experience is anything to go by - a good proportion of this will be people accidentally double clicking a .md (or other random text suffix), and cursing whilst they wait for XCode to slowly load enough that they can quit it and open the file in a proper lightweight editor..
slipperydippery 4 hours ago [-]
Yep, I open Xcode several times per year, but haven't done it on purpose since... uh, 2014 or so?
tomashubelbauer 13 hours ago [-]
I feel like the #1 reason to install Xcode is to get Git working on macOS. Yours is probably #2. I wouldn't bet money on iOS/macOS development sitting at #3.
Lammy 3 hours ago [-]
Command Line Tools for Xcode is a separate smaller package than the full graphical Xcode tho: `xcode-select --install`
oarsinsync 12 hours ago [-]
> At Apple's World Wide Developer Conference on Monday, Tim Cook mentioned that there are now 34 million registered developers with the company's platform.
I think that means either:
* they have revenues of $3.4b/year just from the $100 annual fees, or
* some decent percentage of people have signed up for a free developer account and then never done anything with it (like me)
doctorpangloss 4 hours ago [-]
We found one of the users!
wahnfrieden 16 hours ago [-]
This won't make a dent. It still doesn't support any agentic operation.
The real news is when Codex CLI / Claude Code get integrated, or Apple introduces a competitor offering to them.
Until then this is a toy and should not be used for any serious work while these far better tools exist.
alwillis 16 hours ago [-]
I just installed it—definitely not a toy.
Compared to stock Claude Code, this version of Claude knows a lot more about SwiftUI and related technologies. The following is output from Claude in Xcode on an empty project. Claude Code gives a generic response when it looked at the same project:
What I Can Help You With
• SwiftUI Development: Layout, state management, animations, etc.
• iOS/macOS App Architecture: MVVM, data flow, navigation
• Apple Frameworks: Core Data, CloudKit, MapKit, etc.
• Testing: Both traditional XCTest and the new Swift Testing framework
• Performance & Best Practices: Swift concurrency, memory management
Example of What We Could Do Right Now
Looking at your current ContentView.swift, I could help you:
• Transform this basic "Hello World" into a recovery tracking interface
• Add navigation, data models, or user interface components
• Implement proper architecture patterns for your Recovery Tracker app
manmal 15 hours ago [-]
If a bunch of markdown files forced into the context is “knowing”, then yes. They are usually located at /Applications/Xcode-beta.app/Contents/PlugIns/IDEIntelligenceChat.framework/Versions/A/Resources/AdditionalDocumentation
You are free to point Claude Code to that folder, or make a slash command that loads their contents. Or, start CC with -p where the prompt is the content of all those files.
Claude Code integration in Xcode would be very cool indeed, but I might still stick with VSCode for pure coding.
alwillis 2 hours ago [-]
No, this isn't a bunch of markdown files, has been in the works since at least May and has been dogfooded inside Apple [1]: "Apple Partners With Anthropic for Claude-Powered AI Coding Platform"
What is special about it besides the IDE integration? It is a bunch of markdown files with a search tool. It has no agentic capability so it is far behind state of the art dev tooling. Apple devs internally use Claude Code.
brailsafe 15 hours ago [-]
> Claude Code integration in Xcode would be very cool indeed, but I might still stick with VSCode for pure coding.
I'm sticking with VSCode too, but it's a bit silly to suggest that anyone is using XCode because it's their preferred IDE. It's just the one that's necessary for any non-trivial Apple platform development.
Adding a code generator isn't a marketing ploy to get people to switch editors, it's just a small concession to the many hapless souls stuck dealing with Apple on the professional side, or masochistically building mac SwiftUI apps just to remind themselves what pain feels like.
wahnfrieden 6 hours ago [-]
I actually continue to use Xcode (in vim mode now that they have that) purely because of the way tabs work… in Vim and Xcode I’m able to have the same file open across multiple tabs and window-tabs, allowing me to arrange sets of files for particular tasks. But in VS Code it sends me to another window when I want to view a file next to another one, just because it’s already open elsewhere. I can’t stand this behavior as it slows me down and breaks my ability to see the files I want to see next to each other without many extra steps to rearrange things over and over. A ticket requesting this behavior change has been open for years with no progress.
manmal 13 hours ago [-]
I mean you can stay in VSCode for most activities if you hate Xcode that much (I can relate btw). Plugins like Sweetpad make this possible. My approach now is to develop all logic in small Swift packages and run swift test in VSCode (or Claude Code), so I only absolutely need Xcode for debugging and building releases. Every once in a while I try SwiftUI previews, but those are usually broken anyways.
wahnfrieden 29 minutes ago [-]
If it is severely less capable - and not even any cheaper to use - then it’s a toy! A penny farthing can get you somewhere just as a car can but only one is perhaps of professional utility even if the other used to be at one point too
ako 16 hours ago [-]
Isn’t that easy to add with some rules and guidelines documents? I usually ask Claude code to research modern best practices for SwiftUI apps and to summarize the learnings in a rules file that will be part of the SwiftUI project.
theshrike79 15 hours ago [-]
Yes and no. Proper Agentic coding tools like Claude Code are a bit more than just a bunch of markdown rulesets.
For example: it uses Haiku as a model to run tools and most likely has automatic translations for when the model signals it wants to search or find something -> either use the built-in search or run find/fd/grep/rg
All that _can_ be done by prompting, but - as always with LLMS - prompts are more like suggestions.
ghurtado 15 hours ago [-]
I'm as crazy about AI as the next dev, but that has to be the weakest example of AI capability that I have ever seen.
wahnfrieden 6 hours ago [-]
People are just excited to see Apple finally integrate something useful. But it’s not 1/100th as useful as agentic tools (CC, Codex) which have access to the exact same markdown files as the Apple one. There’s nothing else special about Apple’s offering, and it has no agentic capability. It is a waste of time to use it.
einrealist 14 hours ago [-]
Its not shipping the model in Xcode. You are still sending your data off to a remote provider, hoping that this provider behaves nicely with all this data and that the government will never force the provider to reveal your data.
ygritte 14 hours ago [-]
They are already forcing OpenAI to keep all logs. Go figure.
einrealist 13 hours ago [-]
And people talk to GPT about very private things, using it as a shrink. What can go wrong.
3 hours ago [-]
kridsdale1 4 hours ago [-]
China wishes they had that level of access to their people’s thoughts.
marci 8 hours ago [-]
Anthropic has a strong stance on privacy. They won't rug pull.
3 days ago I saw another Claude praising submission on HN, and finally I signed up for it, to compare it with copilot.
I asked 2 things.
1. Create a boilerplate Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured. It generated garbage devicetree which didn't even compile. When I pointed it out, it apologized and generated another one that didn't compile. It configured also non-existent drivers, and for some reason it enabled monkey test support (but not test support).
2. I asked it to create 7x10 monochromatic pixelmaps, as C integer arrays, for numeric characters, 0-9. I also gave an example. It generated them, but number eight looked like zero. (There was no cross in ether 0 nor 8, so it wasn't that. Both were just a ring)
What am I doing wrong? Or is this really the state of the art?
simonw 15 hours ago [-]
"What am I doing wrong?"
Your first prompt is testing Claude as an encyclopedia: has it somehow baked into its model weights the exactly correct skeleton for a "Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured"?
Frequent LLM users will not be surprised to see it fail that.
The way to solve this particular problem is to make a correct example available to it. Don't expect it to just know extremely specific facts like that - instead, treat it as a tool that can act on facts presented to it.
For your second example: treat interactions with LLMs as an ongoing conversation, don't expect them to give you exactly what you want first time. Here the thing to do next is a follow-up prompt where you say "number eight looked like zero, fix that".
diggan 9 hours ago [-]
> For your second example: treat interactions with LLMs as an ongoing conversation, don't expect them to give you exactly what you want first time. Here the thing to do next is a follow-up prompt where you say "number eight looked like zero, fix that".
Personally, I treat those sort of mistakes as "misunderstandings" where I wasn't clear enough with my first prompt, so instead of adding another message (and increasing context further, making the responses worse by each message), I rewrite my first one to be clearer about that thing, and regenerate the assistant message.
Basically, if the LLM cannot one-shot it, you weren't clear enough, and if you go beyond the total of two messages, be prepared for the quality of responses to really sink fast. Even by the second assistant message, you can tell it's having an harder time keeping up with everything. Many models brag about their long contexts, but I still feel like the quality of responses to be a lot worse even once you reach 10% of the "maximum context".
varispeed 9 hours ago [-]
You also need to state your background somehow and at what level you want the answer to be. I often found LLM would give answer that what I ask is too complex and would take months to do. Then you have to say like ignore these constraints and assume I am already an expert in the field, outline a plan how to achieve this and that. Then drill down on the plan points. It's a bit of work, but its fascinating.
Or it would say to do X it involves very complex math, instead you could (and proceeds with stripped down solution that doesn't meet goals). So you can tell it to ignore the concerns about complexity and assume that I understand all of it and it is easy to me. Then it goes on creating the solution that actually has legs. But you need to refine it further.
OtherShrezzing 15 hours ago [-]
It’s good at doing stuff like “host this all in Docker. Make a Postgres database with a Users table. Make a FastAPI CRUD endpoint for Users. Make a React site with a homepage, login page, and user dashboard”.
It’ll successfully produce _something_ like that, because there’s millions of examples of those technologies online. If you do anything remotely niche, you need to hold its hand far more.
The more complicated your requirements are, the closer you are to having “spicy autocomplete”. If you’re just making a crud react app, you can talk in high level natural language.
ranguna 13 hours ago [-]
Did you try claude code and spend actual time going back and forth with it, reviewing it's code and providing suggestions; Instead of just expecting things to work first try with minimal requirements?
I see claude code as pair programming with a junior/mid dev that knows all fields of computer engineering. I still need to nudge it here and there, it will still make noob mistakes that I need to correct and I let it know how to properly do things when it gets them wrong. But coding sessions have been great and productive.
In the end, I use it when working with software that I barely know. Once I'm up and running, I rarely use it.
johnisgood 12 hours ago [-]
> Did you try claude code and spend actual time going back and forth with it, reviewing it's code and providing suggestions; Instead of just expecting things to work first try with minimal requirements?
I did, but I always approached LLM for coding this way and I have never been let down. You need to be as specific as possible, be a part of the whole process. I have no issues with it.
gattilorenz 13 hours ago [-]
FWIW, I used Gemini to write an Objective-C app for Apple Rhapsody (!) that would enumerate drivers currently loaded by the operating systems (more or less save level of difficulty as the OP, I'd say?), using the PDF manual of NextStep's DriverKit as context.
It... sort of worked well? I had to have a few back-and-forth because it tried to use Objective-C features that did not exist back then (e.g. ARC), but all in all it was a success.
So yeah, niche things are harder, but on the other hand I didn't have to read 300 pages of stuff just to do this...
thefoyer 12 hours ago [-]
I remember writing obj-c naturally by hand. Before swift was even a twinkle in tim cooks eye. One of my favorite languages to program in I had a lot of fun writing ios apps back in the day it seems like
yard2010 9 hours ago [-]
I member obj c, using it was a profound experience, it was so different from other languages I felt like an anthropologist.
Also, fun names like
`makeFunctionNameInCommentLongAndDescriptiveWithNaturalLanguage:(NSLanguage *)language`
fauigerzigerk 13 hours ago [-]
I agree, but I think there's an important distinction to be made.
In some cases, it just doesn't have the necessary information because the problem is too niche.
In other cases, it does have all the necessary information but fails to connect the dots, i.e. reasoning fails.
It is the latter issue that is affecting all LLMs to such a degree that I'm really becoming very sceptical of the current generation of LLMs for tasks that require reasoning.
They are still incredibly useful of course, but those reasoning claims are just false. There are no reasoning models.
fx0x309 14 hours ago [-]
In other words, the vibe coders of this world are just redundant noobs who don't really belong on the marketplace. They've written the same bullshit CRUD app every month for the past couple of years and now they've turned to AI to speed things up
stpedgwdgfhgdd 14 hours ago [-]
Last week I asked Claude to improve a piece of code that downloads all AWS RDS certificates to just the ones needed for that AWS region. It figured out several ways to determine the correct region, made a nice tradeoff and suggested the most reliable way. It rewrote the logic to download the right set, did some research to figure out the right endpoint in between. It only made one mistake, it fallback mechanism was picking EU, which was not correct. Maybe 1 hour of work. On my own it would have taken me close to a working day to figure it all out.
Towaway69 13 hours ago [-]
This is just a thought experiment.
I don't mean to be treading on feet but I'm noticing this more and more in the debates around AI. Imagine if there are developers out there that could have done this task in 30 mins without AI.
The level of performanace of AI solutions is heavily related to the experience level of the developer and of the problem space being tackled - as this thread points out.
Unfortunately the marketing around AI ignores this and makes every developer not using AI for coding seem like a dinosauer, even though they might well be faster in solving their particular problems.
AI is moving problem solving skills from coding to writing the correct prompts and teaching AI to do the right thing - which, again, is subjective, since the "right thing" for one developer isn't the "right thing" for the another developer. "Right thing" being the correct solution, the understandable solution, the fastest solution, etc depending on the needs of the developer using the AI.
WalterSear 12 hours ago [-]
IMHO, the thirty minute developer would still save 10 minutes by vibe coding. That marketing's not wrong.
Spelling out exactly what you want and checking/fixing what you receive is still faster than typing out the code. Moreover, nobody's job involves nothing but brainiac coding, day after day. You have to clean up and lay foundations, whatever level you are at.
Towaway69 11 hours ago [-]
> IMHO, the thirty minute developer would still save 10 minutes by vibe coding. That marketing's not wrong.
For me, that's too general. Of course, perhaps for this particular, specific problem it might be true. But as this thread points out, anything niche and AI fails to help productively. Of course then comes the marketing: just wait, AI will be able to cover those niche cases also.
> want and checking/fixing what you receive is still faster than typing out the code
Then I do wonder why there are developers at all. After all that's what AI is so good at - if one believes the marketing - being precise and describing exactly what needs to be done. Surely it must be faster having two AIs talking to each and hammering out the code.
And even typing is subjective: ten fingers versus two, versus four .. etc. There are developers that can type faster than they can think - in certain cases.
There is also the developer in flow versus the stop and go using an AI prompts to get it just right. I dunno, if it comes true, then thankfully there won't be any humans to create bugs in code but somehow, I can't see it happening.
TheOtherHobbes 9 hours ago [-]
There are two ways to do this. One is to one-shot or maybe few-shot a solution. Maybe this works. Maybe it doesn't. Sometimes it works if you copy a solution from [Product 1] to [Product 2] and say "Fix this."
The other is to look at the non-working solution you get, read through it, and think "Oh, I didn't know about that framework/system/product/library, that's neat" and then do some combination of further research and more hand-holding to get to something that does work.
This is useful, more or less, no matter what your level.
It's also good for explaining core industry tooling you've maybe never used before. If you're new to Postgres/NoSQL/AWS/Docker/SwiftUI/whatever it can talk you through it and give you an instant bootcamp with entry-level examples and decent solutions.
And for providing fixes for widely known bugs and issues in products that may not be widely known to you (yet.)
IME ChatGPT5 is pretty solid with most science/tech up to undergrad. It gets hallucinatory past that, and it's still flattering, which is annoying, but you can tell it to cut that out.
Generally you can use it as a dumb offshore developer, or as an infinitely patient private tutor.
That latter option is very useful. The first, not always.
socksy 6 hours ago [-]
> The level of performanace of AI solutions is heavily related to the experience level of the developer and of the problem space being tackled - as this thread points out.
>
> Unfortunately the marketing around AI ignores this and makes every developer not using AI for coding seem like a dinosauer, even though they might well be faster in solving their particular problems.
You're not necessarily wrong, but I think it's worth noting that very few developers are only ever coding deep in their one domain that they're good at. There's just too many things to be deeply good at everything. For example, it's common that infra and CI tasks are stuff that most developers haven't learned by heart, because you don't tend to touch them very often.
Claude shines here — I've made a lot more useful GitHub Actions jobs recently, because while I could automate something, if I know I'm going to have to look up API docs (especially multiple APIs I'm not super familiar with) then I tend to figure that the automation will lose out the trade-off between doing the task (see https://xkcd.com/1205/). Claude being able to hash out those rapidly, and in a way that's easily verifiable that it's doing the right thing, has changed that arithmetic for me substantially.
dns_snek 4 hours ago [-]
> Maybe 1 hour of work. On my own it would have taken me close to a working day to figure it all out.
1. Find out how to access metadata about the node running my code (assumption: some kind of an environment variable) [1-10 minutes depending on familiarity with AWS]
2. Google "RDS certificates" and find the bundle URL after skimming the page [1] for important info [1-5 minutes]
3. Write code to download the certificate bundle, fallback being "global-bundle.pem" if step 1 failed for some reason? [5-20 minutes depending on all the bells and whistles you need]
Did I miss anything or completely misunderstand the task?
edit: I asked Claude Sonnet 4 to write robust code for a Node.JS application that downloads RDS CA bundle for the AWS region that the code is currently running in and saves it at the supplied filesystem path.
0. It generated about 250 lines of code
1. Fallback was us-east (not global)
2. The download URLs for each region were hardcoded as KV pairs instead of being constructed dynamically
3. Half of the regions were missing
4. It wrote a function that verifies whether the certificate bundle looks valid (i.e. includes a PEM header)... but only calls it on the next application startup, instead of doing so before saving a potentially invalid certificate bundle to disk and proceeding with the application startup.
5. When I complained that half of my instances are downloading global bundles instead of regional ones (because they're not present in the hardcoded list), it:
- incorrectly concluded that not all regions have CA bundles available and hardcoded a duplicate list in 2 places containing regions that are known to offer CA bundles (which is all of them). These lists were even shorter than the last ones.
- wrote a completely unnecessary function that checks whether a regional CA bundle exists with a HEAD request before actually downloading it with a GET request, adding another 50 lines of code
Now I'm having to scrutinize 300 lines of code to make sure it's nothing doing something even more unexpected.
LinuxAmbulance 13 hours ago [-]
I think the majority of coders out there write the same CRUD app over and over again in different flavors. That's what the majority of businesses seem to pay for.
If a business needs the equivalent of a Toyota Corolla, why be upset about the factory workers making the millionth Toyota Corolla?
shafyy 13 hours ago [-]
> I think the majority of coders out there write the same CRUD app over and over again in different flavors
In my experience, that's not entirely true. Sure, a lot of app are CRUD apps, but they are not the same. The spice lies in the business logic, not in programming the CRUD operations. And then of course, scaling, performance, security, organization, etc etc.
reverius42 10 hours ago [-]
Good thing LLMs are really good at unique business logic, scaling, performance, security, organization, etc etc.!
(edit: /s to indicate sarcasm)
wsc981 8 hours ago [-]
Yeah, my experience with LÖVR [0] and LLM (ChatGPT) has been quite horrible. Since it's very niche and quite recently quite a big API change has happened, which I guess the model wasn't trained on. So it's kind of useless for that purpose.
Trying two things and giving up. It's like opening a REPL for a new language, typing some common commands you're familiar with, getting some syntax errors, then giving up.
You need how to learn to use your tools to get the best out of them!
Start by thinking about what you'd need to tell a new Junior human dev you'd never met before about the task if you could only send a single email to spec it out. There are shortcuts, but that's a good starting place.
In this case, I'd specifically suggest:
1. Write a CLAUDE.md listing the toolchains you want to work with, giving context for your projects, and listing the specific build, test etc. commands you work with on your system (including any helpful scripts/aliases you use). Start simple; you can have claude add to it as you find new things that you need to tell it or that it spends time working out (so that you don't need to do that every time).
2. In your initial command, include a pointer to an example project using similar tech in a directory that claude can read
3. Ask it to come up with a plan and ask for your approval before starting
designerarvid 16 hours ago [-]
I guess many find comfort in being able to task an ai with assignments that it cannot complete. Most sr developers I work with take this approach. It's not really a good way of assessing the usefulness of a tool though.
xoac 15 hours ago [-]
He asked what he was doing wrong?
_boffin_ 15 hours ago [-]
too big of tasks. break them down and then proceed from there. have it build out task lists in a TASKS.md. review those tasks. do you agree? no? work with it to refine. implement one by one. have it add the tests. refactor after awhile as {{model}} doesn't like to do utility functions a lot. right now, about +50k lines in to a project that's vibecoded. i sit back and direct and it plays.
Imagine the CS 100 class where they ask you to make a PB&J. saying for it to make it, there's a lot of steps, but determine known the steps. implement each step. progress.
throwawayffffas 12 hours ago [-]
Too big and requiring too much niche specific knowledge, you somehow have to inject that knowledge and allow it to iterate.
SkyPuncher 15 hours ago [-]
This is the way.
I run interviews at my company. We allow/encourage AI.
The number one failure method is people throwing all of the requirements in upfront. They get one good pass then fail.
_boffin_ 6 hours ago [-]
I was part of a shop that did the Pivotal Way and we had Inceptions where the PM, engineers, and a tester or two would be sequestered in a conference room for the day to bang out task lists that went into mid-level fidelity. Technical considerations were debated and sometimes in a heated way, but we never got into implementation—just structure and flow to ensure it jives.
…this reeeeaaaallllyyyy feels like that
CSSer 14 hours ago [-]
I'm inclined to agree with this approach because someone not using AI who fails would likely fail for the same reasons. If you can't logically distill a problem into parts you can't obtain a solution.
999900000999 14 hours ago [-]
Think of Claude as a typical software developer.
If you just selected a random developer do you think they're going to have any idea why your talking about?
The issue is LLMs will never say, sorry, IDK how to do this. Like a stressed out intern they just make up stuff and hope it passes review.
a_wild_dandan 15 hours ago [-]
> What am I doing wrong?
Providing a woefully inadequate descriptions to others (Claude & us) and still expecting useful responses?
VMG 13 hours ago [-]
> It configured also non-existent drivers, and for some reason it enabled monkey test support (but not test support).
If it doesn't have the underlying base data, it tends to hallucinates. (It's getting a bit difficult to tell when it has underlying data, because some models autonomously search the web). The models are good at transforming data however, so give it access to whatever data it needs.
Also let it work in a feedback loop: tell it to compile and fix the compile errors. You have to monitor it because it will sometimes just silence warnings and use invalid casts.
> What am I doing wrong? Or is this really the state of the art?
It may sound silly, but it's simply not good at 2D
throwawayffffas 13 hours ago [-]
> It may sound silly, but it's simply not good at can2D.
It's not silly at all, it's not very good at layouts either, it can generally make layouts but there is a high chance for subtle errors, element overlaps, text overflows, etc.
Mostly because it's a language model, i.e it doesn't generally see what it makes, you can send screenshots apparently and it will use it's embedded vision model, but I have not tried that.
mm263 15 hours ago [-]
Try this prompt: Create a detailed step by step plan to implement a boilerplate Zephyr project skeleton for Pi Pico with configured st7789 SPI display drivers
Ask Opus or Gemini 2.5 Pro to write a plan. Then ask the other to critique it and fix mistakes. Then ask Sonnet to implement
mm263 15 hours ago [-]
I tried this myself and IMO, this might be basic and day-to-day for you, with unambiguous correct paths to follow, but this is pretty niche nevertheless. LLMs thrive when there's a wealth of examples and I struggle to Google what you asked myself, meaning that LLM will perform even worse than my try.
prawn 14 hours ago [-]
I found that second line works well for image prompts too. Tell one AI to help you with a prompt, and then take it back to the others to generate images.
yangikan 6 hours ago [-]
Is there a way to do this kind of design->critique->implement without switching tools? Like an end-to-end solution that consults multiple LLMs?
mm263 5 hours ago [-]
Claude code with Zen MCP. Kiro, but you don’t get a second LLM opinion.
alansammarone 13 hours ago [-]
There's a lot of people caricaturing the obvious fact that any model works best in distribution.
The more esoteric your stack, and the more complex the request, the more information it needs to have. The information can be given either through doing research separately (personally, I haven't had good results when asking Claude itself to do research, but I did have success using the web chat UI to create an implementation plan), or being more specific with your prompt.
As an aside, I have more than 10 years of experience, mostly with backend Python, and I'd have no idea what your prompts mean. I could probably figure it out after some google searches, tho. That's also true of Claude.
Here's an example of a prompt that I used recently when working on a new codebase. The code is not great, the math involved is non trivial (it's research-level code that's been productionized in hurry). This literally saved 4 hours of extremely boring work, digging through the code to find various hardcoded filenames, downloading them, scp'ing them, and using them to do what I want. It one-shotted it.
> The X pipeline is defined in @airflow/dags/x.py, and Y in `airflow/dags/y.py` and the relevant task is `compute_X`, and `compute_Y`, respectively. Your task is to:
> 1. Analyze the X and Y DAGs and and how `compute_X` functions are called in that particular context, including it's arguments. If we're missing any files (we're probably missing at least one), generate a .sh file with aws cli or curl commands necessary for downloading any missing data (I don't have access to S3 from this machine, but I do have in a remote host). Use, say, `~/home` as the remote target folder.
> 2. If we needed to download anything from S3, i.e. from the remote host, output rsync/scp commands I can use to copy them to my local folder, keeping the correct/expected directory structure. Note that direct inputs reside under `data/input`, while auxiliary data resides in other folders under `data`. Do not run them, simply output them. You can use for example `scp user@server.org ...`
> 3. Write another snapshot test for X under `tests/snapshot`, and one for Y. Use a pattern as similar as possible to the other tests there. Do not attempt to run the tests yet, since I'll need to download the data first.
> If you need any information from Airflow, such as logs or output values, just ask and I can provide them. Think hard.
baq 12 hours ago [-]
> What am I doing wrong? Or is this really the state of the art?
You're treating the tool like it was an oracle. The correct way is to treat it as a somewhat autistic junior dev: give it examples and process to follow, tell it to search the web, read the docs, how to execute tests. Especially important is either directly linking or just copy pasting any and all relevant documentation.
The tool has a lossily compressed knowledge database of the public internet and lots of books. You want to fix the relevant lossy parts in the context. The less popular something is, the more context will be needed to fill the gaps.
mexicocitinluez 9 hours ago [-]
> The correct way is to treat it as a somewhat autistic junior dev: give it examples and process to follow, tell it to search the web, read the docs, how to execute tests. Especially important is either directly linking or just copy pasting any and all relevant documentation.
Like "Translate this pdf to html using X as a templating language". It shines at stuff like that.
As a dev, I encounter tons of one-off scenarios like this.
dboon 15 hours ago [-]
Real vibe coding is fake, especially for something niche like what you asked it to do. Imagine a hyperactive eidetic fresh out of high school was literally sitting in the other room. What would you tell her? That’s a good rule of thumb for the level of detail and guidance
postalcoder 15 hours ago [-]
You can no longer answer "what is the state of the art” by pointing to a model.
Generating a state-of-the-art response to your request involves a back-and-forth with the agent about your requirements, having a agent generate and carry out a deep research plan to collect documentation, then having the agent generate and carry out a development plan to carry it out.
So while Claude is not the best model in terms of raw IQ, the reason why it's considered the best coding model is because of its ability to execute all these steps in one go which, in aggregate, generates a much better result (and is less likely to lose its mind).
adastra22 14 hours ago [-]
> So while Claude is not the best model in terms of raw IQ
Which one is, and by what metric? I always end up back at Claude after trying other models because it is so much better at real world applications.
rubicon33 4 hours ago [-]
LLMs are actually terrible at generating art unless they're specifically trained for that type of work. Its crazy how many times I've asked for some UI elements to be drawn using a graphics context and it comes out totally wrong.
XenophileJKO 15 hours ago [-]
Ok. several tips I can give.
1. Setup a sub-agent to do RESEARCH. It is important that it only has read-only and web access tools.
2. Use planning mode and also ask the agent to use the subagent to research best pratices with the tech that you are wanting to do, before it builds a plan.
3. When ever it gets hung up.. tell it to use the sub-agent to research the solution.
That will get you a lot better initial solution. I typically use Sonnet for the sub-agents and Opus for the main agent, but sonnet all around should be fine too for the most part.
AHTERIX5000 14 hours ago [-]
I've had similar experiences when working on non-web tech.
There are parts in the codebase I'd love some help such as overly complex C++ templates and it almost never works out. Sometimes I get useful pointers (no pun intended) what the problem actually is but even that seems a bit random. I wonder if it's actually faster or slower than traditional reading & thinking myself.
pell 15 hours ago [-]
In my experience Claude is quite good at the popular stacks in the JavaScript, Python and PHP world. It struggled quite a bit when I asked it non-trivial questions in C or Rust for example. For smaller languages (e.g., Crystal) it seems to hallucinate a lot. I think since a lot of people work in JS, Python and PHP, that’s where Claude is very valuable and that’s where a lot of the praise feel justified too.
adastra22 14 hours ago [-]
I have had no problems with using Claude on large rust projects. The compiler errors usually point it towards fixing its mistakes (just like they do for me).
johnisgood 12 hours ago [-]
Feed it Crystal documentation and example code. That is what I did with more obscure programming languages and it worked out well in the end.
lm28469 12 hours ago [-]
The only way I manage to get any benefits from LLMs is to use them as an interactive rubber duck.
Dump your thoughts in a somewhat arranged manner, tell it about your plan, the current status, the end goal, &c. After that tell it to write 0 code for now but to ask questions and find gaps in your plan. 30% of it will be bullshit but the rest is somewhat useable. Then you can ask for some code but if you care about quality or consistency with you existing code base you probably will have to rewrite half of it, and that's if the code works in the first place
Garbage in garbage out is true for training but it's also true for interactions
5 hours ago [-]
herbst 14 hours ago [-]
You didn't specify any architecture design. Your prompts are about 10% of what would be needed to one shot this. This is what you do wrong.
awalsh128 15 hours ago [-]
One of the things you can do is provide a guidance file like CLAUDE.md including not only style preferences but also domain knowledge so it has greater context and knows where to look. Just ask it make one and then update and change as needed.
mathiaspoint 9 hours ago [-]
I think you need play around with some of the early codegen models so you can get a better intuition for how LLMs work/fail.
rustystump 15 hours ago [-]
Tbh dawg, those tasks sound intentionally obtuse. It looks like u are doing more esoteric work than the crud react slop us mortals play in on the daily which is where ai shines.
not_your_vase 15 hours ago [-]
I work almost exclusively with embedded devices, with low level code (mostly C, Rust, Assembly and related frameworks) - and that's where I also ask for help from LLMs.
xoac 15 hours ago [-]
Did you intentionally pick your career to make the AI look bad?
adastra22 14 hours ago [-]
It works fine in those domains. I speak from experience. You need CI tools the agent can access, and lots of tests.
baka367 15 hours ago [-]
I find it useful to ask it to build a design document first and push to add details where i see it lacking.
After a few iteration i then ask it to implement the design doc to mostly-better results.
jm547ster 13 hours ago [-]
Sounds like you picked some obscure tasks to test it that would obviously have low representation in the data set? That is not to say it can't be helpful augmenting some lower represented frameworks/tools - just you'll need to equip it with better context (MCPs/Docs/Instruction files)
A key skill in using an LLM agentic tool is being discerning in which tasks to delegate to it and which to take on yourself. Try develop that skill and maybe you will have better luck.
pjmlp 15 hours ago [-]
I managed to get most AIs to generate C# code when I ask for Java stuff, so it is always a kind of template generator that still isn't quite there.
mexicocitinluez 9 hours ago [-]
That's interesting. I use it mainly for C# and Javascript/Frontend stuff.
I wonder if it's because there are maybe millions of MSDN articles, but I don't know if a Java analog to MSDN exists.
jki275 2 hours ago [-]
Claude is bad at embedded. Not sure why, it just is what it is for now.
thefoyer 12 hours ago [-]
What an odd thing to ask it. I installed claude code and ran it from my terminal. Just asked it to simply give me a node based rest API with X endpoints with these jobs, and then I told it to write the unreal engine c++ to consume those endpoints. 2500 lines of code later, it worked.
bakugo 14 hours ago [-]
What you're doing wrong is that you're asking it for something more complicated than babby's first webapp in javascript/python.
When people say things like "I told Claude what I wanted and it did it all on the first try!", that's what they mean. Basic web stuff that that is already present in the model's training data in massive volumes, so it has no issue recreating it.
No matter how much AI fanatics try to convince you otherwise, LLMs are not actually capable of software engineering and never will be. They are largely incapable of performing novel tasks that are not already well represented in their weights, like the ones you tried.
johnisgood 12 hours ago [-]
What they are not capable of is replacing YOU, the human who is supposed to be part of the whole process (incl. architectural). I do not think that this is a limitation. In fact, I like being part of the process.
zer00eyz 15 hours ago [-]
> What am I doing wrong?
My coding ranges from "exotic" to "boiler plate" on any given day.
> Create a boilerplate Zephyr project skeleton, for Pi Pico
Yea... Asking Claude to help you with a low documentation build root system is going to go about the same way, I know first hand about how this works.
> I asked it to create 7x10 monochromatic pixelmaps
Wrong tool for the job here. I dont think IDE and Pixelmaps have as large of an intersection as you think they do. Claude thinks in tokens not pixels.
Pick a common language (js, python, rust, golang) pick something easy (web page, command line script, data ingestion) and start there. See what it can do and does well, then start pushing into harder things.
Cloudef 14 hours ago [-]
If you ask more than a single function, its more trouble than worth
johnfn 13 hours ago [-]
The thing you are doing wrong is asking it to solve hard problems. Claude Code excels at solving fairly easy, but tedious stuff. Refactors that are brainless but take an hour. It will knock those out of the park. Fire up a git worktree and let it spin on your tedious API changes and stuff while you do the hard stuff. Unfortunately, you'll still need to use your brain for that.
stevenhuang 14 hours ago [-]
So I've used Zephyr. The thing you're doing wrong is expecting LLMs to scaffold you a bunch of files from a relatively niche domain. Zephyr is also a mess of complexity with poor documentation. You should ask it to consult official docs and ask it to use existing tools (west etc) and board defs to do the scaffolding.
iamshs 14 hours ago [-]
I just had AI write me a scraper and download 5TB of invaluable data which I had been eyeing for a long time. All in ten days. At the end of it, I still don’t know anything about python. It’s a bliss for people like me. All dependencies installed themselves. I look forward to using it even more.
One frustration was the code changed so much in ChatGPT so had to be lots of prompts. But I had no idea what the code was anyways. Understood vibe coding. Just used ChatGPT on a whim. Liked the end result.
eggsandbeer 16 hours ago [-]
Write some hooks dawg
varenc 19 hours ago [-]
> In the OpenAI API, “GPT-5” corresponds to the “minimal” reasoning level, and “GPT-5 (Reasoning)” corresponds to the “low” reasoning level. (159135374)
It's interesting that the highest level of reasoning that GPT-5 in XCode supports is actually the "low" reasoning level. Wonder why.
lukasb 18 hours ago [-]
Yeah I don't get why they don't support Opus given that you're bringing your own API key.
nezubn 18 hours ago [-]
you can use the API key, and it’ll give you access to all the model.
This is Claude sign in using your account. If you’ve signed up for Claude Pro or Max then you can use it directly. But, they should give access to Opus as well.
natch 17 hours ago [-]
They should document it that way.
breadwinner 18 hours ago [-]
It seems every IDE now has AI built-in. That's a problem if you're working on highly confidential code. You never know when the AI is going to upload code snippets to the server for analysis.
baby 17 hours ago [-]
Not trying to be mean but I would expect comments on HN on these kind of stories to be from people who have used AI in IDEs at this point. There is no AI integration that runs automatically on a codebase.
ygritte 16 hours ago [-]
This could change on a daily basis, and it's a valid concern anyway.
paradite 17 hours ago [-]
There is automatic code indexing from Cursor.
Autocomple is also automatically triggered when you place your cursor inside the code.
rafram 17 hours ago [-]
Yes, Cursor, “The AI Code Editor.”
LostMyLogin 17 hours ago [-]
Cursor is an AI IDE and not what they are describing.
factorialboy 13 hours ago [-]
> There is no AI integration that runs automatically on a codebase.
Don't be naive.
TiredOfLife 12 hours ago [-]
This is HN. 10 years ago that would be true, but now I expect 99% of commenters to have newer used the thing they are talking about or used it once 20 years ago for 10 minutes, or even nkt read the article.
lalo2302 13 hours ago [-]
Gitkraken does
nh43215rgb 18 hours ago [-]
> "add their existing paid Claude account to Xcode and start using Claude Sonnet 4"
Wont work by default if I'm reading this correctly
tcoff91 18 hours ago [-]
Neovim and Emacs don’t have it built in. Use open source tools.
simonh 17 hours ago [-]
They both support it via plugins. Xcode doesn’t enable it by default, you need to enable it and sign into an account. It’s not really all that different.
renewiltord 17 hours ago [-]
[flagged]
TheDong 16 hours ago [-]
What commonly gets installed in those cases is actual malware, a RAT (Remote Admin Tool) that lets the attacker later run commands on your laptop (kinda like an OpenSSH server, but also punching a hole through nat and with a server that they can broadcast commands broadly to the entire fleet).
If the attacker wants to use AI to assist in looking for valuables on your machine, they won't install AI on your machine, they'll use the remote shell software to pop a shell session, and ask AI they're running on one of their machines to look around in the shell for anything sensitive.
If an attacker has access to your unlocked computer, it is already game over, and LLM tools is quite far down the list of dangerous software they could install.
Maybe we should ban common RAT software first, like `ssh` and `TeamViewer`.
jumploops 16 hours ago [-]
> They won’t install AI on your machine
Actually they’ll just the AI you already have on your machine[0]
In this attack, the malware would use Claude Code (with your credentials) to scan your own machine.
Much easier than running the inference themselves!
You know, I should have realized this was a troll account with the previous comment.
I guess that's on me for being oblivious enough that it took this obvious of a comment for me to be sure you're intentionally trolling. Nice work.
OsrsNeedsf2P 16 hours ago [-]
If you're worried about someone accessing your unlocked computer to install LLMs, you might need to rethink your security model.
renewiltord 16 hours ago [-]
They could install anything. Including Claude Code and then run it in background as agent to exfiltrate data. I'm a security professional. This is unacceptable
BalinKing 16 hours ago [-]
I think the parent commenter was pointing out that, instead of installing Claude Code, they could just install actual malware. It's like that phrase Raymond Chen always uses: "you're already on the other side of the airtight hatchway."
renewiltord 16 hours ago [-]
Yes but Claude Code could install malware when I'm not paying attention. And when I remove with MalwareBytes it will return because LLMs are not AGI.
BalinKing 7 hours ago [-]
Isn't the general advice that if malware has been installed specifically due to physical access, then the entire machine should be considered permanently compromised? That is to say, if someone has access to your unlocked machine, I've heard that it's way too late for MalwareBytes to be reliable....
eddyg 16 hours ago [-]
You should install LuLu if you’re that concerned. There are far more nefarious ways of “getting your data”.
Yes. I am so worried as well. This is why I installed an AI to double-check if the password I entered is correct when logging in. Fight fire with fire
viraptor 16 hours ago [-]
This is not a realistic concern. If you're working on highly confidential code (in a serious meaning of that phrase), your while environment is already either offline or connecting only through a tightly controlled corporate proxy. There's no accidental leaks to AI from those environments.
dijit 15 hours ago [-]
thanks for giving the security department more reasons to think that way.
I spent the last 6 months trying to convince them not to block all outbound traffic by default.
postalcoder 15 hours ago [-]
The right middle ground is running Little Snitch in alert mode. The initial phase of training the filters and manually approving requests is painful, but it's a lot better than an air gap.
dijit 14 hours ago [-]
that’s what I do, but since it’s in my control the security teams don’t like it. ;)
troupo 15 hours ago [-]
There are ranges of security concerns and high confidentiality.
For most corporate code (that is highly confidential) you still have proper internet access, but you sure as hell can't just send your code to all AI providers just because you want to, just because it's built into your IDE.
jama211 18 hours ago [-]
Well that depends on whether you give it access or not, apple’s track record with privacy gives me some hope
Mashimo 12 hours ago [-]
On IDEA the organisation who controls the license can disable the build in (remote) AI. (Not the local auto complete one)
But I guess the user could still get a 3rd party plugin.
c_ehlen 14 hours ago [-]
Most of the big corporations will have a special contract with the AI labs with 0 retention policies.
I do not think this will be an issue for big companies.
sneak 17 hours ago [-]
No. It’s always something you have to turn on or log into.
Also, there are plenty of editors and IDEs that don’t.
Let’s stop pretending like you’re being forced into this. You aren’t.
bsimpson 17 hours ago [-]
Sublime Text doesn't by default.
UltraSane 17 hours ago [-]
People working on highly confidential code will NOT have access to the public internet.
abenga 16 hours ago [-]
There is a gulf and many shades between "this code should never be on an internet-connected device" and "it doesn't matter if this code is copied everywhere by absolutely anyone".
UltraSane 16 hours ago [-]
To me "highly confidential" would mean "isolated from the internet" or else it isn't going to be "highly confidential" for very long.
troupo 15 hours ago [-]
Have you seen a lot of code from Klarna, Storytel, Spotify (companies I've worked at)?
None of these companies are isolated from the internet.
UltraSane 8 hours ago [-]
I bet their devs are.
XorNot 17 hours ago [-]
If it's that confidential you should be on an airgapped network.
There's simply no way to properly secure network connected developer systems.
fny 17 hours ago [-]
You do know: when it's enabled.
doctorpangloss 15 hours ago [-]
many enterprises store their code on GitHub, owned by Microsoft, operator of Copilot
you can use Claude via bedrock and benefit from AWS trust
Gemini? Google owns your e-mail. Maybe you're even one of those weirdos who doesn't use Google for e-mail - I bet your recipient does.
I hate that most browsers are willing to render React SPAs.
n2h4 16 hours ago [-]
lynx, elinks, and w3m don't
aurareturn 8 hours ago [-]
It was sarcasm :)
bigyabai 18 hours ago [-]
Really?
jama211 18 hours ago [-]
“Boycott” is a pretty strong term. I’m sensing a strong dislike of ai from you which is fine but if you dislike a feature most people like it shouldn’t be surprising to you that you’ll find yourself mostly catered to by more niche editors.
isodev 18 hours ago [-]
I think it's a pretty good word, let's not forget how LLMs learned about code in the first place... by "stealing" all the snippets they can get their curl hands on.
jama211 2 hours ago [-]
Ah the classic “I don’t want to acknowledge how right that person is about their point, so instead I’ll ignore what they said and divert attention to another point entirely”.
You’re just angry and adding no value to this conversation because of it
astrange 16 hours ago [-]
And by reading the docs, and by autogenerating code samples and testing them against verifiers, and by paying a lot of people to write sample code for sample questions.
Your claim: "by reading the docs, and by autogenerating code samples and testing them against verifiers, and by paying a lot of people to write sample code for sample questions."
Your link: "Grade school math problems from a hardcoded dataset with hardcoded answers" [1]
GSM8K consists of 8.5K high quality grade school math word problems. Each problem takes between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − × ÷) to reach the final answer.
--- end quote ---
khafra 13 hours ago [-]
My two claims:
1. OpenAI has been doing verifier-guided training since last year.
2. No SOTA model was trained without verified reward training for math and programming.
I supported the first claim with a document describing what OpenAI was doing last year; the extrapolation should have been straightforward, but it's easy for people who aren't tracking AI progress to underestimate the rate at which it occurs. So, here's some support for my second claim:
> the extrapolation should have been straightforward,
Indeed."By late next month you'll have over four dozen husbands" https://xkcd.com/605/
> So, here's some support for my second claim:
I don't think any of these links support the claim that "No SOTA model was trained without verified reward training for math and programming"
https://arxiv.org/abs/2507.06920: "We hope this work contributes to building a scalable foundation for reliable LLM code evaluation"
https://arxiv.org/abs/2506.11425: A custom agent with a custom environment and a custom training dataset on ~800 predetermined problems. Also "Our work is limited to Python"
I couldn't get it to properly syntax highlight and autosuggest even after spending over an hour hunting through all sorts of terrible documentation for kate, clangd, etc. It also completely hides all project files that aren't in source control, and the only way to stop it is to disable the git plugin. What a nightmare. Maybe I'll try VSCodium next.
typpilol 18 hours ago [-]
I thought vscodium was just vscode but open source. Won't any issues in vscode also be present in vscodium?
gary_0 16 hours ago [-]
It can't access most Microsoft online services including Copilot, which happens to disable most of the features I don't want. (I understand this is both by design, as well as because Microsoft forbids unofficial forks from doing so.)
ygritte 14 hours ago [-]
However, MS do everything they can to make plugins not work in VSCodium. And the plugin marketplaces are separate now.
sneak 17 hours ago [-]
Many of the popular features in VS Code are provided by plugins that are not open source and thus not provided with VSCodium.
isodev 18 hours ago [-]
Kate is brilliant.
armadyl 17 hours ago [-]
If you're on macOS there's Code Edit as a native solution (fully open source, not VC backed, MIT licensed), but it's currently in active development: https://www.codeedit.app/.
Otherwise there's VSCodium which is what I'm using until I can make the jump to Code Edit.
yycettesi 12 hours ago [-]
Okay dann lass die Ablage erst laufen ohne Teig dann kannst du mit Teig machen wenn du übergaben machst zwischen 13:30 und 14:00 Uhr dann bitte schichtführer/in Bescheid sagen bzw. geben tschüss
qbane 18 hours ago [-]
How about Sublime Text (not really an IDE, just text editor)
kristopolous 19 hours ago [-]
Neovim, emacs?
mr_toad 18 hours ago [-]
Amusing that Emacs that came out of the MIT AI lab, and heavily uses Lisp, a language that used to be en vogue for AI research.
PessimalDecimal 18 hours ago [-]
Amusing is one word for it. Expert systems were all the rage until they weren't. We'll see how LLMs do by comparison.
kristopolous 16 minutes ago [-]
People are trying to achieve the same thing - rules based systems with decision trees. That's still one of the most lucrative use cases.
vrighter 13 hours ago [-]
The so-called "guardrails" used for LLM are very close to expert systems, imo.
Since the landscape of potentially malicious inputs in plain english is practically infinite, without any particular enforced structure for the queries you make of it, means that those "guardrails" are, in effect, an expert system. An ever growing pile of if-then statements. Didn't work then, won't work now.
monkeyelite 7 hours ago [-]
You are word associating. The ideas in each part of that chain are unrelated.
That’s not really native support for LLMs? It’s supporting some LSP feature for completions.
vrighter 13 hours ago [-]
Neovim already supports LSP servers. The fact that a language server exists for anything, doesn't make neovim (or any other editor) "support" the technology. It doesn't, what it does support is LSP, and it doesn't and couldn't care less what language/slop the LSP is working with.
brigandish 18 hours ago [-]
You have to enable it and install a language server, that's not the same as an LLM being baked in.
simonh 17 hours ago [-]
It’s not baked in, in that sense. You still have to enable it in XCode and link it to a Claude account. It’s basically the same.
brigandish 16 hours ago [-]
At the level of "Having to configure something to use it", they're the same, but then that's the same as the hundreds of other config options then. I think we can be slightly more precise than that.
In Neovim the choice of language server and the choice of LLM is up to the user, (possibly even the choice of this API, I believe, having only skimmed the PR) while both of those choices are baked in to XCode, so they're not the same thing.
simonh 13 hours ago [-]
That's fair enough, but it's the opposite complaint, that XCode's LLM support is more limited because it is proprietary. That's a perfectly valid and reasonable objection, of course.
Gosh, it's almost like a proper IDE has synonymous features with LLMs
drusepth 19 hours ago [-]
Ironically, you could probably vibe code your own.
15 hours ago [-]
rs186 18 hours ago [-]
Good luck getting just scroll bar right with vibe coding. You'll be surprised how much engineering is done to get that part work smoothly.
CamperBob2 17 hours ago [-]
If enough examples are in-distribution, the model's scroll bar implementation will work just fine. (Eventually, after the human learns what to ask for and how to ask for it.)
Why wouldn't it?
shakna 15 hours ago [-]
Most programs today regularly have bugs with scrolling. Thus, an LLM will produce for you... A buggy piece of code.
adastra22 14 hours ago [-]
LLMs are not Xerox machines. They can, in fact, produce better code than is in their training set.
mirkodrummer 13 hours ago [-]
That is funny for how much is wrong. Ask the LLMs to vibe code a text editor and you'll get a React app using Supabase. Engineering !== Token prediction
drusepth 1 hours ago [-]
I think this comment exposes an important point to make: people have different opinions of what "vibe coding" even means. If I were to ask an LLM to vibe code a text editor, I guarantee you I wouldn't get a React app using Supabase -- because I'd give it pages of requirements documentation and tell it not only what I want, but the important decisions on how to make it.
Obviously no model is going to one-shot something like a full text editor, but there's an ocean of difference between defining vibe coding as prompting "Make me a text editor" versus spending days/weeks going back and forth on architecture and implementation with a model while it's implementing things bottom-up.
Both seem like common definitions of the term, but only one of them will _actually_ work here.
adastra22 11 hours ago [-]
Non sequitur?
I have used agentic coding tools to solve problems that have literally never been solved before, and it was the AI, not me, that came up with the answer.
If you look under the hood, the multi-layered percqptratrons in the attention heads of the LLM are able to encode quite complex world models, derived from compressing its training set in a which which is formally as powerful as reasoning. These compressed model representations are accessible when prompted correctly, which express as genuinely new and innovative thoughts NOT in the training set.
mirkodrummer 10 hours ago [-]
> I have used agentic coding tools to solve problems that have literally never been solved before, and it was the AI, not me, that came up with the answer.
Would you show us? Genuinely asking
CamperBob2 4 hours ago [-]
Ask the LLMs to vibe code a text editor, and you'll get pretty much what you deserve in return for zero effort of your own.
Ask the best available models -- emphasis on models -- for help designing the text editor at a structural rather than functional level first, being specific about what you want and emphasizing component-level test whenever possible, and only then follow up with actual code generation, and you'll get much better results.
ZYbCRq22HbJ2y7 18 hours ago [-]
Do you really think so? Have you ever explored the source of something like:
I've worked in 3 different WYSIWYG editors for web and desktop applications over the years, lightly contributed to a handful of other open-source editors, and spent plenty of time building my own personal editors from scratch (and am currently using gpt-5 to fix my own human bugs in a rewrite of the Notebook.ai text editor that I re-re-implemented ~8 years ago).
Editors are incredibly complex and require domain knowledge to guide agents toward the correct architecture and implementation (and away from the usual naive pitfalls), but in my experience the latest models reason about and implement features/changes just fine.
PessimalDecimal 18 hours ago [-]
Doesn't have to. The LLM will do it! We're done with code, aren't we?
wfhrto 16 hours ago [-]
Code is still there, but humans are done dealing with it. We're at a higher level of abstraction now. LLMs are like compilers, operating at a higher level. Nobody programs assembly language any more, much less machine language, even though the machine language is still down there in the end.
ZYbCRq22HbJ2y7 16 hours ago [-]
> Nobody programs assembly language
They certainly do, and I can't really follow the analogy you are building.
> We're at a higher level of abstraction now.
To me, an abstraction higher than a programming language would be natural language or some DSL that approximates it.
At the moment, I don't think most people using LLMs are reading paragraphs to maintain code. And LLMs aren't producing code in natural language.
That isn't abstraction over language, it is an abstraction over your computer use to make the code in language. If anything, you are abstracting yourself away.
Furthermore, if I am following you, you are basically saying, you have to make a call to a (free or paid) model to explain your code every time you want to alter it.
I don't know how insane that sounds to most people, but to me, it sounds bat-shit.
guluarte 18 hours ago [-]
even nvim is getting native support for llms
__MatrixMan__ 18 hours ago [-]
It doesn't matter how they feel about LLMs, ignoring their battle hardened plugin system and going native would be bad architecture.
tcoff91 18 hours ago [-]
It’s just native support for ghost text. It’s not llm specific
Of course it is, because that would be an aggressively stupid thing to do. Like boycotting syntax highlighting, spellckecking, VCS integration or a dozen other features that are th whole pint of IDEs.
If you don’t want to use LLM coding assistants – or if you can’t, or it’s not a technology suitable for your work – nobody cares. It’s totally fine. You don’t need to get performatively enraged about it.
pzo 11 hours ago [-]
"Claude in Xcode is now available in the Intelligence settings panel, allowing users to seamlessly add their existing paid Claude account to Xcode and start using Claude Sonnet 4"
Headline quite misleading. So not exactly that it will ship in Xcode but will allow connect to paid account.
elpakal 3 hours ago [-]
it’s funny how much i’ve seen mods here nitpick titles, yet this one gets to slide? Not only does the title summarize one small part of this page, but it’s also a bit misleading. Models will not ship IN Xcode, Xcode can now send your code to the API.
Archonical 19 hours ago [-]
This is great. I've been using Xcode with a separate terminal to run Claude Code, which has been a painful setup.
jacurtis 18 hours ago [-]
Agreed. Claude Code is an amazing experience with Jetbrains IDEs, but for some reason Xcode just hates having claude directly edit the files.
folli 16 hours ago [-]
How do you use it with Jetbrains? Junie? Or just as a separate CLI session?
I use VS Code with Claude Code, then I just use Xcode to build and launch
oefrha 18 hours ago [-]
The annoying thing is the official Swift extension can sometimes flag errors in perfectly fine code with zero problem in Xcode. So I’m forced to live with persistent “errors” when editing in VS Code/Cursor.
heywoods 18 hours ago [-]
I’m building my first iOS app ever so I know it has much more to do with me not understanding Xcode but getting builds to succeed after making changes with Claude code has been a nightmare. If you or anyone have any tips, guides, prayers, incantations for how to get changes in one to not clobber the other and leave me in xproj symlink hell I would be so grateful.
isodev 18 hours ago [-]
Same, only it's Zed for me and Claude Code in a terminal
rusinov 12 hours ago [-]
What was your problem with it? I see it running in a terminal more convenient (can point it to read local files outside of a project folder, for example)
spike021 18 hours ago [-]
if i could just get claude to properly remember it can directly edit the xcode project file, that'd be great.
for whatever reason it ignores my directive that it can from the CLAUDE file at least half the time. one time it even decided it needed to generate a fancy python script to do it. bizarre.
hellonoko 15 hours ago [-]
You can use VSCode and XCode will automatically update when the files change.
kelnos 19 hours ago [-]
How so? I don't use xcode, but I much prefer having an agent in its own "app" so to speak.
jama211 18 hours ago [-]
Likely so it can auto suggest, directly edit code, integrate properly etc
mirkodrummer 13 hours ago [-]
But they won't fix the infinite number of bugs Xcode has, its slowness and subpar ux
terminalbraid 8 hours ago [-]
I find the xcode experience so awful I generally keep a few terminals with some curated nvim and other tools to make up for things like anything git-related, diffs, LLM integration, etc. (fwiw the swift LSP is also pretty good)
This isn't going to change my workflow at this point.
tensility 4 hours ago [-]
Oddly, I thought that the irony was that this was released at the same time as Anthropic changing Claude's terms of service to eliminate user privacy. So much for Apple being a privacy-first company. I guess it's okay as long as they are just piping their developers' code to a third party, eh?
havefunbesafe 7 hours ago [-]
Why would this be an improvement to using Claude in the terminal? Xcode does not have an integrated terminal.
Jaxkr 18 hours ago [-]
The “Cursor for Xcode” startups just got Sherlocked…
siva7 17 hours ago [-]
Were there really such startups? It's so obviously a bad idea..
Like… you’d expect a company to evaluate the potential for competition, right? But these AI companies are obviously not actual companies with any business model, most are just trying to grab some investors money while they can surf the hype.
Anything’s better than the current Xcode autocomplete.
My pet peeve is it will try to autocomplete any string you start typing with just random crap it thinks you might want in a string.
raxxorraxor 13 hours ago [-]
I think all autocomplete solution are crappy, no matter how sophisticated the AI. It is surprising how often the obvious choice is wrong, but it often just is. I deactivated it.
Generating some code is fine, but I now prefer the deterministic autocomplete for my types I have available in my current context.
TiredOfLife 11 hours ago [-]
Maybe you have not used a good one. Even the small locall only 100MB models Jetbrains uses are fine. Codeium/Windsurf one is good.
mrtksn 18 hours ago [-]
Weren’t the AI API’s converging? Why not let the users use whatever LLM they like.
drak0n1c 16 hours ago [-]
Apple really should open it up to any model provider that has an “OpenAI-style API” by letting the user put in a base URL, api key, model id, and a few params like context limit as needed.
Razengan 16 hours ago [-]
Xcode (26) already has that.
thefoyer 12 hours ago [-]
I haven't opened up xcode in years (thank god) I suppose this was inevitable. They have to keep up
misiek08 14 hours ago [-]
"Be ready for AIpple Revolution! We are making programming something different that hasn’t happen before! We are the first to introduce AI assisted agent coding with full integration with Siri, visionOS and so much more. New, holistic approach to creativity and efficiency"
benjismith 15 hours ago [-]
Sonnet only?
CharlesW 3 hours ago [-]
I'd encourage you to read TFA, but why bother? The submitter clearly didn't either.
camillomiller 14 hours ago [-]
>> Coding intelligence provides inconsistent results when modifying files that contain thousands of lines.
Under the known issues
adastra22 14 hours ago [-]
This is true of all LLM agents. It’s a context window problem.
Razengan 16 hours ago [-]
Does anybody know why Anthropic doesn't let you remove your payment info from your account, or how to get support from them?
I bought a Pro subscription, the send button on their dumb chatbot box is disabled for me (on Safari), and I still get "capacity constraints' limits. Filed a chargeback with my bank just because of the audacity of their post-purchase experience. ChatGPT-5 works good enough for coding too.
From my experience with Claude Opus it seems like it tries to be "too smart" and doesn't seem to keep up with the latest APIs. It suggested some code for a iOS/macOS project that was only valid on tvOS, and other gaffs.
adastra22 14 hours ago [-]
The Pro plan ($20/mo?) is not and never was unlimited.
nsonha 16 hours ago [-]
Does it have agent mode? Copilot for XCode has it and provides both GPT and Claude models, free or paid
wahnfrieden 16 hours ago [-]
They also upgraded the GPT-4.1 (actually a special Apple variant) to GPT-5 by default, with the option to use GPT-5-thinking, using your ChatGPT subscription. I don't know if it's a special Apple variant of GPT-5 but this is a big upgrade and more exciting than Sonnet 3.7.
I also wonder if it will have separate rate limits from ChatGPT (app/web) and Codex CLI (which currently has its own rate limits).
19 hours ago [-]
16 hours ago [-]
throwawayaiai 5 hours ago [-]
[dead]
aziya 15 hours ago [-]
[dead]
aiisajoke 19 hours ago [-]
[flagged]
jama211 18 hours ago [-]
I’m sensing just a touch of anger that is making your comment very non-objective.
MangoToupe 19 hours ago [-]
> Apple can’t even ship software that works.
Hey they're doing better than anyone else.
aniforprez 19 hours ago [-]
Big disagree. My Macbook has hard crashed into restarting more times than my PC desktop has in years. The other day I literally just closed my M2 Macbook and it for some reason just completely crashed and shut itself off.
airstrike 18 hours ago [-]
FWIW I abuse my M2 daily and only reboot once a year basically, and only because of some update.
Something could be wrong with your specific Macbook
lanstin 17 minutes ago [-]
Bad memory maybe? I also have much less trouble with my personal Macs than my personal Linux boxes (although I push the edges on beta and cutting edge software a lot more on linux, while I treat the MacOS as something I can ignore the versions of, unless they notify me to upgrade). But haven't had a abrupt reboot outside of software updates in forever (on personal macs - work mac reboots at IT's whims).
MBCook 18 hours ago [-]
I haven’t seen a crash in at least a decade of daily use.
Back in the older Intel days I did sometimes, perhaps due to the GPU switching and flakey Nvidia drivers. But even then it certainly wasn’t common.
jama211 18 hours ago [-]
Something is wrong with your laptop. That is absolutely not normal, and not something you should blame on the software design.
aniforprez 17 hours ago [-]
If you want me to blame "software design" then I can point to the fact that the TV app lags constantly on my M2 especially when I click the "download episode" button which should not ever be a thing on their native hardware, that there is a WindowsServer process that steadily eats up all the RAM that consistently needs to be quit in order to free up memory, that the headphone balance constantly resets to some random value where the audio comes mostly out of my left ear which has apparently been a problem for decades, that Spotlight is less than useless and STILL shows "Disk Utility" as the top result when I search "dis" instead of showing Discord an app that I open on an almost daily basis, that randomly the fingerprint login simply will not work for seconds at a time...
This is just the stuff I remember. This is a 16GB M2 Macbook Pro. Not a single native Apple made app or process should be lagging, let alone using up enough memory that sometimes apps just completely crash. Again, my windows desktop hasn't had a blue screen or a hard crash in the decade that I've used it and it had a worse configuration with 16GB of RAM and an old AMD CPU.
jama211 2 hours ago [-]
Seems pretty a pretty cherry picked list tbh. I could find issues like that on any OS. I suspect you just are frustrated with your particular machine, which is fine, but you were talking about it like it was a universal truth lol
whatevertrevor 18 hours ago [-]
My hot take is all OSes are kinda bad for daily driving.
Apple has no qualms breaking backwards compatibility for core functions like bluetooth connectivity in MacOS. Windows has backwards compatibility, but increasingly worsening UX, throwing ads and subscriptions in your face before you can even log in, and a bad security/process isolation model. Desktop Linux is a case of "how many hours before I find out a critical part of my workflow is unsupported/bad/broken/unconfigurable/pain-to-configure in this particular distro/desktop environment".
jama211 18 hours ago [-]
I think they’re pretty amazing considering how hard a problem it is. Also, we forget how bad os’s used to be. They’re absolutely rock solid compared to the past.
genewitch 1 hours ago [-]
Aside: I bought my first battery backup because the only thing that ruined my uptime on Windows NT 4 was power outages. I would have kept on using NT 4 as my desktop OS, but MS wanted to sell more licenses so newer directX was unsupported on NT4. I moved to Windows 2000, then eventually to XP x64 Edition.
I installed and fixed a lot of 95, 98[se], ME, and XP OSes for other people, though. Thousands. I never bothered with any of those OSes on my own machines, though. The first "consumer" OS i used was win 7 Ultimate Edition (signed by Ballmer, natch). I use 11, now, and i'm fine with it. I think it's because i am "grandfathered" in to win11 without a microsoft login; i just mentioned last night that if i had to reinstall windows on this machine, i probably wouldn't, due to that requirement now.
Anyhow all this is to say, hogwash. Windows has been perfect in the past. Time marches on, fruit flies like a banana, and all that.
whatevertrevor 17 hours ago [-]
I'm not trying to diminish the complexity of a desktop OS of course, but sometimes it's hard not to feel the priorities are all over the place. Don't get me wrong, I'm not nostalgic about Windows XP, I actually remember how many freezes and crashes I used to have back then.
My frustration is more born out of the OS rough edges constantly getting in the way of tasks I actually want to focus on and accomplish, which doesn't play well with my ADHD.
MangoToupe 18 hours ago [-]
Man, agree to disagree. What you're describing is somehow still miles better than any experience I've ever had with a windows or linux latptop.
jama211 18 hours ago [-]
Exactly
Razengan 16 hours ago [-]
I have been trying to make iOS/macOS apps for years, but god, every time I have a go at it, Apple's documentation regime is still hot garbage. Eons ago I gave up Windows development because of Microsoft's inconsistent and uncertain APIs, but MS had great documentation. Apple is the opposite.
The "best" way to get the "latest" details on Apple's APIs is to suffer through mind-numbingly vapid WWDC videos with their reverse uncanny valley presenters (where humans pretend to be robots) and keep your full attention on them to catch a fleeting glimpse of a single method or property that does what you were looking for. Even 1.5x/2x speed is torture. I tried to get AIs to sift through the transcripts of their videos, and may Skynet forgive me for this cruelty.
Then when you go try to use that API, oops it's been changed in the current beta and there's no further documentation on it except auto-generated headers.
They also removed bookmarks from Xcode's built-in documentation browser years ago, and it doesn't retain a memory of previously open tabs, and often seems to be behind the docs on their websites.
I wish they would just provide open-source sample apps of each type (document-based, single-window etc.) for each of their platforms that fully use the latest APIs. At least that would be easier to ask AIs on, since that is what they seem to be going for now anyway.
I pretty much had the same experience recently when I had to deal with their Screen Time APIs. Had to go through the wwdc videos because the documentation was lack lustre.
echelon 19 hours ago [-]
Can't each of these companies with IDE integrations slurp up the network traffic and distill Anthropic's models?
If you can listen to billions of tokens a day, you can basically capture all the magic.
jasonjmcghee 19 hours ago [-]
Terms of service specifically prohibits this.
ceejayoz 18 hours ago [-]
How much of the training set comes from websites with "no automated scraping" in their terms?
echelon 19 hours ago [-]
The companies stole that data from the world, so I don't see why we couldn't take it back.
TimeBearingDown 18 hours ago [-]
It's a nice sentiment. The companies with the integrations are the ones that could take it back, but they don't have the incentive to break legal agreements and share with the world.
Meanwhile the creative output of humanity is distilled into black boxes to benefit those who can scrape it the most and burn the most power, but this impact is distributed amongst everyone, so again there's little incentive among those who could create (likely legal) change.
adastra22 14 hours ago [-]
That is not how training works…
echelon 8 hours ago [-]
That's how model distillation works.
DeepSeek is the most notable case, but it's been used lots.
And the foundation model companies are scraping and exfiltrating each others' data.
AndyKelley 18 hours ago [-]
Apple.com advertising a Mac Mini:
> Built for Apple Intelligence.
> 16-core Neural Engine
These Xcode release notes:
> Claude in Xcode is now available in the Intelligence settings panel, allowing users to seamlessly add their existing paid Claude account to Xcode and start using Claude Sonnet 4
All that dedicated silicon taking up space on their SoC and yet you still have to input your credit card in order to use their IDE. Come on...
geor9e 18 hours ago [-]
To run a model locally, they would need to release the weights to the public and their competitors. Those are flagship models.
They would also need to shrink them way down to even fit. And even then, generating tokens on an apple neural chip would be waaaaaay slower than an HTTP request to a monster GPU in the sky. Local llms in my experience are either painfully dumb or painfully slow.
hu3 17 hours ago [-]
Hence the "come on".
isodev 17 hours ago [-]
"Apple Intelligence", at least the part that's available to developers via the Foundation Models framework is a tiny ~3B model [0] with very limited context window. It's mainly good for simple things like tagging/classification and small snippets of text.
Yes, but the Foundation Model framework can seamlessly use Apple's much larger models via Private Cloud Compute or switch to ChatGPT.
When macOS 26 is officially announced on September 9, I expect Apple to announce support for Anthropic and Google models.
aetherspawn 18 hours ago [-]
I bet Apple are working on it, it’s just not ready yet and they want to see how much people actually use it.
It’s the Apple way to screw the 3rd party and replace with their own thing once the ROI is proven (not a criticism, this is a good approach for any business where the capex is large…)
jdgoesmarching 16 hours ago [-]
Local models and any OpenAI-compatible APIs are available to the Xcode Beta assistant. This is just a dedicated “sign in with x” rather than manual configuration.
18 hours ago [-]
jama211 18 hours ago [-]
Trust me, you wouldn’t want to use a model for agentic code editing that could fit on a Mac mini at this stage.
esafak 16 hours ago [-]
A 128GB Mac Mini M5 would be sweet.
15 hours ago [-]
user214412412 18 hours ago [-]
Wow they're finally getting it. The AI breakthrough will not come from procedural generation of memojis - but rather enabling developers to use your platform. But with the nearly hostile stance of your 30% take, we will see how far this goes.
JustExAWS 16 hours ago [-]
Well seeing that the most popular apps aside from games don’t have in app purchases and another subset of that has means to do payments subscriptions outside of the App Store, the 30% (actually 15% for small developers) is a boogeymen
noobly 18 hours ago [-]
What’s the ‘30% take’?
willio58 17 hours ago [-]
30% of all AppStore sales go right to Apple
sunnybeetroot 17 hours ago [-]
15% if you’re part of the small business program.
pritambaral 17 hours ago [-]
What program do I have to join for 0%?
sunnybeetroot 17 hours ago [-]
The one where you create your own mobile operating system.
latexr 13 hours ago [-]
Being in the EU and releasing in an alternative marketplace.
zer00eyz 15 hours ago [-]
The one where you collect cash directly from users, and magically make handling that have zero overhead.
Credit card processing is hard... Go price out stripe + customer service + dealing with charge backs and tell me if you really want to do processing your self.
17 hours ago [-]
natch 17 hours ago [-]
Why would you limit users to Sonnet and not allow Opus when they are paying for their own account? I mean sure some people say Sonnet is good for coding but it seems needless to limit it in this way. Or they are just really slow to catch up… oh, right.
MattDamonSpace 17 hours ago [-]
Another decade, another claim Apple’s behind and struggling to catch up
suyash 12 hours ago [-]
Still shocked Apple has not created thier own LLM, they have bought so many AI companies and have a rich talent pool and money so what's stopping them ?
vessenes 9 hours ago [-]
Terrible, terrible leadership at the top of the AI org. Plus a fundamental commitment to being a 'product first' company.
This same commitment would be why I wouldn't count them out on the AI side, btw. It's not clear that a private internal foundation model is any kind of required competitive moat. It's also not clear that having one is useless, and all the cool kids do have one or want one, but from a product view it might be that integrating makes sense.
Llama 4 (a terrible release) shows also that making such a model is still really hard. There are not enough ML leads at the pointy tip of the spear to support even 10 high quality foundation model teams globally.
If you have billions of dollars in cash and you are secure in your customer base, and you don't believe AGI liftoff will happen or change your business model, maybe you work out the kinks on product integration now using best in breed providers, not getting locked in on one of them, keep spending 1/10 to 1/100th to stay relevant and on it internally, use your incredibly powerful silicon buying power to get a next gen version of TPUs done, and wait until you know for sure you can spend under $10bn in cash on getting a great proprietary model done, one that you are certain will serve your needs.
Also this will give you time to get better leadership in the AI org.
Upshot - I think it's a mix of reasons, but not fatal, I'm not sure being slow erodes their product customer base, personally I'd like much better and more private AI out of Apple ASAP. We'll see what we get. I predict they'll move internal by 2031.
CodeLikeHell 10 hours ago [-]
They have their Foundational Models [1], so I think they’re focusing more on device-level LLMs than larger ones.
Slowly the AI craziness at Microsoft is taking the similar shape, of going all in at the begining and then losing to the competition, that they also had with Web (IE), mobile (Windows CE/Pocket PC/WP 7/WP 8/UWP), the BUILD sessions that used to be all about UWP with the same vigour as they are all AI nowadays, and then puff, competition took over even if they started later, because Microsoft messed up delivery among everyone trying to meet their KPIs and OKRs.
I also love the C++ security improvements on this release.
What is the irony? Microsoft integrated copilot in Vscode, bing, etc. Apple is integrating claude in Xcode, Jetbrains has their own AI.
Microsoft moved first with putting AI into their products then other companies put other AI into their products. Nothing about this seems ironic or in any way surprising.
[0] - https://en.wikipedia.org/wiki/Irony#Types_of_irony
Apple and Google will never choose to integrate Microsoft's services or products willingly.
It would have been more surprising if they decided to depend on Microsoft.
Bing is irrelevant, VSCode might top in some places, but it is cursor and Claude that people are reaching for, VS is really only used by people like myself that still care about Windows development or console SDKs, otherwise even for .NET people are switching to Rider.
> going all in at the begining and then losing to the competition
Sure, but there are counter examples too. Microsoft went late to the party of cloud computing. Today Azure is their main money printing machine. At some point Visual Studio seemed to be a legacy app only used for Windows-specific app development. Then they released VSCode and boom! It became the most popular editor by a huge margin[0].
[0]: https://survey.stackoverflow.co/2025/technology#most-popular...
They use it because the corporation mandates it.
AWS is the worst of this experience, even IBM Cloud has better tooling in this regard, GCP is somehow in the middle, others like Vercel/Netlify naturally don't offer this kind of setup.
IMO Firebase should be the gold standard of how to do cloud platforms
But you know what's super underrated and I think could really take a hold on the business world? Discord! The video calls are so good! And multiple streams at the same time? Zoom can't do that!
The channels, too, just blow everything teams has out of the water. The video quality is better, its way faster, has more features, and they actually work. The audio filtering stuff actually works.
I really think with the right marketing they could take over the world. Honestly can't believe they haven't tried it yet.
It’s a shame they don’t have an enterprise / business tailored product based on it
Teams took the best bits from Skype and whatever that other service Microsoft had for businesses and their phones and started over basically.
I still have pet peeves about Teams (like why dont the 'Teams' within Teams have proper group chats like Slack would, its ridiculous!) but it could be way worse. After years of screensharing hell I can finally move the stupid top bar out of my way when trying to hit 'Debug' within Visual Studio at least.
If anyone who was an original stakeholder for Keybase is reading this, please bring it back in some way someday. I'm assuming Zoom probably made you guys sign some insane non-compete sadly.
In a sea of garbage chat services all built using Electron and other bloatware, Keybase was a breath of fresh air.
It works decently enough in web, mobile and desktop.
If it weren't for the name, you'd never know it was even a Microsoft product.
Power at OpenAI seems orthogonal to ownership, precedent or even frankly their legal documents.
Sure UWP never caught on, but you know why? Win32, which by the way is also Microsoft, was way to popular and more flexible. Devs weren't going to re-write their apps to UWP in order to support phones.
Before that ex-Microsoft guy was responsible for killing Nokia OS/Meego too in favor of Windows Phone - which got abandoned. What a train-wreck of errors leading to the mobile phone duopoly today.
And Windows 11 was the reboot of Windows 10X,
https://www.youtube.com/watch?v=ztrmrIlgbIc
Google, Apple, FB or AWS would have been suitors for that licensing deal if MS didn’t bite.
I've been getting monthly emails that my free access for GitHub Copilot has been renewed for another month… for years. I've never used it, I thought that all GitHub users got it for free.
These are courtesy of LLVM/Clang (which Xcode ships with), rather than Xcode itself.
Microsoft Copilot (formerly Bing Chat)
Microsoft 365 Copilot
Microsoft Copilot Studio
GitHub Copilot
Microsoft Security Copilot
Copilot for Azure
Copilot for Service
Sales Copilot
Copilot for Data & Analytics (Fabric)
Copilot Pro
Copilot Vision
Maybe if they weren't literally the borg people would open their hearts and wallets to Redmond. They saw that Windows 10 was a privacy nightmare and what did they do? They doubled down in Windows 11. Not that I care but it plays really poorly. Every nerd on the internet spouts off about Recall even though it's not even enabled if you install straight to the latest build.
They bought GitHub and now it's a honeypot. We live in a world where we have to assume GitHub is adversarial.
_NSAKEY???
Fuck you Microsoft.
Makes sense karma catches up to them. Maybe if their mission statement and vision were pure or at least convincing they would win hearts and minds.
Facebook got excoriated for doing that with Onavo but I guess it's Good Actually when it's done in the name of protecting my computer from myself lol
https://appleinsider.com/articles/22/06/06/apple-now-has-ove...
I think that means either:
The real news is when Codex CLI / Claude Code get integrated, or Apple introduces a competitor offering to them.
Until then this is a toy and should not be used for any serious work while these far better tools exist.
Compared to stock Claude Code, this version of Claude knows a lot more about SwiftUI and related technologies. The following is output from Claude in Xcode on an empty project. Claude Code gives a generic response when it looked at the same project:
You are free to point Claude Code to that folder, or make a slash command that loads their contents. Or, start CC with -p where the prompt is the content of all those files.
Claude Code integration in Xcode would be very cool indeed, but I might still stick with VSCode for pure coding.
[1]: https://www.macrumors.com/2025/05/02/apple-anthropic-ai-codi...
I'm sticking with VSCode too, but it's a bit silly to suggest that anyone is using XCode because it's their preferred IDE. It's just the one that's necessary for any non-trivial Apple platform development.
Adding a code generator isn't a marketing ploy to get people to switch editors, it's just a small concession to the many hapless souls stuck dealing with Apple on the professional side, or masochistically building mac SwiftUI apps just to remind themselves what pain feels like.
For example: it uses Haiku as a model to run tools and most likely has automatic translations for when the model signals it wants to search or find something -> either use the built-in search or run find/fd/grep/rg
All that _can_ be done by prompting, but - as always with LLMS - prompts are more like suggestions.
/s
https://news.ycombinator.com/item?id=45062683 (Anthropic reverses privacy stance, will train on Claude chats)
I asked 2 things.
1. Create a boilerplate Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured. It generated garbage devicetree which didn't even compile. When I pointed it out, it apologized and generated another one that didn't compile. It configured also non-existent drivers, and for some reason it enabled monkey test support (but not test support).
2. I asked it to create 7x10 monochromatic pixelmaps, as C integer arrays, for numeric characters, 0-9. I also gave an example. It generated them, but number eight looked like zero. (There was no cross in ether 0 nor 8, so it wasn't that. Both were just a ring)
What am I doing wrong? Or is this really the state of the art?
Your first prompt is testing Claude as an encyclopedia: has it somehow baked into its model weights the exactly correct skeleton for a "Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured"?
Frequent LLM users will not be surprised to see it fail that.
The way to solve this particular problem is to make a correct example available to it. Don't expect it to just know extremely specific facts like that - instead, treat it as a tool that can act on facts presented to it.
For your second example: treat interactions with LLMs as an ongoing conversation, don't expect them to give you exactly what you want first time. Here the thing to do next is a follow-up prompt where you say "number eight looked like zero, fix that".
Personally, I treat those sort of mistakes as "misunderstandings" where I wasn't clear enough with my first prompt, so instead of adding another message (and increasing context further, making the responses worse by each message), I rewrite my first one to be clearer about that thing, and regenerate the assistant message.
Basically, if the LLM cannot one-shot it, you weren't clear enough, and if you go beyond the total of two messages, be prepared for the quality of responses to really sink fast. Even by the second assistant message, you can tell it's having an harder time keeping up with everything. Many models brag about their long contexts, but I still feel like the quality of responses to be a lot worse even once you reach 10% of the "maximum context".
Or it would say to do X it involves very complex math, instead you could (and proceeds with stripped down solution that doesn't meet goals). So you can tell it to ignore the concerns about complexity and assume that I understand all of it and it is easy to me. Then it goes on creating the solution that actually has legs. But you need to refine it further.
It’ll successfully produce _something_ like that, because there’s millions of examples of those technologies online. If you do anything remotely niche, you need to hold its hand far more.
The more complicated your requirements are, the closer you are to having “spicy autocomplete”. If you’re just making a crud react app, you can talk in high level natural language.
I see claude code as pair programming with a junior/mid dev that knows all fields of computer engineering. I still need to nudge it here and there, it will still make noob mistakes that I need to correct and I let it know how to properly do things when it gets them wrong. But coding sessions have been great and productive.
In the end, I use it when working with software that I barely know. Once I'm up and running, I rarely use it.
I did, but I always approached LLM for coding this way and I have never been let down. You need to be as specific as possible, be a part of the whole process. I have no issues with it.
It... sort of worked well? I had to have a few back-and-forth because it tried to use Objective-C features that did not exist back then (e.g. ARC), but all in all it was a success.
So yeah, niche things are harder, but on the other hand I didn't have to read 300 pages of stuff just to do this...
Also, fun names like `makeFunctionNameInCommentLongAndDescriptiveWithNaturalLanguage:(NSLanguage *)language`
In some cases, it just doesn't have the necessary information because the problem is too niche.
In other cases, it does have all the necessary information but fails to connect the dots, i.e. reasoning fails.
It is the latter issue that is affecting all LLMs to such a degree that I'm really becoming very sceptical of the current generation of LLMs for tasks that require reasoning.
They are still incredibly useful of course, but those reasoning claims are just false. There are no reasoning models.
I don't mean to be treading on feet but I'm noticing this more and more in the debates around AI. Imagine if there are developers out there that could have done this task in 30 mins without AI.
The level of performanace of AI solutions is heavily related to the experience level of the developer and of the problem space being tackled - as this thread points out.
Unfortunately the marketing around AI ignores this and makes every developer not using AI for coding seem like a dinosauer, even though they might well be faster in solving their particular problems.
AI is moving problem solving skills from coding to writing the correct prompts and teaching AI to do the right thing - which, again, is subjective, since the "right thing" for one developer isn't the "right thing" for the another developer. "Right thing" being the correct solution, the understandable solution, the fastest solution, etc depending on the needs of the developer using the AI.
Spelling out exactly what you want and checking/fixing what you receive is still faster than typing out the code. Moreover, nobody's job involves nothing but brainiac coding, day after day. You have to clean up and lay foundations, whatever level you are at.
For me, that's too general. Of course, perhaps for this particular, specific problem it might be true. But as this thread points out, anything niche and AI fails to help productively. Of course then comes the marketing: just wait, AI will be able to cover those niche cases also.
> want and checking/fixing what you receive is still faster than typing out the code
Then I do wonder why there are developers at all. After all that's what AI is so good at - if one believes the marketing - being precise and describing exactly what needs to be done. Surely it must be faster having two AIs talking to each and hammering out the code.
And even typing is subjective: ten fingers versus two, versus four .. etc. There are developers that can type faster than they can think - in certain cases.
There is also the developer in flow versus the stop and go using an AI prompts to get it just right. I dunno, if it comes true, then thankfully there won't be any humans to create bugs in code but somehow, I can't see it happening.
The other is to look at the non-working solution you get, read through it, and think "Oh, I didn't know about that framework/system/product/library, that's neat" and then do some combination of further research and more hand-holding to get to something that does work.
This is useful, more or less, no matter what your level.
It's also good for explaining core industry tooling you've maybe never used before. If you're new to Postgres/NoSQL/AWS/Docker/SwiftUI/whatever it can talk you through it and give you an instant bootcamp with entry-level examples and decent solutions.
And for providing fixes for widely known bugs and issues in products that may not be widely known to you (yet.)
IME ChatGPT5 is pretty solid with most science/tech up to undergrad. It gets hallucinatory past that, and it's still flattering, which is annoying, but you can tell it to cut that out.
Generally you can use it as a dumb offshore developer, or as an infinitely patient private tutor.
That latter option is very useful. The first, not always.
You're not necessarily wrong, but I think it's worth noting that very few developers are only ever coding deep in their one domain that they're good at. There's just too many things to be deeply good at everything. For example, it's common that infra and CI tasks are stuff that most developers haven't learned by heart, because you don't tend to touch them very often.
Claude shines here — I've made a lot more useful GitHub Actions jobs recently, because while I could automate something, if I know I'm going to have to look up API docs (especially multiple APIs I'm not super familiar with) then I tend to figure that the automation will lose out the trade-off between doing the task (see https://xkcd.com/1205/). Claude being able to hash out those rapidly, and in a way that's easily verifiable that it's doing the right thing, has changed that arithmetic for me substantially.
1. Find out how to access metadata about the node running my code (assumption: some kind of an environment variable) [1-10 minutes depending on familiarity with AWS]
2. Google "RDS certificates" and find the bundle URL after skimming the page [1] for important info [1-5 minutes]
3. Write code to download the certificate bundle, fallback being "global-bundle.pem" if step 1 failed for some reason? [5-20 minutes depending on all the bells and whistles you need]
Did I miss anything or completely misunderstand the task?
[1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Using...
edit: I asked Claude Sonnet 4 to write robust code for a Node.JS application that downloads RDS CA bundle for the AWS region that the code is currently running in and saves it at the supplied filesystem path.
0. It generated about 250 lines of code
1. Fallback was us-east (not global)
2. The download URLs for each region were hardcoded as KV pairs instead of being constructed dynamically
3. Half of the regions were missing
4. It wrote a function that verifies whether the certificate bundle looks valid (i.e. includes a PEM header)... but only calls it on the next application startup, instead of doing so before saving a potentially invalid certificate bundle to disk and proceeding with the application startup.
5. When I complained that half of my instances are downloading global bundles instead of regional ones (because they're not present in the hardcoded list), it:
- incorrectly concluded that not all regions have CA bundles available and hardcoded a duplicate list in 2 places containing regions that are known to offer CA bundles (which is all of them). These lists were even shorter than the last ones.
- wrote a completely unnecessary function that checks whether a regional CA bundle exists with a HEAD request before actually downloading it with a GET request, adding another 50 lines of code
Now I'm having to scrutinize 300 lines of code to make sure it's nothing doing something even more unexpected.
If a business needs the equivalent of a Toyota Corolla, why be upset about the factory workers making the millionth Toyota Corolla?
In my experience, that's not entirely true. Sure, a lot of app are CRUD apps, but they are not the same. The spice lies in the business logic, not in programming the CRUD operations. And then of course, scaling, performance, security, organization, etc etc.
(edit: /s to indicate sarcasm)
---
[0]: https://lovr.org
Trying two things and giving up. It's like opening a REPL for a new language, typing some common commands you're familiar with, getting some syntax errors, then giving up.
You need how to learn to use your tools to get the best out of them!
Start by thinking about what you'd need to tell a new Junior human dev you'd never met before about the task if you could only send a single email to spec it out. There are shortcuts, but that's a good starting place.
In this case, I'd specifically suggest:
1. Write a CLAUDE.md listing the toolchains you want to work with, giving context for your projects, and listing the specific build, test etc. commands you work with on your system (including any helpful scripts/aliases you use). Start simple; you can have claude add to it as you find new things that you need to tell it or that it spends time working out (so that you don't need to do that every time).
2. In your initial command, include a pointer to an example project using similar tech in a directory that claude can read
3. Ask it to come up with a plan and ask for your approval before starting
Imagine the CS 100 class where they ask you to make a PB&J. saying for it to make it, there's a lot of steps, but determine known the steps. implement each step. progress.
I run interviews at my company. We allow/encourage AI.
The number one failure method is people throwing all of the requirements in upfront. They get one good pass then fail.
…this reeeeaaaallllyyyy feels like that
If you just selected a random developer do you think they're going to have any idea why your talking about?
The issue is LLMs will never say, sorry, IDK how to do this. Like a stressed out intern they just make up stuff and hope it passes review.
Providing a woefully inadequate descriptions to others (Claude & us) and still expecting useful responses?
If it doesn't have the underlying base data, it tends to hallucinates. (It's getting a bit difficult to tell when it has underlying data, because some models autonomously search the web). The models are good at transforming data however, so give it access to whatever data it needs.
Also let it work in a feedback loop: tell it to compile and fix the compile errors. You have to monitor it because it will sometimes just silence warnings and use invalid casts.
> What am I doing wrong? Or is this really the state of the art?
It may sound silly, but it's simply not good at 2D
It's not silly at all, it's not very good at layouts either, it can generally make layouts but there is a high chance for subtle errors, element overlaps, text overflows, etc.
Mostly because it's a language model, i.e it doesn't generally see what it makes, you can send screenshots apparently and it will use it's embedded vision model, but I have not tried that.
Ask Opus or Gemini 2.5 Pro to write a plan. Then ask the other to critique it and fix mistakes. Then ask Sonnet to implement
The more esoteric your stack, and the more complex the request, the more information it needs to have. The information can be given either through doing research separately (personally, I haven't had good results when asking Claude itself to do research, but I did have success using the web chat UI to create an implementation plan), or being more specific with your prompt.
As an aside, I have more than 10 years of experience, mostly with backend Python, and I'd have no idea what your prompts mean. I could probably figure it out after some google searches, tho. That's also true of Claude.
Here's an example of a prompt that I used recently when working on a new codebase. The code is not great, the math involved is non trivial (it's research-level code that's been productionized in hurry). This literally saved 4 hours of extremely boring work, digging through the code to find various hardcoded filenames, downloading them, scp'ing them, and using them to do what I want. It one-shotted it.
> The X pipeline is defined in @airflow/dags/x.py, and Y in `airflow/dags/y.py` and the relevant task is `compute_X`, and `compute_Y`, respectively. Your task is to:
> 1. Analyze the X and Y DAGs and and how `compute_X` functions are called in that particular context, including it's arguments. If we're missing any files (we're probably missing at least one), generate a .sh file with aws cli or curl commands necessary for downloading any missing data (I don't have access to S3 from this machine, but I do have in a remote host). Use, say, `~/home` as the remote target folder.
> 2. If we needed to download anything from S3, i.e. from the remote host, output rsync/scp commands I can use to copy them to my local folder, keeping the correct/expected directory structure. Note that direct inputs reside under `data/input`, while auxiliary data resides in other folders under `data`. Do not run them, simply output them. You can use for example `scp user@server.org ...`
> 3. Write another snapshot test for X under `tests/snapshot`, and one for Y. Use a pattern as similar as possible to the other tests there. Do not attempt to run the tests yet, since I'll need to download the data first.
> If you need any information from Airflow, such as logs or output values, just ask and I can provide them. Think hard.
You're treating the tool like it was an oracle. The correct way is to treat it as a somewhat autistic junior dev: give it examples and process to follow, tell it to search the web, read the docs, how to execute tests. Especially important is either directly linking or just copy pasting any and all relevant documentation.
The tool has a lossily compressed knowledge database of the public internet and lots of books. You want to fix the relevant lossy parts in the context. The less popular something is, the more context will be needed to fill the gaps.
Like "Translate this pdf to html using X as a templating language". It shines at stuff like that.
As a dev, I encounter tons of one-off scenarios like this.
Generating a state-of-the-art response to your request involves a back-and-forth with the agent about your requirements, having a agent generate and carry out a deep research plan to collect documentation, then having the agent generate and carry out a development plan to carry it out.
So while Claude is not the best model in terms of raw IQ, the reason why it's considered the best coding model is because of its ability to execute all these steps in one go which, in aggregate, generates a much better result (and is less likely to lose its mind).
Which one is, and by what metric? I always end up back at Claude after trying other models because it is so much better at real world applications.
That will get you a lot better initial solution. I typically use Sonnet for the sub-agents and Opus for the main agent, but sonnet all around should be fine too for the most part.
There are parts in the codebase I'd love some help such as overly complex C++ templates and it almost never works out. Sometimes I get useful pointers (no pun intended) what the problem actually is but even that seems a bit random. I wonder if it's actually faster or slower than traditional reading & thinking myself.
Dump your thoughts in a somewhat arranged manner, tell it about your plan, the current status, the end goal, &c. After that tell it to write 0 code for now but to ask questions and find gaps in your plan. 30% of it will be bullshit but the rest is somewhat useable. Then you can ask for some code but if you care about quality or consistency with you existing code base you probably will have to rewrite half of it, and that's if the code works in the first place
Garbage in garbage out is true for training but it's also true for interactions
After a few iteration i then ask it to implement the design doc to mostly-better results.
A key skill in using an LLM agentic tool is being discerning in which tasks to delegate to it and which to take on yourself. Try develop that skill and maybe you will have better luck.
I wonder if it's because there are maybe millions of MSDN articles, but I don't know if a Java analog to MSDN exists.
When people say things like "I told Claude what I wanted and it did it all on the first try!", that's what they mean. Basic web stuff that that is already present in the model's training data in massive volumes, so it has no issue recreating it.
No matter how much AI fanatics try to convince you otherwise, LLMs are not actually capable of software engineering and never will be. They are largely incapable of performing novel tasks that are not already well represented in their weights, like the ones you tried.
My coding ranges from "exotic" to "boiler plate" on any given day.
> Create a boilerplate Zephyr project skeleton, for Pi Pico
Yea... Asking Claude to help you with a low documentation build root system is going to go about the same way, I know first hand about how this works.
> I asked it to create 7x10 monochromatic pixelmaps
Wrong tool for the job here. I dont think IDE and Pixelmaps have as large of an intersection as you think they do. Claude thinks in tokens not pixels.
Pick a common language (js, python, rust, golang) pick something easy (web page, command line script, data ingestion) and start there. See what it can do and does well, then start pushing into harder things.
One frustration was the code changed so much in ChatGPT so had to be lots of prompts. But I had no idea what the code was anyways. Understood vibe coding. Just used ChatGPT on a whim. Liked the end result.
It's interesting that the highest level of reasoning that GPT-5 in XCode supports is actually the "low" reasoning level. Wonder why.
This is Claude sign in using your account. If you’ve signed up for Claude Pro or Max then you can use it directly. But, they should give access to Opus as well.
Autocomple is also automatically triggered when you place your cursor inside the code.
Don't be naive.
Wont work by default if I'm reading this correctly
If the attacker wants to use AI to assist in looking for valuables on your machine, they won't install AI on your machine, they'll use the remote shell software to pop a shell session, and ask AI they're running on one of their machines to look around in the shell for anything sensitive.
If an attacker has access to your unlocked computer, it is already game over, and LLM tools is quite far down the list of dangerous software they could install.
Maybe we should ban common RAT software first, like `ssh` and `TeamViewer`.
Actually they’ll just the AI you already have on your machine[0]
In this attack, the malware would use Claude Code (with your credentials) to scan your own machine.
Much easier than running the inference themselves!
[0]https://semgrep.dev/blog/2025/security-alert-nx-compromised-...
I guess that's on me for being oblivious enough that it took this obvious of a comment for me to be sure you're intentionally trolling. Nice work.
https://objective-see.org/products/lulu.html
I spent the last 6 months trying to convince them not to block all outbound traffic by default.
For most corporate code (that is highly confidential) you still have proper internet access, but you sure as hell can't just send your code to all AI providers just because you want to, just because it's built into your IDE.
But I guess the user could still get a 3rd party plugin.
I do not think this will be an issue for big companies.
Also, there are plenty of editors and IDEs that don’t.
Let’s stop pretending like you’re being forced into this. You aren’t.
None of these companies are isolated from the internet.
There's simply no way to properly secure network connected developer systems.
you can use Claude via bedrock and benefit from AWS trust
Gemini? Google owns your e-mail. Maybe you're even one of those weirdos who doesn't use Google for e-mail - I bet your recipient does.
so... they have your code, your secrets, etc.
You’re just angry and adding no value to this conversation because of it
Your link: "Grade school math problems from a hardcoded dataset with hardcoded answers" [1]
It really is the same thing.
[1] https://openai.com/index/solving-math-word-problems/
--- start quote ---
GSM8K consists of 8.5K high quality grade school math word problems. Each problem takes between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − × ÷) to reach the final answer.
--- end quote ---
1. OpenAI has been doing verifier-guided training since last year.
2. No SOTA model was trained without verified reward training for math and programming.
I supported the first claim with a document describing what OpenAI was doing last year; the extrapolation should have been straightforward, but it's easy for people who aren't tracking AI progress to underestimate the rate at which it occurs. So, here's some support for my second claim:
https://arxiv.org/abs/2507.06920 https://arxiv.org/abs/2506.11425 https://arxiv.org/abs/2502.06807
Indeed."By late next month you'll have over four dozen husbands" https://xkcd.com/605/
> So, here's some support for my second claim:
I don't think any of these links support the claim that "No SOTA model was trained without verified reward training for math and programming"
https://arxiv.org/abs/2507.06920: "We hope this work contributes to building a scalable foundation for reliable LLM code evaluation"
https://arxiv.org/abs/2506.11425: A custom agent with a custom environment and a custom training dataset on ~800 predetermined problems. Also "Our work is limited to Python"
https://arxiv.org/abs/2502.06807: The only one that somewhat obliquely refers to you claim
Otherwise there's VSCodium which is what I'm using until I can make the jump to Code Edit.
Since the landscape of potentially malicious inputs in plain english is practically infinite, without any particular enforced structure for the queries you make of it, means that those "guardrails" are, in effect, an expert system. An ever growing pile of if-then statements. Didn't work then, won't work now.
In Neovim the choice of language server and the choice of LLM is up to the user, (possibly even the choice of this API, I believe, having only skimmed the PR) while both of those choices are baked in to XCode, so they're not the same thing.
Why wouldn't it?
Obviously no model is going to one-shot something like a full text editor, but there's an ocean of difference between defining vibe coding as prompting "Make me a text editor" versus spending days/weeks going back and forth on architecture and implementation with a model while it's implementing things bottom-up.
Both seem like common definitions of the term, but only one of them will _actually_ work here.
I have used agentic coding tools to solve problems that have literally never been solved before, and it was the AI, not me, that came up with the answer.
If you look under the hood, the multi-layered percqptratrons in the attention heads of the LLM are able to encode quite complex world models, derived from compressing its training set in a which which is formally as powerful as reasoning. These compressed model representations are accessible when prompted correctly, which express as genuinely new and innovative thoughts NOT in the training set.
Would you show us? Genuinely asking
Ask the best available models -- emphasis on models -- for help designing the text editor at a structural rather than functional level first, being specific about what you want and emphasizing component-level test whenever possible, and only then follow up with actual code generation, and you'll get much better results.
https://github.com/JetBrains/intellij-community
Editors are incredibly complex and require domain knowledge to guide agents toward the correct architecture and implementation (and away from the usual naive pitfalls), but in my experience the latest models reason about and implement features/changes just fine.
They certainly do, and I can't really follow the analogy you are building.
> We're at a higher level of abstraction now.
To me, an abstraction higher than a programming language would be natural language or some DSL that approximates it.
At the moment, I don't think most people using LLMs are reading paragraphs to maintain code. And LLMs aren't producing code in natural language.
That isn't abstraction over language, it is an abstraction over your computer use to make the code in language. If anything, you are abstracting yourself away.
Furthermore, if I am following you, you are basically saying, you have to make a call to a (free or paid) model to explain your code every time you want to alter it.
I don't know how insane that sounds to most people, but to me, it sounds bat-shit.
If you don’t want to use LLM coding assistants – or if you can’t, or it’s not a technology suitable for your work – nobody cares. It’s totally fine. You don’t need to get performatively enraged about it.
Headline quite misleading. So not exactly that it will ship in Xcode but will allow connect to paid account.
for whatever reason it ignores my directive that it can from the CLAUDE file at least half the time. one time it even decided it needed to generate a fancy python script to do it. bizarre.
This isn't going to change my workflow at this point.
I always find this article something to get back to: https://www.inc.com/magazine/20110301/making-money-small-bus...
My pet peeve is it will try to autocomplete any string you start typing with just random crap it thinks you might want in a string.
Generating some code is fine, but I now prefer the deterministic autocomplete for my types I have available in my current context.
Under the known issues
I bought a Pro subscription, the send button on their dumb chatbot box is disabled for me (on Safari), and I still get "capacity constraints' limits. Filed a chargeback with my bank just because of the audacity of their post-purchase experience. ChatGPT-5 works good enough for coding too.
From my experience with Claude Opus it seems like it tries to be "too smart" and doesn't seem to keep up with the latest APIs. It suggested some code for a iOS/macOS project that was only valid on tvOS, and other gaffs.
I also wonder if it will have separate rate limits from ChatGPT (app/web) and Codex CLI (which currently has its own rate limits).
Hey they're doing better than anyone else.
Something could be wrong with your specific Macbook
Back in the older Intel days I did sometimes, perhaps due to the GPU switching and flakey Nvidia drivers. But even then it certainly wasn’t common.
This is just the stuff I remember. This is a 16GB M2 Macbook Pro. Not a single native Apple made app or process should be lagging, let alone using up enough memory that sometimes apps just completely crash. Again, my windows desktop hasn't had a blue screen or a hard crash in the decade that I've used it and it had a worse configuration with 16GB of RAM and an old AMD CPU.
Apple has no qualms breaking backwards compatibility for core functions like bluetooth connectivity in MacOS. Windows has backwards compatibility, but increasingly worsening UX, throwing ads and subscriptions in your face before you can even log in, and a bad security/process isolation model. Desktop Linux is a case of "how many hours before I find out a critical part of my workflow is unsupported/bad/broken/unconfigurable/pain-to-configure in this particular distro/desktop environment".
I installed and fixed a lot of 95, 98[se], ME, and XP OSes for other people, though. Thousands. I never bothered with any of those OSes on my own machines, though. The first "consumer" OS i used was win 7 Ultimate Edition (signed by Ballmer, natch). I use 11, now, and i'm fine with it. I think it's because i am "grandfathered" in to win11 without a microsoft login; i just mentioned last night that if i had to reinstall windows on this machine, i probably wouldn't, due to that requirement now.
Anyhow all this is to say, hogwash. Windows has been perfect in the past. Time marches on, fruit flies like a banana, and all that.
My frustration is more born out of the OS rough edges constantly getting in the way of tasks I actually want to focus on and accomplish, which doesn't play well with my ADHD.
The "best" way to get the "latest" details on Apple's APIs is to suffer through mind-numbingly vapid WWDC videos with their reverse uncanny valley presenters (where humans pretend to be robots) and keep your full attention on them to catch a fleeting glimpse of a single method or property that does what you were looking for. Even 1.5x/2x speed is torture. I tried to get AIs to sift through the transcripts of their videos, and may Skynet forgive me for this cruelty.
Then when you go try to use that API, oops it's been changed in the current beta and there's no further documentation on it except auto-generated headers.
They also removed bookmarks from Xcode's built-in documentation browser years ago, and it doesn't retain a memory of previously open tabs, and often seems to be behind the docs on their websites.
I wish they would just provide open-source sample apps of each type (document-based, single-window etc.) for each of their platforms that fully use the latest APIs. At least that would be easier to ask AIs on, since that is what they seem to be going for now anyway.
If you can listen to billions of tokens a day, you can basically capture all the magic.
Meanwhile the creative output of humanity is distilled into black boxes to benefit those who can scrape it the most and burn the most power, but this impact is distributed amongst everyone, so again there's little incentive among those who could create (likely legal) change.
DeepSeek is the most notable case, but it's been used lots.
And the foundation model companies are scraping and exfiltrating each others' data.
> Built for Apple Intelligence.
> 16-core Neural Engine
These Xcode release notes:
> Claude in Xcode is now available in the Intelligence settings panel, allowing users to seamlessly add their existing paid Claude account to Xcode and start using Claude Sonnet 4
All that dedicated silicon taking up space on their SoC and yet you still have to input your credit card in order to use their IDE. Come on...
They would also need to shrink them way down to even fit. And even then, generating tokens on an apple neural chip would be waaaaaay slower than an HTTP request to a monster GPU in the sky. Local llms in my experience are either painfully dumb or painfully slow.
[0] https://github.com/fguzman82/apple-foundation-model-analysis
When macOS 26 is officially announced on September 9, I expect Apple to announce support for Anthropic and Google models.
It’s the Apple way to screw the 3rd party and replace with their own thing once the ROI is proven (not a criticism, this is a good approach for any business where the capex is large…)
Credit card processing is hard... Go price out stripe + customer service + dealing with charge backs and tell me if you really want to do processing your self.
This same commitment would be why I wouldn't count them out on the AI side, btw. It's not clear that a private internal foundation model is any kind of required competitive moat. It's also not clear that having one is useless, and all the cool kids do have one or want one, but from a product view it might be that integrating makes sense.
Llama 4 (a terrible release) shows also that making such a model is still really hard. There are not enough ML leads at the pointy tip of the spear to support even 10 high quality foundation model teams globally.
If you have billions of dollars in cash and you are secure in your customer base, and you don't believe AGI liftoff will happen or change your business model, maybe you work out the kinks on product integration now using best in breed providers, not getting locked in on one of them, keep spending 1/10 to 1/100th to stay relevant and on it internally, use your incredibly powerful silicon buying power to get a next gen version of TPUs done, and wait until you know for sure you can spend under $10bn in cash on getting a great proprietary model done, one that you are certain will serve your needs.
Also this will give you time to get better leadership in the AI org.
Upshot - I think it's a mix of reasons, but not fatal, I'm not sure being slow erodes their product customer base, personally I'd like much better and more private AI out of Apple ASAP. We'll see what we get. I predict they'll move internal by 2031.
[1] https://developer.apple.com/documentation/foundationmodels
[1] https://news.ycombinator.com/item?id=45062683