> My former colleague Rebecca Parsons, has been saying for a long time that hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.
This is an example of my least favorite style of feigned insight: redefining a term into meaninglessness just so you can say something that sounds different while not actually saying anything new.
Yes, if you redefine "hallucination" from "produce output containing detailed information despite that information not being grounded in external reality, in a manner distantly analogous to a human reporting sense data produced by a literal hallucination rather than the external inputs that are presumed normally to ground sense data" to "produce output", its true that all LLMs do is "hallucinate", and that "hallucinating" is not a undesirable behavior.
But you haven't said anything new about the thing that was called "hallucination" by everyone else, or about the thing--LLM output in general--that you have called "hallucination". Everyone already knew that producing output wasn't undesirable. You've just taken the label conventionally attached to a bad behavior, attached it to a broader category that includes all behavior, and used the power of equivocation to make something that sounds novel without saying anything new.
scott_w 15 hours ago [-]
Fowler is not really redefining "hallucination." He's using a form of irony that emphasises how fundamental "hallucinations" are to the operation of the system. One might also say "you can't get rid of collateral damage from bombs. Indeed, collateral damage is the feature, it's just some of that is what we want to blow up."
You're not meant to take it literally.
xyzzy123 15 hours ago [-]
You might as well say it's interpolating or extrapolating. That's what people are usually doing too, even when recalling situations that they were personally involved in.
I think we call it "hallucinating" when the machine does this in an un-human-like way.
dapperdrake 14 hours ago [-]
The longer term for this is "stochastic parrot". See another HN comment here comparing LLMs to theater actors or movie actors.
LLMs just spew words. It just so happens that human beings can decode them into something related, useful, and meaningful surprisingly often.
Might even be a useful case of pareidolia (a term I dislike, because a world without any pattern matching whatsoever would not necessarily be "better").
jychang 13 hours ago [-]
I dislike the term “stochastic parrot”, because there’s plenty of evidence that LLMs do have an understanding of at least some things that they are saying.
We can trace which neurons activate for a face recognition model and see that a certain neuron does light up when it sees a face. The correct features are active for the sentence “the word parrots is plural”.
If you stop assuming the LLMs have no internal representations of the data, then everything makes a lot more sense! The LLM is FORCED to answer questions… just like a high school student filling out the SAT is forced to answer questions.
If a high school student fills out the wrong answer on the SAT, is that a hallucination?
Hallucinations are expected behaviors if you RLHF a high schooler to always guess an answer on the SAT because that’ll get them the highest score. This applies to ML model reward functions as well.
_heimdall 11 hours ago [-]
> We can trace which neurons activate for a face recognition model and see that a certain neuron does light up when it sees a face.
Seeing which parts of a model (they aren't neurons) light up when shown a face doesn't necessarily indicate understanding.
The model is a complex web of numbers representing a massively compressed data space. It could easily be that what you see light up when shown a face only indicates what specific part of the model is housing the compresses data related to recognizing specific facial features.
neonspark 9 hours ago [-]
> Seeing which parts of a model (they aren't neurons)…
I thought models were composed of neural network layers, among other things. Are these data structures called something different?
_heimdall 4 hours ago [-]
That point may not have been relevant for me to include.
I was getting at the idea that a neuron is a very specific feature of a biological brain, regardless of what AI researchers may call their hardware they aren't made of neurons.
jychang 2 hours ago [-]
1. They are neurons, whether you like it or not. A binary tree may not have squirrels living in them, but it's still a tree, even though the word "tree" here is defined differently than from biology. Or are you going to say a binary tree is not a tree?
2. You are about 5 years behind in terms of the research. Look into hierarchical feature representation and how MLP neurons work. (Or even in older CNNs and RNNs etc). And I'm willingly using the word "neuron" instead of "feature" here because while I know "feature" is more correct in general, there are definitely small toy models where you can pinpoint an individual neuron to represent a feature such as a face.
_heimdall 21 minutes ago [-]
What were you getting at with the MLP example? MLPs do a great job with perception abilities and I get that they use the term neuron frequently. I disagree with the use of the name there that's all, similarly I disagree that LLMs are AI but here we are.
Using the term neuron there and meaning it literally is like calling an airplane a bird. I get that the colloquial use exists, but no one thinks they are literal birds.
the_af 10 hours ago [-]
I think this could be seen as a proxy for evidence that there's some degree of reasoning, if we think we can identify specialized features that always become involved in some kinds of outputs. It's not proof, but it's not nothing either. It has some parallels about research on human brains are conducted, right?
_heimdall 4 hours ago [-]
It does have parallels to the human brain, absolutely. We've been studying the human brain in similar ways for much longer though and we still don't know much about it.
We do know what areas of a human brain often light up in response to various conditions. We don't know why that is though, or how it actually works. Maybe more importantly for LLMs, we don't know how human memory works, where it is stored, how to recognize or even define consciousness, etc.
Seeing what areas of a brain or an LLM light up can be interesting, but I'd be very cautious trying to read much into it.
ACCount37 12 hours ago [-]
This matches what we know about LLMs and hallucination-avoidance behavior in LLMs.
"Wrong answers on SAT" is also the leading hypothesis on why o3 was such an outlier - far more prone to hallucinations than either prior or following OpenAI models.
On SAT, giving a random answer is right 20% of the time - more if you ruled at least one obviously wrong answer out. Saying "I don't know" and not answering is right 0% of the time. So if you RLVR on SAT-type tests, where any answer is better than no answer, you encourage hallucinations. Hallucination avoidance in LLMs is a fragile capability, and OpenAI has probably fried its o3 with too much careless RLVR.
But another cause of hallucinations is limited self-awareness of modern LLMs. And I mean "self-awareness" in a very mechanical, no-nonsense fashion: "has information about itself and its own capabilities". LLMs have very little of that.
Humans have some awareness of the limits of their knowledge - not at all perfect, but at least there's something. LLMs get much, much less of that. LLMs learn the bulk of their knowledge from pre-training data, but pre-training doesn't teach them a lot about where the limits of their knowledge lie.
neonspark 9 hours ago [-]
> But another cause of hallucinations is limited self-awareness of modern LLMs… Humans have some awareness of the limits of their knowledge
Until you said that I didn’t realize just how much humans “hallucinate“ in just the same ways that AI does. I have a friend who is fluent in Spanish, a native speaker, but got a pretty weak grammar education when he was in high school. Also, he got no education at all in Critical thinking, at least not formally. So this guy is really, really fluent in his native language, but can often have a very difficult time explaining why he uses whatever grammar he uses. I think the whole world is realizing how little our brains can correctly explain and identify the grammar we use flawlessly.
He helps me to improve my Spanish a lot, he can correct me with 100% accuracy of course, but I’ve noticed on many occasions, including this week, that when I ask a question about why he said something one way or another in Spanish, he will just make up some grammar rule that doesn’t actually exist, and is in fact not true.
He said something like “you say it this way when you really know the person and you’re saying that the other way when it’s more formal“, but I think really it was just a slangy way to mis-stress something and it didn’t have to do with familiar/formal or not. I’ve learned not to challenge him on any of these grammar rules that he makes up, because he will dig his heels in, and I’ve learned just to ignore him because he won’t have remembered this made up grammar rule in a week anyway.
This really feels like a very tight analogy with what my LLM does to me every day, except that when I challenge the LLM it will profusely apologize and declare itself incorrect even if it had been correct after all. Maybe LLMs are a little bit too humble.
I imagine this is a very natural tendency in humans, and I imagine I do it much more than I’m aware of. So how do humans use self-awareness to reduce the odds of this happening?
I think we mostly get trained in higher education to not trust the first thought that comes into our head, even if it feels self consistent and correct. We eventually learn to say “I don’t know” even if it’s about something that we are very, very good at.
thrawa8387336 7 hours ago [-]
Spanish in particular has more connotations per word than English. It's not even the grammar or spelling, those have rules and that's that. But choosing appropriate words, is more like every word has it right place and time and context. Some close examples would be the N-word or the R-word in English, as they are steeped in meanings far beyond the literal.
skydhash 7 hours ago [-]
He said something like “you say > it this way when you really know the person and you’re saying that the other way when it’s more formal“, but I think really it was just a slangy way to mis-stress something and it didn’t have to do with familiar/formal or not.
There’s such a thing in Spanish and in French. Formal and informal settings is reflected in the language. French even distinguishes between three different level of vocabulary (one for very informal settings (close friends), one for business and daily interactions, and one for very formal settings. It’s all cultural.
sillyfluke 11 hours ago [-]
>I dislike the term “stochastic parrot”, because there’s plenty of evidence that LLMs do have an understanding of at least some things that they are saying.
It's bold to use the term "understanding" in this context. You ask it something about a topic, it gives an answer like someone who understands the topic. You change the prompt slightly, where a human who understands the topic would still give the right response trivially, the LLM outputs an answer that is both wrong/irrelevant and unpredicably and non-humanly wrong in a way that no human who exhibited understanding with the first answer could be predicted to answer the second question in the same bizarrw manner as the LLM.
The fact that the LLM can be shown to have some sort of internal representation does not necessarily mean that we should call this "understanding" in any practical sense when discussing these matters. I think it's counterproductive in getting to the heart of the matter.
naasking 8 hours ago [-]
> You change the prompt slightly, where a human who understands the topic would still give the right response trivially, the LLM outputs an answer that is both wrong/irrelevant and unpredicably and non-humanly wrong in a way that no human who exhibited understanding with the first answer could be predicted to answer the second question in the same bizarrw manner as the LLM.
I think this should make you question whether the prompt change was really as trivial as you imply. Providing an example of this would elucidate.
roadside_picnic 6 hours ago [-]
Here's an entire paper [0] showing the impact of extremely minor structural changes on the quality of the results of the model. Things as simple as not using a colon in the prompt can lead to notably degraded (or improved) performance.
But..but...humans do the same thing?
This is input/output with the output being formed from applying the function of our life experiences (training) to the input?
They just don't have the hormonal circuitry to optimise for whatever our bodies are trying to optimise for when we take decisions.
cmiles74 10 hours ago [-]
People actually understand the input and the output. An LLM understand neither, it's generating output that is statistically likely, within some bounds. As Fowler said, it's a pleasant coincidence that some of this output has value to us.
(For sure, arguments can be made that the relationship betweens the terms that the model has encoded could maybe be called "model thinking" or "model understanding" but it's not how people work.)
seunosewa 7 hours ago [-]
We do the same thing. We pick words that are statistically likely to get us what we want. And much of it is unconscious. You don't formally reason about every word you speak. You are focused on your objective, and your brain fills in the gaps.
scott_w 3 hours ago [-]
We absolutely do not "pick words that are statistically likely to get us what we want." We use words to try to articulate (to varying levels of success) a message that we want to communicate. The words, tone, speed, pitch, etc. all convey meaning.
> And much of it is unconscious.
That does not mean we're "picking words statistically likely to get us what we want," it means "our brains do a lot of work subconsciously." Nothing more.
> You are focused on your objective, and your brain fills in the gaps.
This is a total contradiction of what you said at the start. LLMs are not focused on an objective, they are using very complex statistical algorithms to determine output strings. There is no objective to an LLM's output.
naasking 8 hours ago [-]
> LLMs just spew words. It just so happens that human beings can decode them into something related, useful, and meaningful surprisingly often.
This sentence is inherently contradictory. If LLM output is meaningful more than chance, then it's literally not "just spewing words". Therefore whatever model it is using to generate that meaning must contain some semantic content, even if it's not semantic content that's as rich as humans are capable of. The "stochastic parrot" term is thus silly.
thrawa8387336 7 hours ago [-]
It's a sufficiently large N shannonizer. Nothing more
"LLMs produce either truth or they produce hallucinations."
Claims worded like that give the impression that if we can just reduce/eliminate the hallucinations, all that will remain will be the truth.
But that's not the case. What is the case is that all the output is the same thing, hallucination, and that some of those hallucinations just so happen to reflect reality (or expectations) so appears to embody truth.
It's like rolling a die and wanting one to come up, and when it does saying "the die knew what I wanted".
An infinite number of monkeys typing on an infinite number of typewriters will eventually produce the script for Hamlet.
That doesn't mean the monkeys know about Shakespeare, or Hamlet, or even words, or even what they are doing being chained to typewriters.
We've found a way to optimize the infinite typing monkeys to output something that passes for Hamlet much sooner than infinity.
sceptic123 9 hours ago [-]
I actually found that comment interesting. It's pointing towards something I've struggled with around LLMs. They are (currently) incapable of knowing if what they output is correct, so the idea that "it's all hallucinations" acknowledges that point and gives useful context for anyone using LLMs for software development.
Eridrus 8 hours ago [-]
Humans are also incapable of knowing whether their output is correct. We merely convince ourselves that it is and then put our thoughts in contact with the external world and other people to see if we actually are.
dghlsakjg 8 hours ago [-]
Humans are capable of using formal logic to reach a provably correct conclusion from valid premises. That conclusion can then be used as premises for a further conclusion. Follow the logic the other way and using our powers of observation, and sensing, we can gather base level truths and know that what we are saying is correct in an absolute sense.
Computers use formal logic all the time to output truth, LLMs, however, do not.
jimbokun 8 hours ago [-]
So then we are capable of knowing whether our output is correct, by putting it into contact with the external world.
bartread 14 hours ago [-]
I’ve never liked that this behaviour is described using the term “hallucination”.
If a human being talked confidently about something that they were just making up out of thin air by synthesizing based (consciously or unconsciously) on other information they know you wouldn’t call it “hallucination”: you’d call it “bullshit”.
And, honestly, “bullshit” is a much more helpful way of thinking about this behaviour because it somewhat nullifies the arguments people make against the use of LLMs due to this behaviour. Fundamentally, if you don’t want to work with LLMs because they sometimes “bullshit”, are you planning on no longer working with human beings as well?
It doesn’t hold up.
But, more than that, going back to your point: it’s much harder to redefine the term “bullshit” to mean something different to the common understanding.
All of that said, I don’t mind the piece and, honestly, the “I haven’t the foggiest” comment about the future of software development as a career is well made. I guess it’s just a somewhat useful collection of scattered thoughts on LLMs and, as such, an example of a piece where the “thoughts on” title fits well. I don’t think the author is trying to be particularly authoritative.
dragonwriter 14 hours ago [-]
> I’ve never liked that this behaviour is described using the term “hallucination”.
I have a standard canned rant about "confabulation" is a much better metaphor, but it wasn't the point I was focussed on here.
> Fundamentally, if you don’t want to work with LLMs because they sometimes “bullshit”, are you planning on no longer working with human beings as well?
I will very much not voluntarily rely on a human for particular tasks if that human has demonstrated a pattern of bullshitting me when given that kind of task, yes, especially if, on top of the opportunity cost inherent in relying on a person for a particular task, I am also required to compensate them—e.g., financially—for their notional attention to the task.
scott_w 11 hours ago [-]
> If a human being talked confidently about something that they were just making up out of thin air by synthesizing based (consciously or unconsciously) on other information they know you wouldn’t call it “hallucination”: you’d call it “bullshit”.
I'd recommend you watch https://www.youtube.com/watch?v=u9CE6a5t59Y&t=2134s&pp=ygUYc... which covers the topic of bullshit. I don't think we can call LLM output "bullshit" because someone spewing bullshit has to not care about whether what they're saying is true or false. LLMs don't "care" about anything because they're not human. It's better to give it an alternative term to differentiate it from the human behaviour, even if the observed output is recognisable.
I disagree with the article’s thesis completely. Humans are the ones that spread the bullshit, the LLM just outputs text. Humans are the necessary component to turn that text from “output” into “bullshit.” The machine can’t do it alone.
movpasd 11 hours ago [-]
At the risk of sounding woo, I find some parallels in how LLMs work to my experiences with meditation and writing. My subjective experience of it is that there is some unconscious part of my brain that supplies a scattered stream of words as the sentence forms --- without knowing the neuroscience of it, I could speculate it is a "neurological transformer", some statistical model that has memorised a combination of the grammar and contextual semantic meaning of language.
The difference is that the LLM is _only that part_. In producing language as a human, I filter these words, I go back and think of new phrasings, I iterate --- in writing consciously, in speech unconsciously. So rather than a sequence it is a scattered tree filled with rhetorical dead ends, pruned through interaction with my world-model and other intellectual faculties. You can pull on one thread of words as though it were fully-formed already as a kind of Surrealist exercise (like a one-person cadavre exquis), and the result feels similar to an LLM with the temperature turned up too high.
But if nothing else, this highlights to me how easily the process of word generation may be decoupled from meaning. And it serves to explain another kind of common human experience, which feels terribly similar to the phenomenon of LLM hallucination: the "word vomit" of social anxiety. In this process it suddenly becomes less important that the words you produce are anchored to truth, and instead the language-system becomes tuned to produce any socially plausible output at all. That seems to me to be the most apt analogy.
jimbokun 8 hours ago [-]
"Bullshit engine" is the term that best explains to a lay person what it is that LLMs do.
cess11 10 hours ago [-]
The point, though awkwardly stated, is that there is no difference between 'hallucination' output and any other output from these compressed databases, just like there is no such difference between two queries on a typical RDBMS.
It's a good point.
dragonwriter 5 hours ago [-]
> The point, though awkwardly stated, is that there is no difference between 'hallucination' output and any other output from these compressed databases,
But there is, except when you redefine "hallucination" so there isn't. And, when you retain the definition where there is a difference, you find there are techniques by which you can reduce hallucinations, which is important and useful. Changing the definition to eliminate the distinction is actively harmful to understanding and productive use of LLMs, for the benefit of making what superficially seems like an insightful comment.
belter 14 hours ago [-]
> You've just taken the label conventionally attached to a bad behavior, attached it to a broader category that includes all behavior, and used the power of equivocation to make something that sounds novel without saying anything new.
You mean like every evangelist says AI changes everything every 5 min...When in reality what they mean is neural nets statistical code generators are getting pretty good? because that is almost all AI there is at the moment?
Just to make the current AI sound bigger than it is?
ljm 14 hours ago [-]
If everything is a hallucination then nothing is a hallucination.
Thanks for listening to my Ted Talk.
13 hours ago [-]
jeppester 1 days ago [-]
In my company I feel that we getting totally overrun with code that's 90% good, 10% broken and almost exactly what was needed.
We are producing more code, but quality is definitely taking a hit now that no-one is able to keep up.
So instead of slowly inching towards the result we are getting 90% there in no time, and then spending lots and lots of time on getting to know the code and fixing and fine-tuning everything.
Maybe we ARE faster than before, but it wouldn't surprise me if the two approaches are closer than what one might think.
What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.
ekidd 13 hours ago [-]
> In my company I feel that we getting totally overrun with code that's 90% good, 10% broken and almost exactly what was needed.
This is painfully similar to what happens when a team grows from 3 developers to 10 developers. All of sudden, there's a vast pile of coding being written, you've never seen 75% of it, your architectural coherence is down, and you're relying a lot more on policy and CI.
Where LLM's differ is that you can't meaningfully mentor them, and you can't let them go after the 50th time they try turn off the type checker, or delete the unit tests to hide bugs.
Probably, the most effective way to use LLMs is to make the person driving the LLM 100% responsible for the consequences. Which would mean actually knowing the code that gets generated. But that's going to be complicated to ensure.
jimbokun 8 hours ago [-]
Have thorough code reviews and hold the developer using the LLM responsible for everything in the PR before it can be merged.
whstl 16 hours ago [-]
LLMs are amazing at producing boilerplate, which removes the incentive to get rid of it.
Boilerplate sucks to review. You just see a big mass of code and can't fully make sense of it when reviewing. Also, Github sucks for reviewing PRs with too many lines.
So junior/mid devs are just churning boilerplate-rich code and don't really learn.
The only outcome here is code quality is gonna go down very very fast.
jstummbillig 14 hours ago [-]
I envy the people working at mystical places where humans were on average writing code of high quality prior LLMs. I'll never know you now.
jeltz 11 hours ago [-]
I am working at one right now and I have worked at such in the past. One of the main tricks is to treat code reviews very seriously so people are not incentived to write lazy code. You need to build a cultire which cares about quality of both product and code. You also need decent developers, but not necessarily great developers.
jstummbillig 5 hours ago [-]
Oh, I understand what you need to do. It's like losing weight. It's fairly simple.
And at the same time it's borderline impossible proven by the fact that people can't do it, even though everyone understands and roughly everyone agrees on how it works.
So the actual "trick" turns out to be understanding what keeps people from doing the necessary things that they all agree on are important – like treating code reviews very seriously. And getting that part right turns out to be fairly hard.
raziel2p 11 hours ago [-]
It's very easy to go from what you're describing to a place hamstrung by nitpicking, though. The code review becomes more important than the code itself and appearances start mattering more than results.
jimbokun 8 hours ago [-]
Did you make an effort to find those places and get them to hire you?
actionfromafar 12 hours ago [-]
Some of them will get hired to fix the oceans of boilerplate code.
dapperdrake 14 hours ago [-]
Perlis, epigram 7:
7. It is easier to write an incorrect program than understand a correct one.
"but quality is definitely taking a hit now that no-one is able to keep up."
And its going to get worse! So please explain to me how in the net, you are going to be better off? You're not.
I think most people haven't taken a decent economics class and don't deeply understand the notion of trade offs and the fact there is no free lunch.
computerex 18 hours ago [-]
Technology has always helped people. Are you one of the people that say optimizing compilers are bad? Do you not use the intellisense? Or IDEs? Do you not use higher level languages? Why not write in assembly all the time? No free lunch right.
Yes there are trade offs, but at this point if you haven’t found a way to significantly amplify and scale yourself using llms, and your plan is to instead pretend that they are somehow not useful, that uphill battle can only last so long. The genie is out of the bag. Adapt to the times or you will be left behind. That’s just what I think.
johnnienaked 18 hours ago [-]
Technology does not always help people, in fact often it creates new problems that didn't exist before.
Also telling someone to "adapt to the times" is a bit silly. If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.
A LOT of parallels with crypto, which is still trying to find its killer app 16 years later.
brabel 17 hours ago [-]
I don’t think anyone needs to be convinced at this point. Every developer is using LLM and I really can’t believe someone who has made a career out of automating things wouldn’t be immediately drawn to trying them at least. Every single company seems convinced and using it too. The comparison to crypto makes no sense.
bolobo 16 hours ago [-]
> Every developer is using LLM
Citation needed. In my circles, Senior engineer are not using them a lot, or in very specific use cases. My company is blocking LLMs use apart from a few pilots (which I am part of, and while claude code is cool, its effectiveness on a 10-year old distributed codebase is pretty low).
You can't make sweeping statements like this, software engineering is a large field.
And I use claude code for my personal projects, I think it's really cool. But the code quality is still not there.
brabel 13 hours ago [-]
Stack overflow published recently a survey in which something like 80% of developers were using AI and the rest “wants to soon”. By now I have trouble believing a competent developer is still convinced they shouldn’t use it at all , though a few ludites perhaps might hold on for a bit longer.
skydhash 12 hours ago [-]
Stack overflow published a report about text editors and Emacs wasn’t part of the list. So I’m very sceptical about SO surveys.
brabel 8 hours ago [-]
I was also offended by that :D.
sarchertech 12 hours ago [-]
“Using AI” is a very broad term. Using AI to generate lurem ipsum is still “using AI”.
diegolas 11 hours ago [-]
> You can't make sweeping statements like this, software engineering is a large field.
that goes both ways
johnnienaked 17 hours ago [-]
I need to be convinced.
Go ahead, convince me. Please describe clearly and concisely in one or two sentences the clear economic value/advantage of LLMs.
utyop22 15 hours ago [-]
Careful now, you will scare them away!!!!!
People love stuff that makes them feel like they are doing less work. Cognitive biases distort reality and rational thinking, we know this already through behavioural economics.
computerex 17 hours ago [-]
The company I work for uses LLM's for digital marketing, the company has over 100M ARR selling products build on top of LLM's with real life measurable impact as measured by KPIs.
kaosoaksh 9 hours ago [-]
> real life measurable impact as measured by KPIs
This is making me even more skeptical of your claims. Individual metrics are often very poor at tracking reality.
johnnienaked 6 hours ago [-]
Individual metrics are often very good at distorting reality, which is why corporate executives love them so much.
johnnienaked 16 hours ago [-]
Digital marketing is old. What about LLMs gives an advantage to digital marketing?
throwboy2047 16 hours ago [-]
It’s just plain mean to make the Emperor speak of the thread count of his “clothes”.
Difwif 17 hours ago [-]
My parents could have said your first paragraph when I tried to teach them they could Google their questions and find answers.
Technology moves forward and productivity improves for those that move with it.
discreteevent 15 hours ago [-]
A few examples of technology that moved 'forward' but decreased productivity for those who moved with it from my 'lived' experience:
1) CASE tools (and UML driven development)
2) Wizard driven code.
3) Distributed objects
4) Microservices
These all really were the hot thing with massive pressure to adopt them just like now. The Microsoft demos of Access wizards generating a complete solution for your business had that same wow feeling as LLM code. That's not to say that LLM code won't succeed but it is to say that this statement is definitely false:
> Technology moves forward and productivity improves for those that move with it.
lithocarpus 16 hours ago [-]
I would say it as "technology tends to concentrate power to those who wield it."
That's not all it does but I think it's one of the more important fundamentals.
creesch 17 hours ago [-]
> Technology moves forward and productivity improves for those that move with it.
It does not, technology regresses just as often and linear deterministic progress is just a myth to begin with. There is no guarantee for technology to move forward and always make things better.
There are plenty of examples to be made where technology has made certain things worse.
johnnienaked 17 hours ago [-]
Why is productivity so important? When do regular people get to benefit from all this "progress?"
balamatom 13 hours ago [-]
Being permitted to eat - is that not great benefit?
balamatom 13 hours ago [-]
"But with Google is easier!" When you were trying to teach your folks about Google, were you taking into consideration dependence, enshittification, or the surveillance economy? No, you were retelling them the marketing.
Just by having lived longer, they might've had the chance to develop some intuition about the true cost of disruption, and about how whatever Google's doing is not a free lunch. Of course, neither them, nor you (nor I for that matter) had been taught the conceptual tools to analyze some workings of some Ivy League whiz kinds that have been assigned to be "eating the world" this generation.
Instead we've been incentivized to teach ourselves how to be motivated by post-hoc rationalizations. And ones we have to produce at our own expense too. Yummy.
Didn't Saint Google end up enshittifying people's very idea of how much "all of the world's knowledge" is; gatekeeping it in terms of breadth, depth and availability to however much of it makes AdSense. Which is already a whole lot of new useful stuff at your fingertips, sure. But when they said "organizing all of the world's knowledge" were they making any claims to the representativeness of the selection? No, they made the sure bet that it's not something the user would measure.
In fact, with this overwhelming amount of convincing non-experientially-backed knowledge being made available to everyone - not to mention the whole mass surveillance thing lol (smile, their AI will remember you forever) - what happens first and foremost is the individual becomes eminently marketable-to, way more deeply than over Teletext. Thinking they're able to independently make sense of all the available information, but instead falling prey to the most appealing narrative, not unlike a day trader getting a haircut on market day. And then one has to deal with even more people whose life is something someone sold to them, a race to the bottom in the commoditized activity (in the case of AI: language-based meaning-making).
But you didn't warn your parents about any of that or sit down and have a conversation about where it means things are headed. (For that matter, neither did they, even though presumably they've had their lives altered by the technological revolutions of their own day.) Instead, here you find yourself stepping in for that conversation to not happen among the public, either! "B-but it's obvious! G-get with it or get left behind!" So kind of you to advise me. Thankfully it's just what someone's paid for you to think. And that someone probably felt very productive paying big money for making people think the correct things, too, but opinions don't actually produce things do they? Even the ones that don't cost money to hold.
So if it's not about the productivity but about the obtaining of money to live, why not go extract that value from where it is, instead of breathing its informational exhaust? Oh, just because, figuratively speaking, it's always the banks have AIs that don't balk at "how to rob the bank"; and it's always we that don't. Figures, no? But they don't let you in the vault for being part of the firewall.
rolisz 17 hours ago [-]
Paul Krugman (Nobel laureate in economy) said in 1998 that the internet is no biggie. Many companies needed convincing to adopt the internet (heck, some still need convincing).
Would you say the same thing ("If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.") about the internet?
balamatom 13 hours ago [-]
I would, unironically.
Thing's called a self-fulfilling prophecy. Next level to a MLM scheme: total bootstrap. Throwing shit at things in innate primate activity, use money to shift people's attention to a given thing for long enough and eventually they'll throw enough shit at the wall for something to stick. At which point it becomes something able to roll along with the market cycles.
computerex 17 hours ago [-]
"It is difficult to get a man to understand something when his salary depends upon his not understanding it"
johnnienaked 15 hours ago [-]
It's also difficult to get a man to understand something if you stubbornly refuse to explain it.
Instead of this crypto-esque hand waving, maybe you can answer me now?
jimbokun 8 hours ago [-]
The big difference is that all of the other technologies you cite are deterministic making it easy to predict their behavior.
You have to inspect the output of LLMs much more carefully to make sure they are doing what you want.
utyop22 15 hours ago [-]
"Technology has always helped people. Are you one of the people that say optimizing compilers are bad? Do you not use the intellisense? Or IDEs? Do you not use higher level languages? Why not write in assembly all the time? No free lunch right."
Actually I am not a software engineer for a living so I have zero vested interest or bias lmao. That said, I studied Comp Sci at a top institution and really loved defining algorithms and had no interest in actual coding as it didn't give my brain the variety it needs.
If you are employed as a software engineer, I am probably more open to realizing the problems that only become obvious to you later on.
joe_the_user 18 hours ago [-]
Someone already pointed out that we're at the point when it's longer possible to know if comments like the above are satire or not.
threecheese 8 hours ago [-]
Fast feedback is one benefit, given the 90% is releasable - even if only to a segment of users. This might be anathema to good engineering, but a benefit to user experience research and to organizations that want to test their market for demand.
Fast feedback is also great for improving release processes; when you have a feedback loop with Product, UX, Engineering, Security etc, being able to front load some % of a deliverable can help you make better decisions that may end up being a time saver net/net.
naasking 8 hours ago [-]
> And its going to get worse!
That isn't clear given the fact that LLMs and, more importantly, LLM programming environments that manage context better are still improving.
globular-toast 17 hours ago [-]
Yep, my strong feeling is that the net benefit of all of this will be zero. The time you have to spend holding the LLM hand is almost equal to how much time you would have spent writing it yourself. But then you've got yourself a codebase that you didn't write yourself, and we all know hunting bugs in someone else's code is way harder than code you had a part in designing/writing.
People are honestly just drunk on this thing at this point. The sunken cost fallacy has people pushing on (ie. spending more time) when LLMs aren't getting it right. People are happy to trade convenience for everything else, just look at junk food where people trade in flavour and their health. And ultimately we are in a time when nobody is building for the future, it's all get rich quick schemes: squeeze then get out before anyone asks why the river ran dry. LLMs are like the perfect drug for our current society.
Just look at how technology has helped us in the past decades. Instead of launching us towards some kind of Star Trek utopia, most people now just work more for less!
jama211 15 hours ago [-]
Only when purely vibe coding. AI currently saves a LOT of time if you get it to generate boilerplate, diagnose bugs, or assist with sandboxed issues.
The proof is in the pudding. The work I do takes me half as long as it used to and is just as high in quality, even though I manage and carefully curate the output.
sarchertech 11 hours ago [-]
I use AI for most of those things. And I think it probably saves me a bit of time.
But in that study that came out a few weeks ago where they actually looked at time saved, every single developer overestimated their time saved. To the point where even the ones who lost time thought they saved time.
LLMs are very good at making you feel like you’re saving time even when you aren’t. That doesn’t mean they can’t be a net productivity benefit.
But I’d be very very very surprised if you have real hard data to back up your feelings about your work taking you half as long and being equal quality.
Filligree 9 hours ago [-]
That study predates Claude Code though.
I’m not surprised by the contents. I had the same feeling; I made some attempts at using LLMs for coding prior to CC, and with rare exceptions it never saved me any time.
CC changed that situation hugely, at least in my subjective view. It’s of course possible that it’s not as good as I feel it is, but I would at least want a new study.
sarchertech 7 hours ago [-]
I don’t believe that CC is so much better than cursor using Claude models that it moves the needle enough to flip the results of that study.
The key thing to look at is that even the participants that did objectively save time, overestimated time saved by a huge amount.
But also you’re always likely to be at least one model ahead of any studies that come out.
jimbokun 8 hours ago [-]
> That study predates Claude Code though.
Is there a study demonstrating Claude Code improves productivity?
jama211 4 hours ago [-]
I mean, I used to average 2 hours of intense work a day and now it’s 1 hour.
globular-toast 2 hours ago [-]
I don't write much boilerplate anyway. I long ago figured out ways to not do that (I use a computer to do repetitive tasks for me). So when people talk about boilerplate I feel like they're only just catching up to me, not surpassing me.
As for catching bugs, maybe, but I feel like it's pot luck. Sometimes it can find something, sometimes it's just complete rubbish. Sometimes worth giving it a spin but still not convinced it's saving that much. Then again I don't spend much time hunting down bugs in unfamiliar code bases.
utyop22 15 hours ago [-]
My way of looking at this is simple.
What are people doing this quote on quote, time that they have gained back? Working on new projects? Ok can you show me the impact on the financials (show me the money)? And then I usually get dead silence. And before someone mentions the layoffs - lmao get real. Its offshoring 2.0 so that the large firms can increase their internal equity to keep funding this effort.
Most people are terrible at giving true informed opinions - they never dig deep enough to determine if what they are saying is proper.
Cthulhu_ 16 hours ago [-]
I'd argue that this awareness is a good thing; it means you're measuring, analyzing, etc all the code.
Best practices in software development for forever have been to verify everything; CI, code reviews, unit tests, linters, etc. I'd argue that with LLM generated code, a software developer's job and/or that of an organization as a whole has shifted even more towards reviewing and verification.
If quality is taking a hit you need to stop; how important is quality to you? How do you define quality in your organization? And what steps do you take to ensure and improve quality before merging LLM generated code? Remember that you're still the boss and there is no excuse for merging substandard code.
thrawa8387336 7 hours ago [-]
Imagine someones add 10 UTs carefully devised and someone notices they need 1 more during the PR.
Scenario B, you add 40 with an LLM, that look good on paper but only cover 6 of the original ones. Besides, who's going to pay careful attention to a PR with 40.
"Must be so thorough!".
epolanski 1 days ago [-]
As Fowler himself states, there's a need to learn to use these tools properly.
In any case poor work quality is a failure of tech leadership and culture, it's not AI's fault.
FromTheFirstIn 22 hours ago [-]
It’s funny how nothing seems to be AI’s fault.
Cthulhu_ 16 hours ago [-]
That's because it's software / an application. I don't blame my editor for broken code either. You can't put blame on software itself, it just does what it's programmed to do.
But also, blameless culture is IMO important in software development. If a bug ends up in production, whose fault is it? The developer that wrote the code? The LLM that generated it? The reviewer that approved it? The product owner that decided a feature should be built? The tester that missed the bug? The engineering organization that has a gap in their CI?
Blameless culture is important for a lot of reasons, but many of them are human. LLMs are just tools. If one of the issues identified in a post-mortem is "using this particular tool is causing us problems", there's not a blameless culture out there that would say "We can't blame the tool..."; the action item is "Figure out how to improve/replace/remove the tool so it no longer contributes to problems."
Jensson 10 hours ago [-]
> You can't put blame on software itself, it just does what it's programmed to do.
This isn't what AI enthusiasts say about AI though, they only bring that up when they get defensive but then go around and say it will totally replace software engineers and is not just a tool.
epolanski 18 hours ago [-]
If poor work gets merged, the responsibility lies in who wrote it, who merged it, and who allows such a culture.
The tools used do not hold responsibilities, they are tools.
discreteevent 14 hours ago [-]
"I got rid of that machine saw. Every so often it made a cut that was slightly off line but it was hard to see. I might not find out until much later and then have to redo everything."
xtracto 12 hours ago [-]
If your toaster burns your breakfast bread, Do you ultimately blame "it"?
You gdt mad, swear at it, maybe even throw it to the wall on a git of rage but, at the end of the day, deep inside you still know you screwed.
skydhash 12 hours ago [-]
Devices can be faulty and technology can be inappropriate.
sarchertech 11 hours ago [-]
If I bought an AI powered toaster that allows me to select a desired shade of toast, I select light golden brown, and it burns my toast, I certainly do blame “it”.
I wouldn’t throw it against a wall because I’m not a psychopath, but I would demand my money back.
johnnienaked 18 hours ago [-]
No one seems to be able to grasp the possibility that AI is a failure
raducu 17 hours ago [-]
> No one seems to be able to grasp the possibility that AI is a failure.
Do you think by the time GPT-9 comes, we'll say "That's it, AI is a failure, we'll just stop using it!"
Or do you speak in metaphorical/bigger picture/"butlerian jihad" terms?
johnnienaked 15 hours ago [-]
I don't see the use-case now, maybe there will be one by GPT-9
Kiro 14 hours ago [-]
Absence of your need isn't evidence of no need.
johnnienaked 6 hours ago [-]
This is true, but I've never heard of a use case. To which you might reply, "doesn't mean there isn't one," which you would be also right about.
Maybe you know one.
jama211 15 hours ago [-]
“There’s no use for this thing!” - said the farmer about the computer
colordrops 17 hours ago [-]
You've failed to figure out when and how to use it. It's not a binary failed/succeeded thing.
nicce 17 hours ago [-]
None of the copyright issues or suicide cases are handled in the court yet. There are many aspects.
johnnienaked 17 hours ago [-]
Metaverse was...
colordrops 17 hours ago [-]
How could a tool be at fault? If an airplane crashes is the plane at fault or the designers, engineers, and/or pilot?
wk_end 15 hours ago [-]
Designers, engineers, and/or pilots aren't tools, so that's a strange rhetorical question.
At any rate, it depends on the crash. The NTSB will investigate and release findings that very well may assign fault to the design of the plane and/or pilot or even tools the pilot was using, and will make recommendations about how to avoid a similar crash in the future, which could include discontinuing the use of certain tools.
oo0shiny 1 days ago [-]
> My former colleague Rebecca Parsons, has been saying for a long time that hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.
What a great way of framing it. I've been trying to explain this to people, but this is a succinct version of what I was stumbling to convey.
jstrieb 18 hours ago [-]
I have been explaining this to friends and family by comparing LLMs to actors. They deliver a performance in-character, and are only factual if it happens to make the performance better.
The analogy goes down the drain when a criterion for good performance is being objectively right. Like with Reinforcement Learning from Verifiable Rewards.
ACCount37 11 hours ago [-]
RLVR can also encourage hallucinations quite easily. Think of SAT: giving a random answer is right 20% of the time, giving "I don't know" is right 0% of the time. If you only reward for test score, you encourage guesswork. So good RL reward design is as important as ever.
That being said, there are methods to train LLMs against hallucinations, and they do improve hallucination-avoidance. But anti-hallucination capabilities are fragile and do not fully generalize. There's no (known) way to train full awareness of its own capabilities into an LLM.
neonspark 8 hours ago [-]
I think what you say is true, and I think that this is exactly true for humans as well. There is no known way to completely eliminate unintentional bullshit coming from a human’s mouth. We have many techniques for reducing it, including critical thinking, but we are all susceptible to it and I imagine we do it many times a day without too much concern.
We need to make these models much much better, but it’s going to be quite difficult to reduce the levels to even human levels. And the BS will always be there with us. I suppose BS is the natural side effect of any complex system, artificial or biological, that tries to navigate the problem space of reality and speak on it.
These systems, sometimes called “minds”, are going to produce things that sound right but just are not true.
ACCount37 8 hours ago [-]
It's a feeling I can't escape: that by trying to build thinking machines, we glimpse more and more of how the human mind works, and why it works the way it does - imperfections and all.
"Critical thinking" and "scientific method" feel quite similar to the "let's think step by step" prompt for the early LLMs. More elaborate directions, compensating for the more subtle flaws of a more capable mind.
jstrieb 8 hours ago [-]
Nobody that I'd be using this analogy with is currently using LLMs for tasks that are covered by RLVF. They're asking models for factual information about the real world (Google replacement), or to generate text (write a cover letter), not the type of outputs that are verifiable within formal systems—by definition the type of output that RLVF is intended to improve. The actor analogy is still helpful for providing intuition to non-technical people who don't know how to think about LLMs, but do use them.
Also, unless I am mistaken, RLVF changes the training to make LLMs less likely to hallucinate, but in no way does it make hallucination impossible. Under the hood, the models still work the same way (after training), and the analogy still applies, no?
red75prime 6 hours ago [-]
> Under the hood, the models still work the same way (after training), and the analogy still applies, no?
Under the hood we have billions of parameters that defy any simple analogies.
Operations of a network are shaped by human data. But the structure of the network is not like the human brain. So, we have something that is human-like in some ways, but deviates from humans in ways, which are unlikely to be like anything we can observe in humans (and use as a basis for analogy).
jimbokun 8 hours ago [-]
But being "objectively right" is not the goal of an actor.
Thus, why it's a good metaphor for the behavior of LLMs.
lagrange77 15 hours ago [-]
I'll steal that.
bo1024 11 hours ago [-]
This is also related to the philosophical definition of bullshit[1]: speech intended to persuade or influence without any active intention to be either true or false.
A better analogy is of an overconfident 5 year old kid, who never says that they don't know the answer and always has an "answer" for everything.
aitchnyu 15 hours ago [-]
All models are wrong, some are merely useful - 1976/1933/earlier adage.
lagrange77 15 hours ago [-]
Right, all models are inherently wrong. It's up to the user know about its limits / uncertainty.
But i think this 'being wrong' is kind of confusing when talking about LLMs (in contrast to systems/scientific modelling).
In what they model (language), the current LLMs are really good and acurate, except for example the occasional chinese character in the middle of a sentence.
But what we mean by LLMs 'being wrong' most of the time is being factually wrong in answering a question, that is expressed as language. That's a layer on top of what the model is designed to model.
EDITS:
So saying 'the model is wrong' when it's factually wrong above the language level isn't fair.
I guess this is essentially the same thought as 'all they do is hallucinate'.
pjmorris 15 hours ago [-]
Generally attributed to George Box
mohsen1 18 hours ago [-]
Intelligence in a way is the ability to filter out useless information. Be it, thoughts or sensory information
15 hours ago [-]
tugberkk 16 hours ago [-]
Yes, can't remember who said it but LLM's always hallucinate, it is just that they are 90 something percent right.
ljm 14 hours ago [-]
If I was to drop acid and hallucinate an alien invasion, and then suddenly a xenomorph runs loose around the city while I’m tripping balls, does being right in that one instance mean the rest of my reality is also a hallucination?
Because it seems the point being made multiple times that a perceptual error isn’t a key component of hallucinating, the whole thing is instead just a convincing illusion that could theoretically apply to all perception, not just the psychoactively augmented kind.
OtomotO 15 hours ago [-]
Which totally depends on your domain and subdomain.
E.g. Programming in JS or Python: good enough
Programming in Rust: I can scrap over 50% of the code because it will
a) not compile at all (I see this while the "AI" types)
b) not meet the requirements at all
rootusrootus 23 hours ago [-]
For a hot second I thought LLMs were coming for our jobs. Then I realized they were just as likely to end up creating mountains of things for us to fix later. And as things settle down, I find good use cases for Claude Code that augment me but are in no danger of replacing me. It certainly has its moments.
jama211 15 hours ago [-]
Finally, an opinion on here that’s reasonable and isn’t “AI is perfect” or “AI is useless”.
mexicocitinluez 11 hours ago [-]
One of the things that has struck me as odd is just how little self-awareness devs have when talking about "skin in the game" with regard to CEO's hawking AI products.
Like, we have just as much to lose as they have to gain. Of course a part of us doesn't want these tools to be as good as some people say they are because it directly affects our future and livelihood.
No, they can't do everything. Yes, they can do some things. It's that simple.
kaosoaksh 8 hours ago [-]
> doesn’t want these tools to be as good as some people say they are
No, this is because that would mean AGI. And it’s obviously not that.
bko 1 days ago [-]
> Certainly if we ever ask a hallucination engine for a numeric answer, we should ask it at least three times, so we get some sense of the variation.
This works on people as well!
Cops do this when interrogating. You tell the same story three times, sometimes backwards. It's hard to keep track of everything if you're lying or you don't recall clearly so you can get a sense of confidence. Also works on interviews, ask them to explain a subject in three different ways to see if they truly understand.
jama211 15 hours ago [-]
It works to confuse people and make them sound like they’re lying when they’re not, too. Gotta be careful with this.
Terr_ 1 days ago [-]
> This works on people as well!
Only within certain conditions or thresholds that we're still figuring out. There are many cases where the more someone recalls and communicates their memory, the more details get corrupted.
> Cops do this when interrogating.
Sometimes that's not to "get sense of the variation" but to deliberately encourage a contradiction to pounce upon it. Ask me my own birthday enough times in enough ways and formats, and eventually I'll say something incorrect.
Care must also be taken to ensure that the questioner doesn't change the details, such as by encouraging (or sometimes forcing) the witness/suspect to imagine things which didn't happen.
inerte 24 hours ago [-]
Triple modular redundancy. I remember reading that's how Nasa space shuttles calculate things because a processor / memory might have been affected by space radiation https://llis.nasa.gov/lesson/18803
anon7725 16 hours ago [-]
Triple redundancy works because you know that under nominal conditions each computer would produce the correct result independently. If 2 out of 3 computers agree, you have high confidence that they are correct and the 3rd one isn’t.
With LLMs you have no such guarantee or expectation.
chistev 1 days ago [-]
Who remembers that scene on Better Call Saul between Lalo, Saul, and Kim?
daviding 1 days ago [-]
I get a lot of productivity out of LLMs so far, which for me is a simple good sign. I can get a lot done in a shorter time and it's not just using them as autocomplete. There is this nagging doubt that there's some debt to pay one day when it has too loose a leash, but LLMs aren't alone in that problem.
One thing I've done with some success is use a Test Driven Development methodology with Claude Sonnet (or recently GPT-5). Moving forward the feature in discrete steps with initial tests and within the red/green loop. I don't see a lot written or discussed about that approach so far, but then reading Martin's article made me realize that the people most proficient with TDD are not really in the Venn Diagram intersection of those wanting to throw themselves wholeheartedly into using LLMs to agent code. The 'super clippy' autocomplete is not the interesting way to use them, it's with multiple agents and prompt techniques at different abstraction levels - that's where you can really cook with gas. Many TDD experts have great pride in the art of code, communicating like a human and holding the abstractions in their head, so we might not get good guidance from the same set of people who helped us before. I think there's a nice green field of 'how to write software' lessons with these tools coming up, with many caution stories and lessons being learnt right now.
It feels like Tdd/llm connection is implied — “and also generate tests”. Thought it’s not cannonical tdd of course. I wonder if it’ll turn the tide towards tech that’s easier to test automatically, like maybe ssr instead of react.
daviding 1 days ago [-]
Yep, it's great for generating tests and so much of that is boilerplate that it feels great value. As a super lazy developer it's great as the burden of all that mechanical 'stuff' being spat out is nice. Test code being like baggage feels lighter when it's just churned out as part of the process, as in no guilt just to delete it all when what you want to do changes. That in itself is nice. Plus of course MCP things (Playwright etc) for integration things is great.
But like you said, it was meant more TDD as 'test first' - so a sort of 'prompt-as-spec' that then produces the test/spec code first, and then go iterate on that. The code design itself is different as influenced by how it is prompted to be testable. So rather than go 'prompt -> code' it's more an in-between stage of prompting the test initially and then evolve, making sure the agent is part of the game of only writing testable code and automating the 'gate' of passes before expanding something. 'prompt -> spec -> code' repeat loop until shipped.
girvo 1 days ago [-]
The only thing I dislike is what it chooses to test when asked to just "generate tests for X": it often chooses to build those "straitjacket for your code" style tests which aren't actually useful in terms of catching bugs, they just act as "any change now makes this red"
As a simple example, a "buildUrl" style function that put one particular host for prod and a different host for staging (for an "environment" argument) had that argument "tested" by exactly comparing the entire functions return string, encoding all the extra functionality into it (that was tested earlier anyway).
A better output would be to check startsWith(prodHost) or similar, which is what I changed it into, but I'm still trying to work out how to get coding agents to do that in the first or second attempt.
But that's also not surprising: people write those kinds of too-narrow not-useful tests all the time, the codebase I work on is littered with them!
rvz 1 days ago [-]
> It feels like Tdd/llm connection is implied — “and also generate tests”.
That sounds like an anti-pattern and not true TDD to get LLMs to generate tests for you if you don't know what to test for.
It also reduces your confidence in knowing if the generated test does what it says. Thus, you might as well write it yourself.
Otherwise you will get these sort of nasty incidents. [0] Even when 'all tests passed'.
LLMs (Sonnet, Gemini from what I tested) tend to “fix” failing tests by either removing them outright or tweaking the assertions just enough to make them pass. The opposite happens too - sometimes they change the actual logic when what really needs updating is the test.
In short, LLMs often get confused about where the problem lies: the code under test or the test itself. And no amount of context engineering seems to solve that.
Zerot 19 hours ago [-]
I think in part the issue is that the LLM does not have enough context. The difference between a bug in the test or a bug in the implementation is purely based on the requirements which are often not in the source code and stored somewhere else(ticket system, documentation platform).
Without providing the actual feature requirements to the LLM(or the developer) it is impossible to determine which is wrong.
Which is why I think it is also sort of stupid by having the LLM generate tests by just giving it access to the implementation. That is at best testing the implementation as it is, but tests should be based on the requirements.
gck1 18 hours ago [-]
Oh, absolutely, context matters a lot. But the thing is, they still fail even with solid context.
Before I let an agent touch code, I spell out the issue/feature and have it write two markdown files - strategy.md and progress.md (with the execution order of changes) inside a feat_{id} directory. Once I’m happy with those, I wipe the context and start fresh: feed it the original feature definition + the docs, then tell it to implement by pulling in the right source code context. So by the time any code gets touched, there’s already ~80k tokens in play. And yet, the same confusion frequently happens.
Even if I flat out say “the issue is in the test/logic,”, even if I point out _exactly_ what the issue is, it just apologizes and loops.
At that point I stop it, make it record the failure in the markdown doc, reset context, and let it reload the feature plus the previous agent’s failure. Occasionally that works, but usually once it’s in that state, I have to step in and do it myself.
tony2016 5 hours ago [-]
The developer is supposed to check and verify the code that an LLM creates. Ask for smaller requests in the prompts so you don't get too much code.
Unit tests are verified and run by the developer. I don't know what he means by an LLM runs all the tests and gives green. It can fake running tests. I always the tests myself, whether they were created by an LLM or not.
sebnukem2 1 days ago [-]
> hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.
Nice.
nine_k 1 days ago [-]
I'd rather say that LLMs live in a world that consists entirely of stories, nothing but words and their combinations. Thy have no other reality. So they are good at generating more stories that would sit well with the stories they already know. But the stories are often imprecise, and sometimes contradictory, so they have to guess. Also, LLMs don't know how to count, but they know that two usually follows one, and three is usually said to be larger than two, so they can speak in a way that mostly does not contradict this knowledge. They can use tools to count, like a human who knows digits would use a calculator.
But much more than an arithmetic engine, the current crop of AI needs an epistemic engine, something that would help follow logic and avoid contradictions, to determine what is a well-established fact, and what is a shaky conjecture. Then we might start trusting the AI.
chadcmulligan 16 hours ago [-]
One night, I asked it to write me some stories, it did seem happy doing that. I just kept saying do what you want when it asked me for a choice, its a fun little way to spend a couple of hours.
aitchnyu 15 hours ago [-]
I've asked LLMs to modify 5 files at a time and mark a checklist. Also image generators no longer draw 6 fingered hands.
gnerd00 1 days ago [-]
this was true, but then it wasn't... the research world several years ago, had a moment when the machinery could reliably solve multi-step problems.. there had to be intermediary results; and machinery could solve problems in a domain where they were not trained specifically.. this caused a lot of excitement, and several hundred billion dollars in various investments.. Since no one actually knows how all of it works, not even the builders, here we are.
utyop22 23 hours ago [-]
"Since no one actually knows how all of it works, not even the builders, here we are."
To me this is the most bizarre part. Have we ever had a technology deployed at this scale without a true understanding of its inner workings?
My fear is that the general public perception of AI will be damaged since for most LLMs = AI.
SillyUsername 20 hours ago [-]
This is a misconception, we absolutely do know how LLMs work, that's how we can write them and publish research papers.
The idea we don't is tabloid journalism, it's simply because the output is (usually) randomised - taken to mean, by those who lack the technical chops, that programmers "don't know how it works" because the output is indeterministic.
This is not withstanding we absolutely can repeat the output by using not randomisation (temperature 0).
riwsky 21 hours ago [-]
Humanity used fire for like a bazillion years before figuring out thermodynamics
achierius 23 hours ago [-]
Are you sure you're talking about LLMs? These sound more like traditional ML systems like AlphaFold or AlphaProof.
tptacek 1 days ago [-]
In that framing, you can look at an agent as simply a filter on those hallucinations.
armchairhacker 1 days ago [-]
This vaguely relates to a theory about human thought: that our subconscious constantly comes up with random ideas, then filters the unreasonable ones, but in people with delusions (e.g. schizophrenia) the filter is broken.
Salience (https://en.wikipedia.org/wiki/Salience_(neuroscience)), "the property by which some thing stands out", is something LLMs have trouble with. Probably because they're trained on human text, which ranges from accurate descriptions of reality to nonsense.
keeda 1 days ago [-]
More of a error-correcting feedback loop rather than a filter, really. Which is very much what we do as humans, apparently. One recent theory of neuroscience that is becoming influential is Predictive Processing --https://en.wikipedia.org/wiki/Predictive_coding -- this postulates that we also constantly generate a "mental model" of our environment (a literal "prediction") and use sensory inputs to correct and update it.
So the only real difference between "perception" and a "hallucination" is whether it is supported by physical reality.
sethammons 18 hours ago [-]
You ever see something out of the corner of your eye and you were mistaken? It feels like LLM hallucinations. "An orange cat on my counter?!" And an instant later, your brain has reclassified "a basketball on my counter" as that fits the environment model better as several instant-observations gather contexts: not moving, more round, not furry, I don't own a cat, yesterday my kid mentioned something about tryouts, boop insta-reclassification from cat to basketball.
I can recognize my own meta cognition there. My model of reality course corrects the information feed interpretation on the fly. Optical illusions feel very similar whereby the inner reality model clashes with the observed.
For general ai, it needs a world model that can be tested against and surprise is noted and models are updated. Looping llm output with test cases is a crude approximation of that world model.
osullivj 18 hours ago [-]
I leaned heavily on my own meta cognition recently when withdrawing from a bereavment benzo habit recently. The combo of flu symptoms, anxiety and hallucinations are fierce. I knew O was seeing things that were not real; light fittings turning into a rotating stained glass slideshow. So I'm totally on board with the visual model hypothesis. My own speculation is that audio perception is less predictive as audio structure persists deeper into my own DMT sessions than does vision, where perspective quickly collapses and vision becomes kaleidoscopic. Which may be a return to the vision we had as newborns. Maybe normal vision is only attained via socialisation?
FuckButtons 18 hours ago [-]
Sounds like a kalman filter, which suggests to me that it’s too simplistic a perspective.
jonoc 20 hours ago [-]
thats a fascinating way to put it
Lionga 1 days ago [-]
Isn't an "agent" not just hallucinations layered on top of other random hallucinations to create new hallucinations?
tptacek 1 days ago [-]
No, that's exactly what an agent isn't. What makes an agent an agent is all the not-LLM code. When an agent generates Golang code, it runs the Go compiler, which is in the agent's architecture an extension of the agent. The Go compiler does not hallucinate.
Lionga 1 days ago [-]
The most common "agent" is an letting an LLM run a while loop (“multi-step agent”) [1]
That's not how Claude Code works (or Gemini, Cursor, or Codex).
th0ma5 1 days ago [-]
Yes yes, with yet to be discovered holes
keeda 1 days ago [-]
I've prefered to riff off of the other quote:
"All (large language) model outputs are hallucinations, but some are useful."
Some astonishingly large proportion of them, actually. Hence the AI boom.
jama211 15 hours ago [-]
I find it a bit of a reductive way of looking at it personally
anthem2025 1 days ago [-]
Isn’t that why people argue against calling them hallucinations?
It implies that some parts of the output aren’t hallucinations, when the reality is that none of it has any thought behind it.
awesome_dude 1 days ago [-]
I have a very similar (probably unoriginal) thought about some human mental illnesses.
So, we VALUE creativity, we claim that it helps us solve problems, improves our understanding of the universe, etc.
BUT people with some mental illnesses, their brain is so creative that they lose the understanding of where reality is and where their imagination/creativity takes over.
eg. Hearing voices? That's the brain conjuring up a voice - auditory and visual hallucinations are the easy example.
But it goes further, depression is where people's brains create scenarios where there is no hope, and there's no escape. Anxiety too, the brain is conjuring up fears of what's to come
malloryerik 16 hours ago [-]
You may like to check out Iain McGilchrist's take on schizophrenia, which essentially he says is a relative excess of rationality ("if then else" thinking) and a deficit of reasonableness (as in sensible context inhabiting).
awesome_dude 2 hours ago [-]
I shall have a read of that at some point.
ninetyninenine 1 days ago [-]
Nah I don't agree with this characterization. The problem is, the majority of those hallucinations are true. What was said would make more sense if the majority of the responses were, in fact, false, but this is not the case.
xmprt 1 days ago [-]
I think you're both correct but have different definitions of hallucinations. You're judging it as a hallucination based on the veracity of the output. Whereas Fowler is judging it based on the method by which the output is achieved. By that judgement, everything is a hallucination because the user cannot differentiate between when the LLM is telling the truth and isn't.
This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.
ants_everywhere 1 days ago [-]
an LLM hallucination is defined by its truth
> In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called confabulation,[1] or delusion)[2] is a response generated by AI that contains false or misleading information presented as fact.[3][4]
You say
> This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.
For consistency you might as well say everything the human mind does is hallucination. It's the same sort of claim. This claim at least has the virtue of being taken seriously by people like Descartes.
Even the colloquial term outside of AI is characterized by the veracity of the output.
BlueTemplar 1 days ago [-]
LLMs don't hallucinate : they bullshit (which is not caring about truth).
ninetyninenine 1 days ago [-]
This isn't a good characterization of it either. I don't think LLMs know the difference. Bullshit implies they are lying.
It's possible LLMs are lying but my guess is that they really just can't tell the difference.
hatefulmoron 19 hours ago [-]
They're using the term 'bullshit' as it is understood as a term of art, which doesn't imply lying. It's closer to creating a response without any regard for telling the truth. Bullshitting is often most effective when you happen to be telling the truth, although the bullshitter has no commitment to that.
ninetyninenine 6 hours ago [-]
I disagree. IF something is bullshit nobody means that bullshit is possibly true. Bullshit is colloquially always false and always a lie.
1 days ago [-]
siffland 23 hours ago [-]
The issue is a lot of times it takes a senior level programmer to reason with a LLM to get the results needed. What happens when there are no Juniors to replace the Seniors. I guess by then AI will be able to program efficiently enough.
So far AI seems to be a great augmentation, but not a replacement.
nomilk 1 days ago [-]
> We should ask the LLM the question more than once
For any macOS users, I highly recommend an Alfred workflow so you just press command + space then type 'llm <prompt>' and it opens tabs with the prompt in perplexity, (locally running) deepseek, chatgpt, claude and grok, or whatever other LLMs you want to add.
This approach satisfies Fowler's recommendation of cross referencing LLM responses, but is also very efficient and over time gives you a sense of which LLMs perform better for certain tasks.
_mu 1 days ago [-]
What would such a workflow look like? I have Alfred but mainly just use the clipboard feature. I've tried to get into automation but struggled for inspiration. This one seems good!
Are you just opening a browser tab?
nomilk 23 hours ago [-]
Go to the 'Workflows' tab, make a new one with keyword of your choice (e.g. llm), and map it to open these urls in your default browser:
Modify to your taste. Example: https://github.com/stevecondylios/alfred-workflows/tree/main (you should be able to download the .alfredworkflow file and double click on it to import it straight into alfred, but creating your own shouldn't take long, maybe 5-10 mins if it's your first workflow)
mauflows 8 hours ago [-]
I made something similar that uses alacritty / llm / tmux. The referenced script is also in the repo
One aspect of using LLM's to code that I have not seen mentioned is the "loss of attachment" to code in my projects.
For example, working on my project (https://mudg.fly.dev) I wanted to experiment with a new FE graph library. I asked an llm the fantastic (https://tidewave.ai) and it completely implemented a solution, which turned out to be worse than the current solution.
If I had spent many hours working on a solution I might have been more "attached" to the solution and kept with it.
Have you considered that LLMs are also biased against new languages and libraries, so the code quality will be worse compared to something more established regardless of what you personally think/feel?
abcde-bbq 4 hours ago [-]
More details on tolerance would be nice. Examples:
1: when building games for my kids, the tolerance is high so the LLM can implement a feature in one shot and I will manually test the feature (without bothering to review the code or generating unit tests or using a type-safe language).
2: building a demo at work has lower tolerance for errors, but is still good with LLM with a brief code review.
3: for production code, other processes can be in place to meet the even lower tolerance requirements.
mikewarot 7 hours ago [-]
>Should senior engineers get out of the profession before it’s too late?
If they are actually using the engineering principles that their job description hints at, they'll probably be fine. Software Engineering is a growing field, the need for actual Engineers (you know, the kind that get licensed in other fields) is unlikely to shrink any time soon.
skhameneh 1 days ago [-]
There are many I've worked with that idolize Martin Fowler and have treated his words as gospel. That is not me and I've found it to be a nuisance, sometimes leading me to be overly critical of the actual content. As for now, I'm not working with such people and can appreciate the article shared without clouded bias.
I like this article, I generally agree with it. I think the take is good. However, after spending ridiculous amounts of time with LLMs (prompt engineering, writing tokenizers/samplers, context engineering, and... Yes... Vibe coding) for some periods 10 hour days into weekends, I have come to believe that many are a bit off the mark. This article is refreshing, but I disagree that people talking about the future are talking "from another orifice".
I won't dare say I know what the future looks like, but the present very much appears to be an overall upskilling and rework of collaboration. Just like every attempt before, some things are right and some are simply misguided. e.g. Agile for the sake of agile isn't any more efficient than any other process.
We are headed in a direction where written code is no longer a time sink. Juniors can onboard faster and more independently with LLMs, while seniors can shift their focus to a higher level in application stacks. LLMs have the ability to lighten cognitive loads and increase productivity, but just like any other productivity enhancing tool doing more isn't necessarily always better. LLMs make it very easy to create and if all you do is create [code], you'll create your own personal mess.
When I was using LLMs effectively, I found myself focusing more on higher level goals with code being less of a time sink. In the process I found myself spending more time laying out documentation and context than I did on the actual code itself. I spent some days purely on documentation and health systems to keep all content in check.
I know my comment is a bit sparse on specifics, I'm happy to engage and share details for those with questions.
manmal 1 days ago [-]
> written code is no longer a time sink
It still is, and should be. It’s highly unlikely that you provided all the required info to the agent at first try. The only way to fix that is to read and understand the code thoroughly and suspiciously, and reshaping it until we’re sure it reflects the requirements as we understand them.
skhameneh 1 days ago [-]
Vibe coding is not telling an agent what to do and checking back. It's an active engagement and best results are achieved when everything is planned and laid out in advance — which can also be done via vibe coding.
No, written code is no longer a time sink. Vibe coding is >90% building without writing any code.
The written code and actions are literally presented in diffs as they are applied, if one so chooses.
anskskbs 1 days ago [-]
> It's an active engagement and best results are achieved when everything is planned and laid out in advance
The most efficient way to communicate these plans is in code. English is horrible in comparison.
When you’re using an agent and not reviewing every line of code, you’re offloading thinking to the AI. Which is fine in some scenarios, but often not what people would call high quality software.
Writing code was never the slow part for a competent dev. Agent swarming etc is mostly snake oil by those who profit off LLMs.
x0x0 22 hours ago [-]
ime this is the problem. When I have to deeply understand what an llm created, I don't see much of a speed improvement vs writing it myself.
With an engineer you can hand off work and trust that it works, whereas I find code reviewing llm output something that I have to treat as hostile. It will comment out auth or delete failing tests.
bluefirebrand 18 hours ago [-]
> When I have to deeply understand what an llm created
Which should be always in my opinion
Are people really pushing code to production that they don't understand?
discreteevent 14 hours ago [-]
They are, because in fairness in a lot of cases it just doesn't matter. It's some website to get clicks for ads and as long as you can vibe use it it's good enough to vibe code it.
epolanski 1 days ago [-]
> It's an active engagement and best results are achieved when everything is planned and laid out in advance — which can also be done via vibe coding.
No.
The general assumed definition of vibe coding, hence the vibe word, is that coding becomes an iterative process guided by intuition rather than spec and processes.
What you describe is literally the opposite of vibe coding, it feels the term is being warped into "coding with an LLM".
skhameneh 24 hours ago [-]
I've described an iterative process where one never needs to touch code or documents directly.
Leaving out specs and documentation leads to more slop and hallucinations, especially with smaller models.
mehagar 1 days ago [-]
How could you possibly plan out "everything" in advance? Code itself would be the only way to explicitly specify the "everything".
skhameneh 24 hours ago [-]
Have a documentation system in place and have the LLM plan the high level before having the LLM write any code.
You can always just wing it, but if you do so and there isn't adequate existing context you're going to struggle with slop and hallucinations more frequently.
osullivj 18 hours ago [-]
Written code is an auditable entity in regulated businesses, like banks. Fowler suggests we get more comfortable with uncertainty because other eng domains have developed good risk mgmt. Cart before horse. Other engineering domains strive to mitigate the irreducible uncertainty of the physical world. AI adds uncertainty to any system. And there are many applications where increased uncertainty will lead to an increase in human suffering.
sfink 1 days ago [-]
> We are headed in a direction where written code is no longer a time sink.
Written code has never been a time sink. The actual time that software developers have spent actually writing code has always been a very low percentage of total time.
Figuring out what code to write is a bigger deal. LLMs can help with part of this. Figuring out what's wrong with written code, and figuring out how to change and fix the code, is also a big deal. LLMs can help with a smaller part of this.
> Juniors can onboard faster and more independently with LLMs,
Color me very, very skeptical of this. Juniors previously spent a lot more of their time writing code, and they don't have to do that anymore. On the other hand, that's how they became not-juniors; the feedback loop from writing code and seeing what happened as a result is the point. Skipping part of that breaks the loop. "What the computer wrote didn't work" or "what the computer wrote is too slow" or even to some extent "what the computer wrote was the wrong thing" is so much harder to learn from.
Juniors are screwed.
> LLMs have the ability to lighten cognitive loads and increase productivity,
I'm fascinated to find out where this is true and where it's false. I think it'll be very unevenly distributed. I've seen a lot of silver bullets fired and disintegrate mid-flight, and I'm very doubtful of the latest one in the form of LLMs. I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.
skhameneh 1 days ago [-]
> Figuring out what code to write is a bigger deal. LLMs can help with part of this. Figuring out what's wrong with written code, and figuring out how to change and fix the code, is also a big deal. LLMs can help with a smaller part of this.
I found exactly this is what LLMs are great at assisting with.
But, it also requires context to have guiding points for documentation. The starting context has to contain just enough overview with points to expand context as needed. Many projects lack such documentation refinement, which causes major gaps in LLM tooling (thus reducing efficacy and increasing unwanted hallucinations).
> Juniors are screwed.
Mixed, it's like saying "if you start with Python, you're going to miss lower level fundamentals" which is true in some regards. Juniors don't inherently have to know the inner workings, they get to skip a lot of the steps. It won't inherently make them worse off, but it does change the learning process a lot. I'd refute this by saying I somewhat naively wrote a tokenizer, because the >3MB ONNX tokenizer for Gemma written in JS seemed absurd. I went in not knowing what I didn't know and was able to learn what I didn't know through the process of building with an LLM. In other words, I learned hands on, at a faster pace, with less struggle. This is pretty valuable and will create more paths for juniors to learn.
Sure, we may see many lacking fundamentals, but I suppose that isn't so different from the criticism I heard when I wrote most of my first web software in PHP. I do believe we'll see a lot more Python and linguistic influenced development in the future.
> I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.
I entirely agree, in fact I think we're seeing it already. There is so much that's hyped and built around rough ideas that's glaringly inefficient. But FWIW inefficiency has less of an impact than adoption and interest. I could complain all day about the horrible design issues of languages and software that I actually like and use. I'd wager this will be no different. Thankfully, such progress in practice creates more opportunities for improvement and involvement.
sfink 1 days ago [-]
> Sure, we may see many lacking fundamentals, but I suppose that isn't so different from the criticism I heard when I wrote most of my first web software in PHP.
It's not just the fundamentals, though you're right that is an easy casualty. I also agree that LLMs can greatly help with some forms of learning -- previously, you kind of had to follow the incremental path, where you couldn't really do anything complex without have the skills that it built on, because 90% of your time and brain would be spent on getting the syntax right or whatever and so you'd lose track of the higher-level thing you were exploring. With an LLM, it's nice to be able to (temporarily) skip that learning and be able to explore different areas at will. Especially when that motivates the desire to now go back and learn the basics.
But my real fear is about the skill acquisition, or simply the thinking. We are human, we don't want to have to go through the learning stage before we start doing, and we won't if we don't have to. It's difficult, it takes effort, it requires making mistakes and being unhappy about them, unhappy enough to be motivated to learn how to not make them in the future. If we don't have to do it, we won't, even if we logically know that we'd be better off.
Especially if the expectations are raised to the point where the pressure to be "productive" makes it feel like you're wasting other people's time and your paycheck to learn anything that the LLM can do for you. We're reaching the point where it feels irresponsible to learn.
(Sometimes this is ok. I'm fairly bad at long division now, but I don't think it's holding me back. But juniors can't know what they need to know before they know it!)
skhameneh 23 hours ago [-]
> But my real fear is about the skill acquisition, or simply the thinking. We are human, we don't want to have to go through the learning stage before we start doing, and we won't if we don't have to. It's difficult, it takes effort, it requires making mistakes and being unhappy about them, unhappy enough to be motivated to learn how to not make them in the future. If we don't have to do it, we won't, even if we logically know that we'd be better off.
I've noticed the effects of this first hand from intense LLM engagement.
I relate it more to the effects of portable calculators, navigation systems, and tools like Wikipedia. I'm under the impression this is valid criticism, but we may be overly concerned because it's new and a powerful tool. There's even surveys/studies showing differences in how LLM are perceived WRT productivity between different generations.
I'm more concerned with potential loss of critical thinking skills, more than anything else. And on a related note, there have been concerns of critical thinking skills before this mass adoption of LLMs. I'm also concerned with the impact of LLMs on the quality of information. We're seeing a huge jump in quantity while some quality lacks. It bothers me when I see an LLM confidently presenting incorrect information that's seemingly trivial to validate. I've had web searches give me more incorrect information from LLM tooling at a much greater frequency than I've ever experienced before. It's even more unsettling when the LLM gives the wrong answer and the correct answer is in the description of the top result.
utyop22 23 hours ago [-]
"I'm also concerned with the impact of LLMs on the quality of information."
You have finally made an astute observation...
I have already made the assumption that use of LLMs is going to add new mounds of BS atop the mass of crap that already exists on the internet, as part of my startup thesis.
These things are not obvious in the here and now, but I try to take the view of - how would the present day look, 50 years out in the future looking backwards?
ChrisMarshallNY 12 hours ago [-]
> One of the consequences of this is that we should always consider asking the LLM the same question more than once, perhaps with some variation in the wording. Then we can compare answers, indeed perhaps ask the LLM to compare answers for us. The difference in the answers can be as useful as the answers themselves.
That’s a useful tip.
jonnycomputer 7 hours ago [-]
>All an LLM does is produce hallucinations, it’s just that we find some of them useful.
Nice callback to "All models are wrong, but some of them are useful."
jedimastert 24 hours ago [-]
Talking to LLMs reminds me of my time as a system administrator for my university's physics department. A bunch of incredibly smart, PhD level people...just not in the subject they were talking to me about, most of the time.
sudhirb 1 days ago [-]
> One of the consequences of this is that we should always consider asking the LLM the same question more than once, perhaps with some variation in the wording. Then we can compare answers, indeed perhaps ask the LLM to compare answers for us. The difference in the answers can be as useful as the answers themselves.
There was once a coding agent which achieved SOTA performance on SWE Bench Verified by "just" running the agent 5 times on each instance, scoring each attempt and picking the attempt with the highest score: https://aide.dev/blog/sota-bitter-lesson
stillsut 1 days ago [-]
> we should always consider asking the LLM the same question more than once, perhaps with some variation in the wording. Then we can compare answers,
Yup, this matches my recommended workflow exactly. Why waste time trying to turn an initially bad answer into a passable one, when you could simply re-generate (possibly with different context)
> One of the consequences of this is that we should always consider asking the LLM the same question more than once, perhaps with some variation in the wording. Then we can compare answers, indeed perhaps ask the LLM to compare answers for us. The difference in the answers can be as useful as the answers themselves.
This is what LLM "reasoning" does. More than "reasoning" in the human sense, it just reduces variance from variations in the prompt and random next token prediction.
gjsman-1000 1 days ago [-]
For my money, I used this analogy at work:
Before AI, we were trying to save money, but through a different technique: Prompting (overseas) humans.
After over a decade of trying that, we learned that had... flaws. So round 2: Prompting (smart) robots.
The job losses? This is just Offshoring 2.0; complete with everyone getting to re-learn the lessons of Offshoring 1.0.
The conclusion I reached was different, though. We learnt how to do outsourcing "properly" pretty quickly after some initial high-profile failures, which is why it has only continued to grow into such a huge industry. This also involved local talent refocusing on higher-value tasks, which is why job losses were limited. Those same lessons and outcomes of outsourcing are very relevant to "bot-sourcing'.
However, I do feel concerned that AI is gaining skill-levels much faster than the rate at which people can upskill themselves.
the_af 1 days ago [-]
> Prompting (overseas) humans [...] After over a decade of trying that, we learned that had... flaws.
I think this is a US-centric point of view, and seems (though I hope it's not!) slightly condescending to those of us not in the US.
Software engineering is more than what happens to US-based businesses and their leadership commanding hundreds or thousands of overseas humans. Offshoring in software is certainly a US concern (and to a lesser extent, other nations suffer it), but is NOT a universal problem of software engineering. Software engineering happens in multiple countries, and while the big money is in the US, that's not all there is to it.
Software engineering exists "natively" in countries other than the US, so any problems with it should probably (also) be framed without exclusive reference to the US.
gjsman-1000 1 days ago [-]
The problem isn't that there aren't high quality offshore developers - far from it. Or even high quality AI models.
The problems are inherent with outsourcing to a 3rd party and having little oversight. Oversight is, in both cases, way harder than it appears.
draw_down 1 days ago [-]
[dead]
CuriouslyC 1 days ago [-]
I'm sure the blacksmiths and weavers will find solace in that take. Their time will return!
siliconc0w 6 hours ago [-]
I started a greenfield project the other day and was excited to see how far AI has gotten for a real internal business use-case - not just a toy project. It was great at setting up the scaffolding and helping with some of the tests but overall it might have been a drain on productivity.. I had to repeatedly correct it and explicitly tell it how to do things (or hunt down examples in the codebase) or it would make up APIs or generate not particularly idiomatic code for our codebase (was writing golang fwiw).
There is really a balance between spoon-feeding it the answers vs just abandoning it - the spikiness is really apparent where it can totally nail some things but then utterly fail what should be pretty simple (i.e there was a point where I wanted to refactor two similar functions into a shared library and it just couldn't do it)
npn 18 hours ago [-]
I will believe his rant when Fowler creates any notable software.
rancar2 1 days ago [-]
My favorite quote to borrow: “Furthermore I think anyone who says they know what this future will be is talking from an inappropriate orifice.”
Towaway69 1 days ago [-]
Predicting the future isn't about being correct tomorrow, rather it’s about selling something to someone today.
An insight I picked up along the way…
koolba 1 days ago [-]
Reminds me of the classic Yogi Berra, “It's tough to make predictions, especially about the future”.
th0ma5 1 days ago [-]
To me, it is more specific to say that many futurists don't take into account the social and economic network effects of changes taking place. Many just act as if the future will continue on completely unchallenged in this current state. But if you look at someone like Kurzweil, you can see the very narrow and specific focus of a prediction, which has proved to be more informative to me as a high bar of futurism.
ares623 1 days ago [-]
> Other forms of engineering have to take into account the variability of the world.
> Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.
Those other forms of engineering have no choice due to the nature of what they are engineering.
Software engineers already have a way to introduce determinism into the systems they build! We’re going backwards!
didericis 1 days ago [-]
Part of what got me into software was this: no matter how complex or impressive the operation, with enough time and determination, you could trace each step and learn how a tap on a joystick lead to the specific pixels on a screen changing.
There’s a beautiful invitation to learn and contribute baked into a world where each command is fully deterministic and spec-ed out.
Yes, there have always been poorly documented black boxes, but I thought the goal was to minimize those.
People don’t understand how much is going to be lost if that goal is abandoned.
pton_xd 1 days ago [-]
Agreed. The beauty of programming is that you're creating a "mathematical artifact." You can always drill down and figure out exactly what is going on and what is going to happen with a given set of inputs. Now with things like concurrency that's not exactly true, but, I think the sentiment still holds.
The more practical question is though, does that matter? Maybe not.
didericis 1 days ago [-]
> The more practical question is though, does that matter?
I think it matters quite a lot.
Specifically for knowledge preservation and education.
bathtub365 1 days ago [-]
Yep, LLM’s still run on hardware with the same fundamental architecture we’ve had for years, sitting under a standard operating system. The software is written in the same languages we’ve been using for a long time. Unless there’s some seismic shift where all of these layers go away we’ll be maintaining these systems for a while.
beefnugs 23 hours ago [-]
Yes, but : not in all cases really. There are plenty of throw away, one off, experimental, trial and error, low stakes, spammy, interactions with dumb-assery, manager replacing instances that are acceptable as being quickly created and forgotten black boxes of temporary barely working trash. (not that people will limit to the proper uses)
DSingularity 1 days ago [-]
In a way this is also a mathematical artifact — after all tokens are selected through beam searching or some random sampling of likely successor tokens.
whartung 1 days ago [-]
"Computers are deterministic!"
If I wanted to plumb together badly documented black boxes, I'd have become an EE.
_mu 1 days ago [-]
Underrated comment - computers can be decidedly non-deterministic.
> People don’t understand how much is going to be lost if that goal is abandoned.
Ah but you see, imagine the shareholder value we can generate for a quarter or two in the meanwhile!
Editors note: please read that as absolutely dripping with disdain and sarcasm.
sodapopcan 1 days ago [-]
As pertaining to software development, I agree. I've been hearing accounting (online and from coworkers) of using LLMs to do deterministic stuff. And yet, instead of at least prompting once to "write a script to do X," they just keep prompting "do X" over and over again. Seems incredibly wasteful. It feels like there is this thought of "We are not making progress if we aren't getting the LLM to do everything. Having it write a script we can review and tweak is anti-progress." No one has said that outright, but it's a gut feeling (and it wouldn't surprise me if people have said this out loud).
tptacek 1 days ago [-]
This is the 2025 equivalent of the people who once wrote 2000 word blog posts about how bad it was to use "cat" instead of just shell redirection.
sodapopcan 1 days ago [-]
These are hardly equivalent. One is someone preferring one deterministic way over another. The other is more akin to arguing that it's better to ask someone to manually complete a task for you instead of caching the instructions on your computer. Now if the LLM does caching then you have more of a point, I don't have enough experience there.
1 days ago [-]
bongodongobob 1 days ago [-]
I am one of these people. For my scripting needs, it would probably take me longer to find the script I saved rather than just asking it again and getting an answer in 15 seconds. I haven't saved a smallish script in a year.
achierius 23 hours ago [-]
I used to be one of those people, then I started saving these scripts in a folder and realized just how much time it saved me. Especially for nontrivial scripts which require a lot of typing or input specification.
bongodongobob 20 hours ago [-]
This was me before LLMs. Had tons of scripts saved. I just don't see the point anymore "Take this list of users, remove from x group and reset their passwords" takes less time than trying to remember what I named that file and where it's saved. Anything that can be done in under 100 lines isn't worth saving anymore.
sodapopcan 8 hours ago [-]
Usually one would save this type of thing as a build tool task.
r_lee 16 hours ago [-]
Honestly, one of the biggest gains for me with LLMs has been bash scripting. its so great.
especially if you have some saved in the same project and then you just tell it "look at the scripts in x and use that style"
That was an interesting take, but "probabilistic" is to me different from "random". In particular other field get error tolerances, LLMs give us nothing like that.
We're introducing chaos monkeys, not just variability.
tptacek 23 hours ago [-]
Note that he's talking about the same nondeterminism in that post that we're talking about here.
makeitdouble 19 hours ago [-]
From the linked comment
> Process engineers for example have to account for human error rates. [...] Designing systems to detect these errors (which are highly probabilistic!)
> Likewise even for regular mechanical engineers, there are probabilistic variances in manufacturing tolerances.
I read them as relatively confined, thus probabilistic. When a human pushes the wrong button, an elephant isn't raining from the sky. Same way tolerances are bounded.
When requesting a JSON array from an LLM, it could as well decide this time that JSON is a mythological Greek fighter.
tptacek 19 hours ago [-]
I'm just saying, it's a cite to a thread about the implications of AI-based nondeterminism, just like this one. That's all.
ants_everywhere 1 days ago [-]
adding to this, software deals with non-determinism all the time.
For example, web requests are non-deterministic. They depend, among other things, on the state of the network. They also depend on the load of the machine serving the request.
One way to think about this is: how easy is it for you to produce byte-for-byte deterministic builds of the software you're working on? If it's not trivial there's more non-determinism than is obvious.
skydhash 1 days ago [-]
Mostly the engineering part of software is dealing with non-determinism, by avoiding it or enforcing determinism. Take something like TCP, it's all about guaranteeing the determinism that either the message is sent and received or it is not. And we have a lot of algorithms that tries to guarantee consistency of information between the elements of a system.
ares623 1 days ago [-]
But there is an underlying deterministic property in the TCP example. A message is either received within a timeout or not.
How can that be extralopated with LLMs? How does a system independently know that it's arrived at a correct answer within a timeout or not? Has the halting problem been solved?
tptacek 1 days ago [-]
You don't need to solve the halting problem in this situation, because you only need to accept a subset of valid, correct programs.
skydhash 21 hours ago [-]
> How can that be extralopated with LLMs? How does a system independently know that it's arrived at a correct answer within a timeout or not?
That's the catch 22 with LLM. You're supposed to be both the asker and the verifier. Which in practice, it's not that great. LLMs will just find the snippets of code that matches somehow and just act on it (It's the "I'm feeling Lucky" button with extra steps)
In traditional programming, coding is a notation too more than anything. You supposed to have a solution before coding, but because of how the human brain works, it's more like a blackboard, aka an helper for thinking. You write what you think is correct, verify your assumptions, then store and forget about all of it when that's true. Once in a while, you revisit the design and make it more elegant (at least you hope you're allowed to).
LLM programming, when first started, was more about a direct english to finished code translation. Now, hope has scaled down and it's more about precise specs to diff proposal. Which frankly does not improve productivity as you can either have a generator that's faster and more precise (less costly too) or you will need to read the same amount of docs to verify everything as you would need to do to code the stuff in the first place (80% of the time spent coding).
So no determinism with LLMs. The input does not have any formal aspects, and the output is randomly determined. And the domain is very large. It is like trying to find a specific grain of sand on a beach while not fully sure it's there. I suspect most people are doing the equivalent of taking a handful of sand and saying that's what they wanted all along.
tptacek 21 hours ago [-]
No? These kinds of analyses all seem to rely on the notion that the LLM-caller needs to accept whatever output the LLM provides. In practice, they discard all the outputs that don't compile, and then a further subset of the ones that don't --- those outputs that aren't instantly clear to the caller.
My intuition for the problem here is that people are fixated on the nondeterminism of the LLM itself, which is of limited importance to the actual problem domain of code generation. The LLM might spit out ancient Egyptian hieroglyphics! It's true! The LLM is completely nondeterministic. But nothing like that is ever going to get merged into `main`.
It's fine if you want to go on about how bad "vibe coding" is, with LLM-callers that don't bother to read LLM output, because they're not competent. But here we're assuming an otherwise competent developer. You can say the vibe coder is the more important phenomenon, but the viber doesn't implicate the halting problem.
skydhash 21 hours ago [-]
Valid programs are almost infinite. Context free grammars (which describe valid programs) are generative. When you're programming, you are mostly restricting the set of valid program to include only the few that satisfy the specs. Adding an extra 0 to a number is valid, but put that in the context of money transactions, it's a "hell breaks loose" situation.
SO that's why "it compiles" is worthless in a business settings. Of course it should compile. That's the bare minimum of expectations. And even "it passes the tests" is not that great. That just means you have not mess things up. So review and quality (accountability for both) is paramount, so that the proper stuff get shipped (and fixed swiftly if there was a mistake).
ants_everywhere 20 hours ago [-]
I have LLMs generate Haskell. Having the code compile means everything type checks and is not worthless. That's a huge constraint on valid programs.
skydhash 20 hours ago [-]
> Having the code compile means everything type checks and is not worthless. That's a huge constraint on valid programs.
In a business settings, what usually matter is getting something into prod and not have bug reports thrown back. And a maintainable code that is not riddled down with debts.
Compiled code is as basic as steering left and right for a F1 driver. Having the tests pass is like driving the car at slow speed and completing a lap with no other cars around. If you're struggling to do that, then you're still in the beginner phase and not a professional. The real deal is getting a change request from Product and and getting it to Production.
tptacek 21 hours ago [-]
It feels like you stopped reading before "and then a further subset of those".
Again: my claim is simply that whatever else is going on, the halting problem doesn't enter into it, because the user in this scenario isn't obligated to prove arbitrary programs. Here, I can solve the halting problem right now: "only accept branchless programs with finite numbers of instructions". Where's my Field Medal? :)
It always feels like the "LLMs are nondeterministic" people are relying on the claim that it's impossible to tell whether an arbitrary program is branchless and finite. Obviously, no, that's not true.
skydhash 20 hours ago [-]
> It feels like you stopped reading before "and then a further subset of those".
Pretty sure you've just edited to add that part.
tptacek 20 hours ago [-]
No, I'd have indicated that in my comment if I had. Sorry, I think you just missed it.
I did add the last paragraph of the comment you just responded to (the one immediately above this) about 5 seconds after I submitted it, though. Doesn't change the thread.
ants_everywhere 21 hours ago [-]
> LLMs will just find the snippets of code that matches somehow
This suggests a huge gap in your understanding of LLMs if we are to take this literally.
> LLM programming, when first started, was more about a direct english to finished code translation
There is no direct english to finished code translation. A prompt like "write me a todo app" has infinitely many codes it maps to with different tradeoffs and which appeal to different people. Even if LLMs never made any coding mistakes, there is no function that maps a statement like that to specific pieces of code unless you're making completely arbitrary choices like the axiom of choice.
So we're left with the fact that we have to specify what we want. And at that LLMs do exceptionally well.
ants_everywhere 1 days ago [-]
Right, you handle determinism by applying engineering. E.g. having fail safes, redundancies, creating robust processes, etc.
anthem2025 1 days ago [-]
It’s not trivial largely because we didn’t bother to design deterministic builds because it didn’t seem to matter. There is not much about the actual problem that makes it difficult.
ants_everywhere 1 days ago [-]
if you don't have deterministic builds then you can't tell whether the executable you're running comes from the source code you can see.
It is definitely non-trivial and large organizations spend money to try to make it happen.
anthem2025 18 hours ago [-]
Yes, I spent years working on exactly that. I personally was working with a compiler team to validate changes to eliminate non-determinism. That’s why I felt qualified to make the statement I did.
It’s non trivial because you have to back through decades of tools and fix all of them to remove non determinism because they weren’t designed with that in mind.
The hardest part was ensuring build environment uniformity but that’s a lot easier with docker and similar tooling.
1 days ago [-]
delusional 1 days ago [-]
I would rather say it like this: Very good, very hardworking engineers spent years of their lives building the machine that raised us from the non-determinism of messy physical reality. The technology that brought us perfect, replicable, and reliable math from sending electrons through rocks has been deeply underappreciated in the "software revolution".
The engineers at TSMC, Intel, Global Foundries, Samsung, and others have done us an amazing service, and we are throwing all that hard work away.
positron26 21 hours ago [-]
It's uncertainty, not non-determinism. I don't design a rocket with no idea about the chain of events. I have inscrutable uncertainty everywhere, from manufacture to flight.
AaronAPU 1 days ago [-]
It was forward when Newton discovered the beautiful simple determinism of physics.
Was it going backwards when the probabilistic nature of quantum mechanics emerged?
Viliam1234 1 days ago [-]
Two words: many-world interpretation.
More seriously, this is not a fair comparison. Adding LLM output to your source code is not analogical to quantum physics; it is analogical to letting your 5 years old child transcribe the experimentally measured values without checking and accepting that many of them will be transcribed wrong.
BlueTemplar 1 days ago [-]
Hopefully even Newton already had some awareness of "deterministic chaos" (or whatever terms they would have used back then) ?
And on the side hand, no transistors without quantum mechanics.
kookamamie 12 hours ago [-]
> We are still figuring out how to use LLMs, and it will be some time before we have a decent idea of how to use them well
As per typical, Martin is so late to the party that he projects his own lack of understanding of the subject to be the general state of matters.
People are, and have been for a while, using LLMs as very effective accelerators or augmentations of themselves.
Claude Code, is a great example of something that can easily one-shot most trivial-to-average programming tasks.
While some wait for the "bubble to burst", a lot of people are gaining significant benefits from using LLMs to boost their speed in software development.
It's not a good idea to muddy the expectations for AI by mixing AGI and LLMs into the same bucket. LLMs are already useful for many purposes, even if they didn't lead to AGI, whatever the definition is.
dionian 1 days ago [-]
> I’ve often heard, with decent reason, an LLM compared to a junior colleague. But I find LLMs are quite happy to say “all tests green”, yet when I run them, there are failures. If that was a junior engineer’s behavior, how long would it be before H.R. was involved?
A junior engineer can't write code anywhere nearly as fast. It's apples vs oranges. I can have the LLm rewrite the code 10 times until its correct and its much cheaper than hiring an obsequious jr engineer
xmprt 1 days ago [-]
Junior engineers get better and learn from their mistakes. An LLM will happily make then same mistake 10 times if you try asking it to do something similar in 3 months. In the long term, I don't think LLMs actually save you time considering the amount of extra time spent reviewing/verifying its code and fixing tech debt.
swagasaurus-rex 1 days ago [-]
If an AI can’t write the code after two attempts, I’ve never had success trying ten times
catigula 1 days ago [-]
>I’m often asked, “what is the future of programming?” Should people consider entering software development now? Will LLMs eliminate the need for junior engineers? Should senior engineers get out of the profession before it’s too late? My answer to all these questions is “I haven’t the foggiest”
I just want to point out that this answer implicitly means that, at the very least, the profession is at least questionably uncertain which isn't a good sign for people with a long future orientation such as students.
mvieira38 1 days ago [-]
(slightly off-topic)
Students shouldn't ever have become so laser-focused on a single career path anyway, and even worse than that is how colleges have become glorified trade schools in our minds. Students should focus on studying for their classes, getting their electives and extracurriculars in, getting into clubs... Then depending on which circles they end up in, they shape their career that way. The thought that getting a Computer Science major would guarantee students a spot in the tech industry was always ridiculous, because the industry was just never structured that way, it's always been hacker groups, study groups, open source, etc. bringing out the best minds
tomku 1 days ago [-]
It has never been anywhere close to certain, we just had 20 years of wild, unsustainable growth that encouraged people to cover their eyes and pretend the ride would go on forever. 20 years of telling everyone under the age of 30 that of course they should learn to code and that CS was the new medical or legal degree. 20 years of smugly acting like we are the inevitable future when we are, in fact, subject to the same ups and downs as every other career.
muldvarp 19 hours ago [-]
The answers are even kind of contradictory. If the answer to "will we ever need software engineers again in the future?" is "we don't know" then the answer to "should I spend time and money to enter software engineering?" should be "no".
rhubarbtree 16 hours ago [-]
> Software Engineering is unusual in that it works with deterministic machines. Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.
Recently some people have compared LLMs to compilers and the resulting source code to object code. This is a false analogy, because compilation is (almost always) a semantics preserving transformation. LLMs are given a natural language spec (prompt) that by definition is underspecified. And so they cannot be semantics preserving, as the semantics of their input is ambiguous.
The programmer is left with two options: (1) understanding the resulting code, repairing and rewriting it, or (2) ignoring the code and performing validation by testing.
Both of these approaches are assistive. At least in its current form, AI can only accelerate a programmer, not replace them. Lovable and similar tools rely on very informal testing, which is why they can be used by non-programmers, but they have very little chance of producing robust software of any complexity. I’ve seen people creating working web apps, but I am confident I could find plenty of strange bugs just by testing edge cases or stressing non functional qualities. The bigger issue is the bugs I can’t find because they’re not bugs a human programmer would create.
Option (1) is problematic because LLMs tend not to produce clean code designed to be human readable. A lot of the efforts coders are making is to break down tasks and try to guide the LLMs to produce good code. I have yet to see this work for anything novel and complex. For non trivial systems, reasoning and architecture are required. The hope is that a programmer can write specs well enough that LLMs can “fill in the gaps”. But whether this is a net positive once considering the work involved is still an open question. I’ve yet to see any first hand evidence that there’s a productivity gain here. It’s early days.
Option(2) is also difficult because there is a crucial factor missing in AI coding, the “generality of intent” as a human user. This is a problem, because the non-trivial bugs an AI produces are unlikely to be similar to those from a human. Those bugs are usually a failure of reasoning, but LLMs don’t reason in the same sense that humans do, so testing in the same way may not be possible. Your intuitions for where bugs lie are no longer applicable. The likely result is worse code produced more quickly, and that trade-off needs exploring.
At the moment I think AI is useful for (a) discussions around design, libraries, debugging, (b) autocomplete, (c) agent analysis of existing code where partial answers are ok and false positives acceptable (eg finding some but not all bugs). Agent coding doesn’t seem ready for production to me, not until we have much better tooling to prevent some of these problems, or AI becomes capable of proper reasoning.
ngc248 12 hours ago [-]
Exactly, there is a reason why we don't program in natural languages and use "programming languages" which use a subset of keywords to program.
Natural language is too ambiguous and not of use in serious programming.
paulcole 22 hours ago [-]
Let me guess, they’re not as good as the hype and Real Developers don’t have anything to worry about. Also something about the environment.
dwa3592 22 hours ago [-]
Stop comparing LLMs to Humans for fucks sake. Anthropomorphization is fine as long as it serves a purpose, it doesn't in this case.
insane_dreamer 1 days ago [-]
> I’ve often heard, with decent reason, an LLM compared to a junior colleague. But I find LLMs are quite happy to say “all tests green”, yet when I run them, there are failures. If that was a junior engineer’s behavior, how long would it be before H.R. was involved?
Reminds me of a recent experience when I asked CC to implement a feature. It wrote some code that struck me as potentially problematic. When I said, "why did you do X? couldn't that be problematic?" it responded with "correct; that approach is not recommended because of Y; I'll fix it". So then why did it do it in the first place? A human dev might have made the same mistake, but it wouldn't have made the mistake knowing that it was making a mistake.
epolanski 1 days ago [-]
It did so because his training, weights and context and a billion of matrix calculations led him there.
weakfish 22 hours ago [-]
I think this gets lost on far too many discussions. There is no “why” - it’s statistics, and the closest we have to a “why” is that a lot of code out there sucks
iLoveOncall 1 days ago [-]
> One of the big problems with these surveys is that they aren’t taking into account how people are using the LLMs. From what I can tell the vast majority of LLM usage is fancy auto-complete, often using co-pilot.
This is a completely wrong assumption and negates a bunch of the points of the article...
epolanski 1 days ago [-]
How is it wrong?
Do you have any hard number and data to state that tab/auto complete are less popular than agentic coding?
daveguy 22 hours ago [-]
I would expect it to be almost tautologically true. It's the easiest to implement, and the most deployed. It has to be the most used, even if a lot of people don't know they're using it.
It was the original success of LLMs -- code autocomplete. All of the agentic coding have autocomplete under the hood. Autocomplete is almost literally what LLMs do + post-processing.
23 hours ago [-]
lubujackson 1 days ago [-]
I like the idea of AI usage comes down to a measurement of "tolerances". With enough specificity, LLMs will 100% return what you want. The goal is to find the happy tolerance between "acceptable" and "I did it myself" via prompts.
manmal 1 days ago [-]
> With enough specificity, LLMs will 100% return what you want.
By now I’m sure it won’t. Even if you provide the expected code verbatim, LLMs might go on a side quest to “improve” something.
krainboltgreene 1 days ago [-]
> All major technological advances have come with economic bubbles, from canals and railroads to the internet.
Is this actually correct? I don't see any evidence for a "airflight bubble" or a "car bubble" or a "loom bubble" at the technologies' invention. Also the "canal bubble" wasn't about the technology, it was about the speculation on a series of big canals but we had been making canals for a long time. More importantly, even if it was correct, there are plenty of bubbles (if not significantly more) around things that didn't have value or tech that didn't matter.
sfink 1 days ago [-]
Apologies for arguing from first principles, but for anything that spurs a lot of activity, the only alternative to a bubble is this: people ramp up investment and activity and enthusiasm only as much as the underlying thing can handle, then gradually taper off the increase and gently level off at the equilibrium "carrying capacity" of the new technology.
Does that sound like any human, ever, to you?
(The only time there isn't a bubble is when the thing just isn't that interesting to people and so there's never a big wave of uptake in the first place.)
krainboltgreene 1 days ago [-]
If you were right then we'd have a lot more than a dozen listed bubbles on wikipedia.
That's an absurd framing for a cute quip.
BlueTemplar 1 days ago [-]
Why do you assume that Wikipedia can be trusted to provide an exhaustive answer to this question ?
tptacek 1 days ago [-]
I don't know enough about the early history of the airline industry but there was very definitely a long series of huge bubbles in the railroad industry.
daveguy 23 hours ago [-]
I would be very interested in reading about huge bubbles in railroad, airline, or any other industry. Do you happen to have references (genuinely asking; the original article didn't include any references)?
There was just a long Derek Thompson podcast with a Transcontinental Railroad scholar about this! (That's why I knew about it.)
The whole subtext of that podcast was how eerily similar the Transcontinental Railroad was to AI (as an investment/malinvestment/prediction of future trends).
antithesizer 1 days ago [-]
[dead]
mmmm2 1 days ago [-]
The Intelligent Investor by Graham talks about investors putting so much money into airlines and air freight that it became impossible to make a return. I don't know if you would call that a bubble, maybe just over exuberance.
Marazan 1 days ago [-]
The AI bubble also isn't about the technology.
antithesizer 1 days ago [-]
[dead]
crawshaw 1 days ago [-]
A bubble is asset prices systematically diverging from reasonable expectations of future cash flows. Bubbles are driven by financial speculation.
The claim in the blog post that all technology leads to speculative asset bubbles I find hard to believe. Where was the electricity bubble? The steel bubble? The pre-war aviation bubble? (The aviation bubble appeared decades later due to changes in government regulation.)
Is this an AI bubble? I genuinely don't know! There is a lot of real uncertainty about future cash flows. Uncertainty is not the foundation of a bubble.
I knew dot-com was a bubble because you could find evidence, even before it popped. (A famous case: a company held equity in a bubble asset, and that company had a market cap below the equity it held, because the bubble did not extend to second-order investments.)
cake_robot 1 days ago [-]
Just taking your first example, yes I believe you could characterize a lot of the investments in electrification as speculative bubbles even though there was underlying value that was borne out.
To me a bubble reflects a market disconnect from fundamentals - wherein prices go up steeply, with no help from the fundamentals (expected growth in base year cash flows and risk).
Indeed there is a subtle difference between a bubble and speculation of what could come of a technology. But the two are connected because the effects of technology are reflected by investors in asset prices.
insane_dreamer 1 days ago [-]
Besides electricity (answered above), there were bubbles related to steel production in its early days intertwined with the railroad bubble that drove huge investments in steel production for railroad development; when the railroad companies crashed it brought down steel as well; ultimately both survived in a more subdued form (as with the dot coms).
Dont forget the gas lighting when they make a mistake and try to act like it was you.
Scubabear68 1 days ago [-]
"Hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature".
I used to avidly read all his stuff, and I remember 20ish years ago he decided to rename Inversion of Control to Dependency Injection. In doing so, and his accompany blog, he showed he didn't actually understand it at a deep level (and hence his poor renaming).
This feels similar. I know what he's trying to say, but he's just wrong. He's trying to say the LLM is hallucinating everything, but Fowler is missing is that Hallucination in LLM terms refers to a very specific negative behavior.
ares623 1 days ago [-]
As far as an LLM is concerned, there is no difference between "negative" hallucination and a positive one. It's all just tokens and embeddings to it.
Positive hallucinations are more likely to happen nowadays, thanks to all the effort going into these systems.
Scubabear68 1 days ago [-]
This basically ruins the term “hallucination” and makes it meaningless, when the term actually describes a real phenomenon.
ares623 1 days ago [-]
That's the point. It is meaningless. When it first coined, there were already detractors to the term, that it is an incorrect description of the phenomenon. But it stuck.
anthem2025 1 days ago [-]
No, it’s just an attempt to pretend wrong outputs are some special case when really they aren’t. They aren’t imagine something that doesn’t exist they are just running the same process they do for everything else and it just didn’t work.
If you disagree then I would ask what exactly is the “specific behaviour” you’re talking about?
epolanski 1 days ago [-]
I'll take the bait.
What didn't he understand properly about inversion of control then.
Scubabear68 1 days ago [-]
That it is ultimately about object lifetime management. Dependency injection is a big part of it, but not the only thing.
nicwolff 1 days ago [-]
> I’ve often heard, with decent reason, an LLM compared to a junior colleague.
No, they're like an extremely experienced and knowledgeable senior colleague – who drinks heavily on the job. Overconfident, forgetful, sloppy, easily distracted. But you can hire so many of them, so cheaply, and they don't get mad when you fire them!
furyofantares 1 days ago [-]
These metaphors all suck. Well, ok, yours is funny. But anyway, LLMs are just very different from any human.
They are extremely shallow, even compared to a junior developer. But extremely broad, even compared to the most experienced developer. They type real fuckin fast compared to anyone on earth, but they need to be told what to do much more carefully than anyone on earth.
CuriouslyC 1 days ago [-]
I've gotten Claude Code to make CUDA kernels and all kinds of advanced stuff that there's zero percent chance a junior would pull off.
AI is like a super advanced senior wearing a blindfold. It knows almost everything, it's super fast, and it gets confused pretty quickly about things after you tell it.
skeeter2020 24 hours ago [-]
it's not a senior though, because of the amount of oversight and guidance required. I trust senior-plus human developers to do the right thing and understand why it's the right thing. For mission critical things I get another human senior to verify. There's no way I'd autonomously trust 2, 10 or any number of LLMs to do the same thing.
achierius 23 hours ago [-]
You'd be surprised at what juniors can pull off. I have seen fresh-out-of-college new grads write performant GPU kernels that are used in real world library implementations for particular architectures.
versteegen 17 hours ago [-]
Of course they can. It doesn't take that long to learn CUDA/etc and hardware details, read through some manuals and performance guides, do some small projects to solidify that knowledge. What you need is talent, and some months. That's why at university I saw plenty of students pull off amazing projects, and there's nothing eyebrow-raising in the slightest about starting a PhD a year after getting a bachelor's and writing software more sophisticated than most programmers would write in their career.
I think the programming profession overvalues experience over skill. However, when I was young I had no appreciation for what the benefits of experience are... including not writing terrible code.
CuriouslyC 23 hours ago [-]
Most of the juniors I've worked with would make numerical errors and give up/declare premature victory before getting the implementation to a robust state. I'm sure there are exceptional young folks out there though.
godelski 19 hours ago [-]
> Most of the juniors
Most senior programmers can't write CUDA kernels either. Even fewer can write ones that are any good.
computerex 23 hours ago [-]
What's your sample size?
andoando 20 hours ago [-]
And today I asked Claude code to carefully look at and follow the structure of my tests and write some more, and after 5-10 mins it completely ignored it, wrote one class with all the helper methods and a bunch of compilation errors from fields that arent even in my model.
I figured there's probably a ton more logical issues and deleted it immediately
QuadmasterXLII 23 hours ago [-]
have you ever asked a junior developer to write a cuda kernel?
CuriouslyC 23 hours ago [-]
I've asked juniors to write easier things without success, and I applied the transitive property.
22 hours ago [-]
tjwebbnorfolk 21 hours ago [-]
> but they need to be told what to do much more carefully than anyone on earth.
have you ever managed an offshore team. holy cow
PessimalDecimal 19 hours ago [-]
"Offshore team." "Holy cow." I see what you did there.
cortesoft 19 hours ago [-]
And they are incessantly cheerful.
I asked Claude to design me a UI, and it made a lovely one... but I wanted a web ui. It very happily through away all its work and made a brand new web UI.
I can't imagine any employee being that quick to just move on after something like that.
root_axis 20 hours ago [-]
I think we had the analogy right with "fancy autocomplete". Sometimes, the completions are excellent and do exactly what we want, other times the completions fail to match our expectations and need human attention. It's a tool.
Guthur 1 days ago [-]
I just spent a good 2 hours trying to debug a SM6 Vulkan issue with unreal engine using an LLM, it had got me to good state but UE kept falling to load a project, it transpired that the specific error message would provide a fix as the top Google result, which I found when I eventually decided to look for myself.
LLM did help a lot to get some busy work out of the way, but it's difficult to know when you need to jump out of the LLM loop and go old skool.
jack_pp 1 days ago [-]
Fwiw I think the ratio of times I needed to go to google for a solution instead of an LLM is like 20:1 for me so your mileage may vary. Depends a lot on the specific niche you're working in.
Unrelated to software but recently I wanted to revive an old dumbphone I haven't used since 2014 and apparently I had it password protected and forgot the password and wanted to factory reset it. I found the exact model of the phone and google had only content farm articles that didn't help me at all but Gemini gave me the perfect solution first try. I went to google first because I had no faith in Gemini since to me it seemed like a pretty obscure question but guess I was wrong.
Guthur 1 days ago [-]
In the interest of full disclosure my setup is quite esoteric for unreal dev. Linux and nixos no less. To be honest I'd probably have given up on nixos long ago without LLM support. It's actually really quite handy to be able to share a declarative specification of my environment.
TexanFeller 1 days ago [-]
Google search has been enshittified. Kagi is where you get real search results now.
platevoltage 19 hours ago [-]
Unsure why this is downvoted. "Google search has been enshittified" should be a common sentiment here.
fencepost 1 days ago [-]
No, they're like a completely nontechnical marketing person who has a big library of papers on related subjects and who's been asked to generate a whitepaper by pulling phrases. What comes out will probably have proper grammar and seem perfectly reasonable to another person with no knowledge of the field, but if actually read by someone knowledgeable may be complete gibberish taken as a whole.
Individual sentences and paragraphs may mostly work, but it's an edifice built on sand out of poorly constructed bricks plus mortar with the wrong proportions (or entirely wrong ingredients).
LLM output is "truthy" - it looks like it might be true, and sometimes it will even be accurate (see also, "stopped clock") but depending on it is foolish because what's generating the output doesn't actually understand what it's putting out - it's just generating output that looks like the kind of thing you've requested.
Jcampuzano2 1 days ago [-]
Overconfident, but also easy to sway their opinion.
They follow directions for maybe an hour and then go off and fix random shit because they forgot what their main task was.
They'll tell you to your face how great your ideas were, and that you're absolutely right about something, then go implement it completely incorrectly.
They add comments to literally everything even when you ask them to stop. But they also ignore said comments sometimes.
Maybe they are kinda like us lol.
seadan83 21 hours ago [-]
Like a person that has memorized every single answer in the world to every single question ever asked - but is completely dead behind the eyes.
AnotherGoodName 23 hours ago [-]
My analogy is a cursed monkey paw with unlimited wishes. It's actually really really powerful but you have to be careful of leaving any possible ambiguity in your wishes.
amelius 1 days ago [-]
> who drinks heavily on the job
Yes, that's why you should add "... and answer in the style of a drunkard" to every prompt. Makes it easier to not forget what you are dealing with.
vkou 1 days ago [-]
Experienced and knowledgeable and they also believe in the technical equivalent of flat-Earthism in many, many non-trivial corners.
And if you push back on that insanity, they'll smile and nod and agree with you and in the next sentence, go right back to pushing that nonsense.
sho_hn 1 days ago [-]
One good yardstick, if one has to anthropomorphize, is that LLMs know and believe what's popular. If you ask it to do something that popular developer opinion gets right, it will do fine.
Ask it for things that many people get wrong or just do badly, or can be mistakenly likened to a popular thing in a way that produces a wrong result, and it'll often err.
The trick is having an awareness of what correct solutions are prevalent in training data and what the bulk of accessible code used for training probably doesn't have many examples of. And this experience is hard to substitute for.
Juniors therefore use LLMs in a bumbling fashion and are productive either by sheer luck, or because they're more likely to ask for common things and so stay in a lane with the model.
A senior developer who develops a good intuition for when the tool is worth using and when not can be really efficient. Some senior developers however either overestimate or underestimate the tool based on wrong expectations and become really inefficient with them.
DrewADesign 19 hours ago [-]
“Here’s the code:” looks totally plausible but hallucinated libFakeCall.
“libFakeCall doesn’t exist. Use libRealCall instead of libFakeCall.”
“You’re absolutely correct. I apologize for blah blah blah blah. Here’s the updated code with libRealCall instead. :[…]”
“You just replaced the libFakeCall reference with libRealCall but didn’t update the calls themselves. Re-write it and cite the docs. “
“Sorry about the confusion! I’ve found these calls in the libRealCall docs. Here’s the new code and links to the docs.”
“That’s the same code but with links to the libRealCall docs landing page.”
“You’re absolutely correct.” It appears that these calls belong to another library with that functionality:” looks totally plausible but hallucinated libFakeCall.
vkou 19 hours ago [-]
You forgot to include it's best excuse for why the documentation it cites is hallucinated: "The links may be broken/website is down".
For all the hot air I hear about the user having to give the system the proper context to give you good answers - does anyone claim to have a solution for dealing with such a belligerent approach to bullshit?
They are by no means useless, but once they fall into that hole, there's no further value in interrogating them.
And constantly microdosing, sometimes a bit too much.
DrewADesign 19 hours ago [-]
And sometimes it seems like a macro-dose huffing xylene-based glue.
amelius 1 days ago [-]
> and they don't get mad when you fire them!
No, typically __you__ are mad when you fire them ...
Arubis 22 hours ago [-]
If you’re making hire/fire decisions while emotional, you’re doing it wrong.
kurtis_reed 22 hours ago [-]
Whoosh
FlyingSnake 19 hours ago [-]
“You’re absolutely right!”
worik 1 days ago [-]
Yes. Funny
But it is nothing like a wetware colleague. It is a machine.
revskill 1 days ago [-]
[dead]
austinallegro 1 days ago [-]
[flagged]
anthem2025 1 days ago [-]
It’s funny how people acknowledge the railroads and the similarity to AI, but then jump back to comparing it to the internet when it comes to drawing conclusions.
The internet build out left massive amounts of useful infrastructure.
The railroads left us with lots of railroads that fell into disuse and eventually left us with a complete joke of a railway system. Made a few people so rich we started calling them robber barons and talking the gilded age.
Are we going to continue to use the 60 billion dollar data centers in Louisiana when the bubble bursts? Is it valuable infrastructure or just a waste of money that gets written off?
senko 1 days ago [-]
> Are we going to continue to use the 60 billion dollar data centers in Louisiana when the bubble burst
Why not?
Tracks in the middle of nowhere are useless.
Data centers in the middle of nowhere are useful, as long as energy, cooling and uplink are cost-effective.
(Not saying Louisiana is in the middle of nowhere!)
anthem2025 1 days ago [-]
I don’t disagree, but was Louisiana chosen because it offers cheap energy/cooling/infra or because the government doesn’t bother to ask questions about the impact to thinks like energy and water usage (which is why we are seeing more and more reports of residents upset about pollution, water usage, and energy costs).
tricky_theclown 1 days ago [-]
.
atleastoptimal 1 days ago [-]
There are only 3 things that I have strong empirical evidence for with respect to LLMs
1. Routinely some task or domain of work that some expert claims that LLM’s will able to do, LLM’s start being able to reliably perform that task within 6 months to a year, if they haven’t already
2. Whenever AI gets better, people move the goalposts regarding what “intelligence” counts as
3. Still, LLM’s reveal that there is an element to intelligence that is not orthogonal to the ability to do well on tests or benchmarks
hugedickfounder 1 days ago [-]
[flagged]
archeantus 23 hours ago [-]
If all you’ve done with AI is just use it for autocomplete, you’re missing out big time. I built a slick react app using Lovable yesterday and then created a Node BE using Claude Code today. I told Claude Code to look through the FE code to understand the requirements and purpose of the site, and to build a detailed plan (including proposed DB schemas) for a system that could support that functionality.
It generated a thousand line file with a robust breakdown of everything that needed to be done and at my command it did it. We went module by module and I made sure that each module
Had comprehensive unit test coverage and that the repo built well as we went. After a few hours of back and forth we made 9 modules, 60+ APIs across 10 different tables, and hundreds of unit tests that all are passing.
Does that mean that I’m all done and ready to deploy to prod? Unlikely. But it does mean that I got a ton of boilerplate stuff put into place really quickly and that I’m eight hours into a project that would have taken at least a month before.
Once the BE was done I had it generate extensive documentation for the agent that would handle the FE integration as a sort of instruction guide - in case we need it. As issues and bugs arise during integration (they will!) the model has everything it needs to keep on track and finish the job it set out to do.
What a time to be alive!
computerex 23 hours ago [-]
I feel exactly the same way.
Even this post by Martin Fowler shows he's an aging dinosaur stuck in denial.
> I’ve often heard, with decent reason, an LLM compared to a junior colleague. But I find LLMs are quite happy to say “all tests green”, yet when I run them, there are failures. If that was a junior engineer’s behavior, how long would it be before H.R. was involved?
I don't know what "LLM's" he's using but I just simply don't get hallucinations like that with cursor or claude code.
He ends with this:
> LLMs create a huge increase in the attack surface of software systems. Simon Willison described the The Lethal Trifecta for AI agents: an agent that combines access to your private data, exposure to untrusted content, and a way to externally communicate (“exfiltration”). That “untrusted content” can come in all sorts of ways, ask it to read a web page, and an attacker can easily put instructions on the website in 1pt white-on-white font to trick the gullible LLM to obtain that private data.
Not sure why he is re-iterating well known prompt injection vulnerability, passing it off as a general weakness of LLM's that applies to all LLM use when that's not the reality.
Bridged7756 8 hours ago [-]
60+ APIs? Odd way to word. Did you mean endpoints?
What's the impressive thing here? No one said AI can't do boilerplate. Especially if you're doing run of the mill crud and starting from a clean slate, in fact, that's probably one of its only useful usages in my opinion. You still need to understand your system when maintaining or adding features to it, do a code review, understand the architecture, ensure its architecture allows for pivots in design/functionality, etc.
rester324 14 hours ago [-]
Can you share the code with us? I'd love to see that
daveguy 23 hours ago [-]
Okay, but LLMs will produce a thousand lines of code for Hello World.
Plus, there is no telling how many bugs you have in that code.
This is an example of my least favorite style of feigned insight: redefining a term into meaninglessness just so you can say something that sounds different while not actually saying anything new.
Yes, if you redefine "hallucination" from "produce output containing detailed information despite that information not being grounded in external reality, in a manner distantly analogous to a human reporting sense data produced by a literal hallucination rather than the external inputs that are presumed normally to ground sense data" to "produce output", its true that all LLMs do is "hallucinate", and that "hallucinating" is not a undesirable behavior.
But you haven't said anything new about the thing that was called "hallucination" by everyone else, or about the thing--LLM output in general--that you have called "hallucination". Everyone already knew that producing output wasn't undesirable. You've just taken the label conventionally attached to a bad behavior, attached it to a broader category that includes all behavior, and used the power of equivocation to make something that sounds novel without saying anything new.
You're not meant to take it literally.
I think we call it "hallucinating" when the machine does this in an un-human-like way.
LLMs just spew words. It just so happens that human beings can decode them into something related, useful, and meaningful surprisingly often.
Might even be a useful case of pareidolia (a term I dislike, because a world without any pattern matching whatsoever would not necessarily be "better").
We can trace which neurons activate for a face recognition model and see that a certain neuron does light up when it sees a face. The correct features are active for the sentence “the word parrots is plural”.
If you stop assuming the LLMs have no internal representations of the data, then everything makes a lot more sense! The LLM is FORCED to answer questions… just like a high school student filling out the SAT is forced to answer questions.
If a high school student fills out the wrong answer on the SAT, is that a hallucination?
Hallucinations are expected behaviors if you RLHF a high schooler to always guess an answer on the SAT because that’ll get them the highest score. This applies to ML model reward functions as well.
Seeing which parts of a model (they aren't neurons) light up when shown a face doesn't necessarily indicate understanding.
The model is a complex web of numbers representing a massively compressed data space. It could easily be that what you see light up when shown a face only indicates what specific part of the model is housing the compresses data related to recognizing specific facial features.
I thought models were composed of neural network layers, among other things. Are these data structures called something different?
I was getting at the idea that a neuron is a very specific feature of a biological brain, regardless of what AI researchers may call their hardware they aren't made of neurons.
2. You are about 5 years behind in terms of the research. Look into hierarchical feature representation and how MLP neurons work. (Or even in older CNNs and RNNs etc). And I'm willingly using the word "neuron" instead of "feature" here because while I know "feature" is more correct in general, there are definitely small toy models where you can pinpoint an individual neuron to represent a feature such as a face.
Using the term neuron there and meaning it literally is like calling an airplane a bird. I get that the colloquial use exists, but no one thinks they are literal birds.
We do know what areas of a human brain often light up in response to various conditions. We don't know why that is though, or how it actually works. Maybe more importantly for LLMs, we don't know how human memory works, where it is stored, how to recognize or even define consciousness, etc.
Seeing what areas of a brain or an LLM light up can be interesting, but I'd be very cautious trying to read much into it.
"Wrong answers on SAT" is also the leading hypothesis on why o3 was such an outlier - far more prone to hallucinations than either prior or following OpenAI models.
On SAT, giving a random answer is right 20% of the time - more if you ruled at least one obviously wrong answer out. Saying "I don't know" and not answering is right 0% of the time. So if you RLVR on SAT-type tests, where any answer is better than no answer, you encourage hallucinations. Hallucination avoidance in LLMs is a fragile capability, and OpenAI has probably fried its o3 with too much careless RLVR.
But another cause of hallucinations is limited self-awareness of modern LLMs. And I mean "self-awareness" in a very mechanical, no-nonsense fashion: "has information about itself and its own capabilities". LLMs have very little of that.
Humans have some awareness of the limits of their knowledge - not at all perfect, but at least there's something. LLMs get much, much less of that. LLMs learn the bulk of their knowledge from pre-training data, but pre-training doesn't teach them a lot about where the limits of their knowledge lie.
Until you said that I didn’t realize just how much humans “hallucinate“ in just the same ways that AI does. I have a friend who is fluent in Spanish, a native speaker, but got a pretty weak grammar education when he was in high school. Also, he got no education at all in Critical thinking, at least not formally. So this guy is really, really fluent in his native language, but can often have a very difficult time explaining why he uses whatever grammar he uses. I think the whole world is realizing how little our brains can correctly explain and identify the grammar we use flawlessly.
He helps me to improve my Spanish a lot, he can correct me with 100% accuracy of course, but I’ve noticed on many occasions, including this week, that when I ask a question about why he said something one way or another in Spanish, he will just make up some grammar rule that doesn’t actually exist, and is in fact not true.
He said something like “you say it this way when you really know the person and you’re saying that the other way when it’s more formal“, but I think really it was just a slangy way to mis-stress something and it didn’t have to do with familiar/formal or not. I’ve learned not to challenge him on any of these grammar rules that he makes up, because he will dig his heels in, and I’ve learned just to ignore him because he won’t have remembered this made up grammar rule in a week anyway.
This really feels like a very tight analogy with what my LLM does to me every day, except that when I challenge the LLM it will profusely apologize and declare itself incorrect even if it had been correct after all. Maybe LLMs are a little bit too humble.
I imagine this is a very natural tendency in humans, and I imagine I do it much more than I’m aware of. So how do humans use self-awareness to reduce the odds of this happening?
I think we mostly get trained in higher education to not trust the first thought that comes into our head, even if it feels self consistent and correct. We eventually learn to say “I don’t know” even if it’s about something that we are very, very good at.
There’s such a thing in Spanish and in French. Formal and informal settings is reflected in the language. French even distinguishes between three different level of vocabulary (one for very informal settings (close friends), one for business and daily interactions, and one for very formal settings. It’s all cultural.
It's bold to use the term "understanding" in this context. You ask it something about a topic, it gives an answer like someone who understands the topic. You change the prompt slightly, where a human who understands the topic would still give the right response trivially, the LLM outputs an answer that is both wrong/irrelevant and unpredicably and non-humanly wrong in a way that no human who exhibited understanding with the first answer could be predicted to answer the second question in the same bizarrw manner as the LLM.
The fact that the LLM can be shown to have some sort of internal representation does not necessarily mean that we should call this "understanding" in any practical sense when discussing these matters. I think it's counterproductive in getting to the heart of the matter.
I think this should make you question whether the prompt change was really as trivial as you imply. Providing an example of this would elucidate.
0. https://arxiv.org/pdf/2310.11324
(For sure, arguments can be made that the relationship betweens the terms that the model has encoded could maybe be called "model thinking" or "model understanding" but it's not how people work.)
> And much of it is unconscious.
That does not mean we're "picking words statistically likely to get us what we want," it means "our brains do a lot of work subconsciously." Nothing more.
> You are focused on your objective, and your brain fills in the gaps.
This is a total contradiction of what you said at the start. LLMs are not focused on an objective, they are using very complex statistical algorithms to determine output strings. There is no objective to an LLM's output.
This sentence is inherently contradictory. If LLM output is meaningful more than chance, then it's literally not "just spewing words". Therefore whatever model it is using to generate that meaning must contain some semantic content, even if it's not semantic content that's as rich as humans are capable of. The "stochastic parrot" term is thus silly.
https://en.wikipedia.org/wiki/Predictive_coding
"LLMs produce either truth or they produce hallucinations."
Claims worded like that give the impression that if we can just reduce/eliminate the hallucinations, all that will remain will be the truth.
But that's not the case. What is the case is that all the output is the same thing, hallucination, and that some of those hallucinations just so happen to reflect reality (or expectations) so appears to embody truth.
It's like rolling a die and wanting one to come up, and when it does saying "the die knew what I wanted".
An infinite number of monkeys typing on an infinite number of typewriters will eventually produce the script for Hamlet.
That doesn't mean the monkeys know about Shakespeare, or Hamlet, or even words, or even what they are doing being chained to typewriters.
We've found a way to optimize the infinite typing monkeys to output something that passes for Hamlet much sooner than infinity.
Computers use formal logic all the time to output truth, LLMs, however, do not.
If a human being talked confidently about something that they were just making up out of thin air by synthesizing based (consciously or unconsciously) on other information they know you wouldn’t call it “hallucination”: you’d call it “bullshit”.
And, honestly, “bullshit” is a much more helpful way of thinking about this behaviour because it somewhat nullifies the arguments people make against the use of LLMs due to this behaviour. Fundamentally, if you don’t want to work with LLMs because they sometimes “bullshit”, are you planning on no longer working with human beings as well?
It doesn’t hold up.
But, more than that, going back to your point: it’s much harder to redefine the term “bullshit” to mean something different to the common understanding.
All of that said, I don’t mind the piece and, honestly, the “I haven’t the foggiest” comment about the future of software development as a career is well made. I guess it’s just a somewhat useful collection of scattered thoughts on LLMs and, as such, an example of a piece where the “thoughts on” title fits well. I don’t think the author is trying to be particularly authoritative.
I have a standard canned rant about "confabulation" is a much better metaphor, but it wasn't the point I was focussed on here.
> Fundamentally, if you don’t want to work with LLMs because they sometimes “bullshit”, are you planning on no longer working with human beings as well?
I will very much not voluntarily rely on a human for particular tasks if that human has demonstrated a pattern of bullshitting me when given that kind of task, yes, especially if, on top of the opportunity cost inherent in relying on a person for a particular task, I am also required to compensate them—e.g., financially—for their notional attention to the task.
I'd recommend you watch https://www.youtube.com/watch?v=u9CE6a5t59Y&t=2134s&pp=ygUYc... which covers the topic of bullshit. I don't think we can call LLM output "bullshit" because someone spewing bullshit has to not care about whether what they're saying is true or false. LLMs don't "care" about anything because they're not human. It's better to give it an alternative term to differentiate it from the human behaviour, even if the observed output is recognisable.
The difference is that the LLM is _only that part_. In producing language as a human, I filter these words, I go back and think of new phrasings, I iterate --- in writing consciously, in speech unconsciously. So rather than a sequence it is a scattered tree filled with rhetorical dead ends, pruned through interaction with my world-model and other intellectual faculties. You can pull on one thread of words as though it were fully-formed already as a kind of Surrealist exercise (like a one-person cadavre exquis), and the result feels similar to an LLM with the temperature turned up too high.
But if nothing else, this highlights to me how easily the process of word generation may be decoupled from meaning. And it serves to explain another kind of common human experience, which feels terribly similar to the phenomenon of LLM hallucination: the "word vomit" of social anxiety. In this process it suddenly becomes less important that the words you produce are anchored to truth, and instead the language-system becomes tuned to produce any socially plausible output at all. That seems to me to be the most apt analogy.
It's a good point.
But there is, except when you redefine "hallucination" so there isn't. And, when you retain the definition where there is a difference, you find there are techniques by which you can reduce hallucinations, which is important and useful. Changing the definition to eliminate the distinction is actively harmful to understanding and productive use of LLMs, for the benefit of making what superficially seems like an insightful comment.
You mean like every evangelist says AI changes everything every 5 min...When in reality what they mean is neural nets statistical code generators are getting pretty good? because that is almost all AI there is at the moment?
Just to make the current AI sound bigger than it is?
Thanks for listening to my Ted Talk.
We are producing more code, but quality is definitely taking a hit now that no-one is able to keep up.
So instead of slowly inching towards the result we are getting 90% there in no time, and then spending lots and lots of time on getting to know the code and fixing and fine-tuning everything.
Maybe we ARE faster than before, but it wouldn't surprise me if the two approaches are closer than what one might think.
What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.
This is painfully similar to what happens when a team grows from 3 developers to 10 developers. All of sudden, there's a vast pile of coding being written, you've never seen 75% of it, your architectural coherence is down, and you're relying a lot more on policy and CI.
Where LLM's differ is that you can't meaningfully mentor them, and you can't let them go after the 50th time they try turn off the type checker, or delete the unit tests to hide bugs.
Probably, the most effective way to use LLMs is to make the person driving the LLM 100% responsible for the consequences. Which would mean actually knowing the code that gets generated. But that's going to be complicated to ensure.
Boilerplate sucks to review. You just see a big mass of code and can't fully make sense of it when reviewing. Also, Github sucks for reviewing PRs with too many lines.
So junior/mid devs are just churning boilerplate-rich code and don't really learn.
The only outcome here is code quality is gonna go down very very fast.
And at the same time it's borderline impossible proven by the fact that people can't do it, even though everyone understands and roughly everyone agrees on how it works.
So the actual "trick" turns out to be understanding what keeps people from doing the necessary things that they all agree on are important – like treating code reviews very seriously. And getting that part right turns out to be fairly hard.
7. It is easier to write an incorrect program than understand a correct one.
Link: http://cs.yale.edu/homes/perlis-alan/quotes.html
And its going to get worse! So please explain to me how in the net, you are going to be better off? You're not.
I think most people haven't taken a decent economics class and don't deeply understand the notion of trade offs and the fact there is no free lunch.
Yes there are trade offs, but at this point if you haven’t found a way to significantly amplify and scale yourself using llms, and your plan is to instead pretend that they are somehow not useful, that uphill battle can only last so long. The genie is out of the bag. Adapt to the times or you will be left behind. That’s just what I think.
Also telling someone to "adapt to the times" is a bit silly. If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.
A LOT of parallels with crypto, which is still trying to find its killer app 16 years later.
Citation needed. In my circles, Senior engineer are not using them a lot, or in very specific use cases. My company is blocking LLMs use apart from a few pilots (which I am part of, and while claude code is cool, its effectiveness on a 10-year old distributed codebase is pretty low).
You can't make sweeping statements like this, software engineering is a large field.
And I use claude code for my personal projects, I think it's really cool. But the code quality is still not there.
that goes both ways
Go ahead, convince me. Please describe clearly and concisely in one or two sentences the clear economic value/advantage of LLMs.
People love stuff that makes them feel like they are doing less work. Cognitive biases distort reality and rational thinking, we know this already through behavioural economics.
This is making me even more skeptical of your claims. Individual metrics are often very poor at tracking reality.
Technology moves forward and productivity improves for those that move with it.
1) CASE tools (and UML driven development)
2) Wizard driven code.
3) Distributed objects
4) Microservices
These all really were the hot thing with massive pressure to adopt them just like now. The Microsoft demos of Access wizards generating a complete solution for your business had that same wow feeling as LLM code. That's not to say that LLM code won't succeed but it is to say that this statement is definitely false:
> Technology moves forward and productivity improves for those that move with it.
That's not all it does but I think it's one of the more important fundamentals.
It does not, technology regresses just as often and linear deterministic progress is just a myth to begin with. There is no guarantee for technology to move forward and always make things better.
There are plenty of examples to be made where technology has made certain things worse.
Just by having lived longer, they might've had the chance to develop some intuition about the true cost of disruption, and about how whatever Google's doing is not a free lunch. Of course, neither them, nor you (nor I for that matter) had been taught the conceptual tools to analyze some workings of some Ivy League whiz kinds that have been assigned to be "eating the world" this generation.
Instead we've been incentivized to teach ourselves how to be motivated by post-hoc rationalizations. And ones we have to produce at our own expense too. Yummy.
Didn't Saint Google end up enshittifying people's very idea of how much "all of the world's knowledge" is; gatekeeping it in terms of breadth, depth and availability to however much of it makes AdSense. Which is already a whole lot of new useful stuff at your fingertips, sure. But when they said "organizing all of the world's knowledge" were they making any claims to the representativeness of the selection? No, they made the sure bet that it's not something the user would measure.
In fact, with this overwhelming amount of convincing non-experientially-backed knowledge being made available to everyone - not to mention the whole mass surveillance thing lol (smile, their AI will remember you forever) - what happens first and foremost is the individual becomes eminently marketable-to, way more deeply than over Teletext. Thinking they're able to independently make sense of all the available information, but instead falling prey to the most appealing narrative, not unlike a day trader getting a haircut on market day. And then one has to deal with even more people whose life is something someone sold to them, a race to the bottom in the commoditized activity (in the case of AI: language-based meaning-making).
But you didn't warn your parents about any of that or sit down and have a conversation about where it means things are headed. (For that matter, neither did they, even though presumably they've had their lives altered by the technological revolutions of their own day.) Instead, here you find yourself stepping in for that conversation to not happen among the public, either! "B-but it's obvious! G-get with it or get left behind!" So kind of you to advise me. Thankfully it's just what someone's paid for you to think. And that someone probably felt very productive paying big money for making people think the correct things, too, but opinions don't actually produce things do they? Even the ones that don't cost money to hold.
So if it's not about the productivity but about the obtaining of money to live, why not go extract that value from where it is, instead of breathing its informational exhaust? Oh, just because, figuratively speaking, it's always the banks have AIs that don't balk at "how to rob the bank"; and it's always we that don't. Figures, no? But they don't let you in the vault for being part of the firewall.
Would you say the same thing ("If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.") about the internet?
Thing's called a self-fulfilling prophecy. Next level to a MLM scheme: total bootstrap. Throwing shit at things in innate primate activity, use money to shift people's attention to a given thing for long enough and eventually they'll throw enough shit at the wall for something to stick. At which point it becomes something able to roll along with the market cycles.
Instead of this crypto-esque hand waving, maybe you can answer me now?
You have to inspect the output of LLMs much more carefully to make sure they are doing what you want.
Actually I am not a software engineer for a living so I have zero vested interest or bias lmao. That said, I studied Comp Sci at a top institution and really loved defining algorithms and had no interest in actual coding as it didn't give my brain the variety it needs.
If you are employed as a software engineer, I am probably more open to realizing the problems that only become obvious to you later on.
Fast feedback is also great for improving release processes; when you have a feedback loop with Product, UX, Engineering, Security etc, being able to front load some % of a deliverable can help you make better decisions that may end up being a time saver net/net.
That isn't clear given the fact that LLMs and, more importantly, LLM programming environments that manage context better are still improving.
People are honestly just drunk on this thing at this point. The sunken cost fallacy has people pushing on (ie. spending more time) when LLMs aren't getting it right. People are happy to trade convenience for everything else, just look at junk food where people trade in flavour and their health. And ultimately we are in a time when nobody is building for the future, it's all get rich quick schemes: squeeze then get out before anyone asks why the river ran dry. LLMs are like the perfect drug for our current society.
Just look at how technology has helped us in the past decades. Instead of launching us towards some kind of Star Trek utopia, most people now just work more for less!
The proof is in the pudding. The work I do takes me half as long as it used to and is just as high in quality, even though I manage and carefully curate the output.
But in that study that came out a few weeks ago where they actually looked at time saved, every single developer overestimated their time saved. To the point where even the ones who lost time thought they saved time.
LLMs are very good at making you feel like you’re saving time even when you aren’t. That doesn’t mean they can’t be a net productivity benefit.
But I’d be very very very surprised if you have real hard data to back up your feelings about your work taking you half as long and being equal quality.
I’m not surprised by the contents. I had the same feeling; I made some attempts at using LLMs for coding prior to CC, and with rare exceptions it never saved me any time.
CC changed that situation hugely, at least in my subjective view. It’s of course possible that it’s not as good as I feel it is, but I would at least want a new study.
The key thing to look at is that even the participants that did objectively save time, overestimated time saved by a huge amount.
But also you’re always likely to be at least one model ahead of any studies that come out.
Is there a study demonstrating Claude Code improves productivity?
As for catching bugs, maybe, but I feel like it's pot luck. Sometimes it can find something, sometimes it's just complete rubbish. Sometimes worth giving it a spin but still not convinced it's saving that much. Then again I don't spend much time hunting down bugs in unfamiliar code bases.
What are people doing this quote on quote, time that they have gained back? Working on new projects? Ok can you show me the impact on the financials (show me the money)? And then I usually get dead silence. And before someone mentions the layoffs - lmao get real. Its offshoring 2.0 so that the large firms can increase their internal equity to keep funding this effort.
Most people are terrible at giving true informed opinions - they never dig deep enough to determine if what they are saying is proper.
Best practices in software development for forever have been to verify everything; CI, code reviews, unit tests, linters, etc. I'd argue that with LLM generated code, a software developer's job and/or that of an organization as a whole has shifted even more towards reviewing and verification.
If quality is taking a hit you need to stop; how important is quality to you? How do you define quality in your organization? And what steps do you take to ensure and improve quality before merging LLM generated code? Remember that you're still the boss and there is no excuse for merging substandard code.
Scenario B, you add 40 with an LLM, that look good on paper but only cover 6 of the original ones. Besides, who's going to pay careful attention to a PR with 40.
"Must be so thorough!".
In any case poor work quality is a failure of tech leadership and culture, it's not AI's fault.
But also, blameless culture is IMO important in software development. If a bug ends up in production, whose fault is it? The developer that wrote the code? The LLM that generated it? The reviewer that approved it? The product owner that decided a feature should be built? The tester that missed the bug? The engineering organization that has a gap in their CI?
As with the Therac-25 incident, it's never one cause: https://news.ycombinator.com/item?id=45036294
This isn't what AI enthusiasts say about AI though, they only bring that up when they get defensive but then go around and say it will totally replace software engineers and is not just a tool.
The tools used do not hold responsibilities, they are tools.
You gdt mad, swear at it, maybe even throw it to the wall on a git of rage but, at the end of the day, deep inside you still know you screwed.
I wouldn’t throw it against a wall because I’m not a psychopath, but I would demand my money back.
Do you think by the time GPT-9 comes, we'll say "That's it, AI is a failure, we'll just stop using it!"
Or do you speak in metaphorical/bigger picture/"butlerian jihad" terms?
Maybe you know one.
At any rate, it depends on the crash. The NTSB will investigate and release findings that very well may assign fault to the design of the plane and/or pilot or even tools the pilot was using, and will make recommendations about how to avoid a similar crash in the future, which could include discontinuing the use of certain tools.
What a great way of framing it. I've been trying to explain this to people, but this is a succinct version of what I was stumbling to convey.
https://jstrieb.github.io/posts/llm-thespians/
That being said, there are methods to train LLMs against hallucinations, and they do improve hallucination-avoidance. But anti-hallucination capabilities are fragile and do not fully generalize. There's no (known) way to train full awareness of its own capabilities into an LLM.
We need to make these models much much better, but it’s going to be quite difficult to reduce the levels to even human levels. And the BS will always be there with us. I suppose BS is the natural side effect of any complex system, artificial or biological, that tries to navigate the problem space of reality and speak on it. These systems, sometimes called “minds”, are going to produce things that sound right but just are not true.
"Critical thinking" and "scientific method" feel quite similar to the "let's think step by step" prompt for the early LLMs. More elaborate directions, compensating for the more subtle flaws of a more capable mind.
Also, unless I am mistaken, RLVF changes the training to make LLMs less likely to hallucinate, but in no way does it make hallucination impossible. Under the hood, the models still work the same way (after training), and the analogy still applies, no?
Under the hood we have billions of parameters that defy any simple analogies.
Operations of a network are shaped by human data. But the structure of the network is not like the human brain. So, we have something that is human-like in some ways, but deviates from humans in ways, which are unlikely to be like anything we can observe in humans (and use as a basis for analogy).
Thus, why it's a good metaphor for the behavior of LLMs.
[1] https://en.wikipedia.org/wiki/On_Bullshit
But i think this 'being wrong' is kind of confusing when talking about LLMs (in contrast to systems/scientific modelling). In what they model (language), the current LLMs are really good and acurate, except for example the occasional chinese character in the middle of a sentence.
But what we mean by LLMs 'being wrong' most of the time is being factually wrong in answering a question, that is expressed as language. That's a layer on top of what the model is designed to model.
EDITS:
So saying 'the model is wrong' when it's factually wrong above the language level isn't fair.
I guess this is essentially the same thought as 'all they do is hallucinate'.
Because it seems the point being made multiple times that a perceptual error isn’t a key component of hallucinating, the whole thing is instead just a convincing illusion that could theoretically apply to all perception, not just the psychoactively augmented kind.
E.g. Programming in JS or Python: good enough
Programming in Rust: I can scrap over 50% of the code because it will
a) not compile at all (I see this while the "AI" types)
b) not meet the requirements at all
Like, we have just as much to lose as they have to gain. Of course a part of us doesn't want these tools to be as good as some people say they are because it directly affects our future and livelihood.
No, they can't do everything. Yes, they can do some things. It's that simple.
No, this is because that would mean AGI. And it’s obviously not that.
This works on people as well!
Cops do this when interrogating. You tell the same story three times, sometimes backwards. It's hard to keep track of everything if you're lying or you don't recall clearly so you can get a sense of confidence. Also works on interviews, ask them to explain a subject in three different ways to see if they truly understand.
Only within certain conditions or thresholds that we're still figuring out. There are many cases where the more someone recalls and communicates their memory, the more details get corrupted.
> Cops do this when interrogating.
Sometimes that's not to "get sense of the variation" but to deliberately encourage a contradiction to pounce upon it. Ask me my own birthday enough times in enough ways and formats, and eventually I'll say something incorrect.
Care must also be taken to ensure that the questioner doesn't change the details, such as by encouraging (or sometimes forcing) the witness/suspect to imagine things which didn't happen.
With LLMs you have no such guarantee or expectation.
One thing I've done with some success is use a Test Driven Development methodology with Claude Sonnet (or recently GPT-5). Moving forward the feature in discrete steps with initial tests and within the red/green loop. I don't see a lot written or discussed about that approach so far, but then reading Martin's article made me realize that the people most proficient with TDD are not really in the Venn Diagram intersection of those wanting to throw themselves wholeheartedly into using LLMs to agent code. The 'super clippy' autocomplete is not the interesting way to use them, it's with multiple agents and prompt techniques at different abstraction levels - that's where you can really cook with gas. Many TDD experts have great pride in the art of code, communicating like a human and holding the abstractions in their head, so we might not get good guidance from the same set of people who helped us before. I think there's a nice green field of 'how to write software' lessons with these tools coming up, with many caution stories and lessons being learnt right now.
edit: heh, just saw this now, there you go - https://news.ycombinator.com/item?id=45055439
But like you said, it was meant more TDD as 'test first' - so a sort of 'prompt-as-spec' that then produces the test/spec code first, and then go iterate on that. The code design itself is different as influenced by how it is prompted to be testable. So rather than go 'prompt -> code' it's more an in-between stage of prompting the test initially and then evolve, making sure the agent is part of the game of only writing testable code and automating the 'gate' of passes before expanding something. 'prompt -> spec -> code' repeat loop until shipped.
As a simple example, a "buildUrl" style function that put one particular host for prod and a different host for staging (for an "environment" argument) had that argument "tested" by exactly comparing the entire functions return string, encoding all the extra functionality into it (that was tested earlier anyway).
A better output would be to check startsWith(prodHost) or similar, which is what I changed it into, but I'm still trying to work out how to get coding agents to do that in the first or second attempt.
But that's also not surprising: people write those kinds of too-narrow not-useful tests all the time, the codebase I work on is littered with them!
That sounds like an anti-pattern and not true TDD to get LLMs to generate tests for you if you don't know what to test for.
It also reduces your confidence in knowing if the generated test does what it says. Thus, you might as well write it yourself.
Otherwise you will get these sort of nasty incidents. [0] Even when 'all tests passed'.
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
In short, LLMs often get confused about where the problem lies: the code under test or the test itself. And no amount of context engineering seems to solve that.
Without providing the actual feature requirements to the LLM(or the developer) it is impossible to determine which is wrong.
Which is why I think it is also sort of stupid by having the LLM generate tests by just giving it access to the implementation. That is at best testing the implementation as it is, but tests should be based on the requirements.
Before I let an agent touch code, I spell out the issue/feature and have it write two markdown files - strategy.md and progress.md (with the execution order of changes) inside a feat_{id} directory. Once I’m happy with those, I wipe the context and start fresh: feed it the original feature definition + the docs, then tell it to implement by pulling in the right source code context. So by the time any code gets touched, there’s already ~80k tokens in play. And yet, the same confusion frequently happens.
Even if I flat out say “the issue is in the test/logic,”, even if I point out _exactly_ what the issue is, it just apologizes and loops.
At that point I stop it, make it record the failure in the markdown doc, reset context, and let it reload the feature plus the previous agent’s failure. Occasionally that works, but usually once it’s in that state, I have to step in and do it myself.
Nice.
But much more than an arithmetic engine, the current crop of AI needs an epistemic engine, something that would help follow logic and avoid contradictions, to determine what is a well-established fact, and what is a shaky conjecture. Then we might start trusting the AI.
To me this is the most bizarre part. Have we ever had a technology deployed at this scale without a true understanding of its inner workings?
My fear is that the general public perception of AI will be damaged since for most LLMs = AI.
The idea we don't is tabloid journalism, it's simply because the output is (usually) randomised - taken to mean, by those who lack the technical chops, that programmers "don't know how it works" because the output is indeterministic.
This is not withstanding we absolutely can repeat the output by using not randomisation (temperature 0).
Salience (https://en.wikipedia.org/wiki/Salience_(neuroscience)), "the property by which some thing stands out", is something LLMs have trouble with. Probably because they're trained on human text, which ranges from accurate descriptions of reality to nonsense.
So the only real difference between "perception" and a "hallucination" is whether it is supported by physical reality.
I can recognize my own meta cognition there. My model of reality course corrects the information feed interpretation on the fly. Optical illusions feel very similar whereby the inner reality model clashes with the observed.
For general ai, it needs a world model that can be tested against and surprise is noted and models are updated. Looping llm output with test cases is a crude approximation of that world model.
[1] https://huggingface.co/docs/smolagents/conceptual_guides/int...
"All (large language) model outputs are hallucinations, but some are useful."
Some astonishingly large proportion of them, actually. Hence the AI boom.
It implies that some parts of the output aren’t hallucinations, when the reality is that none of it has any thought behind it.
So, we VALUE creativity, we claim that it helps us solve problems, improves our understanding of the universe, etc.
BUT people with some mental illnesses, their brain is so creative that they lose the understanding of where reality is and where their imagination/creativity takes over.
eg. Hearing voices? That's the brain conjuring up a voice - auditory and visual hallucinations are the easy example.
But it goes further, depression is where people's brains create scenarios where there is no hope, and there's no escape. Anxiety too, the brain is conjuring up fears of what's to come
This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.
> In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called confabulation,[1] or delusion)[2] is a response generated by AI that contains false or misleading information presented as fact.[3][4]
You say
> This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.
For consistency you might as well say everything the human mind does is hallucination. It's the same sort of claim. This claim at least has the virtue of being taken seriously by people like Descartes.
https://en.wikipedia.org/wiki/Hallucination_(artificial_inte...
It's possible LLMs are lying but my guess is that they really just can't tell the difference.
So far AI seems to be a great augmentation, but not a replacement.
For any macOS users, I highly recommend an Alfred workflow so you just press command + space then type 'llm <prompt>' and it opens tabs with the prompt in perplexity, (locally running) deepseek, chatgpt, claude and grok, or whatever other LLMs you want to add.
This approach satisfies Fowler's recommendation of cross referencing LLM responses, but is also very efficient and over time gives you a sense of which LLMs perform better for certain tasks.
Are you just opening a browser tab?
http://localhost:3005/?q={query}
https://www.perplexity.ai/?q={query}
https://x.com/i/grok?text={query}
https://chatgpt.com/?q={query}&model=gpt-5
https://claude.ai/new?q={query}
Modify to your taste. Example: https://github.com/stevecondylios/alfred-workflows/tree/main (you should be able to download the .alfredworkflow file and double click on it to import it straight into alfred, but creating your own shouldn't take long, maybe 5-10 mins if it's your first workflow)
https://github.com/mjmaurer/infra/blob/main/home-manager/mod...
For example, working on my project (https://mudg.fly.dev) I wanted to experiment with a new FE graph library. I asked an llm the fantastic (https://tidewave.ai) and it completely implemented a solution, which turned out to be worse than the current solution.
If I had spent many hours working on a solution I might have been more "attached" to the solution and kept with it.
Perhaps this is a good thing? (https://en.wikipedia.org/wiki/Nonattachment_(philosophy))
1: when building games for my kids, the tolerance is high so the LLM can implement a feature in one shot and I will manually test the feature (without bothering to review the code or generating unit tests or using a type-safe language).
2: building a demo at work has lower tolerance for errors, but is still good with LLM with a brief code review.
3: for production code, other processes can be in place to meet the even lower tolerance requirements.
If they are actually using the engineering principles that their job description hints at, they'll probably be fine. Software Engineering is a growing field, the need for actual Engineers (you know, the kind that get licensed in other fields) is unlikely to shrink any time soon.
I like this article, I generally agree with it. I think the take is good. However, after spending ridiculous amounts of time with LLMs (prompt engineering, writing tokenizers/samplers, context engineering, and... Yes... Vibe coding) for some periods 10 hour days into weekends, I have come to believe that many are a bit off the mark. This article is refreshing, but I disagree that people talking about the future are talking "from another orifice".
I won't dare say I know what the future looks like, but the present very much appears to be an overall upskilling and rework of collaboration. Just like every attempt before, some things are right and some are simply misguided. e.g. Agile for the sake of agile isn't any more efficient than any other process.
We are headed in a direction where written code is no longer a time sink. Juniors can onboard faster and more independently with LLMs, while seniors can shift their focus to a higher level in application stacks. LLMs have the ability to lighten cognitive loads and increase productivity, but just like any other productivity enhancing tool doing more isn't necessarily always better. LLMs make it very easy to create and if all you do is create [code], you'll create your own personal mess.
When I was using LLMs effectively, I found myself focusing more on higher level goals with code being less of a time sink. In the process I found myself spending more time laying out documentation and context than I did on the actual code itself. I spent some days purely on documentation and health systems to keep all content in check.
I know my comment is a bit sparse on specifics, I'm happy to engage and share details for those with questions.
It still is, and should be. It’s highly unlikely that you provided all the required info to the agent at first try. The only way to fix that is to read and understand the code thoroughly and suspiciously, and reshaping it until we’re sure it reflects the requirements as we understand them.
No, written code is no longer a time sink. Vibe coding is >90% building without writing any code.
The written code and actions are literally presented in diffs as they are applied, if one so chooses.
The most efficient way to communicate these plans is in code. English is horrible in comparison.
When you’re using an agent and not reviewing every line of code, you’re offloading thinking to the AI. Which is fine in some scenarios, but often not what people would call high quality software.
Writing code was never the slow part for a competent dev. Agent swarming etc is mostly snake oil by those who profit off LLMs.
With an engineer you can hand off work and trust that it works, whereas I find code reviewing llm output something that I have to treat as hostile. It will comment out auth or delete failing tests.
Which should be always in my opinion
Are people really pushing code to production that they don't understand?
No.
The general assumed definition of vibe coding, hence the vibe word, is that coding becomes an iterative process guided by intuition rather than spec and processes.
What you describe is literally the opposite of vibe coding, it feels the term is being warped into "coding with an LLM".
Leaving out specs and documentation leads to more slop and hallucinations, especially with smaller models.
You can always just wing it, but if you do so and there isn't adequate existing context you're going to struggle with slop and hallucinations more frequently.
Written code has never been a time sink. The actual time that software developers have spent actually writing code has always been a very low percentage of total time.
Figuring out what code to write is a bigger deal. LLMs can help with part of this. Figuring out what's wrong with written code, and figuring out how to change and fix the code, is also a big deal. LLMs can help with a smaller part of this.
> Juniors can onboard faster and more independently with LLMs,
Color me very, very skeptical of this. Juniors previously spent a lot more of their time writing code, and they don't have to do that anymore. On the other hand, that's how they became not-juniors; the feedback loop from writing code and seeing what happened as a result is the point. Skipping part of that breaks the loop. "What the computer wrote didn't work" or "what the computer wrote is too slow" or even to some extent "what the computer wrote was the wrong thing" is so much harder to learn from.
Juniors are screwed.
> LLMs have the ability to lighten cognitive loads and increase productivity,
I'm fascinated to find out where this is true and where it's false. I think it'll be very unevenly distributed. I've seen a lot of silver bullets fired and disintegrate mid-flight, and I'm very doubtful of the latest one in the form of LLMs. I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.
I found exactly this is what LLMs are great at assisting with.
But, it also requires context to have guiding points for documentation. The starting context has to contain just enough overview with points to expand context as needed. Many projects lack such documentation refinement, which causes major gaps in LLM tooling (thus reducing efficacy and increasing unwanted hallucinations).
> Juniors are screwed.
Mixed, it's like saying "if you start with Python, you're going to miss lower level fundamentals" which is true in some regards. Juniors don't inherently have to know the inner workings, they get to skip a lot of the steps. It won't inherently make them worse off, but it does change the learning process a lot. I'd refute this by saying I somewhat naively wrote a tokenizer, because the >3MB ONNX tokenizer for Gemma written in JS seemed absurd. I went in not knowing what I didn't know and was able to learn what I didn't know through the process of building with an LLM. In other words, I learned hands on, at a faster pace, with less struggle. This is pretty valuable and will create more paths for juniors to learn.
Sure, we may see many lacking fundamentals, but I suppose that isn't so different from the criticism I heard when I wrote most of my first web software in PHP. I do believe we'll see a lot more Python and linguistic influenced development in the future.
> I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.
I entirely agree, in fact I think we're seeing it already. There is so much that's hyped and built around rough ideas that's glaringly inefficient. But FWIW inefficiency has less of an impact than adoption and interest. I could complain all day about the horrible design issues of languages and software that I actually like and use. I'd wager this will be no different. Thankfully, such progress in practice creates more opportunities for improvement and involvement.
It's not just the fundamentals, though you're right that is an easy casualty. I also agree that LLMs can greatly help with some forms of learning -- previously, you kind of had to follow the incremental path, where you couldn't really do anything complex without have the skills that it built on, because 90% of your time and brain would be spent on getting the syntax right or whatever and so you'd lose track of the higher-level thing you were exploring. With an LLM, it's nice to be able to (temporarily) skip that learning and be able to explore different areas at will. Especially when that motivates the desire to now go back and learn the basics.
But my real fear is about the skill acquisition, or simply the thinking. We are human, we don't want to have to go through the learning stage before we start doing, and we won't if we don't have to. It's difficult, it takes effort, it requires making mistakes and being unhappy about them, unhappy enough to be motivated to learn how to not make them in the future. If we don't have to do it, we won't, even if we logically know that we'd be better off.
Especially if the expectations are raised to the point where the pressure to be "productive" makes it feel like you're wasting other people's time and your paycheck to learn anything that the LLM can do for you. We're reaching the point where it feels irresponsible to learn.
(Sometimes this is ok. I'm fairly bad at long division now, but I don't think it's holding me back. But juniors can't know what they need to know before they know it!)
I've noticed the effects of this first hand from intense LLM engagement.
I relate it more to the effects of portable calculators, navigation systems, and tools like Wikipedia. I'm under the impression this is valid criticism, but we may be overly concerned because it's new and a powerful tool. There's even surveys/studies showing differences in how LLM are perceived WRT productivity between different generations.
I'm more concerned with potential loss of critical thinking skills, more than anything else. And on a related note, there have been concerns of critical thinking skills before this mass adoption of LLMs. I'm also concerned with the impact of LLMs on the quality of information. We're seeing a huge jump in quantity while some quality lacks. It bothers me when I see an LLM confidently presenting incorrect information that's seemingly trivial to validate. I've had web searches give me more incorrect information from LLM tooling at a much greater frequency than I've ever experienced before. It's even more unsettling when the LLM gives the wrong answer and the correct answer is in the description of the top result.
You have finally made an astute observation...
I have already made the assumption that use of LLMs is going to add new mounds of BS atop the mass of crap that already exists on the internet, as part of my startup thesis.
These things are not obvious in the here and now, but I try to take the view of - how would the present day look, 50 years out in the future looking backwards?
That’s a useful tip.
Nice callback to "All models are wrong, but some of them are useful."
There was once a coding agent which achieved SOTA performance on SWE Bench Verified by "just" running the agent 5 times on each instance, scoring each attempt and picking the attempt with the highest score: https://aide.dev/blog/sota-bitter-lesson
Yup, this matches my recommended workflow exactly. Why waste time trying to turn an initially bad answer into a passable one, when you could simply re-generate (possibly with different context)
I wrote up an example of this workflow here: https://github.com/sutt/agro/blob/master/docs/case-studies/a...
This is what LLM "reasoning" does. More than "reasoning" in the human sense, it just reduces variance from variations in the prompt and random next token prediction.
Before AI, we were trying to save money, but through a different technique: Prompting (overseas) humans.
After over a decade of trying that, we learned that had... flaws. So round 2: Prompting (smart) robots.
The job losses? This is just Offshoring 2.0; complete with everyone getting to re-learn the lessons of Offshoring 1.0.
The conclusion I reached was different, though. We learnt how to do outsourcing "properly" pretty quickly after some initial high-profile failures, which is why it has only continued to grow into such a huge industry. This also involved local talent refocusing on higher-value tasks, which is why job losses were limited. Those same lessons and outcomes of outsourcing are very relevant to "bot-sourcing'.
However, I do feel concerned that AI is gaining skill-levels much faster than the rate at which people can upskill themselves.
I think this is a US-centric point of view, and seems (though I hope it's not!) slightly condescending to those of us not in the US.
Software engineering is more than what happens to US-based businesses and their leadership commanding hundreds or thousands of overseas humans. Offshoring in software is certainly a US concern (and to a lesser extent, other nations suffer it), but is NOT a universal problem of software engineering. Software engineering happens in multiple countries, and while the big money is in the US, that's not all there is to it.
Software engineering exists "natively" in countries other than the US, so any problems with it should probably (also) be framed without exclusive reference to the US.
The problems are inherent with outsourcing to a 3rd party and having little oversight. Oversight is, in both cases, way harder than it appears.
There is really a balance between spoon-feeding it the answers vs just abandoning it - the spikiness is really apparent where it can totally nail some things but then utterly fail what should be pretty simple (i.e there was a point where I wanted to refactor two similar functions into a shared library and it just couldn't do it)
An insight I picked up along the way…
> Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.
Those other forms of engineering have no choice due to the nature of what they are engineering.
Software engineers already have a way to introduce determinism into the systems they build! We’re going backwards!
There’s a beautiful invitation to learn and contribute baked into a world where each command is fully deterministic and spec-ed out.
Yes, there have always been poorly documented black boxes, but I thought the goal was to minimize those.
People don’t understand how much is going to be lost if that goal is abandoned.
The more practical question is though, does that matter? Maybe not.
I think it matters quite a lot.
Specifically for knowledge preservation and education.
If I wanted to plumb together badly documented black boxes, I'd have become an EE.
https://en.wikipedia.org/wiki/Metastability_(electronics)
Ah but you see, imagine the shareholder value we can generate for a quarter or two in the meanwhile!
Editors note: please read that as absolutely dripping with disdain and sarcasm.
especially if you have some saved in the same project and then you just tell it "look at the scripts in x and use that style"
We're introducing chaos monkeys, not just variability.
> Process engineers for example have to account for human error rates. [...] Designing systems to detect these errors (which are highly probabilistic!)
> Likewise even for regular mechanical engineers, there are probabilistic variances in manufacturing tolerances.
I read them as relatively confined, thus probabilistic. When a human pushes the wrong button, an elephant isn't raining from the sky. Same way tolerances are bounded.
When requesting a JSON array from an LLM, it could as well decide this time that JSON is a mythological Greek fighter.
For example, web requests are non-deterministic. They depend, among other things, on the state of the network. They also depend on the load of the machine serving the request.
One way to think about this is: how easy is it for you to produce byte-for-byte deterministic builds of the software you're working on? If it's not trivial there's more non-determinism than is obvious.
How can that be extralopated with LLMs? How does a system independently know that it's arrived at a correct answer within a timeout or not? Has the halting problem been solved?
That's the catch 22 with LLM. You're supposed to be both the asker and the verifier. Which in practice, it's not that great. LLMs will just find the snippets of code that matches somehow and just act on it (It's the "I'm feeling Lucky" button with extra steps)
In traditional programming, coding is a notation too more than anything. You supposed to have a solution before coding, but because of how the human brain works, it's more like a blackboard, aka an helper for thinking. You write what you think is correct, verify your assumptions, then store and forget about all of it when that's true. Once in a while, you revisit the design and make it more elegant (at least you hope you're allowed to).
LLM programming, when first started, was more about a direct english to finished code translation. Now, hope has scaled down and it's more about precise specs to diff proposal. Which frankly does not improve productivity as you can either have a generator that's faster and more precise (less costly too) or you will need to read the same amount of docs to verify everything as you would need to do to code the stuff in the first place (80% of the time spent coding).
So no determinism with LLMs. The input does not have any formal aspects, and the output is randomly determined. And the domain is very large. It is like trying to find a specific grain of sand on a beach while not fully sure it's there. I suspect most people are doing the equivalent of taking a handful of sand and saying that's what they wanted all along.
My intuition for the problem here is that people are fixated on the nondeterminism of the LLM itself, which is of limited importance to the actual problem domain of code generation. The LLM might spit out ancient Egyptian hieroglyphics! It's true! The LLM is completely nondeterministic. But nothing like that is ever going to get merged into `main`.
It's fine if you want to go on about how bad "vibe coding" is, with LLM-callers that don't bother to read LLM output, because they're not competent. But here we're assuming an otherwise competent developer. You can say the vibe coder is the more important phenomenon, but the viber doesn't implicate the halting problem.
SO that's why "it compiles" is worthless in a business settings. Of course it should compile. That's the bare minimum of expectations. And even "it passes the tests" is not that great. That just means you have not mess things up. So review and quality (accountability for both) is paramount, so that the proper stuff get shipped (and fixed swiftly if there was a mistake).
In a business settings, what usually matter is getting something into prod and not have bug reports thrown back. And a maintainable code that is not riddled down with debts.
Compiled code is as basic as steering left and right for a F1 driver. Having the tests pass is like driving the car at slow speed and completing a lap with no other cars around. If you're struggling to do that, then you're still in the beginner phase and not a professional. The real deal is getting a change request from Product and and getting it to Production.
Again: my claim is simply that whatever else is going on, the halting problem doesn't enter into it, because the user in this scenario isn't obligated to prove arbitrary programs. Here, I can solve the halting problem right now: "only accept branchless programs with finite numbers of instructions". Where's my Field Medal? :)
It always feels like the "LLMs are nondeterministic" people are relying on the claim that it's impossible to tell whether an arbitrary program is branchless and finite. Obviously, no, that's not true.
Pretty sure you've just edited to add that part.
I did add the last paragraph of the comment you just responded to (the one immediately above this) about 5 seconds after I submitted it, though. Doesn't change the thread.
This suggests a huge gap in your understanding of LLMs if we are to take this literally.
> LLM programming, when first started, was more about a direct english to finished code translation
There is no direct english to finished code translation. A prompt like "write me a todo app" has infinitely many codes it maps to with different tradeoffs and which appeal to different people. Even if LLMs never made any coding mistakes, there is no function that maps a statement like that to specific pieces of code unless you're making completely arbitrary choices like the axiom of choice.
So we're left with the fact that we have to specify what we want. And at that LLMs do exceptionally well.
It is definitely non-trivial and large organizations spend money to try to make it happen.
It’s non trivial because you have to back through decades of tools and fix all of them to remove non determinism because they weren’t designed with that in mind.
The hardest part was ensuring build environment uniformity but that’s a lot easier with docker and similar tooling.
The engineers at TSMC, Intel, Global Foundries, Samsung, and others have done us an amazing service, and we are throwing all that hard work away.
Was it going backwards when the probabilistic nature of quantum mechanics emerged?
More seriously, this is not a fair comparison. Adding LLM output to your source code is not analogical to quantum physics; it is analogical to letting your 5 years old child transcribe the experimentally measured values without checking and accepting that many of them will be transcribed wrong.
And on the side hand, no transistors without quantum mechanics.
As per typical, Martin is so late to the party that he projects his own lack of understanding of the subject to be the general state of matters.
People are, and have been for a while, using LLMs as very effective accelerators or augmentations of themselves.
Claude Code, is a great example of something that can easily one-shot most trivial-to-average programming tasks.
While some wait for the "bubble to burst", a lot of people are gaining significant benefits from using LLMs to boost their speed in software development.
It's not a good idea to muddy the expectations for AI by mixing AGI and LLMs into the same bucket. LLMs are already useful for many purposes, even if they didn't lead to AGI, whatever the definition is.
A junior engineer can't write code anywhere nearly as fast. It's apples vs oranges. I can have the LLm rewrite the code 10 times until its correct and its much cheaper than hiring an obsequious jr engineer
I just want to point out that this answer implicitly means that, at the very least, the profession is at least questionably uncertain which isn't a good sign for people with a long future orientation such as students.
Students shouldn't ever have become so laser-focused on a single career path anyway, and even worse than that is how colleges have become glorified trade schools in our minds. Students should focus on studying for their classes, getting their electives and extracurriculars in, getting into clubs... Then depending on which circles they end up in, they shape their career that way. The thought that getting a Computer Science major would guarantee students a spot in the tech industry was always ridiculous, because the industry was just never structured that way, it's always been hacker groups, study groups, open source, etc. bringing out the best minds
Recently some people have compared LLMs to compilers and the resulting source code to object code. This is a false analogy, because compilation is (almost always) a semantics preserving transformation. LLMs are given a natural language spec (prompt) that by definition is underspecified. And so they cannot be semantics preserving, as the semantics of their input is ambiguous.
The programmer is left with two options: (1) understanding the resulting code, repairing and rewriting it, or (2) ignoring the code and performing validation by testing.
Both of these approaches are assistive. At least in its current form, AI can only accelerate a programmer, not replace them. Lovable and similar tools rely on very informal testing, which is why they can be used by non-programmers, but they have very little chance of producing robust software of any complexity. I’ve seen people creating working web apps, but I am confident I could find plenty of strange bugs just by testing edge cases or stressing non functional qualities. The bigger issue is the bugs I can’t find because they’re not bugs a human programmer would create.
Option (1) is problematic because LLMs tend not to produce clean code designed to be human readable. A lot of the efforts coders are making is to break down tasks and try to guide the LLMs to produce good code. I have yet to see this work for anything novel and complex. For non trivial systems, reasoning and architecture are required. The hope is that a programmer can write specs well enough that LLMs can “fill in the gaps”. But whether this is a net positive once considering the work involved is still an open question. I’ve yet to see any first hand evidence that there’s a productivity gain here. It’s early days.
Option(2) is also difficult because there is a crucial factor missing in AI coding, the “generality of intent” as a human user. This is a problem, because the non-trivial bugs an AI produces are unlikely to be similar to those from a human. Those bugs are usually a failure of reasoning, but LLMs don’t reason in the same sense that humans do, so testing in the same way may not be possible. Your intuitions for where bugs lie are no longer applicable. The likely result is worse code produced more quickly, and that trade-off needs exploring.
At the moment I think AI is useful for (a) discussions around design, libraries, debugging, (b) autocomplete, (c) agent analysis of existing code where partial answers are ok and false positives acceptable (eg finding some but not all bugs). Agent coding doesn’t seem ready for production to me, not until we have much better tooling to prevent some of these problems, or AI becomes capable of proper reasoning.
Reminds me of a recent experience when I asked CC to implement a feature. It wrote some code that struck me as potentially problematic. When I said, "why did you do X? couldn't that be problematic?" it responded with "correct; that approach is not recommended because of Y; I'll fix it". So then why did it do it in the first place? A human dev might have made the same mistake, but it wouldn't have made the mistake knowing that it was making a mistake.
This is a completely wrong assumption and negates a bunch of the points of the article...
Do you have any hard number and data to state that tab/auto complete are less popular than agentic coding?
It was the original success of LLMs -- code autocomplete. All of the agentic coding have autocomplete under the hood. Autocomplete is almost literally what LLMs do + post-processing.
By now I’m sure it won’t. Even if you provide the expected code verbatim, LLMs might go on a side quest to “improve” something.
Is this actually correct? I don't see any evidence for a "airflight bubble" or a "car bubble" or a "loom bubble" at the technologies' invention. Also the "canal bubble" wasn't about the technology, it was about the speculation on a series of big canals but we had been making canals for a long time. More importantly, even if it was correct, there are plenty of bubbles (if not significantly more) around things that didn't have value or tech that didn't matter.
Does that sound like any human, ever, to you?
(The only time there isn't a bubble is when the thing just isn't that interesting to people and so there's never a big wave of uptake in the first place.)
That's an absurd framing for a cute quip.
Edits--
Found one: https://en.wikipedia.org/wiki/Panic_of_1893
Another good one:https://en.wikipedia.org/wiki/Public_Utility_Holding_Company... (from cake_robot here: https://news.ycombinator.com/item?id=45056621)
For reference, apple and spotify links to the Derek Thompson podcast in reply below (thank you!):
https://podcasts.apple.com/us/podcast/plain-english-with-der...
https://open.spotify.com/show/3fQkNGzE1mBF1VrxVTY0oo
The whole subtext of that podcast was how eerily similar the Transcontinental Railroad was to AI (as an investment/malinvestment/prediction of future trends).
The claim in the blog post that all technology leads to speculative asset bubbles I find hard to believe. Where was the electricity bubble? The steel bubble? The pre-war aviation bubble? (The aviation bubble appeared decades later due to changes in government regulation.)
Is this an AI bubble? I genuinely don't know! There is a lot of real uncertainty about future cash flows. Uncertainty is not the foundation of a bubble.
I knew dot-com was a bubble because you could find evidence, even before it popped. (A famous case: a company held equity in a bubble asset, and that company had a market cap below the equity it held, because the bubble did not extend to second-order investments.)
https://en.wikipedia.org/wiki/Public_Utility_Holding_Company...
To me a bubble reflects a market disconnect from fundamentals - wherein prices go up steeply, with no help from the fundamentals (expected growth in base year cash flows and risk).
Indeed there is a subtle difference between a bubble and speculation of what could come of a technology. But the two are connected because the effects of technology are reflected by investors in asset prices.
avation: likewise there was the "Lindberg Boom" https://en.wikipedia.org/wiki/Lindbergh_Boom which led to overspeculation and the crash of many early aviation companies
I used to avidly read all his stuff, and I remember 20ish years ago he decided to rename Inversion of Control to Dependency Injection. In doing so, and his accompany blog, he showed he didn't actually understand it at a deep level (and hence his poor renaming).
This feels similar. I know what he's trying to say, but he's just wrong. He's trying to say the LLM is hallucinating everything, but Fowler is missing is that Hallucination in LLM terms refers to a very specific negative behavior.
Positive hallucinations are more likely to happen nowadays, thanks to all the effort going into these systems.
If you disagree then I would ask what exactly is the “specific behaviour” you’re talking about?
What didn't he understand properly about inversion of control then.
No, they're like an extremely experienced and knowledgeable senior colleague – who drinks heavily on the job. Overconfident, forgetful, sloppy, easily distracted. But you can hire so many of them, so cheaply, and they don't get mad when you fire them!
They are extremely shallow, even compared to a junior developer. But extremely broad, even compared to the most experienced developer. They type real fuckin fast compared to anyone on earth, but they need to be told what to do much more carefully than anyone on earth.
AI is like a super advanced senior wearing a blindfold. It knows almost everything, it's super fast, and it gets confused pretty quickly about things after you tell it.
I think the programming profession overvalues experience over skill. However, when I was young I had no appreciation for what the benefits of experience are... including not writing terrible code.
I figured there's probably a ton more logical issues and deleted it immediately
have you ever managed an offshore team. holy cow
I asked Claude to design me a UI, and it made a lovely one... but I wanted a web ui. It very happily through away all its work and made a brand new web UI.
I can't imagine any employee being that quick to just move on after something like that.
LLM did help a lot to get some busy work out of the way, but it's difficult to know when you need to jump out of the LLM loop and go old skool.
Unrelated to software but recently I wanted to revive an old dumbphone I haven't used since 2014 and apparently I had it password protected and forgot the password and wanted to factory reset it. I found the exact model of the phone and google had only content farm articles that didn't help me at all but Gemini gave me the perfect solution first try. I went to google first because I had no faith in Gemini since to me it seemed like a pretty obscure question but guess I was wrong.
Individual sentences and paragraphs may mostly work, but it's an edifice built on sand out of poorly constructed bricks plus mortar with the wrong proportions (or entirely wrong ingredients).
LLM output is "truthy" - it looks like it might be true, and sometimes it will even be accurate (see also, "stopped clock") but depending on it is foolish because what's generating the output doesn't actually understand what it's putting out - it's just generating output that looks like the kind of thing you've requested.
They follow directions for maybe an hour and then go off and fix random shit because they forgot what their main task was.
They'll tell you to your face how great your ideas were, and that you're absolutely right about something, then go implement it completely incorrectly.
They add comments to literally everything even when you ask them to stop. But they also ignore said comments sometimes.
Maybe they are kinda like us lol.
Yes, that's why you should add "... and answer in the style of a drunkard" to every prompt. Makes it easier to not forget what you are dealing with.
And if you push back on that insanity, they'll smile and nod and agree with you and in the next sentence, go right back to pushing that nonsense.
Ask it for things that many people get wrong or just do badly, or can be mistakenly likened to a popular thing in a way that produces a wrong result, and it'll often err.
The trick is having an awareness of what correct solutions are prevalent in training data and what the bulk of accessible code used for training probably doesn't have many examples of. And this experience is hard to substitute for.
Juniors therefore use LLMs in a bumbling fashion and are productive either by sheer luck, or because they're more likely to ask for common things and so stay in a lane with the model.
A senior developer who develops a good intuition for when the tool is worth using and when not can be really efficient. Some senior developers however either overestimate or underestimate the tool based on wrong expectations and become really inefficient with them.
“libFakeCall doesn’t exist. Use libRealCall instead of libFakeCall.”
“You’re absolutely correct. I apologize for blah blah blah blah. Here’s the updated code with libRealCall instead. :[…]”
“You just replaced the libFakeCall reference with libRealCall but didn’t update the calls themselves. Re-write it and cite the docs. “
“Sorry about the confusion! I’ve found these calls in the libRealCall docs. Here’s the new code and links to the docs.”
“That’s the same code but with links to the libRealCall docs landing page.”
“You’re absolutely correct.” It appears that these calls belong to another library with that functionality:” looks totally plausible but hallucinated libFakeCall.
For all the hot air I hear about the user having to give the system the proper context to give you good answers - does anyone claim to have a solution for dealing with such a belligerent approach to bullshit?
They are by no means useless, but once they fall into that hole, there's no further value in interrogating them.
And constantly microdosing, sometimes a bit too much.
No, typically __you__ are mad when you fire them ...
But it is nothing like a wetware colleague. It is a machine.
The internet build out left massive amounts of useful infrastructure.
The railroads left us with lots of railroads that fell into disuse and eventually left us with a complete joke of a railway system. Made a few people so rich we started calling them robber barons and talking the gilded age.
Are we going to continue to use the 60 billion dollar data centers in Louisiana when the bubble bursts? Is it valuable infrastructure or just a waste of money that gets written off?
Why not?
Tracks in the middle of nowhere are useless.
Data centers in the middle of nowhere are useful, as long as energy, cooling and uplink are cost-effective.
(Not saying Louisiana is in the middle of nowhere!)
1. Routinely some task or domain of work that some expert claims that LLM’s will able to do, LLM’s start being able to reliably perform that task within 6 months to a year, if they haven’t already
2. Whenever AI gets better, people move the goalposts regarding what “intelligence” counts as
3. Still, LLM’s reveal that there is an element to intelligence that is not orthogonal to the ability to do well on tests or benchmarks
It generated a thousand line file with a robust breakdown of everything that needed to be done and at my command it did it. We went module by module and I made sure that each module Had comprehensive unit test coverage and that the repo built well as we went. After a few hours of back and forth we made 9 modules, 60+ APIs across 10 different tables, and hundreds of unit tests that all are passing.
Does that mean that I’m all done and ready to deploy to prod? Unlikely. But it does mean that I got a ton of boilerplate stuff put into place really quickly and that I’m eight hours into a project that would have taken at least a month before.
Once the BE was done I had it generate extensive documentation for the agent that would handle the FE integration as a sort of instruction guide - in case we need it. As issues and bugs arise during integration (they will!) the model has everything it needs to keep on track and finish the job it set out to do.
What a time to be alive!
Even this post by Martin Fowler shows he's an aging dinosaur stuck in denial.
> I’ve often heard, with decent reason, an LLM compared to a junior colleague. But I find LLMs are quite happy to say “all tests green”, yet when I run them, there are failures. If that was a junior engineer’s behavior, how long would it be before H.R. was involved?
I don't know what "LLM's" he's using but I just simply don't get hallucinations like that with cursor or claude code.
He ends with this: > LLMs create a huge increase in the attack surface of software systems. Simon Willison described the The Lethal Trifecta for AI agents: an agent that combines access to your private data, exposure to untrusted content, and a way to externally communicate (“exfiltration”). That “untrusted content” can come in all sorts of ways, ask it to read a web page, and an attacker can easily put instructions on the website in 1pt white-on-white font to trick the gullible LLM to obtain that private data.
Not sure why he is re-iterating well known prompt injection vulnerability, passing it off as a general weakness of LLM's that applies to all LLM use when that's not the reality.
What's the impressive thing here? No one said AI can't do boilerplate. Especially if you're doing run of the mill crud and starting from a clean slate, in fact, that's probably one of its only useful usages in my opinion. You still need to understand your system when maintaining or adding features to it, do a code review, understand the architecture, ensure its architecture allows for pivots in design/functionality, etc.
Plus, there is no telling how many bugs you have in that code.