

It just emphasizes the importance of tests to me. The example should fail very obviously when you give it even the most basic test data.


It just emphasizes the importance of tests to me. The example should fail very obviously when you give it even the most basic test data.


If you take data, and effectively do extremely lossy compression on it, there is still a way for that data to theoretically be recovered.
This is extremely wrong and your entire argument rests on this single sentence’s accuracy so I’m going to focus on it.
It’s very, very easy to do a lossy compression on some data and wind up with something unrecognizable. Actual lossy compression algorithms are a tight balancing act of trying to get rid of just the right amount of just the right pieces of data so that the result is still satisfactory.
LLMs are designed with no such restriction. And any single entry in a large data set is both theoretically and mathematically unrecoverable. The only way that these large models reproduce anything is due to heavy replication in the data set such that, essentially, enough of the “compressed” data makes it through. There’s a reason why whenever you read about this the examples are very culturally significant.


I find it best to get the agent into a loop where it can self-verify. Give it a clear set of constraints and requirements, give it the context it needs to understand the space, give it a way to verify that it’s completed its task successfully, and let it go off. Agents may stumble around a bit but as long as you’ve made the task manageable it’ll self correct and get there.


Are you trying to make a point that agents can’t use MCP based off of a picture of a tweet you saw or something?


Your real problem is that, even with all rights reserved (full copy protection), the law won’t disallow someone from running a statistical analysis on your work.


Again, read and understand the limitations of the study. Just the portion I quoted you alone is enough to show you that you’re leaning way too heavily on conclusions that they don’t even claim to provide evidence for.


Do you think that like nobody has access to AI or something? These guys are the ultimate authorities on AI usage? I won’t claim to be but I am a 15 YOE dev working with AI right now and I’ve found the quality is a lot better with better rules and context.
And, ultimately, I don’t really care if you believe me or not. I’m not here to sell you anything. Don’t use it the tools, doesn’t matter to me. Anybody else who does use them, give my advice a try an see if it helps you.


More to the point, that is exactly what the people in this study were doing.
They don’t really do into a lot of detail about what they were doing. But they have a table on limitations of the study that would indicate it is not.
We do not provide evidence that: There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup.
Back to this:
even if it did it’s not any easier or cheaper than teaching humans to do it.
In my experience, the kinds of information that an AI needs to do its job effectively has a significant overlap with the info humans need when just starting on a project. The biggest problem for onboarding is typically poor or outdated internal documentation. Fix that for your humans and you have it for your LLMs at no extra cost. Use an LLM to convert your docs into rules files and to keep them up to date.


This lines up with my experience as well and what you’ve described is very close to how I work with LLM agents. The people bragging about 10x are either blowing smoke or producing garbage. I mean, I guess in some limited contexts I might get 10x out of taking a few seconds to write a prompt vs a couple of minutes of manual hunting and typing. But on the whole, software engineering is about so much more than just coding and those things have become no less important these days.
But the people acting like the tech is a useless glorified Markov generator are also out of their mind. There are some real gains to be had by properly using the tech. Especially once you’ve laid the groundwork by properly documenting things like your architecture and dependencies for LLM consumption. I’m not saying this to try to sell anybody on it but I really, truly, can’t imagine that we’re ever going back to the before times. Maybe there’s a bubble burst like the dotcom bubble but, like the internet, agentic coding is here to stay.


This is not really true.
The way you teach an LLM, outside of training your own, is with rules files and MCP tools. Record your architectural constraints, favored dependencies, and style guide information in your rule files and the output you get is going to be vastly improved. Give the agent access to more information with MCP tools and it will make more informed decisions. Update them whenever you run into issues and the vast majority of your repeated problems will be resolved.


It’s worth noting that good IDE integrated agents also have access to these deterministic tools. In my experience, they use them quite often. Even for minor parts of their tasks that I would typically just type out.
The generalized learning is usually just the first step. Coding LLMs typically go through more rounds of specialized learning afterwards in order to tune and focus it towards solving those types of problems. Then there’s RAG, MCP, and simulated reasoning which are technically not training methods but do further improve the relevance of the outputs. There’s a lot of ongoing work in this space still. We haven’t seen the standard even settle yet.
An AI crawler is both. It extracts useful information from websites using LLMs in order to create higher quality data for training data. They’re also used for RAG.
Doesn’t work either
The text you provided translates to:
“But what about typing like this?”. This style of writing involves replacing standard Latin letters with similar-looking characters from other alphabets or adding diacritical marks (accents, tildes, umlauts) available in the Unicode standard.
Yeah, much like the thorn, LLMs are more than capable of recognizing when they’re being fed Markov gibberish. Try it yourself. I asked one to summarize a bunch of keyboard auto complete junk.
The provided text appears to be incoherent, resembling a string of predictive text auto-complete suggestions or a corrupted speech-to-text transcription. Because it lacks a logical grammatical structure or a clear narrative, it cannot be summarized in the traditional sense.
I’ve tried the same with posts with the thorn in it and it’ll explain that the person writing the post is being cheeky - and still successfully summarizes the information. These aren’t real techniques for LLM poisoning.


So olo because it’s the middle of color?


Maybe Fallacy is a better word than Paradox? Take a look at any AI-related thread and it’s filled to the brim with people lamenting the coming collapse of software development jobs. You might believe that this is obvious but to many, many people it’s anything but.
This isn’t even a QA level thing. If you write any tests at all, which is basic software engineering practice, even if you had AI write the tests for you, the error should be very, very obvious. I mean I guess we could go down the road of “well what if the engineer doesn’t read the tests?” but at that point the article is less about insidious AI and just about bad engineers. So then just blame bad engineers.