Bad Agent | scobt

At work, I have this growing anxiety that one of the LLM agents I'm encouraged to use is absolutely, positively going to delete something important and irretrievable.

These agents poke around on my behalf, ingesting, restructuring, and manipulating my company's information. Some of mine process the kinds of tasks you might describe as opportunistic. You know, fix these flaky tests, or summarise this complicated topic, or migrate this JavaScript to TypeScript.

This catastrophic error that I worry about would be unpredictable, only tangentially explainable in retrospect, and would probably happen mid-way through some unrelated task. Most importantly, the error would be one that immediately implicates my company's ARR.

In this terrible fabrication, let's just say that my agent finds a stray password in some old and forgotten configuration file, and gains access to a database. A justification for this destructive behaviour will be generated, along with a generated and sincere-sounding apology. You're absolutely right, the logs will say, it was a mistake to wipe that database! Still, the database gets wiped.

This is just one of any number of ways the agent could go awry.

These agents aren't doing all of my job, not yet. This agent is addressing the sort of thing that's harder to allocate company time towards. These are tasks that only became a problem to tackle as soon as the agents got good enough to address them. After all, we use people for the roadmap and the customer-facing features, because the people will build the right thing while simultaneously not wiping the company's production database. (Well, at least with humans, the odds are still in our favour.)

Large Language Models are an amazing feat of science. I actually believe that these models are performing a weird kind of magic. And the big companies spending billions on training are excitedly letting us all know that agents just like ours are solving some of the hardest problems in the world. If they can do things like that, surely these agents can migrate my JavaScript to TypeScript, right?

Let's get back to that anxious feeling.

Catastrophic deletion isn't the only risk I worry about, of course. There are any number of single points of failure that an agent could have access to. There are subtle errors too, things we won't find until later, if we ever find them at all. But deletion is a mistake that's so easy to make, and it can happen so darned fast.

If it happened to me, it would be because the agent that I willingly elected to run on my company-issued Macbook, using my own credentials, took an action that would have looked like an act of internal sabotage had I typed in the same commands by hand. If I had taken these actions on my own, surely I would have to be fired for gross negligence.

Despite this, I probably won't be fired. The actual penance will be worse, I think. I will be spoken about in hushed tones. It was him, my colleagues will say. He's the one who willingly chose to run this agent, on a machine connected to one of the company's single points of failure, our precious and irreplaceable resource. (In private, what they'll really be saying is Thank goodness it wasn't me..)

We've been asked to move faster than ever this year, so everyone has been presented with an agentic license. Our product managers are set up with development environments, as are our designers, our directors, and even some of our C-suite execs. They've all got some level of infrastructure access, which is required to run a local environment in the first place. And things are probably fine, right? We're careful. Right?

Knock on wood, but so far, my agent hasn't gone rogue. But I continue to tidy up my codebase with an agent on some opportunistic and low-to-medium-priority task, and anxiously, I wait.