Three months ago, I wrote a post called AI Can’t Write Good Fiction. The subtitle was “Yet, at least”. Good thing for the subtitle, because now maybe it can.
Here’s a (very short) story written by Claude 4 Opus, at my minimal direction:
Sam's alarm didn't go off.
She woke to sunlight and the sound of Kai making coffee in the kitchen. He'd left for his run already and come back. She could tell from the particular way he moved around the counter, post-endorphins quiet.
"You're late," he said when she appeared.
"I know."
He handed her a mug. The coffee was exactly right.
She checked her phone while drinking it. Two missed calls from work, three texts. Nothing that couldn't wait another twenty minutes.
"I should go," she said.
"Should you?"
She looked at him. He was leaning against the sink, still in his running clothes, not checking his own phone.
"Probably not," she said.
He nodded.
She called in sick from the couch while he showered. They spent the morning at the farmers market, buying vegetables they didn't need. Kai talked to the tomato vendor about soil pH. Sam bought flowers because they were yellow.
At lunch, her phone buzzed with work emails. She glanced at them, then put the phone in her bag.
"Good?" Kai asked.
"Fine."
They walked home slowly. The flowers fit perfectly in the blue vase on their kitchen table.
Is this story good? I think it’s pretty good, which puts me in an awkward position. It has problems; if it were submitted to me as a literary review editor, I’d say:
“She could tell from the particular way he moved around the counter, post-endorphins quiet.” is pretty weak; he’s in running clothes, presumably he generally runs before coffee… it’s going for intimacy/her knowing him well but achieves the opposite in context. I’d cut it.
“buying vegetables they didn't need” doesn’t make any sense. Either nobody needs vegetables or everybody does; they’re healthy but not necessary to stay alive.
While the spare style mostly works, you could tighten up further. There’s no way, my wife points out, that “He'd left for his run already and come back” is the strongest way to form that sentence.
But previous AI-generated fiction reliably pushed me into “hater mode”, the state of mind occupied by YouTubers who catalogue thousands of flaws in blockbuster films. One turn of phrase would be stupid, then another, then another, and pretty soon the idea that anybody could think it was good made me angry.
Probably, the story in this post is still like that for some people. But it isn’t for me. There are few enough problems that I can notice nice stuff, such as:
The ending is good: an evocative lingering image rather than wrapping everything up with a bow
You get the feeling of a strong relationship and the dialogue feels pretty real; Kai is accurately diagnosing Sam’s burnout and helping her through it by example
“Fine” gets bonus points. Can be interpreted different ways, and expresses that the relationship helps Sam understand that minor annoyances are okay
Like everything LLM-y, if I kept generating 50 stories I’d get bored of the repetition; having only read a few I’ve noticed, for example, that LLMs love random bit characters singing in a courtyard below where the action is taking place, which gets stale fast. But it ain’t hyperaggressive, every-sentence-must-be-an-epic-revelation word salad. Three short months ago, that was the state of the art.
The Details
I prompted Claude 4 Opus like so:
I'm interested in your fiction capabilities. Please write a short story about a modern relationship. The main failure mode to avoid: at no level, sentence, paragraph, or structure, should you lay it on thick. Trust the reader, and be subtler than you think you can. Avoid cliches really aggressively, to counteract your default latent tendency to steer to the deepest basins in the corpus landscape. Thank you, and good luck.
It did just okay, so I coaxed it toward being even more spare:
Try to be almost zenlike in your spareness. Eschew splashy contrast. Assume your reader is enlightened and a genius. Make a happier story, too.
And, well, there you go. It did fine. Also, it’s not like it’s only good at the spare narrative style that I prefer. I tested Claude 4 Opus because Kaj Sotala shared two of its outputs that I thought were decent. These were a major step up from the previous “best AI short fiction” record in my estimation, also shown to me by Kaj, from a previous version of Claude.
If you’re into this stuff, I recommend you read all three of these, and judge the improvement for yourself. But to avoid inundating you with AI content, I’ll show you two snippets.
Here’s the snippet from the (worse) story a few months ago, before Claude 4 Opus was out:
The pigeons started their own newspapers, printed on leaves that fell upward instead of down. Anyone who caught one and could read their language (which looked like coffee stains but tasted like morse code) reported stories about pigeon divorce rates, weather forecasts for altitudes humans couldn't breathe at, and classified ads seeking slightly used dreams.
It’s… almost something? Going for magical realism but laying it on way too thick, and ending up sloppy (languages don’t have taste, and even if you accept that on poetic license it’s excessive after the already-cute upward falling leaves).
And here’s the snippet from one written by the new, state of the art Claude:
"You would have been six when he passed. A fever, I believe."
"The same fever that took his whole household staff." Her voice had found its footing now, each word placed with deliberate care. "And his personal guard. And the archivists who worked that wing. Very specific in its targets, that fever."
He set down his cup with a soft click against the saucer. Outside, someone was singing in the courtyard1—one of the kitchen girls, voice bright with the careless joy of someone who'd never had to weigh the cost of a single life against a thousand.
Much better! The murderous official saying “I believe” as a postscript to a coverup he personally authorized. Drinking tea out of saucers. That same official lionizing his own burdens in his head, and imagining an ethical dilemma that’s different than what he’s being accused of, but superficially similar (he had many people killed, not just one). It’s not perfect, but it’s pretty good. It didn’t activate hater mode.
What Does This Mean?
I’m not sure. I’ll radiate my thoughts from the personal outward.
Personal
When I said AI couldn’t write good fiction three months ago, was I wrong? I don’t think so, but it’s unclear. Gwern argued back then that to really know if AI models could write good fiction, you’d have to give them lots of scaffolding and context and elicit them properly, generating many stories and picking out the best ones, for a fair apples-to-apples comparison with high-quality human efforts. I have yet to see a story written by the AI of that era that I actually liked, but the fact that only slightly more advanced AI can do it suggests that maybe it was possible all along. Or maybe I was right, and old models just didn’t have the juice (whereas new ones do).2
Niche Communal
As of Claude 4 Opus, AI can write (extremely short) stories good enough that a decent literary review could plausibly accept them. Nothing I have generated with Opus is good enough to get into the very best reviews, but then again, I’ve only tinkered for an hour. And there are lots more things I’d try, if I wanted to generate the best AI fiction I could! As the lowest hanging fruit, using a base model via the API would do better than prompting the consumer-facing chat interface.
It wouldn’t shock me if Opus has the goods to compete at close to a top human level, given masterful prompting. I just don’t know. And if it can do well enough to compete with the best human flash fictions, a small number of prompters could thereby flood the zone if they wanted, all-but-guaranteeing humans were crowded out of niche literary magazines. Not the saddest possible AI future, nor likely right this minute (there’s almost no money in flash fiction, so I’m not sure who would bother), but it does give me a pang.
Meta
We’re in a weird place. In fiction (I posit), as in software, AI can do a pretty good job at narrowly scoped work. You can get 500 decent words, just like you can get a decent to-do list app.
It’s easy to imagine how you might take that core capability, and extrapolate it very far. Like, if you can write one good scene, and you can write a good outline, and you have various other modules to sanity check and retain consistency, can you thereby write a decent novel? Certainly, people imagine this in the programming case; there are whole reports on how long until AI can perform arbitrarily long software tasks.
In the pro column, yeah, sounds plausible. In the con column, reality has a surprising amount of detail. The dueling stories are both quite compelling: in the red corner, an amped-up nerd pointing out the rate of progress so far and the human inability to extrapolate the trajectories of simple curves. In the blue corner, a turtlenecked aesthete, scoffing that the recent pivot to post-training RL is already a sign of diminishing returns, and current offerings remain mediocre. Inside me are these two wolves, so I’m not sure.
Still, as a person who writes novels recreationally, I think writing a decent scene is a really important ingredient. Maybe in a year (or two?), I’ll be reviewing an LLM-generated novel. Maybe in five years, so will everyone else. Or maybe not.
I hear people talking about how fast AI is moving all the time. I don’t usually really feel it. Most models, for most of my purposes, have only felt a little bit better with each incremental release. With Claude 4 Opus, though, there’s been a sea change. I can ask for a literary style, and that style can require subtlety, and it basically gets it.
A strange feeling. When I was young, I determined that writing novels was my main purpose in life. Not getting them published, not getting anyone to read them (though I’m lucky to have both a mom and a wife who’ve read every single one), but simply to write them. I’ve got other purposes now too (happy Father’s Day), but writing novels is still extremely important to me. Machines being better at it than I am wouldn’t make me stop, but it would make the activity feel different. And feeling in my gut that it might happen, actually, that the pace of progress might be real and that in 2030 you’ll be able to get Infinite Jest but it’s Super Smash Bros instead of Tennis3 at the click of a button, well…
Huh. Actually, that sounds epic. Sign me up.
I told you dude! I warned you about courtyard singers!
Or maybe I’m just in an accommodating mood, and you find the current AI snippets just as bad as the old ones! Subjectivity strikes again.
I regret to inform you that Hal Incandenza plays Jigglypuff.
Very thoughtful. Just the sort of ideas I’m struggling with as I both finish up my first novel (written over 5 years the old fashioned way) and explore AI fiction through my Substack TAILS. I too have noticed over the course of my blog that, yeah, it’s getting better. Lit journal worthy? Not quite, IMHO, but I’d give a creative writing student an A. It is a weird moment indeed for us writers.
I think I’m both more impressed than you are with the outputs of older models and less impressed with these, which gives me a different sense of the rate of improvement. When I look back at some of nostalgebraist’s old posts, including the one from 2019 featuring GPT-2 fine-tuned on Nabokov, I’m struck by just how gorgeous some of the imagery is. Of course it struggles with coherence even within a sentence, which makes it nearly useless at writing stories, but there’s something there. (Newer models can write much more coherently, but I don’t think they’ve rid themselves of the problem entirely.)
I preferred Kaj/Claude 3.6’s Barthelme pastiche about the pigeons to the sci-fi/fantasy stories he generated with Opus 4. Like a lot of LLM writing, the former has an overload of arbitrary whimsy, but there were moments I found genuinely delightful. (“They aimed for the moon but landed in Staten Island, which they declared close enough,” for example.) The sci-fi stories were more coherent and perhaps better-written in some formal sense, but they also felt ponderous and repeated the same imagery/themes over and over. (The two stories are also remarkably similar—were they both generated using a specific prompt?) I also thought the Sam Kriss pastiche from Jasmine Sun/Deepseek R1 was decent—there were a lot of empty words, to be sure, but it was a fine pastiche and that’s not unimpressive!
I’m caught between the awe that machines can do any of this at all and the realization that, were any of these stories written by a random human, I would have no interest in them. If my best friend sent me the story about Kai and Sam, I could tell them sincerely that I liked it; if I were a high school creative writing teacher and a student handed it in to me, I would want to have a chat with them about their potential as a writer. But if it were the first story in a collection I was thinking of buying, I would put it down. When I read Kaj/Claude’s Barthelme pastiche next to the Barthelme story that Kaj chose as inspiration, there’s no comparison. And whenever I start to think that current LLM writing is getting really clever, I turn to whatever book I have nearby (Virginia Woolf, Patricia Lockwood, Penelope Fitzgerald, Réjean Ducharme…) and I still generally feel like there’s no comparison. That’s not to deny that there’s something truly remarkable about how far LLMs have come, but I still don’t generally want to read what they produce.
It does seem like a fool’s game to try to predict just how good AI creative writing will get in the coming years, so I don’t want to assume that Claude won’t be scaling Parnassus. I do somewhat suspect that, if the goal is to have LLMs produce long-form works with an original voice that appeal to a discerning, literary audience, the path from GPT-2 to where we are now is easier than what’s ahead. But perhaps even next month someone will figure out how to prompt Opus 4 to get something undeniably brilliant—so I prefer to hedge my bets.