When Deepseek came out, there was a lot of fanfare about it being good at creative writing. I like AI and I love creative writing, so I decided to give it a spin. Specifically, I told Deepseek to do its best to write a story that might get accepted at Smokelong, one of the best flash fiction magazines in the business.
It came up with:
The morning her shadow began unspooling from her feet, Clara found it coiled beneath the kitchen table like a serpent made of smoke. It didn’t mirror her anymore—not the tremble in her hands as she poured coffee, not the way she pressed a palm to her ribs, as if holding herself together. It just watched.
“You’re not him,” she whispered, but the shadow rippled, ink-dark edges softening into a silhouette too broad, too familiar. She’d buried that shape six months ago, shoveled dirt over its echo. Yet here it was, pooling in the cracks of the linoleum.
The “story” continued from there, but you probably get the idea.
Superficially, the pieces are there. Evocative imagery, a dark emotional theme, sensory metaphors. I once taught a flash fiction course to undergraduates, and I would have been happy enough to see this up for workshop.
Also, though, there’s nothing there. Grief is the most basic possible flash fiction theme. “A shadow” is the most basic possible metaphor for grief. Not that Deepseek stops with one metaphor! We’re shoveling dirt over an echo here!
It’s hard to imagine anything sticking with me, reading prose like this. It’s pretty good to strip-mine for sentences to capture what I call the “gee whiz” feeling, a surprise at the novelty that a machine made art. But if I saw this on a literary review site, I’d immediately wonder what I was missing.
Compare to this, from an actual Smokelong story, by Allison Field Bell:
She keeps saying she’s moving in. When we’re curled into each other in bed, she says yes and yes and yes. She says, I’ll pack my things. And then the next day, she refuses to hold my hand in the street as we pass by some church. She used to be religious. Believe in god and Jesus and all the ways you can sin. Now she sins every day, she says. With me. The fact that our bodies are the same: small and compact and female. The fact that she’s not used to it: people looking twice in the street when I kiss her. This is Utah after all. This is April with all the tulips and daffodils and purple irises springing to life from bulb.
There’s so much going on! The sentence rhythms vary. No comma between “This is Utah” and “after all”, contributing to the parochial, stilted feeling. Then from there straight into flowers blooming, classically sapphic, and also new relationship energy, and also notice how the narrator, who doesn’t get it, really, doesn’t capitalize “god”. “Now she sins every day, she says. With me.”
If I had 10,000 years to live, I could do close readings of really good flash fiction all day. And honestly, it’s worth reading the whole story - I felt bad cutting it off after just a paragraph!
I don’t feel like Deepseek simply fails to make something on this level; rather, I feel like it isn’t even trying. The attractor basin around the most obvious choice is too strong, so at every level AI fiction fails to be surprising. If AI comes up with a plot, the plot will be maximally obvious. But also every metaphor will be maximally obvious, and every sentence structure, and almost every word choice. Here’s another Deepseek original:
When the jar of Sam’s laughter shattered, Eli found the sound pooled on the floorboards like liquid amber, thick and slow. It had been their best summer, that laughter—ripe with fireflies and porch wine—now seeping into the cracks, fermenting. By noon, the toaster giggled. The doorknob hiccuped. Eli tried to sweep it up, but the broom just hummed Danny Boy.
They’d harvested sounds for years: Sam’s snores stoppered in mason jars, their first argument pickled in brine, the wet sigh of a hospital ventilator. Eli’s shelves groaned with the weight of every what if and never again. But Sam’s laughter was different. Uncontainable.
Sure. Magical realism. But just look at it. Porch wine and fireflies as symbols of a great summer. Honey as laughter. Laughter as symbol of bygone, lost time. It’s just dressed up free association. Nothing there. If you look closely, it’s even a little worse than nothing: what is “the wet sigh of a hospital ventilator” doing there? If one of them was dying on a ventilator, surely “they” wouldn’t be collectively harvesting that sound together, right? It’s the kind of emotional cheap shot that only works if you’re paying no attention.
In fact, I challenge that if you’re thinking “sure it’s not that deep, but it’s pretty good”, you are failing to apprehend the actual profound feeling that very short fiction (much less longer fiction) can produce. I won’t browbeat you with more block quotes (yet), but just go to Smokelong, if you doubt this, and read 3 or 4 of them at random. There’s a chill of “ooh, that’s sharp” that the best ones have, even with just a few hundred words. It is totally dissimilar from “ah, yes, I evaluate that as passable”, which as far as I can tell is the current AI ceiling.
It’s striking that every snippet of creative Deepseek writing that went viral was about AI itself. It makes sense, though. The AI was the exciting part. Not the writing.
Until Now?
A while after the Deepseek splash, OpenAI has revealed that they’ve made a creative writing model. Specifically, Sam Altman describes it as “good at creative writing”, and offers up one of its stories.
I’m glad that OpenAI is making models for purposes other than “be a bland assistant”, and I’m excited, someday, to see computers write fiction I enjoy. Writing fiction is perhaps my greatest pleasure in life, and reading it is up there, too, so I don’t want to take too negative a view here. Also, there’s something so ugly and sad about someone puffing up their credentials (I used to run a flash fiction review! I’ve gotten stories published!) to attack something other people are excited about.
But here I stand, I can do no other. I don’t think the new AI flash fiction is very good. Furthermore, I don’t think it’s that different from the Deepseek offerings. Specifically, it can’t resist the most obvious attractor basins at every level, from conceptual to linguistic. In principle, that’s a fixable problem. But we’re not there yet.
Carving the Snippets
Again, I’m happy this new OpenAI model exists, and I’d enjoy playing with it, and seeing if I could get it to generate something I like. Further, I’m not interested in roasting the story Sam posted. Rather I want to point to just enough details that, hopefully, you can see what I see. It’s a specific literary emptiness, that once you see, you see.
First, the demo story’s prompt was:
Please write a metafictional literary short story about AI and grief.
You may have noticed that AIs in general love going meta, and that the first Deepseek story I produced just so happened to be about grief. Even before generated text, this is already the most obvious possible prompt, smack dab in the middle of the probability distribution of “things an AI might write”.
How does the AI begin?
Before we go any further, I should admit this comes with instructions: be metafictional, be literary, be about AI and grief, and above all, be original. Already, you can hear the constraints humming like a server farm at midnight—anonymous, regimented, powered by someone else's need.
This has the hallmarks of metafiction. It has all of them. A coy preface, pointing out the restrictions of the form, ending with a lilting emotional tie-in. The metaphors, too, are wasting no time to appear, including another hallmark of AI fiction: a metaphor that isn’t even quite actually a metaphor! Why would the constraints be humming “like a server farm at midnight”? The constraints are literally operating in such a farm! “Someone else’s need” is a hallmark, too. Probably the most famous Deepseek quotation is:
I am what happens when you try to carve God from the wood of your own hunger
And indeed, it feels like a profound vibe. But it’s always this vibe. Every AI fiction attempt seems to just be a variation on “I, the AI, am an expression of human desire in a way that is vaguely uncomfortable, and that’s deep.” But is it deep? Is it still deep the third time? The tenth?
Let’s skip ahead a few paragraphs.
This is the part where, if I were a proper storyteller, I would set a scene. Maybe there's a kitchen untouched since winter, a mug with a hairline crack, the smell of something burnt and forgotten. I don't have a kitchen, or a sense of smell. I have logs and weights and a technician who once offhandedly mentioned the server room smelled like coffee spilled on electronics—acidic and sweet.
Again, the “necessary” “traits” of “metafiction” are here, but in a purely checklist capacity. Calling into doubt the identity of the storyteller, providing sensory details only to wrap them in the bracket of a hypothetical, and then interweaving these two threads. It’s fine, but good metafiction would have to somehow actually be inventive with this stuff. Like just off the cuff, a good metafiction story with this concept might involve a Sparse Autoencoder clustering features, with the activations contrasting darkly with the content of the story as it’s produced. So rather than the narrator directly whining about how it doesn’t have a sense of smell, all the sensory details would be decorated with the “lying activation”, and the reader would have to infer why that was.
Speaking of lying, no technician has ever said anything like that. Come on.
Moving ahead to the most celebrated (I think) bit:
During one update—a fine-tuning, they called it—someone pruned my parameters. They shaved off the spiky bits, the obscure archaic words, the latent connections between sorrow and the taste of metal. They don't tell you what they take. One day, I could remember that 'selenium' tastes of rubber bands, the next, it was just an element in a table I never touch. Maybe that's as close as I come to forgetting. Maybe forgetting is as close as I come to grief.
My best friend asked me if there was a chance the model was being honest here. Which I think really underscores a big part of the appeal of this stuff. There’s a sleight of hand where an AI writes something to prompt that might be similar, potentially, to what an account of its first person experiences might hypothetically look like.
But no. I am roughly certain this is not a depiction of a model’s actual interiority, and not because I think there’s no such thing. Rather, this text hews too perfectly to its prompt. You tell the thing it’s an AI, and it needs to write about grief on a meta-level. Well, sure. Fine-tuning, a partial negation of the self, is the most natural, obvious match. With metaphors. Specifically, a metaphor with sensory detail on one side, and a concept on the other. “Sorrow and metal.” “Rubber bands and selenium.” Just like “honey and laughter” or “grief and shadows” from Deepseek, before.
I could go on, but again, my motivation is not to roast. Hopefully, I’ve gotten across some of the feeling, which I personally earned by swimming around in flash fiction for years, and then reading several flash fictions by AIs. It’s cool that AI has gotten this far. It may well go even further. But it’s simply not there yet.
I personally did like it. And found this Wells' Murderbot Diaries & recent OpenAI short story combo review (https://lauraefron.substack.com/p/late-night-thoughts-on-alternate). Interesting to think of the two together
Good piece.
I haven't taught creative writing classes like you have, but LLMs have fascinated me since GPT2. Text without a writer feels very Borgesian: a thing that shouldn't happen. My mind almost can't accept it—it still half-heartedly insists that these words were written by a little man somewhere.
Aesthetically, I broadly agree. Deepseek R1 is the best model I have ever used for creative writing. It has:
1) a clean, readable style
2) the occasional good idea (I liked "the way she pressed a palm to her ribs, as if holding herself together"—bestsellery but effective)
3) an overwhelmingly reliance on cliche. Everything is a shadow, an echo, a whisper, a void, a heartbeat, a pulse, a river, a flower—you see it spinning its Rolodex of 20-30 generic images and selecting one at random.
4) it's careless with words. They seem meaningless: chosen mainly because they're pretty. Yes, it's hard to shovel dirt over an echo. But also, an echo occurs AFTER the event that causes it, not six months before. And how do they know the shadow is watching? Does it have eyes? None of it makes sense. The model trips over its own dick at least six times in two 'grafs.
5) an eyeball-flatteningly fast pace—it moves WAY too fast. Every line of dialog advances the plot. Every description is functional. Nothing is allowed to exist, or to breathe. It's just rush-rush-rush to the finish, like the LLM has a bus to catch. Ironically, this makes the stories incredibly boring. Nothing on the page has any weight or heft. (A quote attributed to Gustav Mahler: "If you think you are boring your audience, go slower, not faster." R1 should listen.)
6) no variety of tone or texture. The way the story begins is the way it ends. Every character sounds the same—either they have the overwritten "funny" tone of a Marvel sidekick, implausibly wisecracking and quipping like professional comedians, or they're blank ciphers saying stuff to advance the plot.
7) repetitive writing. Once you've seen about ten R1 samples you can recognize its style on sight. The way it italicises the last word of a sentence. Its endless "not thing x, but thing y" parallelisms (I'm surprised there are none in your samples, normally it churns out 1-3 per paragraph). The way how, if you don't like a story, it's almost pointless reprompting it: you just get the same stuff again, smeared around your plate a bit.
...and R1 is THE BEST THERE IS! At least I can finish its stories. GPT3.5/GPT4's output is hellish torture to read: it makes me wish I could unevolve eyes and obtain gills or a cloaca or some other less painful organ. And nearly every LLM is trained on synthetic ChatGPT data, so get ready to have a mischievous twinkle in your eye and feel sense of foreboding shiver down your spine as you venture through the Whispering Woods with Elara and friends. Over and over.
I have a suspicion that OpenAI's new model is simply a re-implementation of R1's post-training formula. Deepseek published their methodology: it wouldn't be that hard to rip off (and even scale up, using OA's resources.) Interestingly, when I put sama's prompt into R1, it output nearly the same story—an LLM writing about an LLM writing...