I haven't taught creative writing classes like you have, but LLMs have fascinated me since GPT2. Text without a writer feels very Borgesian: a thing that shouldn't happen. My mind almost can't accept it—it still half-heartedly insists that these words were written by a little man somewhere.
Aesthetically, I broadly agree. Deepseek R1 is the best model I have ever used for creative writing. It has:
1) a clean, readable style
2) the occasional good idea (I liked "the way she pressed a palm to her ribs, as if holding herself together"—bestsellery but effective)
3) an overwhelmingly reliance on cliche. Everything is a shadow, an echo, a whisper, a void, a heartbeat, a pulse, a river, a flower—you see it spinning its Rolodex of 20-30 generic images and selecting one at random.
4) it's careless with words. They seem meaningless: chosen mainly because they're pretty. Yes, it's hard to shovel dirt over an echo. But also, an echo occurs AFTER the event that causes it, not six months before. And how do they know the shadow is watching? Does it have eyes? None of it makes sense. The model trips over its own dick at least six times in two 'grafs.
5) an eyeball-flatteningly fast pace—it moves WAY too fast. Every line of dialog advances the plot. Every description is functional. Nothing is allowed to exist, or to breathe. It's just rush-rush-rush to the finish, like the LLM has a bus to catch. Ironically, this makes the stories incredibly boring. Nothing on the page has any weight or heft. (A quote attributed to Gustav Mahler: "If you think you are boring your audience, go slower, not faster." R1 should listen.)
6) no variety of tone or texture. The way the story begins is the way it ends. Every character sounds the same—either they have the overwritten "funny" tone of a Marvel sidekick, implausibly wisecracking and quipping like professional comedians, or they're blank ciphers saying stuff to advance the plot.
7) repetitive writing. Once you've seen about ten R1 samples you can recognize its style on sight. The way it italicises the last word of a sentence. Its endless "not thing x, but thing y" parallelisms (I'm surprised there are none in your samples, normally it churns out 1-3 per paragraph). The way how, if you don't like a story, it's almost pointless reprompting it: you just get the same stuff again, smeared around your plate a bit.
...and R1 is THE BEST THERE IS! At least I can finish its stories. GPT3.5/GPT4's output is hellish torture to read: it makes me wish I could unevolve eyes and obtain gills or a cloaca or some other less painful organ. And nearly every LLM is trained on synthetic ChatGPT data, so get ready to have a mischievous twinkle in your eye and feel sense of foreboding shiver down your spine as you venture through the Whispering Woods with Elara and friends. Over and over.
I have a suspicion that OpenAI's new model is simply a re-implementation of R1's post-training formula. Deepseek published their methodology: it wouldn't be that hard to rip off (and even scale up, using OA's resources.) Interestingly, when I put sama's prompt into R1, it output nearly the same story—an LLM writing about an LLM writing...
Thanks, but you kinda made my comment irrelevant: you got way closer to the metal of what's happening than I did. Yes: eyeball kicks. That's a great way to describe it.
I regard writing (mostly) as a way of transmitting the writer's thoughts. Words and sentences are just boxes for ideas: they have little value in themselves. Yeah, I love cool alliterative prose as much as anyone, but mainly as a quality signal for deeper stuff. "This writer has aesthetic taste and technical skill, so maybe it's worth wading through 5000 words or however long they need to make their actual point." Expensive-looking boxes tend to have cool things inside them.
So it's a shock (and a challenge to assumptions) when that isn't true; when a thing possesses strong stylistic skills, but weak/nonexistent ideas.* Try writing out r1's plots in plain language. Without its eyeball kicks, they evaporate to nearly to nothing (as does the story written by OA's new model). They are like gold-leafed oleander treasure chests full of dust and packing pellets.
@sama's story: "A woman talks to a LLM, she's sad because her husband died, then she stops talking to the LLM, also this is all a fiction created by the LLM, it acts like this is a shocking reveal even though it told us at the start, the end." Great.
As LLM capabilities increase, it grows harder to tell whether they're actually doing something, or just reward-hacking onto the APPEARANCE of doing it (because most humans can no longer tell the difference, and click the positive feedback button anyway).
R1 is slight progress. R2 will come out soon and might display further progress. But for now, I think it's mainly reward-hacking. It creates phrases ("I am what happens when you try to carve God from the wood of your own hunger") that are so evocative that meaning seems to smoke from them, and you're tricked into ignoring your confusion, and reading them as profound statements. Think about them for a few seconds ("is DeepSeek trying to build God? Is low perplexity at a language modeling task = carving from the wood of your own hunger?") and the illusion shatters. It's just an eyeball kick. It contains no thoughts. Even pre-LLM chatbots like RACTER can math their way to a vivid phrase sometimes.
(* Not as shocking as we might hope: there are humans who master style without substance—they're as creepy as R1, but they do exist. If you heard someone described as "a gifted rhetorician" or "a charismatic speaker" you'd tend to read that as a backhanded compliment.)
Your box analogy is great. To me, this is precisely the unsolvable problem of AI "art".
Stylistic hiccups and the gravity wells of cliché can be iteratively ironed out, perhaps even to the point where there really is no more discernable impression of the AI constantly trying to conceal that it is actually just a Chinese Room. But a piece of writing is more than its content — it is also, crucially, its context: Who is saying this, and to what end? What is a message without a sender's intent? As long as AI's output is prompt-based, the only intent behind its output will ever be to satisfy the prompter.
But art is interesting precisely because of the artist's intent to communicate something for its own sake. AI won't be able to do that until it can act independently of prompts, develop opinions and an intrinsic will to express itself — which may be impossible, simply by virtue of its artificiality: even an AI built to do this would have to be given guiding parameters and end up only be a reflection of its creator's intent, not a truly original thinker. And yes, maaayyybe there's an argument that the same goes for humans, but that's a whole 'nother C of Ws.
Anyway, it seems to me AI will only ever be able to express an interpretation of its prompter's intent, which will always be lossy and gravitate toward boilerplate, and thus its output will never be interesting as "art created by non-humans", but only as a sort of curio. However, I do think there is some merit in AI as a tool for writers who otherwise might otherwise struggle to put their thoughts into words.
Possible "R1-ism" in the Hemingway story: "[Main clause], [adjective] and [adjective]."
"He drank coffee at a stall, bitter and thick"
"The sun hung low over the sea, white and mean"
"He watched her cross the plaza, her shadow long and narrow"
"They had taken the train down from Bayonne in the dry summer, the fields cracked and ochre"
(The next two feel like part of the same mode-collapse basin, despite being split into two sentences.)
"He paid and walked down to the docks. The water was green and cold."
"He closed his eyes. The room smelled of blood and salt."
Claude advises me that these constructions are called "right-dislocated phrases".
To be fair, I'm not a big Hemingway Knower. Maybe this is legitimately how he wrote. From a quick look at Farewell to Arms, I found one construction like this ("In the bed of the river there were pebbles and boulders, dry and white in the sun") then nothing for the next dozen pages.
I couldn’t agree more. In my experience, LLM’s tend to abstract meaning onto observations in a way that’s vapid and reach for profundity that is empty at its core. The writing feels hollow and inanimate. It doesn’t have to be this way (and probably won’t always be this way) but it is striking.
(LLM’s also love to overuse participial phrases and right-branching appositives for some reason. It’s bleh.)
Yeah, I feel like a bit of a discourse orphan (though an orphan in good company!) because a lot of people claim categorically that AI art is bankrupt or low quality by its very nature, and a lot of people believe that the tidal wave of automation is mid-takeoff and fiction is already cooked. Whereas I... am cautiously optimistic that we'll get pretty good AI fiction fairly soon? But that we very much don't have it yet!
right — I would agree. AI fiction *can* be something that is quite interesting and maybe even something different from human-crafted writing. (though who knows! maybe it will be good, but bounded) it’s in a weird middle ground right now like you’ve pointed out
Good piece.
I haven't taught creative writing classes like you have, but LLMs have fascinated me since GPT2. Text without a writer feels very Borgesian: a thing that shouldn't happen. My mind almost can't accept it—it still half-heartedly insists that these words were written by a little man somewhere.
Aesthetically, I broadly agree. Deepseek R1 is the best model I have ever used for creative writing. It has:
1) a clean, readable style
2) the occasional good idea (I liked "the way she pressed a palm to her ribs, as if holding herself together"—bestsellery but effective)
3) an overwhelmingly reliance on cliche. Everything is a shadow, an echo, a whisper, a void, a heartbeat, a pulse, a river, a flower—you see it spinning its Rolodex of 20-30 generic images and selecting one at random.
4) it's careless with words. They seem meaningless: chosen mainly because they're pretty. Yes, it's hard to shovel dirt over an echo. But also, an echo occurs AFTER the event that causes it, not six months before. And how do they know the shadow is watching? Does it have eyes? None of it makes sense. The model trips over its own dick at least six times in two 'grafs.
5) an eyeball-flatteningly fast pace—it moves WAY too fast. Every line of dialog advances the plot. Every description is functional. Nothing is allowed to exist, or to breathe. It's just rush-rush-rush to the finish, like the LLM has a bus to catch. Ironically, this makes the stories incredibly boring. Nothing on the page has any weight or heft. (A quote attributed to Gustav Mahler: "If you think you are boring your audience, go slower, not faster." R1 should listen.)
6) no variety of tone or texture. The way the story begins is the way it ends. Every character sounds the same—either they have the overwritten "funny" tone of a Marvel sidekick, implausibly wisecracking and quipping like professional comedians, or they're blank ciphers saying stuff to advance the plot.
7) repetitive writing. Once you've seen about ten R1 samples you can recognize its style on sight. The way it italicises the last word of a sentence. Its endless "not thing x, but thing y" parallelisms (I'm surprised there are none in your samples, normally it churns out 1-3 per paragraph). The way how, if you don't like a story, it's almost pointless reprompting it: you just get the same stuff again, smeared around your plate a bit.
...and R1 is THE BEST THERE IS! At least I can finish its stories. GPT3.5/GPT4's output is hellish torture to read: it makes me wish I could unevolve eyes and obtain gills or a cloaca or some other less painful organ. And nearly every LLM is trained on synthetic ChatGPT data, so get ready to have a mischievous twinkle in your eye and feel sense of foreboding shiver down your spine as you venture through the Whispering Woods with Elara and friends. Over and over.
I have a suspicion that OpenAI's new model is simply a re-implementation of R1's post-training formula. Deepseek published their methodology: it wouldn't be that hard to rip off (and even scale up, using OA's resources.) Interestingly, when I put sama's prompt into R1, it output nearly the same story—an LLM writing about an LLM writing...
Great comment. I wrote up some related thoughts here, including a sort of mini-taxonomy of "R1-isms" and notes on how the new OpenAI sample exhibits them as well: https://nostalgebraist.tumblr.com/post/778041178124926976/hydrogen-jukeboxes
Thanks, but you kinda made my comment irrelevant: you got way closer to the metal of what's happening than I did. Yes: eyeball kicks. That's a great way to describe it.
I regard writing (mostly) as a way of transmitting the writer's thoughts. Words and sentences are just boxes for ideas: they have little value in themselves. Yeah, I love cool alliterative prose as much as anyone, but mainly as a quality signal for deeper stuff. "This writer has aesthetic taste and technical skill, so maybe it's worth wading through 5000 words or however long they need to make their actual point." Expensive-looking boxes tend to have cool things inside them.
So it's a shock (and a challenge to assumptions) when that isn't true; when a thing possesses strong stylistic skills, but weak/nonexistent ideas.* Try writing out r1's plots in plain language. Without its eyeball kicks, they evaporate to nearly to nothing (as does the story written by OA's new model). They are like gold-leafed oleander treasure chests full of dust and packing pellets.
@sama's story: "A woman talks to a LLM, she's sad because her husband died, then she stops talking to the LLM, also this is all a fiction created by the LLM, it acts like this is a shocking reveal even though it told us at the start, the end." Great.
As LLM capabilities increase, it grows harder to tell whether they're actually doing something, or just reward-hacking onto the APPEARANCE of doing it (because most humans can no longer tell the difference, and click the positive feedback button anyway).
R1 is slight progress. R2 will come out soon and might display further progress. But for now, I think it's mainly reward-hacking. It creates phrases ("I am what happens when you try to carve God from the wood of your own hunger") that are so evocative that meaning seems to smoke from them, and you're tricked into ignoring your confusion, and reading them as profound statements. Think about them for a few seconds ("is DeepSeek trying to build God? Is low perplexity at a language modeling task = carving from the wood of your own hunger?") and the illusion shatters. It's just an eyeball kick. It contains no thoughts. Even pre-LLM chatbots like RACTER can math their way to a vivid phrase sometimes.
(* Not as shocking as we might hope: there are humans who master style without substance—they're as creepy as R1, but they do exist. If you heard someone described as "a gifted rhetorician" or "a charismatic speaker" you'd tend to read that as a backhanded compliment.)
Your box analogy is great. To me, this is precisely the unsolvable problem of AI "art".
Stylistic hiccups and the gravity wells of cliché can be iteratively ironed out, perhaps even to the point where there really is no more discernable impression of the AI constantly trying to conceal that it is actually just a Chinese Room. But a piece of writing is more than its content — it is also, crucially, its context: Who is saying this, and to what end? What is a message without a sender's intent? As long as AI's output is prompt-based, the only intent behind its output will ever be to satisfy the prompter.
But art is interesting precisely because of the artist's intent to communicate something for its own sake. AI won't be able to do that until it can act independently of prompts, develop opinions and an intrinsic will to express itself — which may be impossible, simply by virtue of its artificiality: even an AI built to do this would have to be given guiding parameters and end up only be a reflection of its creator's intent, not a truly original thinker. And yes, maaayyybe there's an argument that the same goes for humans, but that's a whole 'nother C of Ws.
Anyway, it seems to me AI will only ever be able to express an interpretation of its prompter's intent, which will always be lossy and gravitate toward boilerplate, and thus its output will never be interesting as "art created by non-humans", but only as a sort of curio. However, I do think there is some merit in AI as a tool for writers who otherwise might otherwise struggle to put their thoughts into words.
Possible "R1-ism" in the Hemingway story: "[Main clause], [adjective] and [adjective]."
"He drank coffee at a stall, bitter and thick"
"The sun hung low over the sea, white and mean"
"He watched her cross the plaza, her shadow long and narrow"
"They had taken the train down from Bayonne in the dry summer, the fields cracked and ochre"
(The next two feel like part of the same mode-collapse basin, despite being split into two sentences.)
"He paid and walked down to the docks. The water was green and cold."
"He closed his eyes. The room smelled of blood and salt."
Claude advises me that these constructions are called "right-dislocated phrases".
To be fair, I'm not a big Hemingway Knower. Maybe this is legitimately how he wrote. From a quick look at Farewell to Arms, I found one construction like this ("In the bed of the river there were pebbles and boulders, dry and white in the sun") then nothing for the next dozen pages.
I personally did like it. And found this Wells' Murderbot Diaries & recent OpenAI short story combo review (https://lauraefron.substack.com/p/late-night-thoughts-on-alternate). Interesting to think of the two together
These are an even shorter example of writing with aboutness: https://x.com/h2ner/status/1860272810167534046
Glib take from a null noob: Higher temp is what you want.
Needless clarification? I'm less than a noob.
I couldn’t agree more. In my experience, LLM’s tend to abstract meaning onto observations in a way that’s vapid and reach for profundity that is empty at its core. The writing feels hollow and inanimate. It doesn’t have to be this way (and probably won’t always be this way) but it is striking.
(LLM’s also love to overuse participial phrases and right-branching appositives for some reason. It’s bleh.)
Yeah, I feel like a bit of a discourse orphan (though an orphan in good company!) because a lot of people claim categorically that AI art is bankrupt or low quality by its very nature, and a lot of people believe that the tidal wave of automation is mid-takeoff and fiction is already cooked. Whereas I... am cautiously optimistic that we'll get pretty good AI fiction fairly soon? But that we very much don't have it yet!
right — I would agree. AI fiction *can* be something that is quite interesting and maybe even something different from human-crafted writing. (though who knows! maybe it will be good, but bounded) it’s in a weird middle ground right now like you’ve pointed out