logo, company name

“I get hung up on the word scraping,” author R.O. Kwon says. “It sounds quite violent.” Last September, when Kwon learned that her first novel, The Incendiaries, was part of the Books3 dataset that some generative AI models were trained on at the time, she felt violated. She and other authors took to social media, lobbing anger, hurt, and frustration at the tech companies that had secretly “scraped” the Internet for data without consent from or compensation for creators. Kwon’s novels and others were poured into machine learning models, teaching them how to make “new” content based on patterns in the ingested text. (It’s this “generating” that makes generative AI distinct from other types of models that may only identify patterns or make calculations.) The years of work on those books added up: 10 years for one novel, 20 for a memoir, multiplied by the nearly 200,000 books found in the dataset.

“It’s potentially the biggest rip-off in creative history,” says Douglas Preston, a best-selling author and one of the plaintiffs in the class-action lawsuit filed after the initial outrage. In September 2023, 17 authors partnered with the Authors Guild, the oldest and largest professional organization for writers, to file a lawsuit alleging that Microsoft and ChatGPT creator OpenAI violated copyright law by using books to feed their generative AI models. OpenAI and Microsoft, for their part, deny allegations that they infringed any copyrights. The tech companies claim that training their models on copyrighted content is equivalent to a person reading books to improve their own writing. The future of books—and perhaps of creative industries as a whole in the United States—may come down to one judge’s definition of “fair use.” Words and who gets to use them are serious business.

But an ecosystem around text-based generative AI had evolved well before The Atlantic revealed the contents of key datasets. Large language models (LLMs) have been in development since 2017, and OpenAI’s GPT-3, the model that introduced generative AI to the mainstream, hit the world back in 2020. Now, tools, workflows, companies, industry standards, and, of course, grifts are in full operation, already shifting the way some books are written, published, and read. The technology has clicked right into the publishing industry’s recent trend toward efficiency, consolidation, and reader service—and seemingly away from sustainability of human labor. But some believe that generative AI could offer a path forward for writers at a time when it’s harder than ever to make a living through books. It all depends on the meaning of a few words.


Throughout its history, the publishing industry has always needed a boogeyman to represent new developments threatening the good old way of doing things.

“Barnes & Noble was that for a while because it was a chain and because they had centralized bookselling,” says Boris Kachka, author of Hothouse: The Art of Survival and the Survival of Art at America’s Most Celebrated Publishing House. “Then Amazon became the big bad guy, and Barnes & Noble looked old-school all of a sudden.”

text

Next it was consolidation with the thwarted Simon & Schuster and Penguin Random House merger. Now generative AI is the monster under the bed. Each character brings genuine challenges to the industry—and some benefits—but they’re all variations of the same villain: efficiency at the expense of competitive choice and artistic risk. Chain bookstores cut out local booksellers, Amazon cut out physical booksellers, mergers cut out publishing options, and now AI threatens to cut out the slow, careful processes of creation and curation that drew so many to publishing in the first place. But what is genuinely different about generative AI, and may determine its unique power in the story of publishing, is just how fast it moves—and how hard the law has to work to catch up.

“This kind of AI that can reproduce content similar to what it ingested poses an existential threat to the writing profession and to the publishing industry—if it’s unchecked,” says Mary Rasenberger, CEO of the Authors Guild. “A lot of people who aren’t in the industry don’t realize how precarious the profession is and how uneven the business is.”

A recent survey released by the Authors Guild found that median author income in 2022 was just $20,000, with only half of that derived from book sales. Rasenberger says 2023 looks even worse. Perhaps what’s most concerning about these numbers, though, is that they haven’t shifted meaningfully with inflation or cost-of-living increases. It’s always been tough to make a living as a writer, but today it is quantitatively much tougher.

Meanwhile, putting a book out into the world is far easier than it’s ever been. Amazon’s Kindle Direct Publishing platform has supported the growth of a thriving self-publishing industry, where individual writers serve as author, editor, publisher, marketer, and publicist all on their own—with KDP as the distribution channel. In 2023, an estimated 2.5 million books were self-published in the United States alone. Traditionally published books are harder to track, but between 500,000 and 1 million were launched last year—a huge number in its own right. It’s also never been faster or easier for a reader to get a book; with so many to choose from across so many media formats, readers can read what they want when they want and how.

But as with myriad other products sold online, scams are hiding among the infinite rows of options. Fake books and low-quality knock-offs have long been an issue on Amazon; in 2019, the company issued a statement saying, “Amazon strictly prohibits the sale of counterfeit products. We invest heavily in prevention and take proactive steps to drive counterfeits in our stores to zero.” But the problem persists, and now generative AI may be accelerating it at a mind-boggling rate. In the fall of 2023, Amazon instituted a publishing limit for authors on KDP in response to a rising tide of AI-generated content. Whereas before, authors could publish as many titles to the platform as they wanted, they now have a daily (not monthly or yearly) limit of three. In other words, is Amazon saying three books per day is a normal publishing rate for human authors? (Kwon says she wrote her first book in 10 years and her second, Exhibit, in nine. Even Colleen Hoover, a runaway self-publishing success story, took a whole year to put out her first three books.)

text

“If you start inundating the market with these AI-generated books, it’s going to be that much harder for publishers to invest in authors,” says Rasenberger. “Particularly if you allow AI to generate books in the style of John Grisham or George R.R. Martin or Elin Hilderbrand and actually steal sales from those authors.”

This is exactly the trend in generative AI: versions of a model tuned to a specific voice. For readers, it might bring relief to finally read Martin’s last Game of Thrones installment (even if he didn’t write it himself) or a new fairy smut story featuring their favorite protagonist. For scammers, it’s the killer feature of a lock-picking tool.

Jane Friedman, an author, industry guru, and founder of the popular publishing industry newsletter The Hot Sheet, already knows firsthand what it looks like to be imitated online. She’s written extensively about her experience encountering AI-generated books published under her name with titles, cover designs, and content eerily similar to her own. When she reached out to Amazon to complain, she says, the company initially said there was nothing it could do, as she hadn’t trademarked her own name. After she made enough noise in national publications, Amazon finally took down the titles. Apart from the new publishing limit, the company still doesn’t have a process in place for fighting the flood.

The scale of AI-generated content is so overwhelming in part because it’s so cheap to create. That’s because no one is paying for one of the highest-quality sources of training data: books. Investors and leaders in AI know that shelling out for this content would break their business; some have acknowledged as much. In a response to the U.S. Copyright Office, the venture capital firm Andreessen Horowitz wrote, “Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development.” Documents filed in the Authors Guild lawsuit against OpenAI show that two training sets of books (containing some 50 billion words, by the company’s own count) were deleted prior to the lawsuit. The elephant in the boardroom is that generative AI is booming in large part because some critical raw materials have been free—or, by another definition, stolen. “A lot of us would never have consented to such a thing,” Kwon says of contributing her life’s work to AI training. “And now you’re saying you can’t afford it? That means this isn’t a workable model.”


The point of the lawsuit against OpenAI and Microsoft (and a handful of other similar suits) is to secure compensation for the use of the books already ingested as well as for future contributions—essentially forcing the tech companies to include the cost of labor in their business model. For ongoing tracking and payment, which some AI leaders have claimed would be impossible to manage logistically, the Authors Guild proposes a licensing system modeled after the music licensing systems used by the American Society of Composers, Authors, and Publishers (ASCAP) and Broadcast Music Inc. (BMI). But the details of its execution and funding are still fuzzy, and adding a new compensation structure to an industry in which artists have been struggling against their own technological boogeymen is not a smooth sell.

Tech companies have also been striking deals with news outlets to use articles in AI training, but a partnership like that would be with publishers, not writers themselves. “There are some models to follow, but nobody likes those models,” says literary agent Carly Watters. “What do we actually think is the appropriate amount of money authors should be paid, and who gets to decide? If these written words are creating the future of the Internet, [writers] are going to be paid what, two dollars?”

Watters’s concern is based on real events from just last year: It was reported that recordings from internal meetings at Meta included discussions about paying authors just $10 per book for full licensing rights, or simply buying Simon & Schuster outright for the training data. OpenAI has announced its own forthcoming platform to track content going into its models, called Media Manager, which authors and artists could use to manage their contributions to training and tuning. But OpenAI’s proposal offers an opt-out option rather than opt-in—the data would be used by default. And although the company says it’s committed to working in partnership with creators, its statement about Media Manager did not mention compensation. Plus, the tool won’t be available until 2025, leaving out the huge datasets already utilized to train most AI models—which by now have been used to build other tools and businesses.


Indeed, the generative AI genie has been out of the bottle for a while now, and more than a few companies leveraging the technology have emerged in the writing and publishing space. Some founders see generative AI as a tool for writers to get feedback and find opportunities in an industry that has historically been brutally hard to break into, asking: Is the traditional way of doing things really so great? Judging by the numbers, plenty of writers want to roll the dice.

In 2013, Ali Albazaz started Inkitt as a platform for sharing writing with friends. By 2019, when the company launched Galatea, an app featuring the most popular fiction from Inkitt, the platform had amassed more than 100,000 authors and a million users. Albazaz and his team used old-fashioned AI—the nongenerative kind—to examine user behavior on Inkitt and identify the most engaging content. They then published those novels on Galatea for a wider audience, breaking them down into chapters that readers purchase individually. Industry veteran Jane Friedman (a different Jane Friedman, not the founder of The Hot Sheet quoted above), who helped usher in the audiobook era as president and CEO of HarperCollins from 1997 to 2008, joined Inkitt’s board in 2021 after serving as an advisor. Now brand-name venture capital firms, including Kleiner Perkins, NEA, and Khosla Ventures, have invested over a hundred million dollars in the company that Albazaz describes as “the Disney of the 21st century.”

text

Albazaz tells me that every three to four weeks, Inkitt has a new hit on its hands. (He defines a hit as a novel that generates a million dollars in sales.) “When you talk to old-school publishers, they’re like, ‘Holy shit, how are you guys doing that?’” he says. Galatea is currently focused on romance and fantasy novels, with plans to expand into the other genres that Inkitt supports. (When I asked about literary fiction, Albazaz told me that because it constitutes such a small percentage of the total fiction market—2 percent in the U.S., according to Circana—literary fiction likely won’t be a priority anytime soon. “We’re running a business here,” he says.) Galatea seems to have riffed on the self-publishing boom and turned the dial to hyperspeed.

With generative AI, Galatea can offer audio versions of existing stories in multiple voices, but perhaps the biggest leap is a planned new feature allowing readers to change published fiction to suit their own tastes. “Readers can hyper-personalize stories now,” Albazaz explains. “AI is writing the new text, and the original author is participating in royalties. So if you take one of our top books, you could change the name of the characters, make it simpler or more complex, or add some plotlines—like fan fiction.” For a long time, this kind of reader service has thrived in the gift economy of Internet fandom, where a fan writes a story adaptation or extension based on their preferences or requests from the community—for free. The authors of the original texts aren’t usually compensated, and in fact, some have taken legal action against fan writers for infringement.

But can these platforms offer authors a stable living? If writers choose to charge the Inkitt community to read their work as part of the “author subscription program,” they can keep any profit. (Monthly subscriptions generally range from $1 to $10.) If an author’s book makes it to Galatea, they can then earn royalties: 6 percent on traffic driven by the company and 70 percent on traffic “driven by the authors” (meaning the author does their own marketing and publicity)—in a way that is clearly attributable, which can be challenging. Anonymized documents provided by Albazaz show that “top” authors make variable amounts, including about $36,000 over three months and even $115,000 in one month. “We believe in AI, but we also believe in the power of all these authors,” says Albazaz. “Everything we do has to be aligned with our mission to help authors. It’s a tough thing to do.”

James Yu and Amit Gupta were writers themselves when they began experimenting with GPT-3 in 2020. (They even named their company, Sudowrite, after the writing group they shared.) While many startups were assessing how generative AI could work in corporate contexts, Yu wondered if it could help writers like him get feedback when they needed it—which can be hard to come by at 2:00 A.M. before a morning deadline or after your accountability buddy has read your tenth draft.

Now Sudowrite has about 15,000 paid users who employ the tool to help them brainstorm, draft, and revise their writing—mainly novels, Yu says. Like Inkitt, Sudowrite has found its customers to be most interested in genre fiction, but Yu says all types (including literary fiction) are represented. Right now, the company uses a wide range of models—over two dozen, including foundational ones, like the latest version of GPT, and open-source models, which are less strict about the racy language that romance writers need. The most requested feature, and one that Yu and his team are trialing right now, is a model precisely tuned to a writer’s voice—not Stephen King’s or John Grisham’s, but a customer’s own. Writers can add their own books in Sudowrite to tune the models for more aligned feedback.

Yu says they are very careful with this data. “We have never trained on any of our writers,” he says. “That work is sacrosanct to us, and if we were ever to do that, we would ask for explicit consent first.” But Sudowrite can only incorporate available models into its product, and the majority of those models are based on the original Internet-scraped datasets—although since GPT-4, OpenAI has been far less transparent about how its models are trained and tuned. (We have no verifiable information on the datasets that GPT-4 and beyond use.) “If there was a model out there that had the total consent [of content creators], we’d jump to use it,” Yu says. (There are a few open-source models trained exclusively on publicly available data and data shared with creator consent, but they aren’t as comprehensive or useful as the more popular ones.)


Currently, works created with generative AI—like Inkitt’s choose-your-own adventures and some novels by Sudowrite customers—exist in an odd legal gray area. “AI-generated” work is not copyrightable, but “AI-assisted” work can be copyrighted to an extent, a boundary nudged forward by a writer with disabilities who uses generative AI as an accessibility tool. The distinction between generated and assisted is hardly black-and-white, and it shifts based on who’s defining the words. Since late last year, Amazon has required authors to disclose if their book is “AI-generated”—although that information is not shared on the site with readers. But Amazon doesn’t require disclosure if a book is “AI-assisted,” which the company defines as leveraging AI-based tools after the initial writing. (A Senate bill that would require public labeling of all AI content was introduced last summer but hasn’t progressed through Congress.) It’s also impossible to track sales of AI-assisted or AI-generated books from traditional publishers because those publishers don’t currently include those attributes in their own data. But Yu predicts that AI assistance will soon become for writers what Photoshop is for visual artists: an accepted tool of the trade.

Traditional publishers have made their own public statements about generative AI’s future; lately, the bigger publishing houses appear to be keeping their options open and their definitions loose. Nihar Malaviya, CEO of Penguin Random House (the biggest publisher in the United States), said he hopes the technology will make it easier to publish more books without hiring more workers. The company’s existing workers are none too happy about it; current Penguin Random House employees recently leaked that it’s using AI internally to boost productivity on marketing copy and other written materials. Meanwhile, HarperCollins announced that it’s partnering with an AI studio to produce audiobook translations. (Neither company responded to Esquire’s requests for comment on their plans.) Simon & Schuster, recently acquired by the private equity firm KKR, is also using AI for marketing copy and looking toward AI-generated audio narration while remaining open-minded about other uses. In the UK, Pan Macmillan has conveyed enthusiasm for AI’s potential to help the organization move faster and produce more books. No company has yet expressed an interest in operationalizing generative AI to write books outright.


Kwon and other authors believe that readers won’t be fooled by generative AI and that AI boosters fundamentally misunderstand how art works and how writing happens. But the most pressing danger to writers isn’t that they’ll be replaced by AI. It’s that the already tight squeeze on authors will get even tighter as more content pours into the marketplace and as their creative work is used for training without payment. With the technology moving so quickly, regulation lagging far behind, and tight-lipped publishers acting in their own interests, authors may need to erect guardrails of their own, particularly because as freelancers they aren’t able to collectively bargain for better terms. “I don’t think authors fully understand the ramifications of generative AI,” says Watters, the literary agent. “I think they are confused and upset, but they don’t know what to do about those feelings.”

According to the Association of American Literary Agents, if a right—like the right to use a writer’s work to train or tune a model—is not explicitly licensed to a publisher, then it’s retained by the author. But is this enough protection? Both the Authors Guild and author Jane Friedman recommend that writers include clauses in their contracts that only allow the use of AI designers, AI audiobook actors, and AI translators with the author’s consent. However, there’s only so much an individual can do within publishing-industry power dynamics. Watters has been pushing AI clauses for her clients since last year, but, she says, “it’s met with resistance almost across the board. Publishers don’t want to have to define AI in a legal document.”

logo

Friedman isn’t surprised. “The bottom line for traditionally published authors is that you can ask your publisher all day long for ‘no AI use’ or ‘no AI ingestion,’” she says, “but they will be reluctant to make promises they don’t want to keep, especially when the future is so unknown, nor do they want to lose a potential competitive advantage or hobble earnings in some way. But I’m sure they will also be fighting these AI companies at the same time and trying to protect their IP for whatever it’s worth in future licensing agreements.”

In May, with encouragement from the Authors Guild and agents, Penguin Random House, Hachette, and Macmillan began offering optional boilerplate language around AI in author contracts, protecting work from being licensed by the publisher or through the publisher to third parties for model training. But in most cases, these clauses are only added by request from an author or agent, and in others, they can be removed during negotiations.

Kwon hasn’t considered requesting extra clauses, although she says she will when a new contract comes along. So will a handful of other writers I spoke with, including Vauhini Vara, an author and journalist who’s written about her own creative experiments with AI tools. Preston, who previously served as Authors Guild president, hasn’t changed his contracts yet either, although he predicts that AI clauses will one day become an industry standard.

The other impact that generative AI may have is on the types of books that make it into the world. After all, the environment in which books are bought and sold can change the kind of work that gets produced. When I asked Albazaz what he meant by the “best” books on Inkitt’s platform, he quickly replied, “Best performing.” Indeed, the tech industry and the world of writers have different definitions of what constitutes a “good” book, as anyone interacting with Amazon has learned. When so many books are available so quickly, the more challenging ones may not get the investment they need to survive—from publishers or readers. In this climate, it may be increasingly difficult for authors to afford to keep going on complex projects. Spending ten years on a single novel could become too expensive for a writer—if it’s not already—even if those words are one day compensated by AI companies. “The books that I fear losing are the kinds that make us think and understand each other,” says Rasenberger, the Authors Guild CEO. “With any more downward pressure on what authors earn, we’re just going to see more leave, and every time talented authors stop writing books—or write a lot less—we potentially lose great books.” Less investment in deep and time-consuming work impacts authors directly, but the downstream effect on readers and the world may be more intangible and insidious.

“There is a real risk of a diminishing of our culture, of what is available to us to read, which is sad to me,” says Vara. “There are all kinds of forces that are making the publishing industry more conservative in what it’s publishing, more reluctant to take bets. [AI] feels like part of the same movement.” This tough moment in publishing is not the first; the generative AI under the bed certainly can’t be blamed for all of the industry’s woes. But it can cast a scary shadow.

What happens next is not up to writers, agents, readers, or even publishers. The judges in the current copyright lawsuits will likely determine if AI companies keep moving forward at a breakneck pace or if there will be a reckoning that cracks the very foundation of their business models. Some, like Preston, believe it’ll be a slam dunk for authors, but many experts aren’t so sure. “Fair use” is an old and nuanced piece of writing, and its legal interpretation in these cases comes down to whether these models “transform” the work they’re trained on or just remix and regurgitate it—and whether those efforts are toward the broader “good” of society. These are not simple definitions. And as writers already understand well, the meaning of a few words can change the whole world.

Prop styling by Miako Katoh