AI in Publishing: From IP Fear to Growth Plan

Walk into almost any publisher's strategy meeting over the past two years and the first word out of someone's mouth about artificial intelligence has been a legal one. Scraping. Licensing. Litigation. The conversation tends to start defensive and stay there, which is understandable enough. A publishing house's corpus, its archive, its authors, its back catalogue, is the asset, and watching that asset get hoovered into training data without payment feels like a robbery in progress. But there is a cost to spending all your strategic energy on the locks while the building quietly changes shape around you.

The industry that builds and sells AI has already moved past the question publishers remain stuck on. It is not really arguing about whether to use the technology. It is arguing about how much of it to use, and how to keep the bill from running away. That shift, from anxiety to arithmetic, is the one publishing has not fully had yet.

What the spending data is starting to show

Consider Databricks, the data analytics company that has become an unlikely barometer for how businesses actually consume AI. The firm told analysts at its summit in San Francisco in June that annualised revenue had climbed more than 80% year over year to $6.9 billion, up from $5.4 billion in its fiscal fourth quarter, according to CNBC. Its private valuation sits at $134 billion, comfortably ahead of the publicly traded Snowflake's roughly $83 billion market cap. By most measures this is a company winning the AI boom.

And yet its margins are shrinking, which is the part worth dwelling on. Databricks charges by consumption. As its customers deploy fleets of AI agents to clean data and answer questions, those agents generate enormous volumes of queries, which forces Databricks to spend more on the underlying models it pays for. Chief executive Ali Ghodsi was candid that gross margin will keep heading down, though he declined to name the current figure. The agents, he said, simply push consumption up across the board.

Why should any of this matter to a magazine group or a trade-book imprint? Because it reveals the real economics underneath the hype, and those economics are not the ones publishers have been preparing for. The fear in publishing has been about ownership: who controls the words. The operating reality at companies already running AI at scale is about metering: who controls the spend. Ghodsi described a behavioural turn among large customers, away from what he called tokenmaxxing, the early enthusiasm for using as many AI tokens as possible, toward what he termed value-maxxing, squeezing efficiency out of every dollar. Databricks even sells a tool that pings users as they approach their budget ceiling. Companies, in other words, are no longer asking whether AI is powerful. They have decided it is. They are now asking whether it is affordable, and rationing accordingly.

That is the conversation publishing needs to import. Not because the IP fight does not matter (it does, and the licensing deals struck between news organisations and model developers prove there is real money in the archive) but because treating AI purely as a thing to be protected against leaves you with no plan for the half of the equation where the value actually accrues. Ghodsi's customers want frontier models for the hard problems and cheap open-source ones for the dull tasks. They want choice. A publisher that has not worked out which of its own internal jobs deserve the expensive model and which deserve the cheap one will find out the hard way, the way Databricks' margin line is finding out now, that consumption compounds quietly until it doesn't.

The licensing money is real, but it is not a strategy

Give publishers credit. Several have done well out of the IP anxiety by converting it into cash. The major content-licensing agreements signed with AI developers brought in upfront sums and recurring payments, and for archives that were otherwise sitting in a database earning nothing, that is found money. The reasoning is sound: if the material is going to be used to train models regardless, better to be paid and seated at the table than scraped and ignored.

The trouble is that licensing is a sale of the past. It monetises what has already been written. It does very little to change how the next thing gets made, edited, packaged, translated or sold. A house that signs a fat licensing cheque and then carries on running its newsroom or its editorial pipeline exactly as before has banked a one-off and missed the structural shift. The Databricks story is a reminder that the recurring economics of AI live on the consumption side, in the doing, not in the one-time licensing of the done.

There is a quieter risk, too, in leaning too hard on licensing revenue. It positions the publisher as a supplier of raw material to someone else's machine. That is a fine business to be in if you are comfortable being a commodity input. Most publishers, when you press them, are not. They believe the value sits in editorial judgement, in curation, in the relationship with a reader who trusts the masthead. None of that gets stronger by selling the archive and standing back.

Where the growth actually hides

So where is the growth case? In the unglamorous middle of the operation, the parts nobody writes press releases about.

Take metadata and discovery, the deeply tedious work of tagging, categorising and describing content so that it can be found, recommended and resold. Done by hand it is expensive, and so it is routinely skipped, which is why so much archive material is effectively invisible. This is precisely the kind of mundane task Ghodsi said large companies now want to run on cheap, simple models. A book publisher with thirty years of backlist and no proper metadata is sitting on inventory it cannot even locate. Fixing that is not a moonshot. It is plumbing, and the plumbing is now affordable.

Translation and format adaptation is another. The cost of producing a serviceable draft translation, an audio edition, or a regionally adapted version of a text has dropped to a level that makes long-tail rights exploitation viable where it never was before. The frontier model handles the literary novel that needs a human's careful eye. The open-source model handles the technical manual that just needs to be intelligible in nine markets. Choosing correctly between those two, as Ghodsi's customers are learning, is the whole game.

Then there is the reader-facing layer, the search-and-ask tools that let someone interrogate a body of content directly. Databricks sells exactly this: products that answer business users' questions from their own corporate data and let developers build custom applications on top. The publishing equivalent (let a subscriber query a magazine's entire archive, or a textbook buyer ask a course's material a direct question) turns a static catalogue into something interactive, and interactivity is what people pay subscriptions for. But, and this is the part the Databricks numbers hammer home, every one of those queries costs money on the back end. Build it without a metering plan and you have built a feature that grows more expensive the more your best customers use it. That is not a hypothetical. It is the literal mechanism eating Databricks' margins right now.

This is the maturity publishing has to reach: holding the defensive and the offensive in the same hand. Protect the IP, certainly, and charge for it where you can. But also treat AI as an operating cost line that needs a budget, a metering discipline, and a clear-eyed view of which tasks deserve the smart expensive model and which deserve the dumb cheap one. The companies furthest along have stopped asking permission of the technology and started managing it like any other input, with all the boring rigour that implies. Publishing, by and large, has not got there. It is still litigating the front door.

What to watch over the next year

The tell will be in how publishers describe AI internally. As long as it lives in the legal department and the conversation is about exposure, the industry is still playing defence. The moment it shows up in the operations budget with a line item, a usage cap and someone accountable for return on the spend, the conversation will have matured into the one Databricks' customers are already having.

Ghodsi made a point in his CNBC interview that lingers: companies want the choice of which model to use for which job, and they are demanding it loudly. Publishers have spent two years insisting on their right to be paid for their words. The harder, more valuable demand to make now is the one about choice and control over their own use of the technology. Watch which houses start making it. The licensing cheques have, mostly, already been written. The next round of winners will be decided by who learned to run the machine cheaply, not by who fought hardest to keep it out of the building.

What the spending data is starting to show

The licensing money is real, but it is not a strategy

Where the growth actually hides

What to watch over the next year

When Automation Augments vs. Displaces Workers

What the Booker Prize Reveals About Literary Translation

Building Codes as Fossilised Disasters

Why Some Engineering Failures Become Legendary Lessons