Book a demo

Fruit juice, fair use, and moving the legal goalposts

Copyright and fair use are under attack as AI businesses look to squeeze as much juice as they can from a currently forbidden fruit. It turns out all Adam & Eve needed was a better lawyer or lobbyist.

by Rob Corbidge

Published: 18:18, 24 October 2024

A man running through a vegetable field taking a shortcut to a demo of a publishing CMS for media, entertainment, and sports organisations.

When Microsoft CEO Satya Nadella mixes his broccoli and guava juice in the morning, which in my mind he does to prime himself for a tough meeting with Microsoft's Chief Licensing Compliance Officer to pore over the latest fines dished out for unlicensed Microsoft product usage, does he regard the smoothie-making process to be equally important to the ingredients used, or less important?

Or not important at all? Are the broccoli and guava even relevant to Nadella? Could they instead be anything blended with anything to his own particular recipe, and that it's the decision to conjure a new recipe literally from thin air that kicks him into gear best each day? His is a CEO after all, and decisions are his bread and/or butter.

Or, as an ex-engineer, maybe what really does the job to wake him up is getting to use the maximum wattage Full Power Mode in his blender (at the risk of waking anyone nearby), or the gentler rhythmic pulsing mode - for a less smooth smoothie, but he gets to feel like a DJ for a few seconds?

What if Nadella is making it for someone else? Do those people need to know the ingredients? And did he buy the broccoli and guava, or perhaps grow it himself? Perhaps they were a gift, and he intends to pay it back in kind at some point in the future.

I don't for one second believe he helped himself to the produce from the shelves or fields and ran off without paying. That would be awful, especially if he was selling the smoothie.

Transformed beyond recognition

One thing is certain. Whatever he ends up making will not look like a broccoli, or a guava, any more. They are transformed beyond recognition - but probably not beyond flavour or nutrition, or ownership, despite the best efforts of the machine to destroy its previous form and transform it into a non-vibrant goop.

Why are we musing on the philosophical underpinnings of the Microsoft CEO's morning smoothie? Well, it's because Nadella is the latest boss of an AI firm to question the idea of copyright. In an interview this week with The Times, Nadella took the opportunity to promote the idea of copyright laws being re-weighted away from creators and owners and towards businesses who wish to freely harvest data to train their LLMs. Businesses such as Microsoft, and its AI wunderkid OpenAI.

"What’s copyright?" Nadella asked of The Times. "If everything is just copyright then I shouldn’t be reading textbooks and learning because that would be copyright infringement."

Which is, if you don't mind me saying from the cheap seats, a rather stupid argument. Maybe it sounded good in the canteen to some interns.

What Nadella is holding up as an example relies on the human endeavour of learning, and the laudable idea of progress based on such learning, to make us believe that unless we all labour to generate knowledge for use in AIs and LLMs, then human progress will stall. He did all this without reference to Microsoft's share price, which I believe to be an important factor in his views.

I'm put in mind of the Communism meme popular a few seasons ago, in which some poor fool would declare something theirs, such as a cake, to be told by a giant Soviet bunny that it was in fact "our cake". It's all Microsoft's cake, and it's obtained by Text and Data Mining (TDM).

Copyright isn't theft

Such matters of copyright are of pressing importance right now. This week also saw the CEO of AI-search startup Perplexity say he wants to strike revenue deals with publishers for using their content, after the parent company of the Wall Street Journal and the New York Post, Dow Jones, began legal proceedings against his business for "allegedly misappropriating their content". Staring down the barrel of a well-funded legal action does wonders for clarifying the mind.

Yet many creators of original work have no such legal artillery to their fingertips, and rely more on existing national legislation to protect their endeavours.

In the UK, the previously settled position was that TDM is appropriate only for "non-commercial activities", meaning research. But that is under government scrutiny, as reported by the law firm A&O Shearman, with the current thinking being "that the government is set to consult on a new TDM scheme that will cover commercial activities but will allow content owners 'to opt out' i.e. expressly reserve their rights in certain works".

Opting out puts all the onus on the creators and publishers of original content to marshal anyone "reading" it, and engage with each one separately if they spot a bot or methodical scanning which they wish to reject. FYI, setting up a new bot is not difficult to do, while discovering the correspondence details of a bot is considerably harder than discovering the address of a minister to send them an invite to an event.

It's a bit like opting out of being burgled by putting a notice on your front door that is addressed to the burglar by name, which is legally ignorable by anyone else.

Given the observed behaviour of companies looking to TDM all they can, making the default position that a publisher needs to opt in to share text and data is a far less dangerous position than a default free-for-all.

Unfair use

If TDM comes under "fair use" - reminder again that this is a US term, which does not travel well - why have so many AI businesses signed deals for content use?

As US researcher Suchir Balaji outlines in his excellent summary of the current situation, "given the existence of a data licensing market, training on copyrighted data without a similar licensing agreement is also a type of market harm, because it deprives the copyright holder of a source of revenue".

The fact is, this is a commercial situation, not a technological one. If we are to value the output of LLMs, then we want the data they are trained on to be good data. The good data they need is the result of human effort, and that effort must be rewarded.

The Fear of Missing Out card is being played hard by the AI sector, causing some within to ignore a basic commercial rule that they themselves will not forget when it comes to sending out invoices: you don't give anything away unless there's some benefit to you.

What is accepted as the first true copyright law, the Statute of Anne of 1709, started with the words: "Whereas Printers, Booksellers, and other Persons, have of late frequently taken the Liberty of Printing ... Books, and other Writings, without the Consent of the Authors ... to their very great Detriment, and too often to the Ruin of them and their Families."

It wouldn't be that bad if we could all agree that lots of things LLMs had come up with were good and original ideas. I've seen lots of bad ones, but even they weren't new per se.

When that starts, maybe I'll shut up and drink my unspecific transformed ingredient smoothie.

Rob Corbidge • Head of Content Intelligence

Rob Corbidge is Head of Content Intelligence at Glide Publishing Platform, applying the latest knowledge about advances and ideas in the publishing industry to our own product and helping clients get the most from their content.