AI’s Mounting Legal Woes

While lawyers everywhere are worried about being replaced by Artificial Intelligence (AI), it may be the very existence of the technology that saves their jobs. 

Currently, OpenAI, Microsoft and GitHub are being sued in a class action motion that alleges they are all promoting violations in US copyright law by allowing AI-assisted creators like Copilot, to utilize protected content without providing credit or royalties.

Two additional companies responsible for popular AI-utilizing art generation tools, Midjourney and Stability AI, are also in the crosshairs of a legal case that alleges they infringed on the rights of artists by training their technologies on material; primarily images, from the Internet.

And just a few days before, industry giant and stock image supplier Getty Images took Stability AI to court for reportedly using millions of images from its site without permission to train Stable Diffusion, an art-generating AI.

At issue, mainly, is generative AI’s tendency to replicate images, text and more — including copyrighted content — from the data that is used to train it. In a recent example, an AI tool used by CNET to write explanatory articles was found to have plagiarized articles written by actual humans — articles presumably swept up in its training dataset. Meanwhile, an academic study published last year found that image-generating AI models like OpenAI’s DALL-E 2 and Stable Diffusion can and do replicate aspects of images from their training data. Users also have to pay a fee to use Dall-E 2. 

Perhaps the suits matriculated in part due to the robust venture funding session the new technologies had last November, totaling around $1.3 billion. But the legal questions are beginning to affect perceptions which, in turn, affects business.

Content creators far-and-wide have complained that the new technologies threaten their income-earning potential because by typing in a prompt with the desired elements followed by a famous artists’ name, like a futuristic space cat or sweeping landscape from a Dark-Ages fantasy, will appear and look strikingly similar to works the named artist is famous for. 

Law experts and attorneys agree that the class action suit against Stability AI, Midjourney, and DeviantArt will be challenging to prove in court. Specifically, it seems quite difficult to ascertain which specific images were used to train the AI systems because of the immense amount of data used to train the new systems and the fundamental gulf between what constitutes “plagiarism” and “inspiration,” being that whatever the AI generates is unquestionably original. 

What may tip the scales of justice are the nuanced details in how these state-of-the-art image-generating systems do what they do. For example, AI tech like Stable Diffusion is what’s known as a “diffusion” model, which learns to create images from human-generated text prompts as it works its way through colossal training datasets. The models are trained to “re-create” images instead of generating them from scratch, starting with incomprehensible noise then refining the image and process over time to make it incrementally resemble the text prompt. An absolutely identical creation rarely, if ever, happens underlying the precedent in US courts that make elements like “style” difficult to use in order to protect any sort of copyright. 

What may determine how this ensuing battle is fought might be over the defining of three words: “fair-use” and “transformative.” 

Fair use should provide well-established protections if the technology is trained on licensed content, which seems impossible to prevent. The doctrine of “fair-use” is enshrined in U.S. copyright law and absolutely permits limited use of material without first having to obtain permission from the rightsholder. See Authors Guild v. Google.  

The word “transformative” is a bit more dodgy.  

Bloomberg Law recently outlined in a very informative article that the success of a fair-use defense will depend on whether the works generated by the AI are considered “transformative.” The Supreme Court, in its 2021 Google LLC v. Oracle America, Inc. decision, suggested that using collected data to create new works is transformative. In that specific case, Google used portions of Java SE code, copyrighted by Oracle, to create its Android operating system, which was determined to be fair use due to its “transformative” impact. 

Another key aspect in these cases is the copyright holders being able to prove damages. 

It also would seem that if the AI technology companies pivot to immediately creating copyright-risk-management frameworks, such as the AI Risk Management Framework released by National Institute of Standards and Technology, it would show good faith in attempting to monitor and mitigate copyright issues in the design and use of the AI systems they are creating. 

Lawyers are keen to realize that the cost of doing business in the AI sector could become prohibitive for non-Google/Microsoft companies if litigation gets out of hand and thwarts the creative process behind creating these technologies. 

Suppressing new technologies to protect archaic profit models is a bit of an American tradition and will be forced to be refined, for better or worse, by these new cases.