A recent landmark ruling by a California court has delivered legal clarity on a defining question in the era of generative AI: can large language models (LLMs) be trained on copyrighted works without the rights holder’s consent?

In Bartz v Anthropic, a United States federal judge in the Northern District of California held that training an AI model on lawfully acquired copyrighted books constituted “fair use” under US copyright law.
For jurisdictions like South Africa, where the copyright law regime remains under review, the judgment provides timely comparative insight. With the Copyright Amendment Bill (B-13 2017) close to finalisation, this case offers a valuable reference point for lawmakers, practitioners and technology businesses navigating the uncertain interface between innovation and intellectual property.
Understanding Bartz
The plaintiffs in Bartz were a group of authors who alleged that Anthropic, the developer of the Claude large language model (LLM), had infringed their copyrights by using their books as part of its AI training data.
According to the complaint, Anthropic had sourced books in two ways: first, by downloading millions of titles from pirate websites; and second, by lawfully acquiring physical copies of copyrighted books, scanning and storing them digitally in a central library for training purposes.
The court drew a key distinction between these two data sources. It held that works obtained through piracy could not qualify for fair use protection, but the use of lawfully acquired books (even if scanned and digitised) could be permissible, provided the use met the standard of “transformative” use under US fair use doctrine.
The court reasoned that:
“The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.”
The court also rejected the authors’ argument that AI training would result in a flood of infringing outputs that compete with their works.
It likened the concern to a broader fear of technological progress, stating:
“Authors contend generically that training LLMs will result in an explosion of works competing with their works… but Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works.”
What the court did not decide
The plaintiffs did not allege that the LLM’s outputs infringed their copyrights, so the question of output liability was not before the court. The ruling is therefore limited to the legality of using copyrighted materials as training data — not to what the model ultimately produces.
While the court was willing to accept that the training process, in isolation, was fair use, it left open the possibility that model outputs, depending on their content, could give rise to infringement claims in future litigation.
South African law
Under South Africa’s current Copyright Act No. 98 of 1978, the answer would almost certainly be different. SA law follows a closed-list “fair dealing” approach, which allows limited, purpose-specific exceptions, such as for private study, criticism, review, or news reporting.
Training an AI system on copyrighted materials would likely not qualify under any of these exceptions, meaning that such training would likely infringe the copyright in such materials.
However, the Copyright Amendment Bill, in its current form, proposes the introduction of a US-style fair use clause. The proposed section 12A would allow unauthorised uses of copyrighted works, provided they are fair, based on a four-factor analysis: the purpose and character of the use, the nature of the work, the amount used, and the effect on the market.
The framing of fair use under the Bill is largely aligned with the US fair use definition, allowing for a far broader and more flexible allowance for justifiable use of copyrighted materials outside of a formal licence.
If adopted, the fair use clause could allow South African AI developers and startups to train models on copyrighted datasets in circumstances similar to Anthropic, particularly where the use is transformative and does not compete directly with the original works.
Implications for tech developers
While the Bartz ruling is not binding in SA, it is highly relevant for local companies exploring generative AI. The increasing availability of powerful open-source LLMs, such as Meta’s LLaMA and Mistral, is levelling the global playing field.
This presents a unique opportunity for SA businesses, including startups, to fine-tune or build upon these models using locally relevant data.
However, doing so raises complex copyright considerations, particularly around the legality of training data.
The Bartz judgment, and the potential adoption of a fair use provision under SA’s Copyright Amendment Bill, could pave the way for local developers to responsibly train and deploy AI systems.
After the Bartz ruling, three key considerations emerge for SA technology companies:
1. Lawful data sourcing
Under the current Copyright Act, even the use of lawfully acquired copyrighted content for AI training may fall outside the scope of permissible “fair dealing” exceptions.
Until local courts confirm a fair use defence in similar circumstances, developers should avoid broad or indiscriminate web scraping, particularly where the copyright status of material is ambiguous or the content may have been uploaded without permission.
As a general rule, AI developers should focus on building training datasets using:
- content in the public domain (eg. government publications, works where copyright has expired);
- content that is openly licensed (such as via Creative Commons or public datasets);
- non-copyrightable information (like data points, mathematical formulae, or legal citations); and
- proprietary content only where they have secured appropriate licences.
Should the Copyright Amendment Bill be enacted, and SA courts adopt a similar stance to Bartz, there may be scope to use lawfully acquired copyrighted works in training, such as subscription-based academic articles, or purchased eBooks, provided that the use is fair, transformative, and does not displace the original market and importantly, that the output is not substantially similar to the original work.
A case-by-case assessment will be required. Until then, legal input and clear data governance protocols are essential to mitigate infringement risk.
2. Output risk
Even if training is lawful under a fair use standard, outputs that reproduce protected works could still infringe copyright. Developers should implement guardrails to detect and limit substantial reproductions in model outputs, and legal teams should monitor emerging jurisprudence on this issue.
3. Contracts
Businesses offering or deploying generative AI tools should revisit their customer contracts and internal policies. This includes clarifying permitted training datasets, IP ownership of outputs, limiting liability for generated content, and addressing third-party rights in both training and deployment phases.
Looking ahead
As the court in Bartz observed, ”The technology at issue was among the most transformative many of us will see in our lifetimes.”
The ruling signals a shift in how courts may approach the balance between copyright enforcement and technological advancement.
In this evolving landscape, SA businesses engaging with generative AI should not wait for legal certainty. Now is the time to strengthen data governance, revisit licensing strategies, cater for AI risks in contracts and build internal policies that anticipate both legal risks and commercial opportunities.
As the legislative framework takes shape, those who anticipate its direction, and align their strategies accordingly, will be best positioned to thrive in a market increasingly shaped by AI.