Why We Need a Learnright
Copyright law was never designed to govern learning. As machine learning externalizes learning itself (fast, scalable, transferable), a conceptual gap opens that existing doctrines cannot close. This essay explains why that gap exists and what kind of legal thinking the moment demands.
Copyright was built for copies. Learning is the quiet work the law never saw. Image: Midjourney
Editor's note
The CORPUS White Paper proposes a licensing and compensation framework for AI training that no longer rests on the logic of copying and use. It starts from the observation that existing copyright instruments struggle to account for value creation in generative AI systems.
This article steps further back. It asks why that logic is breaking down at all, and names what copyright never had a category for: learning itself. A similar gap has recently been identified by Frank Pasquale, Thomas W. Malone, and Andrew Ting, who propose a “Learnright”: a legal category for machine learning that copyright cannot absorb.
The Wrong Starting Question
In most current disputes around generative AI, the debate revolves around a single question: Was something copied? Copyright law has well-established tools for reproduction, distribution, and public communication, so the legal debate follows that path. Plaintiffs point to copies; defendants argue that no relevant copies exist.
The pattern repeats across jurisdictions: from the German GEMA v. OpenAI litigation to the UK proceedings between Getty Images and Stability AI, and across the policy frameworks being drafted to regulate AI training.
But this starts in the wrong place. Copying is what copyright was built to govern; machine learning is not, at its core, a copying problem.
Why “Copying” Once Worked
Copyright is in its operative core a law of economic exploitation, not an abstract theory of creativity. It protects works insofar as they are reproduced, distributed, or otherwise placed into markets.
Historically, a copy was a distinct object: detached from its source, independently usable, economically substitutable. Books, records, and later digital files mattered because copying and market participation largely coincided.
Learning never fit this logic. Humans have always learned from books, music, and images without license, because human learning did not constitute a market act. It was slow, embodied, non-transferable, non-scalable. It produced no reproducible work object and no substitute. Learning remained legally invisible because it was economically limited.
What Changes with Machine Learning
In this article, learning means the formation of generalized capabilities through exposure to information, not subjective experience or consciousness.
Machine learning breaks this implicit assumption because learning itself has been externalized, separated from the human carrier, fixed in a transferable system. Learning is no longer bound to individual bodies: it is fast, massively scalable, permanently stored, transferable, economically systemic.
For the first time, learning itself becomes economically consequential at scale — and copyright has no dedicated category for it.
Why Copyright Starts to Misfire
With no vocabulary for learning, legal and policy debates try to translate machine learning into familiar terms. Memorization is treated as fixation, model weights are framed as stored works, outputs are used retroactively to infer unlawful acts in training. It is a category error produced by the absence of an alternative framework.
Training does involve technical copying. Data is stored, loaded into memory, transferred, processed. But these copies are transient, technically necessary, not independently usable, and not economically substitutive. Copyright itself already recognizes this: in many jurisdictions, temporary copies without independent economic significance are excluded from copyright relevance.
The economic value of training lies in the statistical aggregation, transformation, and abstraction of data into a parametric system. The result is a system with behavioral capacities. Copying is a technical precondition, not the locus of value creation.
That does not mean copyright concerns disappear. When generative systems produce outputs substantially identical to protected works (near-verbatim lyrics, recognizable images, reconstructable texts), those concerns are legitimate. But the assessment belongs at the output level. Treating training itself as use shifts copyright into a domain it was never designed to regulate.
What a Learnright Would Do
The Learnright proposal names this rupture: it distinguishes machine learning from human learning and argues that the former can no longer rely on the latter’s historically implicit freedom.
A learnright would complement copyright. It would separate learning from use, training from exploitation, and recognize that value creation in machine learning is collective and non-isolable.
The central question shifts from which work was copied to the conditions under which externalized learning is permissible, and how the resulting value should be allocated.
Traditional compensation models cannot answer that question. They presume identifiable works, traceable uses, and direct exploitation. But generative AI relies on distributed statistical contributions, non-reversible influences, and system-level value creation. CORPUS is one infrastructural attempt to operate in the gap while the legal category it points toward remains to be built.
Conclusion
As long as learning is governed as if it were copying, courts will rely on fragile technical assumptions, developers on fragile legal ones, and copyright will be stretched beyond its conceptual limits.
The CORPUS White Paper proposes an infrastructure for navigating this tension. The notion of a Learnright names the gap itself. This article has aimed to explain why that gap exists at all.
Copyright does not need to be abolished; it needs a category it never had: one for learning itself, now externalized, scalable, and economically decisive.
Follow the CORPUS project as it takes shape.
Or subscribe via RSS feed — works in any reader.