Editor's note

The public debate about AI and music has consolidated around a single question: who gets paid when a machine generates a song. This article argues that the more consequential future of sound is forming in markets where the finished song is not the product at all. What those markets open up is substantial, and still largely outside the current AI music conversation.

Executive Summary

The debate around AI and music overlooks the markets where the song is not the point. In automotive, healthcare, rehabilitation, games, and immersive installations, what is needed is adaptive sound, systems that read a situation and respond continuously, in real time, on-device. These markets are regulated, context-specific, and machine-to-machine. Apps like Suno cannot serve them. The blocker is technical. Meanwhile, the public conversation has narrowed to a single question (who gets paid when a machine generates a song?), and the major-label settlements with Suno and Udio addressed a legal exposure while leaving the verification problem untouched: no one can audit what a trained model was actually trained on. These new markets require a new infrastructure. Key components include a semantic layer, a real-time generative system, and a verifiably licensed training corpus. None of it yet exists at scale. CORPUS is building it. Seen from farther back, this is not the end of music but the return of something older: sound that responds to the situation it is in, contextual and unrepeatable — the condition music had for most of its history, before recording froze it into product.

A driver has been on the road for four hours. The car is electric, its cabin uncannily quiet — no combustion note to mark acceleration, no mechanical feedback for a long stretch at speed. Night, rain, a complex junction approaching. The vehicle reads his pulse through the steering wheel, tracks his eye movement, cross-references speed and weather and time of day. A low harmonic field, tuned to the manufacturer’s sonic identity, has been holding the cabin together for the past hour; as he approaches the junction it tightens — brighter, more articulate, pulling his attention forward without asking for it. No one pressed play; the response comes from the situation itself. The same car, the next morning in traffic, sounds nothing like this.

A player is deep in a narrative game, approaching a choice the story will remember. The score reads how she moves: how long she hesitates, where her camera rests, what she has done in the hour before. When she commits, the resolution is composed live, bounded by what the scene demands and the musical world the composer built for it. No two playthroughs sound the same.

A woman on a dementia ward becomes restless in the late afternoon. A small device on her bedside table begins to play something familiar, chosen from what it knows about her: regional songs from the 1950s, the music of her youth. It triggers a memory. She starts to sing along, fragmentary, drifting in and out of the melody. Now the device must follow, adjusting to her tempo, holding the harmonic thread when she loses it, softening when she goes quiet, gently carrying the melody forward until she finds her way back in. It is not playing a recording. It is accompanying her, in real time, through a moment that will not repeat in the same way tomorrow.

None of these scenarios are invented. Each one extends what trained professionals already do, whether sound designers shaping in-car environments, composers writing adaptive scores for games, or music therapists reading a patient and adjusting in real time. The practice is real. What does not yet exist is the technology to do it continuously, at scale.


Where practice cannot reach

A flagship AAA game can afford a full adaptive-audio team; the niche title, however interesting, cannot, and falls back on fixed loops. For a car, the limit is not budget but combinatorics: no studio, however well-funded, can pre-compose a response to every permutation of driver state, traffic, weather, time of day, and biometric reading — any pre-composed system covers a slice of the possible situations and falls silent everywhere else. For a dementia ward, the limit is time.

In Germany, 1.8 million people live with dementia, and projections put the care-worker deficit at around 500,000 unfilled nursing positions by 2030. What actually helps through the difficult hours is often simple: someone present, singing along, holding the thread when the patient loses it. The evidence for music in dementia care is strong enough that clinical guidelines recommend non-pharmacological approaches first for agitation, before psychotropic medication. It does not take a specialist therapist to sit and sing with someone; it takes time. And time is what a short-staffed ward does not have.

An empty modern chair in a sunlit clinical room, no one sitting — representing the absence of care workers in understaffed facilities
By 2030, Germany is projected to be short around 500,000 nursing positions. Image: Midjourney

What these scenes share, despite their different limits, is the shape of the answer: a system that reads a situation and keeps responding to it, continuously, without waiting for instruction. Together with the adjacent interactive and immersive domains where sound already plays a central role, these markets will be large, and growing. There is something to gain here, not only something to defend. And none of it will be served by anything that currently dominates the AI music conversation.

Sources and numbers behind these claims

Dementia in Germany: 1.8 million people are currently living with dementia, projected to reach 2.8–3.0 million by 2055–2070 (Alzheimer Demenz in Deutschland, PMC 2025). Germany's nursing staff deficit is projected at around 500,000 unfilled positions by 2030 (Bertelsmann Pflegereport 2030); the Federal Statistical Office estimates a gap of 280,000–690,000 by 2049 depending on scenario (Destatis Pflegekräftevorausberechnung). Only 12.3% of music therapists worldwide work in geriatric settings (AMTA Workforce Analysis, 2021).

Clinical evidence for music therapy in dementia: Music therapy has documented clinical benefit in dementia care — reducing agitation and psychotropic medication use (BMC Geriatrics, 2024).

Clinical guidelines: Both the UK's NICE guideline on dementia (NG97) and the German S3 Demenz guideline (DGPPN/DGN) recommend non-pharmacological interventions — including music-based approaches — as first-line treatment for behavioural and psychological symptoms such as agitation, before psychotropic medication is considered.

The scale of these adjacent markets

For context, the global music industry was roughly $95 billion in 2025, combining recorded music ($31.7 billion, IFPI Global Music Report 2026), live music (~$38.2 billion, Goldman Sachs 2025 via Music Business Worldwide), music publishing (~$10.7 billion, Goldman Sachs 2025), and merchandise (~$14.2 billion, inferred from MIDiA Research). The adjacent markets where adaptive sound could play a role are each individually large and growing at double-digit rates:

Automotive in-car audio and infotainment: $23-32 billion (2025), projected $36-76 billion by 2030-2035 depending on source and scope (Mordor Intelligence; MarketsandMarkets; Precedence Research). Sound is a core brand differentiator. Even if adaptive sound captures 5-10% of in-car audio budgets, the segment is substantial.

Digital therapeutics: $10-12 billion (2026), projected $25-40 billion by 2030-2035 (The Business Research Company). Music-based interventions (like MedRhythms' FDA-cleared InTandem) are a growing segment. A conservative 5-10% share yields $0.5-4 billion.

Music therapy: $3.2-3.6 billion (2026), projected $5.4-6 billion by 2032 (Coherent Market Insights). This is the healthcare segment where adaptive sound has the most direct precedent and evidence base. Note that this market is largely a services market of licensed therapists billing into healthcare systems, so adaptive sound intersects with it rather than substituting it directly.

Assistive and social robotics: $7-8 billion for social robots (2025), projected $30-40 billion by 2030-2031 (Mordor Intelligence; Research and Markets). Elder care robotics specifically: $3.4 billion (2025), projected $9.9 billion by 2033 (Grand View Research). Auditive interaction is a component of human-robot communication.

Gaming and interactive experiences: The global video game music market is estimated at $1.8 billion (2025), projected $3.4 billion by 2034 (Proficient Market Insights). The broader spatial audio and immersive sound market is valued at $4-10 billion (2025), projected $18-32 billion by 2033-2034 (MarketIntelo). For context: the global gaming market itself stands at $189 billion (2025, Newzoo), of which audio is a growing but hard-to-isolate component.

Even with conservative estimates of adaptive sound's share in each sector, the combined addressable market reaches the tens of billions today and is growing at double-digit rates. These are different markets with different buyers, regulatory frameworks, and technical requirements, which is precisely the point of this article.


The debate everyone is having

The public conversation about AI and music has consolidated around a single axis: who gets paid.

On one side, companies like Suno and Udio have built song generators of remarkable quality. Suno has two million paying subscribers and a $2.45 billion valuation from its late-2025 Series C. Udio generates roughly ten songs per second. The technology works. The demand is real.

On the other side, the music industry has fought back. Lawsuits were filed, negotiations followed, and within months the headlines suggested resolution — Universal settled with Udio, Warner with both Udio and Suno, and licensed training became, at least on paper, the new standard. Crisis managed.

Meanwhile, in studios, the situation looks different. AI tools have become part of everyday production workflows: stem separation, AI-generated samples, demo mockups. Songwriter Michelle Lewis described the atmosphere among peers as “don’t ask, don’t tell.” Producer David Baron noted a real social penalty for being identified as an AI user. Young Guru, Jay-Z’s longtime engineer, estimated that more than half of sample-based hip-hop now uses AI-generated material. The practice has moved far ahead of the debate.

And the debate itself is losing energy. Music lawyer Raffaella De Santis framed 2026 as copyright’s stress test year: the moment when legal frameworks will either adapt or fracture under the weight of what generative AI has made possible. Others have gone further. Music-AI researcher Ryan Page argued that the fixation on payment misses the structural point: “Ethics does not equal who gets paid.”

A commitment to “train only on licensed music” is a contractual statement. Nothing in the architecture can prove whether the contract was kept, and everyone involved knows this.

There is a pattern here that is worth naming. The entire debate (licensed versus unlicensed, paid versus unpaid, fair use versus opt-in) operates within a single framework: music as recorded product, counted and transacted. Music existed for millennia before it could be recorded. Sheet music created a market for compositions. But the industry as we know it, the one negotiating with Suno, was built on the recorded copy. It is roughly a century old. Its logic is the logic of copies, plays, and streams. And it is the same logic that reduced musical value to a single dimension long before AI arrived.

Applying that logic to generative AI does not resolve the tension; it reinforces it. The major-label settlements with Suno and Udio addressed a legal exposure, but not the problem underneath: once a generative model is trained, no external party can reconstruct what it was trained on. A commitment to “train only on licensed music” is a contractual statement. Nothing in the architecture can prove whether the contract was kept, and everyone involved knows this.


Why this is the wrong market

Return to the three scenes at the beginning. None involve a text prompt, none produce a finished song, none need a model trained on the full breadth of Western commercial music. The markets forming around them share three properties that set them apart from the song-generator world.

They are regulated. Automotive OEMs and medical-device companies operate under procurement standards that no contractual “train only on licensed music” assurance can pass. Provenance must be logged, reproducible, verifiable — architecture, not attestation.

They are context-specific. Each deployment must be steerable along fine-grained axes: a manufacturer’s sonic brand, a patient’s musical biography, a scene’s narrative state. The response must land on the first try. A song generator is throwaway by design: prompt, listen, discard, re-prompt until something fits. These systems have no re-roll; the driver is already at the junction, the patient is already agitated, the player is already at the turning point. That calls for training material annotated in depth and models that expose parametric control at inference time, not a generalist that outputs a finished song from a prompt. The deployment constraints (on-device, real-time, certifiable) then rule out the cloud-scale generalist model something like Suno represents.

Lines of light flowing through a dark gallery space — a visual metaphor for continuous, machine-driven adaptive sound
Sound that reads the room and keeps reading. Image: Midjourney

They are machine-to-machine. No human is in the loop at the moment of generation. A vehicle reads sensors, a therapeutic device reads behavior, a game engine reads player state. The model must respond in real time, continuously, on-device — a fundamentally different architecture from the prompt-generate-listen flow of a cloud-based song generator.

For these markets, the licensing debate misses the real issue: their paradigm is architecturally incompatible. Too slow, too imprecise, too uncontrollable. The more interesting question is whether that format (human types prompt, machine produces song) is the most consequential application of generative audio at all.


What would be needed instead

Three components that do not yet exist at scale: a semantic layer that translates real-world context such as sensor data, behavioral signals, and physiological states into meaningful sonic parameters; a real-time generative system that runs on-device with fine-grained parametric control; and a training corpus that is licensed, annotated in depth, and auditable by contributors, licensees, and regulators alike — where provenance is architecture, not attestation.

CORPUS is building this infrastructure because the markets we are building for cannot be served by any song generator. What these other markets need does not yet have a name. Adaptive sound. Situated music. Sonic behavior. The language is still forming, because the category is still forming.

The argument for CORPUS in these markets is not that we are fairer. It is that the song-generator paradigm cannot do this at all.


The wrong debate, and the older idea beneath it

While the industry argues about who gets paid when a machine generates a song, markets are emerging in which the finished song is not the product. The debate is important because its concerns are legitimate. Fair compensation matters. Licensing matters. But the frame is too narrow: it treats AI music as a better jukebox, another way to produce fixed copies to be played back. What is actually arriving is sound that responds to the situation it is in — contextual, continuous, unrepeatable.

This is not new. For most of human history, that is what music was: played for a room, a moment, a person, never twice the same. The century of recorded copies traded as product is the anomaly, not the baseline. Music is not ending. It is becoming something it already was, now carried by machines that can listen.