The Wrong Debate — CORPUS Journal

Executive Summary

The debate around AI and music overlooks the markets where the song is not the point. In automotive, healthcare, rehabilitation, games, and immersive installations, what is needed is adaptive sound — systems that read a situation and respond continuously, in real time, on-device. These markets are regulated, context-specific, and machine-to-machine. Apps like Suno cannot serve them. The blocker is technical. Meanwhile, the public conversation has narrowed to a single question (who gets paid when a machine generates a song?), and the major-label settlements with Suno and Udio addressed a legal exposure while leaving the verification problem untouched: no one can audit what a trained model was actually trained on. The infrastructure these new markets require — a semantic layer, a real-time generative system, and a verifiably licensed training corpus — does not yet exist at scale. CORPUS is building it.

A driver has been on the road for four hours. Night, rain, a complex junction approaching. The vehicle reads speed, biometric signals, weather, time of day. The sound environment shifts, continuously, adapting to support concentration without adding stress. No one pressed play; the response comes from the situation itself.

A woman on a dementia ward becomes restless in the late afternoon. A small device on her bedside table begins to play something familiar, chosen from what it knows about her: regional songs from the 1950s, the music of her youth. It triggers a memory. She starts to sing along, fragmentary, drifting in and out of the melody. Now the device must follow, adjusting to her tempo, holding the harmonic thread when she loses it, softening when she goes quiet, gently carrying the melody forward until she finds her way back in. It is not playing a recording. It is accompanying her, in real time, through a moment that will not repeat in the same way tomorrow.

A stroke patient works to regain motor control. Music sets the rhythm for a hand exercise: steady, structured, something to move to. When he cannot follow, the music notices. It slows subtly, simplifies, meets him where he is. When his strength returns, it builds again. No session resembles the previous one, because recovery is not linear. The system does not select a motivational track. It leads, listens, and adapts.

None of these scenarios are invented. Each one describes what trained professionals already do — music therapists on dementia wards, rehabilitation specialists using rhythmic auditory stimulation, sound designers shaping in-car environments. The practice is real. What does not yet exist is the technology to do it without a human in the room.

Because there are not enough people

And it would be better if humans did it. That is not in question. A music therapist who sits with a restless patient, who reads her breathing and holds the moment, no system replaces that presence. But Germany has 1.8 million people living with dementia, projected to reach nearly three million by mid-century. The country is already short tens of thousands of care workers, with the gap widening every year. Projections put the deficit at around 500,000 unfilled nursing positions by 2030, and the workforce is aging: forty percent of nursing staff in care homes are over fifty. Music therapy, despite strong evidence, is not part of standard care in most facilities. Across the EU, more than nine million people live with dementia. Only around twelve percent of music therapists work in geriatric settings at all.

In the United States, the numbers are starker still. Roughly 10,000 certified music therapists serve a country with more than six million dementia patients. An individual session costs over a hundred dollars an hour. The facilities run on Medicaid. The attention these patients need, daily, for hours, is simply not available. Not because no one cares. Because there are not enough people.

The same arithmetic applies in rehabilitation. Motor recovery after stroke requires hundreds of repetitions per day. Patients in clinical settings receive a fraction of that. More than thirty percent get no therapy at all in the first month after discharge. Those who do start a home program see their adherence collapse within weeks. The dosage that recovery demands is known. The system simply cannot sustain it.

The driver is alone by definition.

An empty modern chair in a sunlit clinical room, no one sitting — representing the absence of care workers in understaffed facilities — By 2030, Germany is projected to be short around 500,000 nursing positions. Image: Midjourney

These are scaling problems — the practices work, the professionals exist, there are just not enough of them, not enough hours, not enough funding to meet the need. The question is not whether a machine should replace a therapist. It is what happens in all the hours, all the days, all the rooms where no therapist is present.

That is where adaptive sound becomes relevant: something that can exist in the silence where nothing else does. Sound that reads a situation, responds, and continues responding as the situation changes. Add to that the interactive environments where sound already plays a central role, from games to immersive installations, and these markets will be large. Potentially larger than the recording industry. And they will not be served by anything that currently dominates the AI music conversation.

Sources and numbers behind these claims

Dementia in Europe: Germany has 1.8 million people living with dementia (2021), projected to reach 2.8-3.0 million by 2055-2070 (Alzheimer Demenz in Deutschland, PMC 2025). Across the EU27, approximately 9.1 million people are affected, expected to rise 58-64% by 2050 (Alzheimer Europe). Germany's nursing staff deficit is projected at around 500,000 unfilled positions by 2030 (Bertelsmann Pflegereport 2030; Prognos 2015). The Federal Statistical Office projects a gap of 280,000 to 690,000 nursing workers by 2049 depending on scenario (Destatis Pflegekräftevorausberechnung). Around 40% of nursing staff in care homes and ambulatory services are 50 years or older (Destatis, 2021). Only 12.3% of music therapists worldwide work in geriatric settings (AMTA Workforce Analysis, 2021).

Dementia in the U.S.: Roughly 10,000 certified music therapists serve more than six million dementia patients (CBMT Certificant Data). Individual music therapy sessions cost $100-200 per hour. Music therapy is not part of standard care in most nursing homes despite strong evidence for reducing agitation and medication use (BMC Geriatrics, 2024).

Stroke rehabilitation: Motor recovery requires 400-600 repetitions per session to trigger neural adaptation. Patients in clinical settings receive an average of 32 upper-limb repetitions per session (Dose and Timing in Neurorehabilitation, PMC; Observation of Movement Practice, PMC). More than 30% of stroke patients receive no postacute therapy in the first 30 days after discharge (Stroke/AHA Journals, 2022). Adherence drops from 63-82% during hospitalization to 47-54% post-discharge, with 25% of patients dropping out entirely within 6-12 months (PMC, 2025).

Why "potentially larger than the recorded music industry" is not hyperbole

The global recorded music industry generated $29.6 billion in 2024 (IFPI Global Music Report 2025). The adjacent markets where adaptive sound could play a role are each individually large and growing at double-digit rates:

Automotive in-car audio and infotainment: $23-32 billion (2025), projected $36-76 billion by 2030-2035 depending on source and scope (Mordor Intelligence; MarketsandMarkets; Precedence Research). Sound is a core brand differentiator. Even if adaptive sound captures 5-10% of in-car audio budgets, the segment is substantial.

Digital therapeutics: $10-12 billion (2026), projected $25-40 billion by 2030-2035 (The Business Research Company). Music-based interventions (like MedRhythms' FDA-cleared InTandem) are a growing segment. A conservative 5-10% share yields $0.5-4 billion.

Music therapy: $3.2-3.6 billion (2026), projected $5.4-6 billion by 2032 (Coherent Market Insights). This is the healthcare segment where adaptive sound has the most direct precedent and evidence base. Note that this market is largely a services market of licensed therapists billing into healthcare systems, so adaptive sound intersects with it rather than substituting it directly.

Assistive and social robotics: $7-8 billion for social robots (2025), projected $30-40 billion by 2030-2031 (Mordor Intelligence; Research and Markets). Elder care robotics specifically: $3.4 billion (2025), projected $9.9 billion by 2033 (Grand View Research). Auditive interaction is a component of human-robot communication.

Gaming and interactive experiences: The global video game music market is estimated at $1.8 billion (2025), projected $3.4 billion by 2034 (Proficient Market Insights). The broader spatial audio and immersive sound market is valued at $4-10 billion (2025), projected $18-32 billion by 2033-2034 (MarketIntelo). For context: the global gaming market itself stands at $189 billion (2025, Newzoo), of which audio is a growing but hard-to-isolate component.

Even with conservative estimates of adaptive sound's share in each sector, the combined addressable market reaches the low tens of billions today and approaches or exceeds the size of the recorded music industry by the mid-2030s. These are different markets with different buyers, regulatory frameworks, and technical requirements, which is precisely the point of this article.

The debate everyone is having

The public conversation about AI and music has consolidated around a single axis: who gets paid.

On one side, companies like Suno and Udio have built song generators of remarkable quality. Suno has two million paying subscribers and a $2.45 billion valuation from its late-2025 Series C. Udio generates roughly ten songs per second. The technology works. The demand is real.

On the other side, the music industry has fought back. Lawsuits were filed, negotiations followed, and within months the headlines suggested resolution — Universal settled with Udio, Warner with both Udio and Suno, and licensed training became, at least on paper, the new standard. Crisis managed.

Meanwhile, in studios, the situation looks different. AI tools have become part of everyday production workflows: stem separation, AI-generated samples, demo mockups. Songwriter Michelle Lewis described the atmosphere among peers as “don’t ask, don’t tell.” Producer David Baron noted a real social penalty for being identified as an AI user. Young Guru, Jay-Z’s longtime engineer, estimated that more than half of sample-based hip-hop now uses AI-generated material. The practice has moved far ahead of the debate.

And the debate itself is losing energy. Music lawyer Raffaella De Santis framed 2026 as copyright’s stress test year — the moment when legal frameworks will either adapt or fracture under the weight of what generative AI has made possible. Others have gone further. Music-AI researcher Ryan Page argued that the fixation on payment misses the structural point: “Ethics does not equal who gets paid.”

A commitment to “train only on licensed music” is a contractual statement. Nothing in the architecture can prove whether the contract was kept, and everyone involved knows this.

There is a pattern here that is worth naming. The entire debate (licensed versus unlicensed, paid versus unpaid, fair use versus opt-in) operates within a single framework: music as recorded product, counted and transacted. Music existed for millennia before it could be recorded. Sheet music created a market for compositions. But the industry as we know it, the one negotiating with Suno, was built on the recorded copy. It is roughly a century old. Its logic is the logic of copies, plays, and streams. And it is the same logic that reduced musical value to a single dimension long before AI arrived.

Applying that logic to generative AI does not resolve the tension. It reinforces it. The major-label settlements with Suno and Udio addressed a legal exposure, but not the problem underneath: once a generative model is trained, no external party can reconstruct what it was trained on. A commitment to “train only on licensed music” is a contractual statement. Nothing in the architecture can prove whether the contract was kept, and everyone involved knows this.

Why this is the wrong market

Return to the three scenes at the beginning. None of them involve a text prompt, none produce a finished song, none require a model trained on the full breadth of Western commercial music. They ask for something else entirely.

The markets forming around adaptive sound (in vehicles, in healthcare, in rehabilitation, in robotics, in games and immersive installations) share three properties that separate them from the song-generator world.

They are regulated. Automotive OEMs and medical technology companies operate under some of the strictest procurement standards in any industry. Their legal departments do not sign off on products built on datasets with unresolved intellectual property. This is not an ethical preference. It is a compliance gate. And this is where the verification problem becomes concrete: a model whose training data cannot be audited will not pass procurement at a car manufacturer or a medical device company. A contractual assurance is not enough. The provenance must be structural: logged, reproducible, verifiable. Architecture.

They are context-specific. A song generator optimizes for breadth, the ability to produce something convincing in any genre on demand. The applications described here are the opposite: a vehicle’s sound environment is brand-specific and must be consistent across a fleet of millions. A therapeutic intervention varies by patient background, by culture, by the individual day. No system will match the sensitivity of a trained therapist, and this article does not claim otherwise. But a system that knows its domain deeply, that is trained on carefully annotated material rather than the full breadth of recorded music, can offer something where the alternative is silence. Technically, that favors smaller, precisely trained models built on rights-cleared data, running on-device within tight hardware constraints.

Lines of light flowing through a dark gallery space — a visual metaphor for continuous, machine-driven adaptive sound — Sound that reads the room and keeps reading. Image: Midjourney

They are machine-to-machine. Suno’s interface is human-facing: a person types a prompt, the model generates, the person listens. The systems described above have no human in the loop at the moment of generation. A vehicle reads sensors. A therapeutic device reads behavior. A rehabilitation system reads movement. A game engine reads player state. The model must respond in real time — continuously, on-device, without waiting for a prompt. This is a fundamentally different architecture from cloud-based song generation.

For these markets, Suno is not ethically problematic. It is architecturally incompatible. Too slow, too imprecise, too uncontrollable. The more interesting question is whether the format Suno represents (human types prompt, machine produces song) is the most consequential application of generative audio at all.

What would be needed instead

The infrastructure for adaptive sound requires three components that do not yet exist at scale.

A semantic layer that translates real-world context (sensor data, behavioral signals, physiological states) into meaningful sonic parameters: a structured interpretation of what the situation means and what kind of sonic response it calls for.

A real-time generative system that runs on-device with fine-grained parametric control, accepts continuous input streams, and produces continuous sonic output within the constraints of automotive or medical hardware.

And a training corpus that is licensed, annotated in depth, and auditable. Where compliance is a structural property of the system, verifiable by contributors, licensees, and regulators. Where the music never leaves the training infrastructure. Where every training run is logged with its dataset composition. Where provenance is architecture, not attestation.

CORPUS is building this infrastructure because the markets we are building for cannot be served by any song generator. Suno’s market is real and its users are making real things; the CORPUS argument is orthogonal to that. What these other markets need does not yet have a name. Adaptive sound. Situated music. Sonic behavior. The language is still forming, because the category is still forming.

The argument for CORPUS in these markets is not “we are fairer than Suno.” It is that Suno cannot do this at all.

The wrong debate, and where to look instead

While the industry argues about who should be compensated when a machine generates a song, markets are emerging in which the song is not the point. The debate is not wrong because its concerns are illegitimate. Fair compensation matters. Licensing matters. But the frame is too narrow. It looks at AI music and sees a better jukebox. What is actually arriving are machines that can sense a situation and respond with sound.

The woman on the dementia ward does not need fewer music therapists. She needs one to be there at all. The stroke patient needs his rehabilitation to continue after the system sends him home. If adaptive sound can exist in the hours where no therapist is present, the result will not be fewer therapists. It will be more therapy, reaching more of the people who need it.