Wednesday, June 18, 2025

Music Theory Is Not a Model of Perception

(Main article: Tonal Constancy)

A recurring problem in psychoacoustics and auditory perception research is the assumed “bridge” between music theory and perceptual science. But this bridge is not a bridge at all, it is a web. And many researchers never stop to examine where one domain ends and the other begins, or what assumptions are silently carried across.

Much of music theory is art. It is a creative, symbolic, and analytical practice. It does not claim to describe perception yet scientists often treat it as if it does. A researcher attempting to model pitch inference may spend time studying well-formed scales, spiral tuning systems, Grassmannian spaces, etc... But these belong to music theory as generative or formal systems. They are methods for organizing pitch, not explanations of how the auditory system infers or categorize its functions.

Music theory often operates in two extremes:

- Pure formalism: abstract symbolic systems with internal mathematical consistency but no intrinsic perceptual grounding.
- Aesthetic prescription: theories that describe what is considered consonant, resolved, beautiful, or complete.

What is missing is the central perceptual layer. The cognitive mechanisms that connect sound to meaning are often left unspecified.


Students learn what a triad is, what a sharp is, what C means. They do not learn which neural systems are engaged when temporal expectations are violated, or how tonal grammar interacts with Bayesian predictive processing under high expectancy conditions.

This gap reinforces a deeper problem: research frequently conflates Western music theory with auditory cognition. Theoretical constructs become dogma, and perceptual science risks becoming a feedback loop that validates pre-existing musical abstractions.

A mathematical construction may be elegant but elegance does not entail perceptual relevance. All musical systems are parameterizable. The equations themselves do not explain how pitch is inferred, why pitch is categorized cyclically, why octave equivalence appears perceptually privileged, where the limits of these perceptual definitions lie, etc


Consider sensory dissonance models such as the one proposed by William Sethares. His approach models vertical dissonance by summing the beating interactions between all frequency pairs within complex tones. The resulting curve predicts minima that correspond to perceptual consonance (in terms of roughness).

For harmonic timbres, these minima align closely with 12-tone equal temperament and simple just intervals. This result has been appealing across philosophical camps, to those who view scales as "natural" and to those who interpret them metaphysically.

But the model is incomplete.

It does not incorporate, for example, combination tones (a physical nonlinear phenomenon); or the missing fundamental (virtual pitch), a central neurological phenomenon.

Because of this, timbre manipulation to reduce roughness (as the goal of music?) does not generalize cleanly to edge cases, and edge cases are often where theory breaks down.

The octave is often treated as uniquely privileged. Yet in roughness models it is not necessarily more consonant than the fifth, fourth, or even the tritone under certain constructions.

One can design timbres and tunings where another interval minimizes roughness more than the octave. Yet this does not cause pitch categories to reorganize cyclically around that interval. Notes do not begin repeating at that new distance.

This suggests that octave equivalence is not reducible to bottom-up sensory consonance alone. The pitch helix may be more top-down than bottom-up, or the "bootstrap mechanism"as argued by Diana Deutsch.

The model predicts roughness. It does not explain pitch categorization or tonal function.


Research on tonality and music perception relies heavily on rhetorical vocabulary: Tension, Resolution, Stability, Harmonicity, Pleasure, Finishedness, Beauty...

Even when carefully operationalized within individual studies, these terms remain partially defined and context-dependent. They often avoid full ontological grounding.

The foundational atoms of sound perception remain unclear.

One of the first questions in music perception should be: What is a pitch?

A tentative operational definition usually is:

"Pitch is sonic information from which a human listener, under ideal conditions, can consistently report and potentially reproduce a single most fundamental frequency equivalent to an f₀."

Yet there is no simple predictive model specifying when a listener will extract such an f₀ from arbitrary sound. Despite this, musical practice assumes pitch as obvious and primary, and research often jumps immediately to higher-level statistical structures built from these assumed pitch units.

The perceptual ground remains underdefined.


Music theory is not perception theory, mathematical elegance is not perceptual truth, sensory models are not categorization models.

No comments:

Post a Comment

“Too Mathy for Assyrians” Is Just Modern Amnesia

The Myth of the "Non-Mathy" Ancient We have developed a strange, collective amnesia regarding what the Assyrians and Babylonians a...