Sunday, July 14, 2024

Tonal Constancy

Tonal Constancy
(draft, notes, wip, open research)


Human pitch perception is not a passive registration of acoustic frequencies, but an active process of categorical organization. When we listen to melodies or harmonies, the auditory system continuously groups pitches into learned, stable classes, intervals, scale degrees, tonal functions. This process, which I call tonal constancy, is the brain’s mechanism for maintaining coherent pitch categories even when the acoustic input is imprecise, unstable, or atypical.

Tonal constancy combines several interacting principles: learned templates (diatonic hierarchy, common-practice cadences, chord prototypes), psychoacoustic cues (spectral alignments, sensory dissonance minima), Bayesian priors (expectation of tonal centers, step/skip distinctions), language-like magnet effects (Kuhl, Krumhansl).

Together these allow the brain to stabilize musical pitch spaces keeping notes “in tune” conceptually, even when they are not acoustically aligned. It's categorical perception with a predictive correction mechanism.

This stabilization has an interesting consequence: many pitch systems far beyond 12-tone equal temperament are still heard as if they belonged to it. I refer to this phenomenon as duodecimability: the degree to which arbitrary pitch material is assimilated into 12-tone categories. In the simplest formalization, duodecimability can be imagined as a categorical tolerance window analogous to saying that ±15/30 cents deviations still fall into a 12-EDO class. If pitch events fall within the categorical boundaries of 12-EDO, we say they are duodecimable. But in practice, the tolerance is not fixed. It expands and contracts dynamically in response to musical motion, tonal context, and the listener’s expectations. Every musical event: strengthens a hypothesized tonal grammar, increases tolerance for deviations, increases willingness to “repair” the contour, reinterprets incoming intervals based on context, updates prediction biases...

This means that duodecimability is behavioral, not merely acoustic. Once a tonal interpretation gathers inertia, the listener becomes increasingly tolerant of deviations. This explains why 12-EDO can absorb large perturbations (±40c, sometimes more), why the same interval can be categorized differently depending on how it is approached, how it is resolved, or which tonal hierarchy is currently active. This functional multivalence explains why certain intervals in scales such as 7-EDO can be reinterpreted in multiple ways.

Evidence for duodecimability appears in controlled experiments with random tunings:
even when pitches are drawn from a uniform distribution across the audible range, diatonic structure, tonal centers, modality, major-minor implications that are not present in the data as typical fixed values still reappear. Similarly, empirical cases like pitch drift and modulating experiments demonstrate the brain’s readiness to preserve the internal contour of melodies by “rotating the helix” of 12 tones, rather than abandoning familiar pitch categories. Multivalence is not “in the interval”; it’s in the interaction between interval and prediction.

This reveals a deeper asymmetry: it is generally easier for listeners to reinterpret pitch height than to abandon pitch category. The system first tries to preserve the tonal scaffold, only relinquishing it when the input becomes unequivocally incompatible.

Only a few pitch systems escape this gravitational pull. Tunings such as 5-EDO, the Bohlen-Pierce scale, or certain just-intonation domains trigger an early collapse of duodecimability. The brain cannot map them to the 12-tone prior without contradiction, and so tonal constancy reboots, constructing a new internal grammar shaped by the system’s own logic.

What follows from this is a general cognitive picture:
Tonality is not a fixed property of a tuning system but a strategy the brain applies to maintain coherence.
Duodecimability is the measure of how hard the brain tries to impose that strategy before giving up.

In this work, I aim to separate this perceptual mechanism from traditional music-theoretical concepts (modality, tonal hierarchy, functional harmony). By doing so, we can understand why some microtonal systems feel familiar, and others generate their own internal stability independent of the 12-tone world.

---

This study examines the perception of stable tonal functions in musical contexts where those functions are not reliably present in the acoustic signal. Although musical intervals can be described as physical frequency ratios, their experienced roles depend heavily on melodic context, cultural learning, and cognitive pattern-matching. The same pitch height can be interpreted differently depending on the musical environment, revealing that many aspects of tonal structure are perceptually constructed rather than acoustically fixed.

Tonal Constancy is proposed as a unifying principle to explain this behavior. The theory holds that the brain actively maps incoming sound onto a familiar, likely, 12-tone framework, projecting learned tonal relationships, such as tonic-dominant hierarchy or the sense of resolution, onto ambiguous pitch material. Music reveals functional categories that are not acoustically encoded but are inferred from melodic shape, stress patterns, and prior exposure.

This perspective has direct implications for microtonality. It distinguishes between systems that the brain can easily assimilate by fitting them into existing categories, and systems that resist assimilation because their structures are incompatible with the internalized 12-tone model. Counterintuitively, some coarse or evenly spaced tuning systems (macrotonal frameworks) can feel more perceptually “alien” than finely divided microtonal scales precisely because they provide fewer recognizable tonal anchors. (Microtonal Refinement/Optimization vs Microtonal Structuralism)

The chapters that follow define a precise vocabulary for these phenomena, present case studies demonstrating tonal reconstruction in action, and outline a classification system for how different pitch structures interact with the listener’s internalized tonal grid. The work also situates these perceptual mechanisms within broader discussions of acoustics, cognition, and cultural history.

Note on Perceptual Validity: The ultimate evidence for Tonal Constancy is the listener's ear. Therefore, the audio examples provided are not merely aesthetic demonstrations; they are the primary data. Formal statistical validation is outside the scope of this initial exploration.







1: An Introduction to Tonal Constancy


Tonal Constancy refers to the brain’s tendency to impose familiar tonal structures onto acoustically unfamiliar or ambiguous pitch data. It explains why listeners often perceive semitones, tonics, or functional resolutions even when the input contains no reliable cues for those categories. Rather than passively receiving frequencies, the auditory system actively organizes them according to learned schemas, a form of cognitive pattern-imposition similar to pareidolia, but operating on pitch relationships.

The term parallels color constancy in vision, where colors remain identifiable across changes in lighting. In audition, small deviations in interval size or tuning system do not disrupt the perceived identity of a pitch pattern; we still recognize the underlying relational “shape.” Even in more extreme tuning systems, such as those far from 12-tone equal temperament, listeners frequently reconstruct familiar functions through melodic motion and contextual inference. In this sense, pitch categories behave less like fixed points and more like directional cues, vectors rather than precise coordinates.

For such reconstruction to occur, the brain relies on a stable internal framework. This suggests a deeper process: the perceptual categorization of cycles. Just as spatial patterns are grouped into shapes and temporal events into meters, pitch information is grouped into circular structures, with the octave being the dominant organizing cycle. This is supported by the harmonic series and phenomena such as the missing fundamental, which provide early perceptual justification for octave grouping.

Within this framework, three related concepts become important:

-Range: the span of frequencies humans can hear.
-Resolution: the smallest perceivable pitch difference (JND).
-Cycle: the inferred circular structure the brain uses to map pitch into repeating units.

Range and resolution describe physical limits; cycle reflects a cognitive strategy for generating order within continuous acoustic space.

This perspective explains several recurring phenomena:

-Why scales such as 7-EDO can still evoke familiar functions like “semitone” or “dominant” when heard in motion.
-Why randomly generated or uniformly spaced tones often suggest tonal relationships.
-Why even experienced musicians may not immediately detect that they have left the 12-tone system when exposed to systems like 13-EDO, provided timbre and pitch motion remain within certain perceptual expectations.

In short, most pitch systems are predisposed to be interpreted tonally unless they are structured to avoid alignment with learned priors.

Tonal Constancy therefore reflects more than musical habit. It may represent a fundamental feature of auditory cognition. This study aims to formalize the concept, propose a classification system for degrees of translatability into tonal space, and explore its implications for theory, composition, and the psychology of hearing.

----

Note on Bayesian Models of Cognition

Music provides a useful case in which statistical and predictive models encounter recurring misunderstandings about the level at which they are meant to apply. A common critique states that “a musician is not computing Bayes’ rule in order to choose the next note.” Similar objections appear across psychology whenever Bayesian approaches are introduced, often stemming from the way the theory is explained historically, namely as an answer to the question “How do people make decisions?” This framing invites the familiar objection that humans are not statisticians, an idea echoed in early critiques such as those of Daniel Kahneman, who remarked: “For anyone who would wish to view man as a reasonable intuitive statistician, such results are discouraging.”

Part of the confusion arises because the “rational agent” is often interpreted as a consciously deliberating subject, the post-perceptual individual who experiences qualia, rather than as an unconscious system performing a computational task. In Bayesian cognitive science, however, the model is usually intended to operate at what David Marr (1982) called the computational level: a description of the problem being solved and the optimal strategy for solving it, not necessarily the algorithm implemented by the brain. At lower levels of description, individual neural networks can be shown analytically to approximate simple forms of Bayesian inference, as discussed by James McClelland (1998).

Within music cognition, Bayesian reasoning appears particularly successful in at least two related domains. One concerns how listeners form probabilistic expectations, what might be framed less as “decision making” and more as how people make perceptual guesses. Another concerns the strategies perceptual systems employ to minimize uncertainty or energetic cost when interpreting ambiguous sensory input. Tonal Constancy draws on predictive frameworks of this kind. Although predictive coding is well established in neuroscience, a conceptual gap still remains between these neural theories and their broader use in cognitive models of music.

In the present context, priors may have a relatively direct origin. Even with minimal learning, the harmonic structure present in complex tones provides a natural distribution of expectations. When auditory input is interpreted as musical pitch, the harmonic components of a tone effectively act as priors over possible continuations. In a loose sense this resembles a belief distribution that amplifies certain predictions about what may occur next. For example, if a harmonic tone at 100 Hz is followed by another at 200 Hz, little genuinely new spectral information has been introduced; the second sound largely reiterates relationships already present in the first. In that sense it represents one of the most predictable continuations available to the system.

The goal here, however, is not to explain how musicians compose music. Models such as those proposed by David Temperley show that many musical structures can indeed be described using Bayesian-like frameworks, where a piece becomes progressively more predictable as additional observations accumulate. In such models, stylistic regularities function almost as a fingerprint of a composer or tradition. These frameworks also reveal that priors are not limited to timbre: they emerge both from bottom-up constraints on possible frequency relations and from top-down recognition of tonal objects.(see section on face perception)

The present use of Bayesian reasoning is narrower. The relevant question is how listeners infer tonal objects and dynamically reallocate perceptual tolerance as context unfolds. How do pitch categories remain stable despite variation in tuning, timbre, and spectral structure? In other words, the focus is not compositional choice but perceptual inference and category stability.

Strict Bayesian updating may not fully capture these processes. In some respects the dynamics resemble ideas closer to those explored in Quantum Bayesianism or even computational analogies such as Grover's algorithm. In such frameworks observations do not merely add probability mass; they can interfere with competing hypotheses. A musical example illustrates the intuition: a leading tone does not simply increase the probability of the tonic; it simultaneously suppresses alternative tonal interpretations. Observations therefore reshape the hypothesis space not only by reinforcement but also by elimination.(see section on attractor dynamics)


2: From 'Xenharmonic' to 'Induodecimable': The Need for a Precise Lexicon.


The term xenharmonic was originally coined to describe musical materials that lie outside the framework of Western 12-tone equal temperament (12-EDO). Implicit in its early usage was a sense of perceptual alienness, sounds that could not be reconciled with conventional tonal expectations, or even approximated within the familiar scalar structures of Western music.

Over time, however, the term’s scope has broadened. Today, xenharmonic may be applied to any music using non-standard tunings, alternate instruments, or unfamiliar timbres. As its use has expanded, its precision has diminished. A piece might now be labeled xenharmonic even if it maps closely onto 12-EDO, or if it retains gestures that remain tonally functional within familiar paradigms. In this diluted form, the term no longer guarantees that the music is truly untranslatable into the 12-tone system.

To address this ambiguity, we use a more semantically precise term: induodecimable, from Latin roots meaning not reducible to twelve. It describes musical structures, scales, or timbres that cannot be effectively translated into 12-EDO without a perceptual or functional loss. Unlike xenharmonic, this term emphasizes irreducibility, not just unfamiliarity. Moreover, its morphology is cross-linguistically stable (e.g., induodecimable reads identically in English and Spanish), and it admits extensions for greater specificity, such as indiatonizable, referring to pitch content incompatible with diatonic function.

It is important to note that this property is not binary. Whether a given musical structure is duodecimable (that is, whether it is approximable by 12-EDO) is ultimately a perceptual judgment. Some cases are obvious: inharmonic spectra perceived as noise, or very specific microtonal systems with step sizes that fall far outside typical pitch categories. But others lie in a gray zone where perceptual context, cultural exposure, and learned listening habits strongly shape what we "hear."

This gray zone is precisely where Tonal Constancy becomes critical. Even when a melodic or harmonic structure defies analytical reduction to diatonic scales, listeners often project familiar tonal frameworks onto it, constructing implied functions, modes, or centers through context and inference. The ability to “make tonal sense” of unfamiliar material is not evidence of universality in the structure itself, but of the perceptual elasticity of the listener.

As this study will show, the diatonic scale functions as a tonal attractor, a kind of perceptual sink into which ambiguous or approximate materials are pulled. The 12-tone system serves as its most stable host, offering a resolution of the scale’s unequal steps into evenly spaced units that align (imperfectly, but reliably) with physical redundancies in the harmonic series.


  • Duodecimability:A musical object (interval, scale, passage, piece, or tuning behavior) is duodecimable if there exists a translation into 12-tone equal temperament that preserves the same functional structure for a human listener.
    Note-by-note mapping may or may not be close in cents. What must survive is: direction, relative attraction, cadence identity, interval categories, and modulatory logic.
    When this fails, when a mapping is possible but the resulting structure no longer behaves recognizably, then the original is induodecimable.
    Duodecimability is a structural property of music, not merely a property of the tuning system. A tuning may be capable of producing duodecimable and induodecimable music depending on context.

 

The question then arises: why twelve? It's not the timbre, the Pythagorean algorithm or the more compelling harmonic semi-group, which, while seemingly more ontologically robust, doesn't necessarily relate to or reflect our perception. Why does this specific internal subdivision act as the dominant attractor, rather than systems based on ten, fifteen, or nineteen tones? Why does duodecimability seem to represent a perceptual threshold?

The answer is not solely historical. Nor is it purely acoustic. Instead, we find ourselves in the deeper terrain of categorical emergence: how perceptual systems construct stable reference frames from continuous data. Just as we learn to divide the color spectrum into culturally specific “basic colors,” so too do we divide the pitch continuum into categories that are both learned and constrained, by cognition, biology, and acoustics.

This chapter lays the foundation for a more precise taxonomy of perceptual translatability in music. The aim is not only to explore how Tonal Constancy works, but to examine the deeper question: where do musical categories come from at all? At what point do categories cease to form, or become unstable? And when do they dissolve into pure context-dependence; when Tonal Constancy, in effect, runs out?


3. Tonal Reconstruction in 7-EDO and the Elasticity of Pitch Meaning


The 7-tone equal division of the octave (7-EDO) offers a clear demonstration of tonal constancy. Each step spans ~171 cents, and unlike diatonic 12-EDO, the system contains no internally differentiated intervals, no whole/semitone hierarchy, no embedded modal markers, and no natural centers of gravity. Acoustically, it is a uniformly spaced cycle.

Yet when 7-EDO is used melodically, listeners routinely report hearing tonal centers, modal references, and functional cadences. The system behaves musically intelligibly despite lacking structural cues.

This raises a central question:

How does a scale composed entirely of uniform steps give rise to perceived modes, cadences, and tonal direction?

The answer lies in the organization of the sequence rather than the tuning itself. Trajectory, rhythmic emphasis, contour, and learned tonal archetypes guide the perceptual system toward interpretations that are not encoded in the signal. This is tonal constancy: the imposition of familiar pitch relationships onto acoustically ambiguous material.

The 7-EDO “Chameleon Effect”: Functional Multivalence


One of the clearest signs of tonal constancy in 7-EDO is the instability of pitch identity. A single 7-EDO degree can be re-interpreted as multiple 12-EDO categories depending on context. For example, the 342-cent step—the “neutral third”—frequently shifts between major-like and minor-like functions.

This is Functional Multivalence (a one-to-many mapping):
  • In 12-EDO, a pitch class is relatively stable (“C is C”).
  • In 7-EDO, the same frequency can behave as a major third, a minor third, or something in between, depending on melodic direction, local emphasis, or implied harmony.
The perceptual system prefers to preserve musical grammar—directionality, cadence, and contour—over preserving literal interval size. In effect, the brain “chooses” the pitch identity that best fits the surrounding syntax, even when that identity is not present in the acoustic input.

For reference, the 7-EDO scale in cents is:

[0, 171, 342, 513, 684, 855, 1026]

Audio Examples 01/02: Diverging Cadential Mappings


The following examples (7-EDO followed by 12-EDO reinterpretation) illustrate how a single interval can support multiple functional readings.

The tune has a simple A-A-B-B structure. In both phrases, the final step of the bass line is identical in 7-EDO: a single 171-cent ascent. Yet in the 12-EDO reinterpretation, this same interval is mapped differently in each phrase:

In one case, the step resolves as a 100-cent semitone, supplying a cadential “leading tone.”
In the other, it expands to a 200-cent whole tone, producing a “major seventh”-like resolution.

Thus the same 7-EDO motion supports two distinct tonal functions, determined not by its size but by its role in the phrase.

Audio Examples 03/04: Neutral Intervals in Context


Additional examples (sine-wave only) show that the effect persists even without harmonic cues. The 342-cent degree is perceived as “major” or “minor” depending on the melodic frame:

Ascending, it often acquires major-third implications.
Descending, or in a minor-leaning contour, it takes on a minor-third quality.

When rendered in 12-EDO, performers naturally “resolve” these ambiguous steps toward the expected functional pitches to satisfy the implied cadence. This is tonal constancy operating directly on interval interpretation.

Trajectory, Momentum, and Torsor Structure


These examples highlight the role of pitch momentum, the way successive intervals form a directed trajectory through pitch space. Even in a perfectly symmetric tuning, melodic movement generates expectation and prepares closure.

This behavior aligns more closely with a torsor than a vector space: there is no absolute reference point, only relational structure. A pitch derives meaning from: its placement in the trajectory, its rhythmic emphasis, and its relation to culturally learned tonal prototypes.

The ear treats 7-EDO not as a static grid but as a flexible relational field.

This leads to a central set of questions:

When Does Meaning Break Down?
How far can an interval deviate before its expected function collapses?
When does a “minor third” cease to be heard as minor?
At what point does tonal constancy fail to rescue the structure?

These boundaries are not fixed. They shift with experience, familiarity, cultural priors, and attentional state. Much of what we “hear” as categorical pitch identity is constructed, not given.

Later chapters will return to these issues when discussing tonal attractors, learned priors, and the emergence of pitch categories.

Summary

Tonal behavior in 7-EDO shows that pitch meaning is elastic. Even in a scale with no intrinsic hierarchy, listeners reconstruct functional roles through trajectory, rhythm, and expectation. The auditory system is not passively reporting interval sizes; it actively infers tonal structure.

Where the tuning system provides symmetry and ambiguity, perception generates hierarchy and direction. This is tonal constancy, not a property of the tuning, but of the listener.


[data

Context-dependent functional reinterpretation of non-12-EDO pitch material


Pitch categories are assigned dynamically based on trajectory and expectation. 

The study tests whether functional identity emerges from musical context rather than from fixed interval size, such that identical pitch distances may be categorized differently depending on their role within a phrase.

Primary Hypothesis (H1)

Listeners trained in 12-EDO will produce convergent 12-EDO transcriptions of short musical phrases constructed from pitch selections that are mathematically unrelated to 12-EDO, indicating context-dependent functional reinterpretation rather than nearest-neighbor pitch matching.

Secondary Hypothesis (H2)

The same pitch interval may be transcribed as different 12-EDO functions depending on musical context, demonstrating functional multivalence driven by melodic trajectory and tonal expectation.


H3:
Non-musician listeners will judge 7-EDO stimuli and their convergent 12-EDO transcriptions as perceptually equivalent at rates significantly above chance. = duodecimability, and functional multivalence, lives at the level of perceptual meaning, not technical pitch awareness.



Null Hypothesis (H0)

Transcriptions will primarily reflect nearest mathematical proximity between pitch selections and 12-EDO pitch classes, resulting in divergent or unstable interpretations across participants. 

Design Justification . Why 7-EDO

7-EDO was selected as a test case because its step sizes (~k171 cents) are sufficiently distant from 12-EDO semitones to prevent trivial nearest-neighbor mapping, yet dense enough to support stable melodic motion and phrase-level structure. This makes it well suited for testing whether pitch interpretation is governed by mathematical proximity or by contextual functional inference.

Importantly, the choice of 7-EDO is methodological rather than aesthetic; any pitch system that decouples interval size from familiar categorical boundaries could serve the same role.

7-EDO maximizes functional ambiguity under minimal pitch density.

Low pitch count → forces category reuse
Rough correspondence to diatonic functions → enables tonal hallucination
Sufficient internal coherence → supports tonal constancy within itself(it has its own "atonality")

That combination does not hold easily for:

5-EDO → too sparse, rapidly becomes self-consistent → new grammar
High EDOs → too many anchors, difficult to test, requires a subset, introduces uneveness.

7-EDO is where the brain has to choose.




Most music theory, tuning theory, and even cognition implicitly ask:
What tuning is right? What intervals are consonant? What system is natural?

This work reframes this to:
Which structures are perceptually stable under distortion, and why?


It helps music cognition by telling us what doesn’t need to be controlled, fixed, or standardized because the brain already does it for free, by isolating a missing variable, a lot of research treats: Pitch categories, Scales, Tonality, as static objects.

The path-dependent, not just grid-dependent, dynamics of category reallocation over time is under-theorized:

The symbolic system so is downstream of perceptual stability, not upstream.

It helps explain cultural convergence without mysticism, Some structures, like 12-EDO, are attractors because they tolerate distortion better under predictive cognition.

---
Phase 1: First-Pass Categorization (fast inference): One listen, Forced commitment, No reflection
This captures: default priors, contour-based inference, tonal hallucination, duodecimability “snap”.

Phase 2: Reflective Reconciliation (error management): Unlimited re-listening, Corrections allowed
This captures: stability of the mapping, whether confidence increases or collapses, whether participants fight their first interpretation or reinforce it,.

A. Agreement metrics
Inter-participant agreement (within phase)
Intra-participant change (phase 1 → phase 2)

B. Direction of change
Do corrections move toward 12-EDO categories?
Or do they increase ambiguity?
Or do they collapse into a different stable mapping?

prediction:
most 7-EDO music → high convergence, stable reinterpretation
most 5-EDO music → low convergence, grammar rejection or re-learning

this experiment demonstrates:

Tonal cognition is path-dependent
Pitch categories are functional, not acoustic
Some systems invite hallucination, others demand learning
12-EDO is not “natural”, it’s flexible

Scope and Limitations

The study tests perceptual-cognitive mapping, not compositional validity
Results apply to short phrases and may not generalize to long-form music
Findings do not claim universality beyond enculturated 12-EDO listeners


]


Relationship with Shepard tones:


Shepard tones expose a perceptual symmetry in pitch space which relies on overlapping spectral components, octave-wrapped circular pitch causes and ambiguous vertical positioning on the pitch helix. This creates bi-stability, the same stimulus can be interpreted as “ascending” or “descending”
depending on which branch of the helix the brain commits to. But this requires a very artificial timbre. It’s not “natural” in musical terms.

The 7-EDO functional example is a parallel phenomenon but "natural", this is the difference:

The ambiguity is not caused by timbre or chroma wrapping.
It is caused by cadential expectation and functional reinterpretation.

The 171 cent jump is small to be a clear “whole step”, too large to be a clear “semitone”, ambiguous in scalar context, and can act as either a 100-cent role or a 200-cent role after functional remapping. This means the listener's tonal model decides the interval class, not the acoustics. This is a cognitive analog to Shepard’s perceptual ambiguity but purely musical.

This is exactly what is predicted by categorical perception, key-dependent interval class assignment, top-down functional bias, and tonal constancy mechanisms.

The example is a miniature interval multistability illusion, here the bistability is: major-step function vs minor-step function mapped onto the same absolute interval.

The 171-cent example shows interval identity is not acoustically fixed, functional context can warp interval categorization, listeners can be tricked naturally, not through artificial timbres and pitch-space can behave as a bistable perceptual manifold.


Deustch Illusion's


Deutsch describes a self-reinforcing perceptual loop, the “bootstrapping operation”:

-bottom-up cues: local intervals, sequential grouping, contour, roughness, spectral features
-top-down cues: stored tonal hierarchies, category expectations, Western pitch-class memories

The system settles into a coherent key + a coherent sequence despite ambiguity in the input.
But she is only talking about ambiguities inside 12-EDO. Not about the deformation of the system itself.

The hidden assumption of their entire debate: The underlying pitch lattice is stable, fixed, and accurate.

Here we extend this:

What if the entire pitch framework is warped?
How far can you stretch the lattice before the bootstrapping collapses?
How does the brain “repair” a scale that violates its statistical priors?

But the bootstrapping mechanism should still operate even under distortion.

Their theory quietly implies pitch flexibility, this lurks inside the implications.

Their model says: Local intervals provide sequential cues. Stored hierarchies provide tonal mapping, the system iterates until a stable interpretation emerges.

This is mathematically the same pattern as a stable fixed point under perturbation. If you slightly detune the fifth: the local interval cue moves, the hierarchical mapping adapts, the loop tries to settle into a new stable point

Neither Deutsch nor Krumhansl explored perturbing the system to see when this perceptual homeostasis breaks, but their mechanism predicts that there must be:

a region of stability (duodecimability)
a region of instability (collapse)
a boundary (“breaking point”)

The unasked question, everything from Krumhansl’s probe-tone curves to Deutsch’s illusions was done on the assumption that: the octave = 1200 cents, the fifth ≈ 700 cents, the diatonic steps ≈ 100/200 cents, pitch classes repeat with perfect diatonic symmetry.

What if we perturb the system?
How flat can a fifth be before tonal hierarchy collapses?
How stretched can an octave be and still be recognized?
How much deviation can a major third tolerate before category flipping?
How stable is the bootstrapping loop under systematic scale deformation?
How does the perceptual system “repair” wrong tunings?

If tonal perception is a dynamic attractor landscape with deformation tolerance; this becomes, cycle elasticity, equivoques, duodecimability layers, structural vs mnemonic constancy, melodic contour vs pitch topology, multiple tunings mapping to the same cognitive attractor.

The same bootstrapping mechanism might be responsible for:

octave constancy (why a stretched octave still “feels like” an octave)
scale constancy (why warped scales still produce diatonic functions)
melody recognition under pitch drift (the “Happy Birthday” experiment)
equivoques (same intervals → different structures)
inverse equivoques (different interval sets → same functional pattern)
perceptual inertia (where memory overrides tuning reality)

All of this falls out naturally from their bootstrapping framework.

How the mind finds a key when the tuning system itself is moving?


-----


This study by Lola Cuddy (1983) investigates octave flexibility across different contexts. Experiments show that musicians tend to prefer, on average, a slightly widened octave of about +20 cents, with discrimination increasing as melodic context becomes richer. The study concludes that a flexible choice of tuning is important. However, it does not directly address a key question: preference of the octave in what role, and serving what function?

As in many studies of pitch perception, the tacit definition appears to follow the framework of Western music theory, which combines mathematical relationships with cultural and perceptual assumptions. Within this framework, the octave is defined as the 1:2 frequency ratio and is treated as the “same” note. Cuddy’s results implicitly invite this interpretation.

Yet the “sameness” model requires several additional perceptual mechanisms to operate coherently: for instance, the pitch helix, bottom-up influences from harmonic timbre, and forms of categorical perception. In this perspective the octave is not merely a consonant interval with low sensory dissonance. Rather, it functions as a container of pitch categories, the repeating unit of pitch space, or a perceptual pole.

The elasticity reported in these early studies, even when using sine tones, demonstrates that pitch functions are not fixed acoustical values in sequential settings. This remains true even in experimental conditions where complex timbre is removed or manipulated, as in work by William Sethares. The requirement for slightly different octaves may therefore reflect predictive tolerance and commitment to a directional interpretation of the pitch helix, rather than revealing the “true size” of the octave. If the octave is defined as the 1:2 ratio, then its physical size is already specified. A different question emerges when we ask what characteristics a sound must possess in order to fulfill the role of becoming a pitch within a cyclic, categorizable structure. At that point perceptual flexibility enters the picture, while symbolic systems such as tuning theory emerge downstream from perceptual stability.

The enlargement of the subjective octave has been addressed in earlier explanatory models. Ernst Terhardt (1978) proposed that it arises as a byproduct of speech perception, resulting from nonlinearities in the auditory system. Similar reasoning has been used to explain the stretched tunings used in pianos. Others, such as Shinji Ohgushi (1983), suggested that it reflects a systematic bias in the temporal firing patterns of auditory neurons.

However, as Cuddy observes, introducing a major triad significantly improves octave accuracy, even when the tones involved are pure sine waves. Interpreted one way, a major triad corresponds closely to the first six components of the harmonic series. In effect, it provides the structure by which the pitch helix becomes established. The configuration of unison, major third, fifth, and octave functions as an ideal blueprint for pitch identity. It is not merely a chord; rather, it concentrates most of the perceptually relevant cues into a single configuration (1:2:3:4:5:6). In this sense it behaves almost like a “virtual timbre.” Under these conditions tolerance decreases, and the subjective octave requires greater physical accuracy. Similar effects appear in stretched diatonic experiments.

From this perspective one might even ask whether the observed shift reflects an octave stretched by +20 cents, or instead a unison compressed by −20 cents. Alternatively, the phenomenon might represent a tolerance window distributed symmetrically between them for instance −10 cents at the unison and +10 cents at the octave.

The physical octave is defined by the 1:2 frequency ratio. The subjective octave, by contrast, is the interval musicians locate in different contexts in order to fulfill the functional role of the learned octave. The perceptual or categorical octave the experience of sameness and return, can be understood as arising when the spectral structure of a sound reproduces itself across frequency scaling. When a harmonic spectrum is shifted by a factor of two, many of its components overlap or align with existing ones, creating a strong sense of structural repetition in pitch space. In the limiting case of the harmonic series this alignment is maximal, since doubling frequency preserves the same pattern of integer relations among partials.

Importantly, this mechanism differs from that of sensory consonance or roughness. Low sensory dissonance does not necessarily produce octave equivalence, nor does roughness prevent it. It is possible to construct timbres with very little beating or roughness in which no interval produces a convincing sense of pitch repetition. Conversely, even highly inharmonic or rough spectra can produce a clear cyclic pitch structure if their spectral patterns repeat across frequency scaling. In this sense, octave equivalence reflects the perception of recurrent spectral organization, rather than merely the absence of sensory dissonance.

------




Ξ Example A - 7edo
Ξ Example A - 12edo

Ξ Example B - 7edo
Ξ Example B - 12edo


(Image.1) This geometric visualization compares 7-EDO with the diatonic scale in 12-tone equal temperament on a logarithmic scale. Transposition of the 7-EDO structure yields identical intervallic relationships, whereas transposition of the diatonic scale reveals the seven familiar modes of 12-tone music.






The Mechanism


4: Layers of Duodecimability


Why start with duodecimability?


Because the 12-fold diatonic categorization whether its origins are cultural, biological, or both; appears to be the brain’s default prior for extracting tonal meaning. It is the predictive template that absorbs the largest amount of distortion, and it is remarkably difficult to “turn off.”

This is why randomly generated pitches can still form something that passes as 12-tone music, even when the underlying intervals are mathematically far from 12-EDO or just intonation. The familiar examples: pitch drift, shifting, detuning, entire performances slowly rotating away from their starting point; yet remaining perfectly recognizable, all point to the same mechanism.
The Bach-in-31-tones reinterpretation demonstrates this vividly: the listener treats the 31 pitches not as new categories but as 12 categories with imperceptible internal modulation. The brain prefers to reinterpret the entire signal as a warped version of 12-tone space rather than adopt a finer contour with more pitch classes.

The layers I describe for induodecimability therefore generalize. They apply to any sufficiently robust pitch structure, whenever tonal constancy locks onto it, whether it is 5-EDO, the Bohlen Pierce scale, or others. More broadly, these layers are layers of translatability between musical systems, understood not as fixed mathematical grids, but as auditory predictive templates with their own distortion-absorption capacities, precision thresholds, and tolerance for categorical warping. Among these, the 12-tone template appears to be the most flexible. (Later, in the section on cycles and the relationship between JND and JNM(just noticeable meaning)I discuss why a 12-fold partition might offer unusually high predictive utility.)

Likewise, the final layer, 4: substrate dependence; is not specific to duodecimal systems, but applies to any pitch model. This layer is included because it becomes relevant for musical practice and analysis. For instance, some music catalogued as “microtonal” is in fact post-pitch music: the “scales” are secondary, and pitch height is no longer the main axis of organization. In such repertoire, the semantics arise from the behavior of inharmonic spectra, noise processes, or timbral trajectories. These works are not merely “induodecimable”, they are unpitchable in the categorical sense. Pitch is not the substrate on which their meaning is built. 

Why Pre-Select Pitches at All?


Any tuning system begins with an act of selection: we carve a finite subset out of a continuous pitch continuum. Whether we choose 12-EDO, a just-intonation lattice, or a non-octave structure, this selection presupposes a grid. And the moment a grid is imposed, pitch becomes symbolic, something to be named, navigated, and reasoned with, rather than merely heard.

This raises the foundational question behind duodecimability:
What does pitch selection reveal about the perceptual forces that shape our sense of musical structure?

The Organology of Resistance: Why Frets and Notes Matter


A persistent assumption in microtonal discourse is that “fretless equals freedom”, that removing the grid grants access to an infinite field of pitch possibilities. Under tonal constancy, the opposite is often true.

The Gravitational Pull of the Fretless


On fretless instruments, intonation becomes a closed loop between the ear and the fingers. Because the auditory system continuously seeks harmonic-series alignment, familiar step sizes, and diatonic attractors (the “anti-randomness engine”), players unconsciously micro-correct toward culturally internalized targets. The result:

Fretless improvisation drifts toward just intonation or 12-EDO approximations.

“Microtonal freedom” often collapses back into familiar centers.
Without structural resistance, the instrument’s acoustics and the player's perceptual habits steer the music toward what the ear already knows.

Frets as Cognitive Prosthetics


Frets, keys, and fixed pitches are not restraints, they are tools of resistance. They freeze the geometry of an alternative system long enough for it to be inhabited on its own terms. By shifting navigation from psychoacoustic alignment to spatial/logical constraints (shapes, cycles, finger patterns), frets temporarily disable the brain’s corrective instinct.

They allow for structural alienation: the ability to function within an unfamiliar tuning without immediately reabsorbing it into 12-tone expectations.

Why This Matters for Duodecimability


Without such scaffolding, many “alien” systems are eroded by tonal constancy before they can be meaningfully explored. Fixed geometry protects them from the perceptual gravity of the listener and the performer.

These observations motivate the central question of this chapter: The Need for a Framework

How strongly does a tuning system gravitate back toward 12-EDO when filtered through human perception, performance practice, and musical habit?

Rather than treating this gravitational pull as an aesthetic defect or a perceptual failure, we can use it as an analytic tool. The concept of duodecimability provides a structured vocabulary for describing how alternative tunings interact with the 12-tone system, not as universal truth, but as our current cultural baseline.

What Duodecimability Measures


Duodecimability is not an evaluation of musical value. It is a practical measure of translatability: how easily a given system can be mapped, approximated, or “rescued” by 12-EDO expectation.

Identifying  layers, ranging from systems that can be subtly aligned with 12-tone tonality to those that resist assimilation even at their acoustic substrate. This framework allows us to distinguish:

systems that behave like dialects or variations of 12-tone practice, 
systems that partially align but diverge in key functions, 
systems that require structural scaffolding to maintain their identity, and 
systems that collapse entirely when filtered through the perceptual pull of tonal constancy.

This is not an argument about what tuning “should be” or which system is superior. Instead, it provides
a practical tool for microtonal composition, an explanatory model for instrument design, and a conceptual bridge between psychoacoustics and musical structure.

It clarifies why some tunings feel intuitively compatible with tonal expectations while others feel like entirely new musical species.
 
Layers of Duodecimability

A tuning system’s “duodecimability”(translatability) refers to the degree to which its pitches, functions, or perceptual structures can be interpreted (or misinterpreted) through the lens of the 12-tone system.

Each layer below marks a progressively deeper departure from 12-EDO as both perceptual default and theoretical grammar:
 
  • Layer 1: Mathematical Proximity + Melodic-Trajectory Tolerance

Layer 1 covers music and systems that remain close enough to 12-EDO for tonal constancy to maintain a stable 12-tone interpretation, even when the literal pitches deviate. Here lie the just-intonation 12-tone structures, meantone temperaments, and the hyper-diatonic subsets of large EDOs. These systems are essentially colorations or timbral optimizations of the 12-tone framework.

Music that employs microtonal inflection (blues bending, R&B melisma, Flamenco ornaments) still relies on the 12-tone categorical skeleton. The deviations provide flavor, expression, and emotional contour, but the functional grid remains unmistakably diatonic/duodecimal.

Similarly, many quarter-tone traditions (Arabic, Persian, Ottoman, etc.) use extra pitches ornamentally and melodically, enriching the expressive space of 12-tone logic without dissolving it. The added pitches are not themselves duodecimable, but the framework is, these systems exploit the flexibility of categorical perception rather than replacing it.

(An upcoming chapter expands this into the distinction between expression-level deviation and category-level function.)

  • Layer 2: Structural Divergence + Independent Predictive Templates

Layer 2 contains systems whose internal structure cannot be “absorbed” into 12-tone categorization. They require and reliably trigger their own predictive grammar once tonal constancy locks onto them.

5-EDO and the Bohlen-Pierce scale are strongly induodecimable: they can be forced into 12 categories only locally, superficially or momentarily, but their true logic emerges quickly once musical motion reveals their characteristic intervals and cadential behavior. (Conversely, 12-EDO is highly inpentable and in-BP-able.) This leads to the Equivoques.

7-EDO occupies a mixed zone (the system with the latent functional multivalence): some melodic movements are categorically incompatible with 12-tone logic, while others are close enough to feel modally diatonic and therefore fully duodecimable.


This boundary is not binary. It is a continuum of pitch relationships and motion patterns, where different systems reveal their character at different thresholds of contour, function, and expectation. Layer 2 is where divergent musical worlds become perceptually robust.


  • Layer 3: Substrate-Induodecimability (Timbral Dissolution)

A musical system reaches substrate-induodecimability(any translatability) when its sonic material no longer supports stable pitch perception. Here, the breakdown is not in tuning or tonal function but in the underlying spectral substrate. Inharmonic partials, unstable resonances, or nonlinear tone generators prevent the auditory system from assigning a reliable fundamental.
As a result, the 12-tone system, nor any other, cannot be inferred, not even approximately.

Gamelan metallophones, hyperstring instruments, ring modulation, and other inharmonic systems exemplify this condition. Their perceptual identity is defined not by pitch geometry but by timbral fingerprints and resonant patterns. While one can attempt to “translate” these sounds onto 12-EDO instruments, any such translation is interpretive rather than structural: it preserves personal associations, not categorical equivalence.

In this layer, pitch-based translatability collapses, not because the tuning diverges, but because pitch ceases to be the primary organizing coordinate of the music.

 ----


These layers don’t represent value judgments. They describe degrees of translatability, not superiority or purity. Each layer tells us more about how listeners (trained and untrained) perceive, categorize, and force-fit sound into symbolic boxes.

They also suggest that duodecimability is not a binary. It is a gradient, and perhaps a contested one: where you place a system may depend not only on its design, but on your listening history, your training, and your linguistic tools.

In the next chapter, we turn from classification to emergence: how tonal categories form in the first place, and what kinds of mental scaffolding make tonal constancy (and its resistance) possible.

The Equivoque Principle: Local Identity, Global Divergence

Induodecimability can be misunderstood as a purely microtonal phenomenon, a matter of "notes between the keys." However, the most profound breaks from the 12-tone system occur not when intervals are unrecognizable, but when familiar intervals build impossible structures. We call these Equivoque Scales.

The Equivoque: A sequence of intervals that appears locally identical to a 12-EDO structure (triggering Tonal Constancy) but which, upon accumulation, arrives at a destination that contradicts 12-tone logic.

Case Study: The 5-EDO Paradox: Consider the 5-tone equal division of the octave (5-EDO). Its second step is 480 cents. To a listener conditioned by 12-EDO, this falls comfortably within the category of a "Perfect Fourth" (500 cents). The 20-cent deviation is perceived merely as a "flat" or "mellow" character, a timbral flavor rather than a categorical change.

However, the grammar of the system depends on what happens when we stack them.

-In 5-EDO: Stacking five steps (5 * 480) yields exactly 2400 cents; a perfect double octave. The stack resolves into stability. It is a closed cycle. 
 -In 12-EDO: Stacking five Perfect Fourths (5 * 500) yields 2500 cents; a double octave plus a semitone. The stack creates tension and displacement.

The Failure of Translation

If a musician attempts to "translate" a 5-EDO piece based on local intervals, they will play a stack of fourths. But where the 5-EDO piece resolves to a stable octave, the 12-EDO translation lands on a dissonant minor second. The local translation (Note A -> Note B) was "correct," but the macro-translation (Structure A -> Structure B) collapsed. 

This reveals that Tonal Constancy operates on a "horizon of prediction." For short segments, the brain assimilates the 480-cent interval as a fourth. But as the segment lengthens, the accumulated error forces the brain to confront a new geometry. The "Equivoque" is the point where the map (12-EDO) no longer matches the territory: Local Similarity \(\neq\) Global Congruence.


The Geometry of the Fretboard

The difference between "tuning deviation" and "structural alienation" is best visualized on the guitar. In 12-EDO, stacking Perfect Fourths (500c) overshoots the double octave (2400c) by a semitone (2500c). To correct this, standard inter-string tuning introduces an asymmetry: the interval between the G and B strings is shortened to a Major Third (400c). The symmetry of the instrument is broken to satisfy the cycle of the octave.(And play chords "ergonomically")

In 10-EDO (or 5-EDO), the structural "fourth" is 480 cents. Stacking five of these intervals yields exactly 2400 cents (480 * 5). On a guitar refretted for 10-EDO, the inter-string tuning becomes perfectly symmetrical (4,4,4,4,4 steps) while still locking into the double octave.

This creates an "Equi-Pentatonic" chord on the open strings, a sound that is locally recognizable (stack of near-fourths) but globally "impossible" in 12-EDO logic. It is a system where the geometry of performance becomes fundamentally different.

Induodecimability is not just about "weird notes", it is about different geometries of connection.


The Two Families of "Equivoques" (structural vs mnemonic)

There are two routes by which intervals → structure can happen and they are not the same phenomenon.

1. Structural Equivoques

Equivoque Duality Principle:
For any perceptual mapping that preserves local interval categories while altering global structure, there exists a complementary mapping that preserves global structure while altering local intervals. (This corresponds the each tonal structure flexibility and reinterpretation tolerance)

(purely perceptual or topological, independent of musical memory)

Equivoques: same small intervals → different global cycle (the 5edo subfourths example).

Inverse Equivoques: : different small intervals → same global cycle (the stretched diatonic examples, or local perturbations of the 5edo subfourths)

This is the domain of the stretched-diatonic examples: the octave from 1200 → 1150 (or 1250).

Every interval is slightly warped. But the tonal grammar (scale degrees, melodic motion, cadential weight) stays intact. The listener perceives “the same melody” even if they’ve never heard it before.
This type relies on total scale interval flexibility.

The brain stabilizes identity based on internal relational geometry, not raw acoustics, assumes “there is a cycle here” and finds the closest consistent one.

2. Mnemonic

(contour-based identification via stored templates)

This is not the same mechanism. The “Happy Birthday drifting in pitch” experiment belongs to a different category.

The massively distroted tune, chromas land in the “wrong” positions, the interval sizes are inconsistent, but its still perfectly recognizable because now the brain is using a top-down stored pattern matching the melody to a memory template; contour dominance over interval precision.

Up/down motion + rhythm is enough to trigger recognition, identity-from-template, not identity-from-geometry

This kind of recognition does not imply a robust internal structure in the new tuning system.
It implies melodic memory, not tuning tolerance. This does not mirror equivoques, it’s formally separate.

Here we are just recognizing the song, not making a "literal" transcription of the incoming pitches.

---


The Optimization Trap:

It is a common misconception that "more notes" equals "more alien." Systems with high step counts such as 19, 31, 53, or 72-EDO are often grouped with radical microtonality. However, under the lens of Tonal Constancy, these systems often function not as departures from the 12-tone framework, but as Hyper-Diatonic Optimizations.

The Availability of "Better" Notes

In systems like 31-EDO or 72-EDO, the density of pitches is so high that the system acts as a "super-set." Within this vast array, one can easily select a subset that approximates 12-EDO intervals with greater precision than 12-EDO itself (e.g., finding a "pure" 5:4 Major Third).

The Effect: Instead of forcing the listener to confront new categories (as 8-EDO does), the composer, consciously or not, is tempted to overfit the pitch selection to known templates.

The JND Threshold

This reaches a critical limit in 72-EDO, where the step size (~16.6 cents) approaches the average Just Noticeable Difference (JND) for pitch in melodic contexts.

At this level of granularity, the step is no longer a structural "brick" it is a nuance.

Tonal Constancy engages effortlessly here. Because the grid is finer than the brain’s categorical error margin, any pitch can be slid perceptually into a standard 12-tone bin.

Rational Metaphysics

Consequently, much of modern microtonal theory has been directed not toward escaping the diatonic gravity well, but toward deepening it. The focus often shifts to a quest to justify 12-tone musical habits using the "purity" of Just Intonation ratios. This approach seeks to "fix" the commas and beating of Western music, perfecting the very structure it claims to expand).

Conclusion on Density

Therefore, high-density systems are not inherently induodecimable. Unless the composer rigorously avoids the "diatonic attractors" hidden within the swarm of notes, these systems tend to collapse back into Layer 1. They sound like "better" versions of the familiar, whereas lower-density, structurally incompatible systems (like 10-EDO) sound fundamentally different because they offer no place to hide.


Example of Duodecimability Using All 31-EDO Notes

A transformation of sensory input that preserves perceptual invariants.

The audio/video example below is a reinterpretation of Bach’s Goldberg Variation No. 1.
This variation famously uses all 12 pitch classes while remaining firmly diatonic; an early peak of Bach’s polyphonic chromaticism.

Here, however, the piece is performed on a 31-EDO sampled clavichord, and the adaptation makes use of nearly all 31 available chromas across the octaves.

The obvious question is: why doesn’t the music collapse?

Subtle Modulations, Stable Structure

In 31-EDO the step size is 38.7 cents, so many intervals sit near multiple possible 12-EDO interpretations. For example:

-the semitone can be realized as 77 or 116 cents
-the tritone can appear as 580 or 620 cents

Because of this, several 12-EDO mappings are always available not only via mathematical proximity, but also via melodic trajectory, voice-leading weight, and contextual expectation. Interval function is more flexible than the grid suggests, as shown earlier in the 7-EDO “functional multivalence” examples.

Thus even though the pitch set is far denser and every chroma is eventually touched, the music remains:

-tonal
-diatonic in function
-fully duodecimable

This is not “microtonal structuralism.” It’s microtonal refinement + auditory illusions, very similar to pitch-drift and Shepard-tone-style ambiguity.

Is 31-EDO Structurally Alien?

Yes and no.

31-EDO has its own harmonic logic and can certainly support non-12-tone structures. But because its pitch density is so high, you can choose to: selectively improve consonances, or subtly colorize a keys.
…while still remaining within familiar 12-tone perceptual categories.
You don’t leave the “12-EDO flavor”, you simply refine, bend, and tint it.

This is duodecimability in action: many microtonal deviations still funnel back into 12-tone percepts when context supports them.



Video Description

The video shows two circular pitch-class displays:

A 12-EDO clock, marking the original chroma classes of the Bach score.
When each pitch class is used, it remains highlighted with a different color.
(By halfway through the piece, Bach has used all 12.)

A 31-EDO clock, showing all pitches actually played in the 31-EDO adaptation.
As each chroma is used, it remains marked showing how the entire 12-tone structure “moves” within a larger 31-tone space.

The result is a visualization of how the whole wheel of 31 notes gets painted over time, while the music itself stays remarkably stable.

5: The Spectrum of Familiarity: Microtonal Flavor vs. Functional Break


Music doesn’t become "otherworldly" just because it uses strange intervals. In many cases, it is ornamental, expressive, a kind of seasoning, a flavor layered atop an underlying structure that is still resolutely tonal.

The difference between microtonal flavor and functional departure is a spectrum, not a binary. But it’s crucial because it defines whether a piece of music is interpretable, translatable, or cognitively disorienting. And that distinction hinges on tonal constancy: whether the listener can still rely on familiar perceptual anchors, tonic, cadence, resolution, even as the tuning system mutates around them.
 
Historic Flavors: Chopin and Meantone Coloration

A famous example: Chopin referred to D minor as the “saddest” key.

At first glance, this seems metaphysical or poetic. But in fact, there was a physical reason. During his time, many pianos were tuned in meantone temperament, a system optimized for certain intervals using simple integer ratios. While 12-tone equal temperament (12-EDO) was theoretically known and even in use it was still rare for instruments to be tuned to it precisely. Ear-based tuning methods favored rational approximations. Algorithmically, meantone was simply more practical before electronic tuners.

The result: each key had a unique color, a subtle deviation in interval sizes that made D minor sound distinctly different from, say, B minor. These were microtonal inflections, not fundamental departures. The harmonic framework remained diatonic. What changed was the flavor profile of each key.
 
Modern Examples of Flavor: Bends, Blues, and Maqamat

Today, the idea persists in many styles:

Blues music bends between notes of the pentatonic and chromatic scales, sliding into pitches that don't "exist" in 12-EDO notation. These expressive bends act as stylistic inflections, not harmonic challenges. The tonic remains the tonic. 
 
Arabic Maqamat and Persian Dastgah systems incorporate quarter-tones and nuanced scalar steps, often creating pitches "between the keys." Yet these systems still rely on cadential logic and tonal gravitation. The microtones serve as ornaments, bridges, flavors. They rarely seek to dissolve the entire structure, they aim to enrich it.

In both cases, duodecimability remains possible, even if imperfect. A skilled listener can still find the center of gravity. These are flavored tonalities, not alternative logics.
 
Functional Break: When Tonal Constancy Fails

What happens, though, when the system no longer submits to interpretation?

Below is an example from 8-EDO, a tuning system that divides the octave into eight equal steps (150 cents each). It contains two maximally symmetric diminished scales, and enough pitch density to form chords and melodies. However, the logic of this system is non-diatonic by design.

Try to map its harmonic progressions to 12-EDO, and tonal constancy breaks. No amount of perceptual coercion or melodic expectation can fully translate its motion. The listener doesn’t "mishear" it as tonal ,they simply hear it as strange.

Why? Because 8-EDO sits out of phase with 12-EDO. There are no simple ratios shared between their step sizes. Their intervals don’t approximate one another; they contradict each other.(Except for the diminished scale, or 4EDO) This is the threshold at which duodecimability fails entirely. Translation is not fuzzy, it is impossible.
 
Octave Retention vs. Structural Alienation

Interestingly, 8-EDO still uses the octave as a repeating unit. This gives it a slight advantage in group performance and instrument design: parts can be transposed, ranges can be shared.

Compare that to the very similar scale, Bohlen-Pierce (13-ED3), a system that replaces the octave (2:1) with the tritave (3:1). While rich in harmonic possibilities (especially with odd harmonics), it loses the universal reference point that the octave provides. The result: true structural alienation, especially in chordal writing. Melodies still function, but harmonies drift into perceptual limbo. An approximate 1.96 ratio, close to the octave, exists, but it is harmonically incoherent with traditional instruments.

This is why 8-EDO, though less famous, can feel more playable. Its symmetrical design makes it excellent for exploring alien harmonic functions while maintaining just enough structure for ensemble use.

(So 8edo, Bohlen-Pierce share many equivoques, similar local intervals but contradicting global structures, some of their scales are in phase briefly)
 
The Takeaway: The Diatonic Ghost is Hard to Kill

Even in highly divided systems like 19, 22, or 31-EDO, often used for their greater consonance or intonation precision, diatonic templates resurface. Musicians use them to better approximate known categories, not to invent new ones. In fact, the higher the division, the more tempting it becomes to overfit microtonal pitch space to traditional harmonic roles.

By contrast, systems like 8-EDO or 10-EDO, low-subdivision tunings that avoid rational alignment with 12-EDO, offer fewer handholds. Their symmetry, spacing, and internal logic prevent easy mapping. They don't flavor tonal music, they replace it.

These systems are functionally distinct, and their progressions defy tonal constancy. This is the boundary line: where the mind stops hearing “altered chords” and starts hearing new grammar.
 
Closing Note

The difference between flavor and functional break is not merely theoretical. It defines whether music can still operate within a shared perceptual vocabulary, or whether it demands the invention of a new one.

In the chapters ahead, we’ll explore this boundary more formally: how tonal categories form, and what kinds of cognitive attractors allow or prevent the perception of coherence when pitch structures drift too far.

Or put more provocatively: when does a microtone become a mutiny?







(Video.01 - Color-coded octave equivalence)

Video.01: Octave equivalence is demonstrated through a common chord progression exhibiting a known tension-resolution characteristic: \(\text{V}_7 \to \text{I}\). Within a 12-tone equal temperament (12-EDO) framework, with middle C standardized at 261 Hz, the progression \(\text{G}_7 \to \text{C}\) is employed. An initial sequence, represented in MIDI format, comprises approximately one pitch class per chord. Subsequent sequences introduce randomized octave doublings of chord members, illustrating the preservation of harmonic function and tonal meaning. Introduction of other random intervals in further sequences results in the loss of this harmonic function. While the octave's significance may appear self-evident within certain modern consonance models and given the observed perceptual flexibilities, such examples serve to reaffirm its fundamental role. The synthesized sounds in these examples utilize sine waves, thus eliminating timbral complexity and ensuring that the observed pitch grouping is independent of partials.

The perceptual flexibility of the octave and its role as a framework for monophonic melodic structure are demonstrated through a series of audio examples. Each example features a 12-EDO diatonic major scale subjected to proportional stretching. The notes of the scale are presented sequentially, followed by a short melody, to illustrate the preservation of tonal meaning and relative intervallic distances despite the stretching. This process results in a relative error distribution of less than 10 cents between adjacent notes. Specifically, audio example 1 features a stretching of the octave from 1200 cents to 1150 cents, while audio example 2 features a stretching from 1200 cents to 1250 cents.

This is Categorical Perception again. The brain prefers a "closed loop" topology (a circle) over a line, so it will bend the data to close the circle. (These stretched diatonic scales are the inverse of the equivoques, where similar intervals stack onto a different macrostructure, here different intervals stack and stand as the same macrostructure)


(Audio.01) 12-EDO diatonic stretched to 1150cents.


(Audio.02) 12-EDO diatonic stretched to 1250cents.

(Audio.03) Auditory stimulus used in pitch distance estimation tests. A sequence of randomly generated pitches with constant, randomized step sizes is presented. Participants estimate the overall interval between the first and last pitches. Step sizes and number of notes are withheld from participants to prevent calculation-based responses. 




Resolution Isn’t Enough: From JND to Musical Meaning


Even with modern tuning systems (19, 22, 31, 72) where we can finely sculpt intervals, not every perceptible difference becomes a category.

This is the core distinction:

  • JND (Just Noticeable Difference): “Can I tell this is different?”
  • JNM (Just Noticeable Meaning):Categorical Distinctiveness: “Does this difference mean something?”
This distinguishes Psychophysics (what the ear can do) from Semantics (what the mind acts upon). This is the missing link in microtonal theory.

Tonal constancy plays a key role here. It resists giving category status to subdivisions that don’t contribute to the known structure. It insists on interpreting ambiguous or fine distinctions as versions or “flavors” of known categories, not entirely new ones. This is why microtonal intervals so often get perceived as “bent” versions of 12-tone intervals, unless the tuning is radically unfamiliar or the anchor density is too low to allow reinterpretation.

So we get plateaus: 5, 7, 12. These are systems where the perceptual return on complexity is high—where each added step adds not just a difference, but a function.

Conclusion: The Brain’s Great Synthesis

Our musical categories arise not from arbitrary cultural evolution nor fixed laws of physics. They are emergent solutions, stable balances between two incompatible demands:

The brain’s love of symmetry, compression, and predictability.

The world’s asymmetrical offerings, in the form of acoustic resonances and harmonic structure.

The cycle gives us the container. These two forces tell us how to divide it. And Tonal Constancy is the mechanism that enforces these learned partitions—interpreting incoming sound as near or far, familiar or strange, anchored or drifting.

The categories we use are not infinite, not because we can’t tell the difference—but because only some differences rise to the level of meaning. That is the blueprint of pitch. A perceptual system always balancing what it could sense with what it must make sense of.






Neural Mechanisms and Predictive Models of Tonal Constancy

Neuroimaging and electrophysiological studies reveal that specific regions of the auditory cortex selectively respond to structured sound patterns such as speech, melody, and harmonic sequences, but remain relatively inactive during exposure to unstructured noise. Notably, areas such as the planum temporale, located posterior to the primary auditory cortex, appear to engage dynamically when pitch structures exhibit internal regularities, even when those regularities are statistically subtle or culturally learned.

These regions are not merely passively decoding incoming sound, they participate in an active predictive process. The brain constructs internal models of melodic or harmonic progression, and generates expectations for future events. When a pitch contour unfolds predictably, it minimizes error between the expected and actual input, triggering dopaminergic reward responses in associated circuits such as the nucleus accumbens. These reward-linked responses, observed even in anticipation of musical climaxes, suggest that successful pattern prediction is inherently satisfying, reinforcing the learned tonal templates over time.

Such findings align well with a Bayesian perspective: the brain updates internal priors based on the statistical structure of the sound environment, forming what we might call tonal basins, perceptual attractors that stabilize around culturally salient pitch configurations. These basins guide pitch interpretation even when the physical signal is ambiguous, distorted, or derived from non-standard tunings (e.g., 7-EDO or inharmonic timbres). The result is tonal constancy, rooted in statistical expectation, contextual prediction, and hierarchical sensory processing.




draft



Exhibit A: The Anti-Randomness Engine

These musical examples belong to a larger work in which I explored the role of randomness in pitch selection, beginning with the question of what randomness even means in a musical context. For the purposes of that study, I treated randomness as non-intentional design.

The research (see [link]) investigates in depth why apparently random pitch sets remain musically functional, often without producing the sense of “oddity” one might expect. While this section does not include all of those examples, some pitch systems are far less straightforward to analyze than the 7-, 8-, or 10-EDO cases presented here, the broader study incorporates sets derived from planetary data, mathematical functions, noise distributions, and other sources.

What unites these varied systems is that, despite escaping the 12-tone grid in every permutation and defying conventional tuning logic, they still exhibit traditional musical utility. In practice, many of them do not even sound “microtonal.” Instead, musical intent, cognitive expectation, and perceptual organization stabilize them, allowing listeners to hear them as familiar or “normal.”

The key finding is simple:
Uniform subdivisions of the perceptual cycle, even allowing for clustering or irregular spacing, still guarantee tonal functions.

This “anti-randomness engine” illustrates how music perception is less about strict mathematical grids and more about how the mind organizes trajectories into meaningful tonal categories.

This study began by asking if randomness destroys musical coherence. The evidence suggests the opposite: randomness often ensures it.

We find that Randomness and High Density function as a "perceptual mirror." The sheer statistical spread of random systems provides enough "anchors" (familiar intervals approximating 3:2 or 2:1) that the brain's error-correction mechanism (Tonal Constancy) can effectively project a 12-tone grid onto the chaos. The listener does not hear the randomness; they hear the subset of it that resembles what they already know.

The creation of genuine "new notes", does not arise from chaos, but from Specific, Alien Structure. It emerges in systems like 5-EDO, 8-EDO, or non-octave scales (Bohlen-Pierce), where the internal geometry is so rigid and "induodecimable" that the brain is forced to abandon its diatonic priors.

In the end, to escape the gravity of the 12-tone system (learned or not), one cannot simply roll the dice. One must build a new geometry that is robust enough to stand on its own.






Start with 12edo music: how far can you stretch the interval sizes without distorting their meaning? Where do they switch categories?

It seems that if each pitch in a 12edo set is randomly varied within ±20 cents, the structure remains perceptually stable we still hear it as “12edo.” These are duodecimable tunings: systems that, despite numerical deviations, are still interpreted as 12-tone.

This resilience might be due to biological factors (like pitch discrimination thresholds and auditory memory) and/or cultural familiarity. But even that’s not the whole story.

When you introduce pitch trajectory, motion, gesture, melodic phrasing, the categories become even more flexible. A note that might have sounded “too far” statically can function perfectly well within a musical phrase, even shifting its perceived identity based on context.

So, duodecimability is not just a mathematical condition. It’s contextual and perceptual. The brain doesn’t simply match frequencies, it interprets relationships, motion, expectation, and structure.

This has real consequences for microtonal music and composition. Flavor is one thing, the unique color of a tuning, but structural category is another. Tonal constancy means that pitches don’t just sound similar; they mean similarly, and meaning is shaped by use.

In this light, new tonal spaces aren't just a matter of dividing the octave differently. They demand both careful pitch selection and deliberate usage, exploiting or bypassing tonal constancy depending on the desired perceptual effect. Composition becomes an active negotiation between new materials and old cognitive habits.

Meaning-to-Noise Ratio (MNR)

How much of the perceived structure carries interpretable, repeatable form vs how much is background variation or "interpretive entropy".

Lower MNR → ambience, texture
Higher MNR → form, hierarchy, prediction:

±5 cents noise → likely still high MNR
±25 cents → MNR drops, unless trajectory/symmetry/surface redeems it
Non-periodic microtonal sets → potentially high MNR if duodecimable, low if not



References / Further Reading:
Albert Bregman -  Auditory Scene Analysis 1990
Diana Deutsch -  research on auditory illusions 1999
David Temperley -  The Cognition of Basic Musical Structures
Maurice Merleau-Ponty -  Phenomenology of Perception
Don Ihde - Listening and Voice: Phenomenologies of Sound
Easley Blackwood research on EDOs
Sethares - Tuning, Timbre, Spectrum, Scale
Burns & Ward (1978) - Categorical Perception of Musical Intervals
Plack et al. (2005) - Pitch: Neural Coding and Perception
Zatorre & Halpern (2005) - Brain regions for pitch category memory
Tillmann et al. (2000) - Implicit Learning of Tonal Structure
Lerdahl's Tonal Pitch Space (2001)





Tonal Cognition and Quantum Search Analogies


The analogy to quantum amplitude amplification should be understood as a conceptual description of global hypothesis redistribution, closely related to attractor dynamics and winner-take-all competition in neural models of perception.

Grover-like amplification ≈ attractor dynamics in neural systems.

Tonal listening can be modeled as a probabilistic search process over competing tonal hypotheses. Musical cues act as operators that progressively amplify compatible interpretations while suppressing others. The resulting dynamics resemble continuous amplitude amplification, analogous to Grover search, where probability mass rotates toward a target state. In neural terms this corresponds to attractor dynamics within predictive processing systems, where stable tonal centers emerge as low free-energy states of interpretation.

Musical feeling becomes a visible trace of inference dynamics.



An interesting analogy can be drawn between tonal cognition and mechanisms used in quantum amplitude amplification algorithms, particularly those related to Grover's algorithm. Although the comparison is not meant literally, the mathematical intuition behind these algorithms provides a useful conceptual framework for understanding how listeners stabilize tonal interpretations over time.

At any moment during listening, the auditory system can be thought of as maintaining a distribution over possible tonal interpretations. A listener may simultaneously entertain several competing hypotheses: that a passage belongs to a particular key, that a certain pitch functions as a tonic, that an interval acts as a leading tone, or even that the passage is not tonal at all. Before sufficient musical context accumulates, none of these interpretations is fully determined.

In this sense the perceptual state resembles a superposition of tonal hypotheses. Importantly, however, the initial distribution is not uniform. Extensive exposure to tonal systems, such as the twelve-tone equal-tempered framework, creates strong prior expectations. As a result, familiar tonal structures begin with higher probability weight than unfamiliar alternatives. This situation resembles a quantum search initialized with a biased amplitude distribution, where some candidate solutions already carry greater weight because they are more strongly expected by the system.

In quantum search algorithms, the role of the oracle is to mark candidate states and allow subsequent transformations to amplify their probability. Musical events play a somewhat analogous role in perception. Individual chords, melodic motions, and rhythmic articulations act as cues that selectively reinforce or weaken particular tonal hypotheses. For instance, a dominant-seventh chord strongly favors certain key interpretations; a cadential progression sharply increases the likelihood of a specific tonic; conversely, an unexpected harmonic event can reduce confidence in previously favored interpretations.

Within predictive models of perception, these events effectively reshape the error landscape of competing tonal hypotheses. Each new observation alters the relative plausibility of candidate interpretations, not simply by incrementally adding probability mass but by suppressing alternatives as well. In this respect the dynamics resemble amplitude amplification: evidence does not merely accumulate but redistributes probability across the entire hypothesis space.

As a piece unfolds within a coherent tonal grammar, repeated cues gradually concentrate probability on a single tonal interpretation. Cadential events illustrate this particularly clearly. Prior to a cadence, several tonal centers may remain plausible. As the cadential progression unfolds, the probability associated with the intended tonic increases rapidly while competing interpretations lose support. After the cadence, the system effectively commits to a single tonal interpretation.

This description is compatible with Bayesian models of perceptual inference, yet the quantum analogy highlights an additional property of tonal cognition: updates operate globally across the hypothesis space rather than strictly on a note-by-note basis. Predictions interact with incoming information, reinforcing some interpretations while actively suppressing others, producing a winner-take-all stabilization of tonal structure.

Another useful parallel concerns robustness. Quantum amplitude amplification algorithms are designed to tolerate moderate noise or imperfect marking of candidate states while still converging on the correct solution. Tonal perception shows similar resilience. Even when pitches drift, tuning systems are distorted, or melodic lines deviate from strict intonation, listeners typically maintain a stable tonal interpretation until accumulated deviations exceed a perceptual threshold.

From this perspective, learned tuning systems such as twelve-tone equal temperament function partly as cognitive stabilization mechanisms(an "error-correcting code"). Once internalized, they provide discrete categorical anchors toward which continuous pitch variation can be perceptually corrected. Melodic expectation and harmonic context repeatedly steer pitch interpretations back toward these categories, maintaining stable tonal structure even under conditions of acoustic variability.

Although the analogy to quantum search should not be taken as a literal claim about neural implementation, it offers a useful conceptual model for tonal cognition. Musical perception can be understood as an iterative process in which competing tonal hypotheses interact, reinforce, and suppress one another until a coherent interpretation of the tonal environment emerges.

Tonal interpretation may therefore be understood as the emergence of an attractor state in a dynamically competing hypothesis space, where musical events progressively amplify one tonal interpretation while suppressing alternatives.



Tonal tension and resolution behave very much like a continuous search process converging to an attractor which is extremely similar to continuous versions of Grover search used in quantum mechanics.

The standard picture of Grover's algorithm is discrete: superposition of states, oracle marks target, amplitude amplification, collapse to solution. But physicists also describe Grover search as a continuous rotation of probability amplitude between states. Probability gradually shifts from: many possibilities -> target state, through repeated small transformations.

Tonal perception behaves similarly, in tonal listening the brain maintains something like a probability field over tonal interpretations.

possible states: tonic = C, tonic = G, tonic = F, modal / ambiguous, non-tonal interpretation,

At the beginning of a piece probability is spread across many states, as music unfolds dominant chords, leading tones, cadential motion, gradually rotate the probability distribution toward one tonal center. This is not a sudden update, it’s a gradual steering process, exactly what happens in continuous Grover dynamics.

Cadences work because they apply multiple reinforcing cues simultaneously.

Example: V → I
Contains: leading tone resolution, dominant function, root motion, harmonic expectation.

All cues point to the same attractor so the system rapidly converges.

This connects directly to tonal tension which happens when probability mass is still distributed. Resolution happens when one attractor dominates.

That explains several musical phenomena:

suspension: probability temporarily ambiguous
deceptive cadence:amplification redirected to unexpected attractor
modulation: system slowly steered toward new attractor basin
tonal ambiguity: multiple attractors competing

The Free Energy Principle basically says biological systems minimize prediction error / surprise. In tonal listening brain tries to maintain a stable generative model of the music.

If the model predicts well: low free energy, stable tonal center.
If not: model updates, new tonal interpretation.

So tonal attractors are essentially low free-energy states of musical interpretation.

12-EDO acts like a cognitive error-correcting system because tonal perception does something like: continuous pitch input -> snap toward discrete tonal categories. Just like phonemes in speech. Even if tuning drifts, the brain pulls notes back into stable tonal bins, classic attractor behavior.

Tonal harmony starts to resemble navigation through an energy landscape.



When you enter 5-EDO, Bohlen-Pierce, or 8-EDO, etc (read section on who and how effectively escapes 12-tone expectations): predictions fail,categorization becomes impossible, the “oracle” cannot mark the expected tonal states, the amplitude distribution becomes uniform or chaotic.

This is the same as quantum decoherence; the superposition no longer evolves according to its internal logic because the environment (the music) is incompatible with the system’s internal basis.

At that point the brain resets and builds a new basis, a new grammar.

This gives a high-level parallel:

High duodecimability: coherent evolution in a stable basis.
Low duodecimability: decoherence and basis reconstruction.




----tonal congnition (draft outdated)

---

Excursus?

Enculturation versus perception in musical priors

A necessary distinction?: ' Enculturation shapes aesthetic expectations, but core perceptual capacities for pitch, mode, and “musical object” recognition are strongly suggested to rest on deeper priors. ' 

Where priors come from? A braid of biophysical constraints, quick statistical learning, and supramodal gestalts. Enculturation tunes expectations and aesthetics; it doesn’t originate the perceptual capacity to differentiate modes or infer tonal centers.

Historical modal practices and contemporary perceptual evidence suggest that enculturation shapes style and affect more than the base perceptual architecture. Probabilistic models succeed because they ride on those deep constraints; expectation adapts quickly, but perception has anchors that predate and outlast any single culture’s twelve-tone dominance.

The critique that “humans aren’t statisticians” targets conscious, post-qualia decisions.
In music this is not the domain where key, mode, and tonal anchoring are primarily computed; those are largely unconscious, rapid, and pre-attentive processes (perceptual inference rather than explicit calculation).

Bayesian-like models describe how priors shape unconscious perceptual interpretation (grouping, tension, resolution) without implying literal equation-solving.
Key and meter perception, tonal attraction, and hierarchical structuring in music are well-modeled probabilistically and emerge even with noisy or partial input, consistent with “perception-as-inference.” (Temperley)

The origins of musical priors

  1. Biophysical constraints (innate/early-developing): 
    Mechanism: Harmonicity, periodicity, and fusion/segregation tendencies bias auditory scene analysis.
    Implication: Chordal fusion, consonance/dissonance gradients, and tonal anchoring arise from how spectra organize percepts, not just from exposure.
    Support: Harmonic complex tones fuse; nonharmonic complexes yield diffuse pitch; composers exploit these grouping tendencies. These regularities are robust across listeners and styles.

    These constraints don’t hard-wire a specific musical system; instead, they bias perception toward certain stable organizations, making some tonal grammars more naturally learnable, stable, and cognitively efficient than others. 

  2. Probabilistic structure learning (developmental): 
    Mechanism: Brains learn distributions of events (tones, intervals, meters) via statistical learning (an efficient way to refine priors).
    Implication: Exposure tunes which tonal centers and metric patterns feel “normal,” but on a scaffold already constrained by biophysics.
    Support: Probabilistic/Bayesian models of key and meter capture expectation, ambiguity, and style differences without requiring explicit calculation; more broadly, statistical learning is a well-established cognitive mechanism.

  3. Cross-modal scaffolds and gestalts (supramodal): 
    Mechanism: Shared organizational principles (shape, hierarchy, attraction) span modalities—music recognized as gestalt objects in time
    Implication: Tonal “gravity,” metric hierarchy, and object-like recognition of phrases parallel face and word robustness to distortion.
    Support: Tonal-metric correlations and “tonal gravity” style arguments point to hierarchical, attractor-like structures beyond mere enculturation.



Enculturation versus perception

Enculturation amplifies stylistic expectations and moral/aesthetic coding but does not create the underlying capacity to perceive modes or recognize tonal centers. Evidence from ancient cultures, Chinese, Vedic, Greek modal practice and Mesopotamian lyre tunings indicates functional differentiation of modes long before mass musical exposure, suggesting the perceptual system could already detect structured tonal organizations, distinct rotations of the diatonic scaffold (enough to attribute character) without uniform enculturation; before hyperstandardized exposure.
This convergence does not prove universality, but it supports the idea that perceptual systems readily detect structured tonal scaffolds and treat modal shifts as meaningful reorganizations rather than arbitrary stylistic conventions. (Plato banning modes, Plutarch's Mode moral characters, Mesopotamian modal associations e.g Saturn the most restrictive or malefic was the tritone starting scale, the Locrian mode as we call it now, diabolus in musica, etc)


Model-integrated view:  
Perception is constrained by auditory biophysics (harmonicity, fusion, periodicity) and gestalt grouping.
Expectation is tuned by statistical exposure (which keys, cadences, meters are common), accelerating predictions and stylistic fluency.
Affect/morality is shaped by culture, rhetoric, and pedagogy; layered on top of perceptual distinctions rather than generating them from scratch.

Probabilistic models explaining key and meter (and ambiguity/tension) work precisely because they sit atop constraints that make certain organizations perceptually salient; enculturation modulates distributions but doesn’t conjure the perceptual substrate ex nihilo.

Culture rides perception.

1 Biophysical priors: constrain what can become musically salient
2 Statistical learning: tunes expectations within those constraints
3 Gestalt / supramodal organization: gives music object-like identity
4 Enculturation: shapes meaning, affect, style, moral coding


Musical perception reflects a layered system of priors. Auditory biophysics biases the brain toward harmonic fusion, periodicity, and hierarchical grouping; statistical learning rapidly tunes these biases into stylistic fluency; and culture overlays meaning, affect, and convention. Enculturation shapes what we expect, but deeper priors shape what we can perceive as structured, anchored, and musically “object-like” in the first place. 


----

Configural Processing in Face Perception and Music Perception

An instructive analogy can be drawn between face perception and music perception. In both domains, the brain does not simply process raw features independently; instead, it organizes sensory input into structured perceptual gestalts. When that organization is disrupted for example through scrambling, inversion, or distortion distinct neural systems become engaged, revealing the mechanisms normally responsible for coherent perception.

The comparison is not intended to suggest that the same neural machinery processes faces, typography, and music. Rather, the analogy highlights a shared computational principle across perceptual domains: sensory systems rely on probabilistic inference combined with configural organization, allowing them to stabilize meaningful objects while absorbing different levels of distortions in the signal.

A useful way to describe this organization is through a three-level framework frequently discussed in the literature on face perception.

Face Recognition: Configural and Holistic Processing:

Research summarized by Mike Burton (2024) describes face perception as operating through several interacting levels of organization.

Holistic processing.
Facial features are bound into a single perceptual object. Observers do not experience isolated elements such as “an eye,” “a nose,” or “a mouth”; instead they perceive a unified face.

First-order configural processing.
This level detects the canonical layout that defines the category itself for example, eyes positioned above a nose, which in turn sits above a mouth. Disruptions of this canonical configuration strongly impair recognition. A well-known example is the inversion effect: upside-down faces are disproportionately difficult to recognize compared with inverted objects.

Second-order configural processing.
Once the canonical template has been established, perception becomes sensitive to fine parametric variations within that structure. Small differences in spacing between the eyes, curvature of the jawline, or proportions of facial features allow observers to distinguish individuals.

Neuroimaging studies consistently show specialized networks supporting these processes, particularly regions such as the fusiform face area (FFA), occipital face area (OFA), and superior temporal sulcus (STS).

Music Perception: Tonal Gestalts

A parallel structure can be observed in music perception, where auditory input is similarly organized into coherent tonal objects.

Holistic processing.
Individual notes combine into larger perceptual structures such as chords, tonal centers, and melodic gestures. Listeners typically do not experience music as a sequence of independent frequencies; instead they perceive structured musical events. A C–E–G sonority, for instance, is often perceived directly as a tonal object rather than as three separate pitches.

First-order configural processing.
At this level the perceptual system identifies the basic structural framework of the musical environment: tonal hierarchies, key centers, modal organization, and the general grammar of the tonal system. When this structure is disrupted such as in scrambled sequences containing the same pitch materials but lacking coherent ordering listeners report a loss of musical coherence. Tonal grammar therefore appears not merely as an analytical abstraction but as an extension of the perceptual organization of sound.

Second-order configural processing.
Within an established tonal framework, perception becomes sensitive to finer relational cues. These include subtle distinctions in tonal coloration, expressive ornamentation, microtonal deviations, and other parametric variations that shape stylistic identity and expressive nuance.

Neuroimaging studies indicate that tonal perception recruits distributed auditory, frontal, and parietal networks associated with hierarchical and predictive processing. Meta-analyses of fMRI data (Rie Asano), show distinct activation patterns when listeners hear intact tonal music compared with scrambled tonal sequences. These results parallel findings from face perception research, where intact and scrambled faces produce markedly different neural responses.

Shared Computational Principles

Despite operating in different sensory modalities, face perception and music perception appear to share several computational characteristics:

-Holistic binding of features into perceptual objects
-Configural templates defining category structure
-Parametric sensitivity within those templates 
-Robustness to different levels of distortions combined with strong disruption under structural scrambling.

In both cases, coherent perception emerges when incoming sensory information aligns with learned structural priors. When those priors are violated, the perceptual system shifts into alternative processing regimes in an attempt to interpret the signal.



| Faces                                          | Music                                                            
| Holistic: “this is a face”                     | Holistic: “this is music / tonal object”                         
| First-order: layout (eyes-nose-mouth)          | First-order: tonal hierarchy / key / modal skeleton              
| Second-order: spacing, jawline, idiosyncrasies | Second-order: tuning color, microvariation, expressive distortion


Clarifying the “scrambled / inversion” parallel

“Inversion” in faces is precise; in music it’s more ambiguous. We must distinguish temporal scrambling (sequence loss > destroys grammar) from tonal scrambling (relative relationships preserved vs not)

When saying tonal grammar is not analytical but embedded in sound: Just as face inversion disrupts the configural template, disrupting tonal order and hierarchical cues collapses the music from a structured percept into “just sounds.”


Conceptual implication: music perception is object perception

Not symbolic decoding. Not mere sensation. But object-level inference with priors, category anchoring, and failure thresholds, aligned with current predictive processing and ecological hearing models.(categorical perception, Bayesian priors / predictive coding, catastrophe / phase transition at recognition threshold)



But what is music?:

Multiple Predictive Priors in Music Perception


A useful perspective on the structure of musical cognition emerges from the meta-analysis conducted by Rie Asano and colleagues in The Neural Basis of Tonal Processing in Music: An ALE Meta-Analysis (2022). Their analysis suggests that music perception is not governed by a single unified “musicality” mechanism, but rather by multiple partially independent predictive systems operating in parallel.

Under tonal, metrically regular, and syntactically ordered conditions, neural activity reflects what may be described as a low-prediction-error regime. In such contexts, auditory and frontal cortical areas maintain hierarchical predictions about incoming musical events without requiring strong compensatory recruitment from attentional or executive systems. Perception proceeds efficiently because the structure of the input aligns with the brain’s learned predictive models.

When one of the core structural regularities of tonal music is selectively disrupted, however, the brain does not simply register the stimulus as “less musical.” Instead, different neural systems are recruited depending on which predictive prior has been violated. This suggests that musical perception relies on multiple functionally dissociable constraints that can be manipulated independently.

Three such priors appear particularly prominent:

-Tonal hierarchy (relations among pitch classes and tonal centers)
-Temporal regularity (metrical and rhythmic predictability)
-Sequential or syntactic structure (learned patterns of harmonic and melodic progression)

Each of these dimensions contributes to the formation of stable predictive states during musical listening. When they operate together as in conventional tonal music they produce a highly efficient perceptual regime in which prediction errors remain low and interpretation stabilizes quickly.

Selective violations reveal the relative independence of these systems. Atonality primarily disrupts tonal hierarchy, while preserving temporal organization. Irregular rhythm challenges temporal prediction but may leave tonal structure intact. Scrambled or random sequences disrupt syntactic expectations while leaving pitch materials or meter unchanged. In each case, the brain engages distinct compensatory processes to interpret the signal.

From a predictive-processing perspective, these results suggest that musical perception is organized around multiple interacting priors rather than a single global model of “music.” Tonal listening therefore reflects the convergence of several predictive subsystems whose interaction produces the stable perceptual regimes characteristic of musical experience.

In this sense, the familiar contrast between tonal and non-tonal music oversimplifies the structure of the problem. Atonal, metrically irregular, and syntactically disordered stimuli do not represent variations along a single continuum of musicality; rather, each selectively perturbs a different component of the brain’s predictive architecture. Put simply:

atonal ≠ scrambled ≠ irregular.

Each reveals a different dimension of the predictive systems that support musical cognition.



 music perception is governed by multiple functionally dissociable, partially independent priors supported by partly overlapping but differentially recruited neural networks, rather than by a single “musicality” factor.


          Basin A
     ┌───────────────┐
     │ tonal-regular │
     │ ordered music │
     └──────┬────────┘
            │
       break PRIOR
            │
 ┌──────────┼──────────┐
 Basin B     Basin C     Basin D
 atonal      scrambled   irregular


Atonal:
reduced long-term tonal prediction,
more reliance on sensory detail,
greater attentional recruitment,
→ “bottom-up + parietal/attention load increase”.

Scrambled tonal:
sequence expectation disruption,
working memory + structural search
→ “frontal / sequence / integration networks”

Irregular meter:
disruption of entrainment,
timing / motor-predictive systems,
→ “basal ganglia + cerebellar contributions”



Predictive-coding:

Tonal + ordered + regular → low prediction error state
Break any one prior → different compensatory network
These are not “features,” they’re distinct predictive priors in different circuits

The system sits in a stable attractor basin, and perturbation along each axis pushes it into a different basin.





Music perception isn’t feature detection, It’s object inference in time stabilized by priors with distortion absorption and attractor collapse thresholds.

Tonal constancy is not a unitary mechanism but an emergent equilibrium state maintained jointly by pitch hierarchy, sequential grammar, and temporal regularity; disruption of any one dimension forces the auditory system into compensatory predictive regimes rather than immediate abandonment of musical interpretation.

---

Music "caricaturization" as the auditory counterpart of face caricatures:

By “caricature” in music I don’t mean comedy, but systematic exaggeration or degradation of structural cues while the percept remains categorially stable.
Distortion that exaggerates or degrades features, but still preserves enough configural anchors for recognition.

Music caricaturization shows that recognition is not about perfection but about anchors. Just as faces remain recognizable under distortion, modes and melodies remain perceptible even when tuning, timing, or execution are degraded. This robustness is evidence of deep perceptual priors: the brain treats music as objects in time, with configural layers that can absorb distortion while preserving identity (the whole point of tonal constancy).


Caricature in Faces

Holistic anchor: Even when features are exaggerated (big nose, wide eyes), the gestalt remains intact.
First-order configural: Layout preserved (eyes above nose above mouth).
Second-order configural: Distortions in spacing/size create the “caricature” effect.
Recognition threshold: As long as holistic anchors remain, recognition persists; beyond a threshold, identity collapses.

"Caricature" in Music

Holistic anchor: The music gate, structured sound detection (the “face”)
First-order configural: Core interval sizes (third, sixth, octave) define the skeleton of the mode.
Second-order configural: Distortions (tuning shifts, rhythmic irregularities) "caricaturize" the music but don’t erase recognition.
Recognition threshold: Even poorly played, out-of-tune, or out-of-time versions of familiar music remain recognizable, think memes mocking “bad covers.” Recognition collapses only when anchors (first/last notes, cadential cues) are destroyed.


Distortion Absorption

Meantone & pre-12EDO systems: Each mode carried unique distortions depending on tuning. Yet listeners still recognized the mode holistically. (This is not a great example of caricaturization rather mode coloring.) 

Flexibility: Melodic trajectory absorbs distortion, just as a caricatured face remains recognizable despite exaggerated features.

Hyper-flexibility: Recognition persists across wide ranges of deformation, showing that perception is robust, probabilistic, and anchored by priors.




Shared Principles Across Domains

Gestalt anchoring:

Faces: holistic configural layout (eyes–nose–mouth).
Fonts/words: first/last letters, symmetry of strokes.
Music: tonal anchors (root, cadence, mode skeleton).

Recognition persists as long as anchors remain intact.

Distortion tolerance:

Faces: caricatures exaggerate features but remain recognizable.
Fonts: distorted letters remain readable if symmetry/layout is preserved.
Music: out-of-tune or off-time performances remain recognizable if tonal/melodic anchors survive.

Threshold of collapse:

Each domain has a point where distortion breaks recognition.
Studying these thresholds reveals the priors and constraints of perception.




-----

The “long half-life” of perception research

Musicology, psychology, neuroscience, acoustics, computational modeling... each of these fields moves at different speeds; a single paper can legitimately cite: Mesopotamian Tablets(1400s BC), Guido d'Arezzo(1100s), Helmholtz (1860s), Krumhansl (1980s), predictive coding papers from 2010s, an fMRI meta-analysis from 2022...


-----

More Dicussion: (old notes)

From Perceptual theory to cognitive structure in the wild: On Microtonality, microexpression, macrotonality, duodecimability.

In live, grassroots musical contexts: bars, basements, ad hoc stages; perfect intonation is a fantasy. Guitars drift, fingers bend, amps howl, and every note exists in a haze of imperfection. Yet, remarkably, musical structure does not collapse. Cadences land. Songs resolve. Listeners sway and scream in sync. What survives is the categorical scaffold: a shared tonal grammar robust enough to absorb distortion, fluctuation, and expressive slippage.

This is a testament to the resilience of 12-fold diatonic prior. Not as a precise physical tuning, but as a conceptual attractor, a perceptual frame that listeners and performers both lean into, even when the tuning wobbles ±50 cents (or more). For example a Floyd Rose bridge, famous for its instability, can easily raise a note 100 cents via palm muting alone. Yet in context, embedded in phrase, trajectory, expectation, such inflections go largely unnoticed. The brain resolves them as expressive, not structural.
The performance remains duodecimable: reducible to 12edo structure, even if not 12edo tuning.

This highlights a crucial distinction in music cognition: Error is flavor, as long as structure remains legible.

Microtonality vs. Microexpression

Much of what is often described as “microtonality” in expressive performance is better understood as microexpression. That is: Not a rejection of 12edo, but inflections around it, expressive deviations that live because 12edo is there as a reference frame. This is the core difference between fine-tonalism  and macrotonality, or structural tonalism.

Thus, while fine microtonal subdivisions (31EDO, 72EDO) are invaluable in composition, synthesis, and spectral mapping, they often collapse in embodied, performative settings. They demand idealized precision incompatible with the beer-soaked fretboards of live rock and roll.


Enter coarse divisions like 8EDO or 10EDO. These are not attempts to refine pitch, but to reorganize it. They provide alternate but still robust structures, grids that can support real-world performance, become a grammar and absorb its own distortion. A 10EDO system:

Has wide enough intervals to tolerate expressive slippage
Maps easily to standard technique (e.g., power chords, scale shapes)
Retains its identity even with ±20 cent tuning drift

Allows ensemble alignment without bespoke instruments

In fact, this is why 7EDO blues feels so right: its neutrality allows the pentatonic to lean major or minor as needed, echoing what real guitarists have always done: bend, blur, and break pitches around an implicit tonal backbone.

Final Thought:  "No one cares if the fifth sounds better.(is still a fifth)"

Microtonality is easy to break, hard to carry. The abstract perfection of a slightly improved fifth in 22EDO is moot if the guitarist can’t find the fret, if the string buzzes out, or if the audience is three beers deep. What matters is that the structure holds, not just mathematically, but performatively.

Thus, this becomes a philosophical principle:

Music theory must always return to the cemetery-side bar.
If it doesn’t survive there, it doesn’t survive anywhere.


-

No comments:

Post a Comment

Spectral Congruence & Pitch Cyclicity

Pitch cyclicity (including octave equivalence) can be understood as an emergent property of spectral self‑similarity under frequency scaling...