Thursday, August 1, 2024

The Average Tuning System: Scala Archive Statistics

This tuning system is a simple descriptive statistical representation of the scala archive, a renowned curated database of global tunings, seeking common ground and practical use among diverse world tunings.

Interval    Traditional Western Name
16/15       minor diatonic semitone
10/9        minor whole tone
7/6         septimal minor third
6/5         minor third
5/4         major third
4/3         perfect fourth
√2
3/2         perfect fifth
8/5         minor sixth
5/3         major sixth
12/7        septimal major sixth
9/5         just minor seventh
15/8        classic major seventh
2/1         octave


Statistics and tuning construction:

Out of the 5,176 files, the range of system sizes extends from 2 to 579. The average system size is 17, with a median of 12. The mode is also 12, appearing 1,546 times, followed by 7-note size tunings with 715 occurrences. This signifies a diverse collection, albeit with a notable concentration of systems hovering around the 12-note mark.

Top 5 Sizes

Size  Occurrences
12    1546
7     715
5     231
19    218
8     206


While some files span multiple octaves or include non-reduced intervals below the unison, these instances are relatively rare. Most are periodic tunings in alignment with the octave, the archive's most common interval. (Note: rather than relatively rare, some are intentionally wrong, since scala file definition specifies the omission of the 1, and conclude with the equave, implementations may totally ignore those values)

In a direct analysis of the files, the first key from each tuning, totaling 87,558 notes, reveals the octave as the most common, appearing with its exact representation in 4,481 total files and with close variations in practically all tunings.

The perfect fifth emerges as the second most popular interval, succeeded by the perfect fourth and major third.

Distribution of intervals. The two graphics depict identical data. The first graphic displays both vertical and horizontal axes on a linear scale, while the second utilizes a logarithmic scale for the vertical axis. This logarithmic scale highlights intervals that occur only once, significantly beyond the octave, as well as those appearing below a value of 1.

Top 5 Intervals

Interval  Name              Occurrences
2/1       octave            4481
3/2       perfect fifth     2001
4/3       perfect fourth    1743
5/4       major third       1290
9/8       major whole tone  1095



Assuming all tunings are periodic, cyclical pitch sets, the octave is identified as the interval of equivalence in 4,379 tuning files. The next most common equave is the twelfth, with only 93 files.

When calculating all added tones, the complete interval matrix only for the octave-ending tunings yields a total of 2,641,310 intervals, and the list of the most frequent remains largely unchanged.


The two graphics present distinct datasets. The first graphic represents the scan of the initial key in each file, while the second illustrates the scan subsequent to computing all matrices. Both graphics showcase the top 17 intervals, which exhibit remarkable similarity. Each graph encompasses a single octave, with both vertical and horizontal axes set to a logarithmic scale.


(Why is it important to calculate the interval matrix and added tones to determine the most common intervals?

Take this periodic tuning, for example: 16/15 6/5 8/5 9/5 2/1.

If you're not very familiar with intervals, simply seeing the initial key doesn't tell you anything. However, upon computing the matrix for this 5-note periodic tuning, it reveals 14 unique intervals. Among these, the most common intervals are the fifth (3/2), the fourth (4/3), the major whole tone (9/8), and the Pythagorean minor seventh (16/9) – all of which aren't explicitly mentioned in the "first" key.)

There are precision issues affecting interval categorization, resulting from the conversion of fractions and cents, the dual languages of scala files, to a common decimal representation. This inherits machine number problems. When calculating the complete matrix of equal division systems, where a size of any given number should imply the same diversity, the precision nuances in floating-point arithmetic may lead to some being counted as different.

Another problem arises in categorizing cent tunings. Some files may refer to the same note, but due to differences in the amount of digits in their definitions, no program will consider them equal. (701.955 != 701.95)

You can attempt to correct this by equally limiting the number of digits, which would effectively reduce the number of individual distinct intervals. However, since truncation occurs in their decimal format, an uneven definition loss of musical notes is observed due to their original distribution, which is nonlinear (without repetitions).

The graph represents the tuning space horizontally and accumulates identical exact repetitions vertically.


Both graphics portray identical data, but the second one illustrates the data after truncation (with a maximum error of approximately 0.2 cents). Both visuals display the top 17 intervals, which remained consistent even after truncation. This reduction resulted in 242,538 unique intervals being compressed to just 9,997. The logarithmic view in the graphic also highlights the uneven definition loss of musical notes post-truncation, which was executed on the decimal data.


Progressively truncating the notes in this way, doesn't significantly alter popularity, even a 2-cent error proved insufficient to dislodge any peak prominence.

Additionally, the graph experiences intrinsic truncation due to its fixed resolution, significantly lower than the data range. Consequently, different notes are depicted on the same pixel, this is used to add a third dimension to the graph, highlighting note concentrations, which are always very close to some of the already favored intervals. For example, the perfect fifth has a concentration of notes next to it, hinting at systems like 12-tone equal temperament, where the fifth is 700 cents. However, without altering the graphical scale, these clusters won't even be apparent.


Both graphics represent the analysis of the initial keys, displaying the same dataset. However, the first graphic features a vertical logarithmic scale, while the second employs a linear scale. Presented as a heat map, red areas denote note concentrations (which are not visible in the linear view), while blue indicates fewer notes.

The generated systems employing the 17 most frequent intervals, are symmetric in both cases, reflecting a mirror image via the square root of 2. They comprise half superparticular intervals and half their reduced inversions, the perfect fourth and fifth, major third and minor sixth, minor third and major sixth, etc.

Nonetheless, some of these intervals are very small in practice, which poses minimal concerns for keyboard or synthesizer configurations but imposes constraints upon the guitar's limited space, among other factors that make it less suitable for very precise tunings; and 17 was just the average system size.

The final generated system consists of 13 notes, or 14 when including the square root of 2. This selection exhibits near-complete coverage of the tuning space. Graphically, their common-tone aggregate resembles the added tones for the entire collection, which is interesting. The intervals that were left out from the average 17 due to their proximity haven't disappeared entirely; they remain popular, even surpassing those included, although the major whole-tone was removed from the main key, it still exists in some of the others.


The first image corresponds to the analysis of the full archive's interval matrix, showcasing the 17 most popular intervals. The second image depicts the same graphic process, computing the interval matrix and accumulating the repetitions vertically, but on the newly generated tuning system. The general contour of both is similar, this type of tuning analysis typically provides the fingerprint for a tuning. This means the 14-note system generates a similar fingerprint to the entire database of 2.5 million notes.

The system does not match any of the existing files.

Analysis using subsets of the archive—half or a third selected randomly—still yielded the same most frequent intervals. However, for a more accurate representation of an average world tuning system, it's essential to curate the data better. This would involve handpicking the most well-known tunings that are or were actually in use, rather than relying on the full Scala archive, which contains numerous modern tunings seldom used.


Composition with the average system

Improvisation with the average system


TODO: Additional statistics:

The first ~500 most frequent intervals comprise just, rational, and integer ratio intervals before cent-defined intervals like the octave at 1200 cents appear.

How to:

The program developed for this analysis is open-source and available at [LINK]. It's designed for straightforward usage—simply load any .scl file or files, and it will promptly conduct and showcase statistics on them. The analysis comes in two modes: 'direct' examines files as they are, focusing on the first key, while 'full' generates interval matrices for all files. Notably, the 'full' analysis uses a fixed equave of 2:1, a setting implemented after discovering that 95% of the database concludes with a 2:1 equave. This equave parameter can be adjusted within the code for further exploration and customization.

No comments:

Post a Comment

Interval Reduction

This page is dedicated to the interval reduction operation, a foundational concept in music theory that I’ve explained briefly in other arti...