12 Note Scale

why are there 12 notes per octave?
There are 12 notes in the common musical scale... but why?

Most music is based on the harmonic series. It is a very simple progression of frequencies which sound good together. For example, let's say a note is played at 1000 Hz. From there, the harmonic series goes up:
  • 1000 Hz
  • 2000 Hz
  • 3000 Hz
  • 4000 Hz
  • 5000 Hz
  • ...
Each of these sounds good with the original frequency, because it is an exact multiple. It always lines up, so it sounds... for lack of a better word... harmonic.

We can also go down instead of up...
  • 1000 Hz
  • 500 Hz
  • 333 Hz
  • 250 Hz
  • 200 Hz
  • ...
Each of these is a fraction of the original, like 1/2, 1/3, 1/4, or 1/5.

And that's pretty much all we need to know to build a musical scale.

Here's what happens if we start at an arbitrary note and then apply the harmonic series. The algorithm used is very simple:
  1. Start with one frequency. Make a dot on the keyboard where that frequency is.
  2. For each harmonic we care about, and each frequency we've looked at so far, do the following:
    1. Multiply or divide that frequency by a simple number, like 2 or 3 or 1/2 or 1/3.
    2. Make a dot on the keyboard where that frequency would be.
    3. Add this frequency to the list, then move down one row and go back for another iteration.
After repeating a few times, we should end up with a bunch of dots where perfect ratios are. These are ratios like 1, 2, 3, 4, 3/2, 5/2, 4/3, 5/3, and so on. They all have a simple relationship to the original note, so they all line up nicely when played at the same time.

Here's how it looks when we only use "1" as a ratio. There's only one note:
(each dot is one iteration, going from top to bottom)

If we add in another harmonic, the ratios are 1/2, 1, and 2.
This gives us the original frequency plus all the different octaves. notes-2-12.png

Add one more harmonic, and the ratios are 1/3, 1/2, 1, 2, and 3.
At first it gives us just the original note, the octaves, and the extra note in a "power chord" like what people play in heavy metal. That's called a seventh, because it's 7 half-steps up from the original tone. But after enough iterations, these ratios produce 12 different clusters of frequencies. notes-3-12.png

Add one more harmonic, and the ratios are 1/4, 1/3, 1/2, 1, 2, 3, and 4.
This converges to the same results as last time, but it takes fewer iterations to get there. At the end, the numbers land in 12 different clusters. This is why the common scale has 12 notes per octave. notes-4-12.png

Add in one more harmonic, and then things turn into a huge mess.
Almost nobody writes music this way, because it's too complicated and usually sounds bad. notes-5-12.png

You may have noticed that the notes aren't perfectly aligned. Here's a closer look.

The blue line in each note represents "equal temperament", or what happens when we space 12 frequencies as evenly as possible throughout the octave. This is used as a common tuning for many instruments because, even though it's not perfect, it's really close... and then it doesn't matter what scale or key you play in. They're all more or less equally in tune (or equally out of tune).

The blue dots, of course, represent perfect intervals relative to the original note of "C". notes-4-12-zoomed.png

Sometimes musicians on fretless instruments bend each note slightly up or down to get closer to a perfect interval. Sometimes people tune their instruments to a specific key, to align everything with the dots above... so it sounds better when played in that key, but worse when played in any other key. And sometimes people play microtonal instruments so they can get a perfect and exact pitch every time no matter what key they're playing in. But that requires a really good ear, and it places extra restrictions on composition... and some listeners may think it sounds out of tune because they're accustomed to the evenly-spaced scale.

Zooming in even further, and running the algorithm for more iterations, it becomes clear that most notes have two possible frequencies for perfect intervals... but one is usually more dominant, and the other is only used on rare occasions. Mostly, the choice of which one is better seems to depend on which direction it's approached from, like from above or below, and from how far away. notes-4-20-zoomed.png

Or, if we use all 12 equal-temperament notes as a starting point instead of just "C", here's how the result looks. It's like the previous image, but with 12 slightly-offset copies all overlaid onto the same graph: notes-4-10-allequal-zoomed.png

Overall though, the slight detuning isn't usually a problem. A very common complaint among musicians is that digital synthesizers lack "character", or that they sound too cold and sterile, because they're too perfect. Meanwhile, almost everyone likes the sound of analog synthesizers -- which are almost never quite in tune. They're always just slightly off, which adds more life and variety to the sounds they make.

Similarly, people often add vibrato to longer notes, because it sounds better than just holding steady at the same pitch. This is also known as a LFO, or a low-frequency oscillation, applied to the frequency. Sound designers use LFOs a lot to add movement and expression to their sounds. But it's not always necessary to add LFOs, because they happen automatically during chords just by having the notes slightly out of tune. When done well, it doesn't sound out of tune... it just sounds more expressive.

For people who prefer the too-perfect sound of digital synthesizers though, especially those who love the sound of FM synthesis, perfect intervals may be just what the doctor ordered. Perfect intervals do for chords what FM synthesis does for the sound of individual notes.

FM (frequency modulation) synthesis is all about building notes out of perfect intervals, which gives it a very clean sound. For an example of this, look at old video game systems like the Sega Genesis, or the old Adlib card built into many older computers.

FM is known for several iconic sounds...
  • Really clean bell sounds
  • Bell-like instruments like marimbas
  • Clean "organ" sounds
  • Artificial electric guitar sounds (because of how nasty FM can sound when the ratios aren't perfect)
But it's also known for sounding really cheesy when used for almost anything else.

The way FM works is by taking a simple waveform, usually a sine wave, and rapidly changing the pitch. This is basically the same thing as vibrato, but a lot faster and a lot deeper. Instead of oscillating by a small amount a couple times per second, it'll generally be a large amount a few hundred or a few thousand times per second. This basically adds overtones, and as long as the vibrato speed is a clean ratio of the original note frequency, the sound ends up very clean with only overtones from the harmonic series. The resulting sound is extremely similar to what people get from additive synthesis techniques, where sounds are made by explicitly adding overtones.

In contrast to this, subtractive synthesis is also very common. The way subtractive synthesis works is by making an absolute bloody mess all across the frequency spectrum, and then filtering out any frequencies which aren't desired. It's like tossing a bunch of rocks in a lake to make a mess of ripples... and then changing the shape of the lake to get rid of the ripples you don't want.

There are essentially three perfect waveforms which form the basis of most music:
  • Sine wave: The simplest possible waveform, consisting of only a single frequency.
  • Square wave: The most complex harmonic waveform, consisting of the sum of the entire harmonic series going from the root frequency all the way up to infinity.
  • White noise: The sum of all possible frequencies, white noise is completely random. It's not harmonic at all.
The basic unit of FM synthesis (and additive synthesis) is the sine wave. This is because both methods start with something clean and simple, then add overtones. The FM sound was used in some older gaming systems, like the Sega Genesis.

The basic unit of subtractive synthesis is typically a square wave. This is because subtractive synthesis starts with something messy and complex, and removes overtones and other unwanted frequencies. Subtractive synthesis was used in old gaming systems like the Commodore 64's SID chip. It's also used in most popular synthesizers today, because it's simple, intuitive, easy to perform with by sculpting sounds in real time, and sounds good. Here's a recent example of a song made using subtractive synthesis for all parts except the drums.

Subtractive synthesis is known for its ability to be mellow or harsh or anywhere in-between, with smooth adjustments the whole way. It is the technique behind iconic sounds such as:
  • "Acid" basslines
  • Some classic video game sounds: C64, Atari, NES
  • Resonance, whether subtle or ear-piercing
  • The sound of wind... like, if you make a "whoosh" sound with your mouth, that's basically subtractive synthesis applied to white noise by using your mouth as a resonant bandpass filter.
  • Most of the mixing/mastering techniques in modern music, because the main method used for that is to resonate (increase) or filter (decrease) frequencies to make different tracks fit together better in the mix. This is the core of what subtractive synthesis is about.
Last modified: May 21, 2020 @ 6:08 MDT
Copyright (C) 1996-2020 Selene ToyKeeper