Musical Synthesis 101

1.0 What is Sound?

Amplitude is the measurement of the height of the waves. As the amplitude of a wave increases, so does its energy. Intensity is used to describe the amplitude of a wave over an area. This means sounds with a high amplitude will also have a high intensity. Sounds are more intense if heard in a smaller area. Therefore, higher intensity sounds are usually categorized as louder. With this in mind, we can assign a unit to the intensity of a sound. This unit is called decibels which is shortened to dB. The way the dBs are measured can be confusing as we are used to measuring things in linear fashion. For example a car moving at 20 km/h is going half the speed as a car going 40 km/h. This is not the same for sound. Instead, sound is measured in a logarithmic way where the sound pressure level (dB) is equal to 20 * log(Sound wave pressure in pascals (P)/ Reference value of sound pressure (Pref)) or dB = 20 * log(P/Pref). Pref is assumed to be .00002 Pa, or the threshold of human hearing. Thankfully, the likelihood of you ever using this equation is very low. The most important thing to remember is that something that is twice as loud usually has an amplitude multiple orders of magnitude larger than the prior value. 

Smoother Amplitude-Controlled Animation

Pitch and Frequency are directly related. Pitch is determined by how high or low something sounds. Frequency refers to the number of times a wave completes its cycle per second. This is measured in hertz or Hz. Humans are able to hear from 20Hz to 20,000Hz, with anything below 20Hz being recognized as individual impulses rather than pitch. A piano tuned to a chromatic western scale will have its middle A tuned to 440Hz. Interestingly, the A 12 semitones or an octave above the middle A will be tuned to 880Hz. You may notice a pattern where the frequency doubles for every octave. This is because octaves are a “logarithmic unit for ratios between frequencies, with one octave corresponding to a doubling of frequency… The term is derived from the Western musical scale where an octave is a doubling in frequency.” (Wikipedia, 2023) This means the perceived difference between an octave will stay the same, while the difference in frequency will vary between octaves.

Tone is the way that we perceive different sounds with the same frequency. It is what allows us to differentiate between a voice and a piano both playing the same note. The harmonic composition, or overtones, determine the tone of a sound. Therefore two sounds can share a fundamental frequency but can be differentiated by their harmonic makeup. Each harmonic in the harmonic series contains one more cycle than the last. In other words, the first harmonic, also known as the fundamental, is ½ the frequency of the 2nd harmonic. Similarly the fundamental is ⅓ the frequency of the 3rd harmonic. This rule holds true for each harmonic in the series. Interestingly, the fundamental is not required to be the loudest harmonic, nor does it even have to exist. Instruments with brighter tones tend to contain more of the upper harmonics while instruments that play lower frequencies tend to have more of the fundamental and lower harmonics.

Sound is a tricky thing to record. While the first photo was taken in 1826, it wasn't until 1860 that the first audio recording was made. Edouard-Léon Scott de Martinville was the man responsible for inventing the phonautograph, the first invention to be able to record audio. While Scott’s phonautograph could record audio, it wasn’t able to play it back. It wasn’t until 17 years later when Thomas Edison invented the phonograph that audio could not only be recorded, but played back as well. “On the first audio recording Edison recited, ‘Mary had a little lamb. Its fleece was white as snow. And everywhere that Mary went, the lamb was sure to go.’ [Video]  Edison recordings were made on tin foil and could sustain replaying only a few times” (National Museum of American History, 2012). Recording on the phonograph worked by using a stylus and a groove based system. In the early days of audio recording, a rotating tinfoil cylinder and a needle attached to a membrane were used to record pressure waves. The needle left with a groove that represents the amplitude of pressure over time. This could then be played back by using a needle attached to an amplifier. By rotating the cylinder audio would be played back. The same concept holds true for modern day record players, with the addition of a second channel.  Rather than being a groove with only up and down movement, vinyl records instead use a groove that is “V-shaped, and each side of the groove ‘wall’ carries one of the stereo signals. The right channel is carried by the side closest to the outside of the record, and the left is carried by the inside wall” (Yamaha, 2021). Other analog ways of recording include the magnetic tape recording. Invented in 1928 by Dr. S. J. Begun in Germany, the magnetic tape recorder uses “record” magnets to convert electrical signals from a microphone onto a piece of magnetic tape. The tape can then be read by a “play” magnet that converts the magnetic imprints into an electrical signal which can then be amplified. 

CDs and other forms of digital media use binary data to store information that can later be recalled and converted into audio signals. Digital media is different from analog media in the way that the information stored isn’t continuous, meaning single values are stored at set intervals, usually values are stored at over 44,100 times per second. Each value is referred to as a bit, and the interval at which they are stored is called bitrate. Some common bitrates are as follows: 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz and 192kHz. The increase in bitrate corresponds to an increase in quality. Generally, CDs store data at the lowest quality at 44.1kHz while audio professionals regularly use bitrates of 48kHz and up. If the bitrate is reduced to a rate lower than the audio monitor is capable of producing artifacts emerge, causing the audio to sound distorted by introducing unwanted harmonics. This is known as downsampling and is sometimes used internationally to achieve a low-fidelity sound. Other types of distortion may be caused if the input gain is too high for the device that the signal is being passed through. This type of distortion is known as clipping because the peaks and valleys of the audio signal are essentially being clipped off. Ideally an audio signal should not be clipping if the goal of the recording is to preserve the original sound, though oftentimes clipping is used to increase the perceived loudness. 

Sample Rate Visualization
10
1
Previous
Previous

0.1 Welcome

Next
Next

1.1 Can we see what we Hear?