What Do I Need to Know about Sound?
Before diving into sound, there are a bunch of terms to refer to as you go through this section:
Sound Waves: A pattern of compressions and rarefactions (decompressions) of a medium—usually air.
Frequency: The rate at which compressions and rarefactions occur in a sound wave; the higher the frequency, the higher the pitch we perceive. Measured in Hertz.
Hertz: 1Hz (one Hertz) represents a single compression/rarefaction cycle per second. 1kHz = 1,000 cycles per second. Named for Heinrich Rudolf Hertz, the scientist who discovered electromagnetic waves.
Frequency Response: The maximum frequency range that can be heard or produced by a given device. For the human ear, this is said to be 20Hz-20kHz, though this varies with hearing ability.
Amplitude: The strength of compression/rarefaction. When looking at a graphical representation of a sound wave, this is the height and depth of the curve. Measured in decibels.
Decibel: dB is used to measure all sorts of things in audio, but we’re mostly concerned with sound intensity (amplitude/volume/pressure). Decibels compare two levels to each other, with 6dB representing a quadrupling of intensity, which is the same change you can expect from doubling or halving the distance from a sound source. Don’t get intensity confused with volume; four times the intensity doesn’t mean we hear it four times louder. Due to how our ears work, each perceived sound level requires more energy, making each Decibel a larger jump than the previous—we call this a logarithmic curve. It may sound over-complicated, but it works exceptionally well in practice. Decibel levels of common sounds.
The Speed of Sound: The constant rate at which sound waves travel, regardless of amplitude, frequency or wavelength. This speed varies with the medium itself, but is usually referred to as 1130 feet per second through normal air. Though it’s hardly applicable to us Earthlings, sound travels at about 787 f/s on Mars.
What is Sound?
Sound is the compression and decompression (rarefaction) of a medium—usually air. Objects like speakers, a tuning fork or a hand clap create sound by pushing the air, which is followed by a partial vacuum (rarefaction). One great visualization of a sound wave is from NPR. They utilized a method of recording the movement of air visually by watching how light is distorted by variations in the air. This doesn’t fully visualize the 3D nature of sound. Think of how an explosion happens, or Harry Potter’s Patronus charm (before they showed it as an animal). It’s a pulse from the source that spreads in all directions. And, since sound is so fast, it appears to instantly fill the space it’s in.
The speed of sound is constant in a given environment; it doesn’t change with amplitude, frequency or wavelength. It would be strange if the sound of a bass guitar arrived at your ears at a different time than the lead singer. Even though the speed of sound can vary as air density changes, it’s usually said to be 1130 feet per second. One thing you may notice in a large space is that light still travels much faster than sound. This delay between seeing a performer and hearing the music is usually reconciled by the sound system.
What determines Pitch?
The frequency, or rate of compressions, determines how high or low the sound is perceived. If you play an instrument, you may notice that your metronome says A=440 on it. This is because the current standard is for bands to tune to an A that’s exactly 440 Hz (compression/rarefaction cycles per second). Some ensembles tune based on a slightly different frequency of A, but that’s only in special circumstances.
A doubling of frequency means an octave jump in pitch. So there are A notes at 440Hz, 880Hz, 1.76kHz, etc . . . and also at 220Hz, 110Hz, 55Hz, etc . . . If you do the math, you can find out the frequency of every note on a standard, 88-key piano (or print off the PDF available in the resources area to the right). Another way to visualize the different frequencies between notes is through the harmonic series.
How is Sound Recorded and Reproduced?
Since this such a complex question, we’ll need to divide it into sections . . .
Michael Faraday was one of the greatest physicists of all time. He discovered that when magnets and an electrical conductor move past each other, electricity is generated (induced). We now call this Faraday’s Law of Electromagnetic Induction. Faraday’s Law works both ways—electricity creating movement (speakers/electric motors), or movement creating electricity (microphones/power dynamos). Whatever the device, if it changes either electricity or motion into the other, it’s called a transducer.
Even though induction is a simple principle, there is a lot of complexity in getting a mic or speaker to sound just right for its intended use.
Conventional loudspeakers are fed an alternating electric current that causes a voice coil (electromagnet) to move through a permanent, anchored magnet in sync with the audio. Attached to the coil is a diaphragm (what actually moves the air). The most common diaphragm is the traditional cone. It moves forward and backward to create the compressions and rarefactions that comprise audio moving through the air. In general, larger speakers work better at reproducing lower frequencies and smaller speakers work best with higher frequencies. Because speakers generally must produce a complex conglomerate of frequencies, many of which are above ten thousand compressions per second, it’s practically impossible to see individual vibrations. It’s easier to see on a large subwoofer that is only reproducing tones below 100Hz, or when a speaker is viewed in slow motion.
A wonderfully visual and effective explanation of how speakers work was done by Jacob O’ Neil here.
Microphones are speakers in reverse. You can literally use a speaker as a microphone, and some people do. Speakers turn electric currents into moving air; microphones turn air movement into electric currents.
If you look beneath the pop filter of a microphone, you see the mic’s diaphragm(s). When air pressure waves (sound waves) hit the diaphragm, it induces a current that varies in sync with the compressions/rarefactions. Nowadays, this current is usually digitized into a computer for editing within a digital audio workstation.
There are many types of microphones we’ll go over in a later section.
The audio coming out of a microphone or other audio device is in the form of electrical impulses, but computers only think in zeroes and ones. To reconcile this difference, analog to digital converters take snapshots of the electrical impulses and turn them into binary code that can be processed by a computer. In practical terms, each sample value corresponds to the amplitude of the sound wave at that point in time (printable graphic in resources area).
These snapshots are then reconstructed into an extremely precise digital representation of the original sound wave. How many snapshots are taken in a second is stated in Hertz (just like frequency). The standard for CD-quality audio is 44,100 samples per second (Hz), or 44.1k audio; the standard for video is 48k; 96k is also a fairly common sample rate, though 48k is plenty for most applications.
How much data is captured in each snapshot is measured in bits. The standard bit rate (word length is another term for it) for CD audio is 16-bits, though most recording is done at 24-bit or 32-bit floating point. The higher the bit rate, the greater the range from softest to loudest. There are many advantages to recording, mixing and mastering at a high bit rate, even if the final product will be 16-bit.
Since most computer sound cards are built for simple reproduction of consumer audio, a separate audio interface is usually required for recording and processing professional audio. An interface usually has a USB, firewire or thunderbolt bus for connecting to a computer, and both an analog-to-digital (ADC) and a digital-to-analog converter (DAC) for sampling and playback of audio.
Interfaces also usually come with audio input jacks. These come in various forms and are used to connect input devices, such as microphones, guitars, keyboards, sound generators, the audio outputs from mixers, and other interfaces. When the output from a device is too weak, like from a microphone, the interface uses a built-in amplifier to strengthen the signal before sampling it into the computer; we call this amplifier a preamp. Here’s an example of a simple interface that still has most of the basic features of a pro interface (click on the image to open it in a new window).
When choosing an interface, engineers consider many factors:
—The type of computer interface; not all computers accept Firewire and Thunderbolt, and you may be looking for one that plugs into a mobile device.
—How many mic preamps there are; this is easy to see, since they usually correspond to XLR jacks.
—How many other inputs there are; these can be 1/4″ phono jacks, analog or digital RCA, optical jacks, etc . . .
—The quality of mic preamps; for educational use, this isn’t as critical, but most pro studios care a lot about the character that a specific preamp imposes on a sound.
—Many other factors: the style of input meters, knob layout, the presence or lack of faders and other controls, operating system and recording program compatibility, favorite brands, etc . . .
For beginners and low-budget educational programs, there are plenty of simple interfaces for around $100. There are also mics that have the interface built-in, and simple converters that turn regular mics into USB mics. Alternatively, mobile interfaces are becoming increasingly popular. This page is simply a concise intro to the technology; click here to go to the page where I recommend specific products to get you started.