What Makes It Sound Like Christmas?

Every year, music theory enthusiasts begin to ask the same question: “what makes it sound like Christmas?”

As you can see, this discussion recurs every year in /r/musictheory.

Vox.com has incurred the wrath of Twitter’s musicologists after posting a video focusing on Mariah Carey’s “All I Want for Christmas Is You” that suggested that iiø7 chords are what make it sound Christmassy. The video begins by stating the research question, “What makes Mariah Carey’s song sound so incredibly Christmassy? Aside from the sleigh bells, of course.” They then proceed to discuss the harmonic content of the song and how the harmonies signify Christmassy-ness.

Vox’s declaration that iiø7 chords sound Christmassy irritated musicologists for many reasons, perhaps best summarized thusly:

In the Vox video and in all those reddit posts, and indeed in much of beginner music theory, there is an obsession with finding explanations in the harmonies, specifically, of a song. This is a reflection of the overall bias in music theory: we focus on teaching harmony most of the time. Curiosity about how harmony elicits emotions is natural in this context. It only becomes problematic when this discussion really leads to the exclusion of other music-analytical domains that are more relevant to the track’s signification—namely, timbre!

“What makes Mariah Carey’s song sound so incredibly Christmassy? Aside from the sleigh bells, of course.” This last line in the Vox video is done as a throwaway joke—”haha, gotta have sleigh bells in Christmas songs, obviously!” Well, yes! You do! That is actually what makes it sound Christmassy. I would argue the only thing contributing more to its Christmas sound is the lyrical content and all its allusions to Christmas imagery (stockings, Christmas trees, fireplace, snow). Why focus so much on harmony—which is not different in Christmas music than in comparable pop styles—when we could focus on what really distinguishes this music from other genres?

Do We Know It’s Christmas?


“Do They Know It’s Christmas?” is a charity single by the supergroup Band Aid that was released in December of 1984. It was meant to raise funds for the famine in Ethiopia. This song is also among the worst Christmas songs of all time, not only due to the musical content but for spreading some harmful reductionist representations of Ethiopia. But it’s a Christmas song nonetheless. So what makes it sound so Christmassy?

Harmony-wise, this track is completely unremarkable. The chords of the verse are F–G–C (IV–V–I in C major), in the prechorus, you have Dm–G–C–F (ii–V–I–IV), and in the chorus we’re back to F–G–C (IV–V–I).

I’d contend that, like a lot of Christmas songs (including Mariah Carey’s “All I Want for Christmas is You”), these harmonies don’t sound particularly Christmassy. Instead, Christmas themes are communicated through the lyrics—that is, by repeating the words “Christmas” and “Christmastime” over and over—and also through the heavy use of synthesized tubular bells. 

“Do They Know It’s Christmas?” features that grand old synthesizer, the Yamaha DX7. I reached out to Midge Ure, one of the song’s writers of Ultravox fame, on Twitter and he confirmed that the DX7 preset called TUB BELLS is the source of this infamous bells sound.

TUB BELLS analysis

Here is the TUB BELLS sound isolated, playing an octave C3–C4, the same sound that you hear at the very beginning of “Do They Know It’s Christmas?”.

Today I don’t have time to get into all the details of this timbre, but if you’ve never heard what’s so special about bell timbres before, well, now you can. In general, bell timbres are special because the overtones that resonate when you strike a metal bar are totally different than the regular harmonic series that you get from a vibrating string or column of air. Bell timbres do not follow the harmonic series—they are inharmonic instruments.

Here’s another spectrogram image, this time for just a single note, C3. (For info on how to read a spectrogram, click here.)

tub bells 2.png

Since most of you probably don’t immediately know how to translate Hertz into pitch names, I’ve made a transcription in traditional notation of what these partials are.


If you’re familiar with the harmonic series, you can see that that series of notes is quite different. If you’re not familiar with the harmonic series, well, here it is:


The harmonic series has intervals that progressively narrow in a predictable fashion. Each frequency is a multiple of the lowest (fundamental) frequency. But in the harmonic series for TUB BELLS, well, it’s not quite so predictable. Not every partial is a multiple of the fundamental, and the intervals are not progressively narrowing.

But what does it mean?

The Yamaha DX7 was released in 1983, and so the technology was still shiny and new by December of 1984. The synthesizing capabilities of the DX7 were especially renowned for being able to faithfully replicate percussive sounds such as tubular bells, glockenspiel, and the like, much better than other contemporary synthesizers.

So the TUB BELLS sound in “Do They Know It’s Christmas?” is actually carrying a lot of semiotic weight! DX7’s TUB BELLS immediately inform the listener that 1) this is a Christmas song and 2) this is an ’80s Christmas song.

In so many cases, when we’re wondering “what makes it sound ____?” where ____ is Christmas, or metal, or Irish, or whatever, the answer lies not so much in the harmonies, but the timbres. Timbre is probably the most immediate aspect of our musical experience. Why shortchange it in our analyses?


Lerdahl’s Timbral Hierarchies

The real reason, I would argue, why timbre has been regarded as a secondary musical dimension is that, unlike pitch and rhythm, it has lacked any substantial hierarchical organization.

–Fred Lerdahl, 1987

Yesterday I read “Timbral Hierarchies” by Fred Lerdahl, originally published in 1987 in Contemporary Music Review Vol. 2. This article is post-GTTM (A Generative Theory of Tonal Music) and represents an attempt to explain how timbre prolongations, or at the very least timbral hierarchies, might be possible, in much the same vein as L&J-type metrical or tonal hierarchies.

This article is another curious entry in the outpouring of timbre music theory research that occurred in the mid-1980s (see also Cogan 1984, Slawson 1985). Since I wasn’t researching in the 1980s, I’ve wondered myself what the music theory community was like at this time, and what in the culture propelled this sudden interest in timbre. I presumed that this was due to a wider access to 1) spectrograms, a useful visualization tool for timbre, and 2) digital synthesizers, which allow for the level of precise control necessary in many perception studies. Lerdahl identifies out another possible impetus for a sudden rush to theorize timbre: “The issue has sharpened with the recent rise of computer music. There is now such an infinity of timbral possibilities that the need for some kind of selection and organization has become acute” (136).

I’ve found it funny in the past that I study 1980s popular music, and that so many of the existing articles and books on timbre research also date from the 1980s. But this quote in particular helped me realize that the unifying factor in all of this is rapid technological advancement. Advances in computing technology and digital synthesis are what defines the sound of 1980s pop music; these same things also end up defining research trends, including research trends about music. It’s no coincidence, in other words, that timbre research flourished during the same time as the music I’ve chosen to study (although I didn’t choose the music for that reason).

Lerdahl’s article, rather like Slawson’s book, spends a good deal of time establishing what a successful “theory of timbre” would look like. I want my own timbre theory to be a theory that theorists like, so these questions interest me. Lerdahl’s theory is grounded in psychological concepts from the 1980s. Lerdahl relies on association to group things together and organize them—certainly a foundational concept. But of course, simply saying “this is like that” is not a very satisfying analysis to read. Enter hierarchy. Lerdahl correctly notes that cognitive psychology has shown that the mind can learn an organize a great deal more information if it can be organized hierarchically. Therefore, he concludes it would be advantageous to create a timbral hierarchy. In short: “Thus a theory of associations requires a theory of hierarchies” (138).

To get to a timbral hierarchy, Lerdahl has a number of intervening steps. He proposes a system of timbral consonance and dissonance, akin to tonal consonance and dissonance, which can create what he calls stability conditions that will permit the hierarchy. A more stable thing is at a higher level than a less stable thing. A more consonant timbre is more stable than a more dissonant timbre.

But what is a consonant or dissonant timbre? This is where I find Lerdahl begins to tread into the territory of arbitrariness. For practicality’s sake, Lerdahl must focus on only a few timbral attributes. Lerdahl admits that timbre is multidimensional and so creating a continuum from most dissonant to most consonant will not be feasible. So he chooses vibrato and harmonicity (the degree to which partials follow the harmonic series) as his two attributes of focus.

While these steps seem logical enough in isolation, combining these ground rules together to make an analytical theory seems to miss the point.

Lerdahl proposes an “ideal” level of vibrato as the baseline, and says that more or less vibrato is less consonant (i.e., less stable) than this ideal. In other words, a sound that began non-vibrato and then progressed to the ideal level would progress from less stable to more stable. Less arbitrarily, Lerdahl also suggests that a spectrum following the harmonic series is the most consonant and stable, and deviations from this are less stable.

Lerdahl determines his prolongations by saying that unstable things eventually come under the reign of the more stable things. Borrowing terms from counterpoint, Lerdahl articulates the timbral equivalents of arpeggiating, neighboring, and passing functions. He depicts these possibilities using branching graphs, a familiar sight to anyone who knows GTTM. Below are these arpeggiating (Figure 10), neighboring (Figure 11), and passing functions (Figure 12) given certain vowel sounds. The vowel sounds are given in IPA. The increasing subscript numbers indicate an increase in brightness within that vowel sound.

Screen Shot 2016-12-05 at 5.08.33 PM.png

If I imagine myself saying these vowels (and this is difficult, since Lerdahl does not futher define the qualities of brightness present for each subscript numeral), I start to feel foolish for attempting to think of these as passing, neighboring, or arpeggiating. Perhaps it’s simply the terminological analogy, which as a Schenkerian I find a bit distracting and loaded. But contrapuntal analogies or no, I am skeptical that we can organize timbre hierarchically. Lerdahl writes, “It might be supposed that the pitch-timbre analogues are artifacts of the way the issues have ben posed. But it is more interesting, and I believe more true, to argue that the underlying principles channel musical cognition and that the analogues rely on certain of these principles” (157). I’m no cognition expert but I’m skeptical that these trees necessarily represent our cognition of timbre structures.

The preoccupation with hierarchy is probably typical of music theory in the 1980s, and Lerdahl’s work likely simply reflects that. Instead, reflective of our present day, where music theory continues to borrow more and more from the perspective of musicologists and ethnomusicologists, I prefer to enhance timbral associations through cultural context.  That may be a post for another time.

Beat of a Different Drummer?

(Is this title too dorky? Be honest.)

(…Actually, don’t tell me.)

In my dissertation research I’m turning toward drum machines. It’s a natural extension of my ’80s sound inquiries: if the Yamaha DX7 was so important to the ’80s sound, drum machines like the LinnDrum and the Roland TR-808 were at least equally important.

Analyzing the timbre of drum machines using my existing apparatus has revealed how biased toward pitched phenomena theories of timbre really are. For example, so many theories of timbre are completely preoccupied with overtones/partials and their relative loudness. (For more info on spectrogram analysis, check out the first half of this blog post.)

This spectrogram is of a harmonica synth playing a melody. Time is on the x-axis in seconds. Pitch is on the y-axis in Hertz (higher Hz = higher pitch). The bottom line of this spectrogram, at around 500 Hz, is the fundamental pitch. Colloquially we just call this “the pitch.” The parallel lines running above the fundamental are the partials of this sound. You don’t hear them as separate notes, but instead you hear a change in timbre.

But for many percussion instruments, drums and cymbals and such, you won’t see any partials like that at all. Even drums that are pitched don’t really have partials running in multiple parallel lines above it.

all sounds mono.png
These are samples from a Roland TR-808: bass drum, low tom, mid tom, high tom, snare, closed hi-hat, open hi-hat, clave, and handclaps. Notice how these are all just thick bars of sound, not at all like the parallel strands in the above example.

So it does us no good at all to talk about partials, how those partials compare to the ideal natural harmonic series, whether there’s vibrato, etc. Yet, that’s the majority of the focus of spectrogram analyses.

Over the next few weeks I’m going to start finessing how we can talk about timbre in non-pitched percussion instruments. For now, back to the grind…

’80s-inspired music

Last Wednesday I was a featured contributor to the podcast Pop Unmuted on an episode about ’80s music—listen here.

We are currently living in a kind of ’80s revival. Google “How do I make my song sound 80s?” and you can see hundreds of posts on online forums from amateur producers looking for an ’80s sound.

Screen Shot 2016-07-18 at 9.20.17 AM

The funny thing about this is that of course the ’80s was an entire decade of music, and there were tons of different genres and styles that were going on at this time. Obviously it would be difficult to name even a single characteristic that was represented in every ’80s style. And yet there’s something that persists in the collective memory of people today that can be called an ’80s sound.

How do we make something sound ’80s? When today’s millennials—who were only infants or children in the ’80s—recreate an ’80s sound, how does it compare to an authentically ’80s sound? What elements of the ’70s or the ’90s get misremembered as an ’80s phenomenon?  All these questions are discussed in the episode. Here’s a bit of my conversation with Scott Interrante and Kurt Trowbridge:

Megan: I think that a lot of the people who use, you know, “’80s-ness” in this way are younger people, like a lot of young producers maybe want to make music that “sounds ’80s”. And so they’re kind of creating this memory of the ’80s that they’re then putting into this music. And maybe they’re not so super familiar with what makes something sound ’80s authentically.

Scott: Right. Well, I think it’s also not always coming from the artist. You know, like, I don’t know that M-83 set out and said “We’re gonna make music that sounds like ’80s synth-pop,” or if they made music and then it was labelled as such. At this point, you know, I just mentioned, M-83 who really broke out in 2007? 08? So now we’re almost ten years past that—at what point do we just realize, “well this is what music sounds like now”? But we sort of continually put that ’80s label onto it, maybe against the artists’ wishes, maybe not in every case, but I do wonder where that label comes from.

We also discuss ’80s-style covers like Tronicbox’s remix of Ariana Grande’s “Focus”: what is authentically ’80s about this, and what’s not authentically ’80s?


Learn more about how we relate to the ’80s today by listening to the episode on Pop Unmuted.

I’ve been on the Pop Unmuted podcast a few other times, too—check out this episode on Max Martin, one of the songwriters behind the Backstreet Boys, *NSYNC, and Britney Spears, or this episode dedicated to #FreeKesha, which I also wrote about a bit in another blog post.

header image credit: Igor Fuentes

Reading about embodiment (Heidemann on timbre)

Since I first saw it, I’ve been fascinated by this video of an impressionist singing in the style of many different singers. I love karaoke, and I love doing impressions of quirkier singers myself (Celine Dion, Idina Menzel, and Britney being a few of my favorites)—I’m nowhere near as good as Christina Bianco, but it’s good to have goals.


Actually, watching Christina Bianco convinced me that it must be possible for anyone to sing beautifully. It must all just be muscles and vowel placement and so on, if this one woman can make all these different kinds of voices!

Never having studied the voice seriously, it’s hard for me to describe how I would make these different voices. But it’s probably the first thing you’d try to do in describing this video to someone else. I’m reading Kate Heidemann’s article, “A System for Describing Vocal Timbre in Popular Song,” recently published in Music Theory Online 22/1, which I find a completely wonderful way to discuss vocal timbre. This article pinpoints the kinds of distinctions I’m tuning into when I watch that Youtube video above.

Her methodology for timbre analysis is centered on embodiment of vocal timbres, and naturally her writing about it frequently describes what the body does to create these sounds. For example, paragraph 3.19:

By pulling one’s larynx lower, tilting the thyroid cartilage forward, expanding the pharynx, and drawing the velum upwards (but not completely closing off the nasal passages), it is possible to create a vocal sound that listeners often perceive as much darker, and sometimes quieter, than those previously described. Estill refers to this vocal timbre as “sob” since she relates the vocal setting that produces it to “silent, suppressed sobbing” (McDonald Klimek, Obert, and Steinhauer 2005b, 31). In teaching this sound, vocal instructors might encourage a student to breathe deeply through the nose to lower the larynx and expand the sidewalls of the pharynx, or yawn to aid in tilting the thyroid cartilage, opening the pharynx, and raising the velum. This sound can often be heard in operatic singing, and is an important component of the “crooning” vocal style. It is a regular feature of Bing Crosby’s singing (e.g. in “Christmas in Killarney,” Example 12), and characterizes Cher’s singing voice as well. It can require extra energy to maintain an expanded pharynx while singing, but this vocal tract position is typically very easy on the vocal folds—this can make listening to and mimicking this vocal timbre feel rather soothing.

To turn to a more meta perspective and discuss the act of writing: I’m struck by how engrossing this writing is, even while it’s very technical. The explicit verbal descriptions of how one embodies these techniques really draws me in as a reader. It’s very powerful to imagine your own body interacting with these different vocal sounds. I almost think there should be a directive at the beginning of the article to read it in a space where you can confidently sing with these actions!

I saw this paper presented at the SMT national meeting in 2014 (I think). The paper was delivered in the traditional way, as a lecture, but I would love to see this presented again as some kind of workshop, where Heidemann gets audience members on the right track with embodying the sounds and producing them themselves. Or alternatively, this would be an amazing article to teach in a class to students, immediately engaging all of them. It was published too late for me to include it in my pop music seminar this past semester, but I’ll definitely make a point to fit it in at my next opportunity.

A theory of attacks?

Studies have shown that the attack (onset) of a sound plays an important role in a listener’s ability to accurately determine the sound’s source. In Saldanha and Corso 1964, listeners were able to identify the source of a tone with 50% greater accuracy if the attack of the sound was included in the sample, as opposed to a sample that cuts out the attack and plays only the sustain of the sound.

Therefore the attack of a sound must greatly influence our perception of timbre. In order to summarize the most important aspects of a timbre, my methodology must have an adequate way of accounting for the attack of the sound. How to do this? At the moment, my methodology is based on a system of oppositions. My first thought, of course, was an opposition between sounds with a fast attack and a slow attack. But isn’t this oversimplifying? There are probably degrees of variance between “fast” and “slow.” (Now you have a little insight into what I think about when I walk between my apartment and the cafe.)

The critique of binaries as being over-generalizing is leveled at me a lot. But McAdams 1999 shows that perhaps this isn’t actually a damaging oversimplification. McAdams also theorizes timbre but from a perceptual approach. From a study asking participants to rank the similarity between 153 pairs of timbres, McAdams devised a three-dimensional timbre space onto which the 18 timbres could all be mapped. One of these dimensions is attack time, on a scale from short (4) to long (−3). Listeners seem to have conceived of attack times as basically short or long, with little middle ground. This is visible in McAdams’s Figure 2 (below) by the grouping of the sounds into two clusters: there are basically timbres that are up high at around +2 (vibraphone, guitar, harpsichord) and timbres that are down low at around −2 (clarinet, trombone, English horn). 

Screen Shot 2016-06-20 at 3.13.50 PM
from McAdams 1999, 89.

This encourages me, but I admit that binaries are not always going to be appropriate. I have already begun to discard one binary in favor of a number, which measures the distance in octaves between the fundamental and the highest sounding partial. In Figure 2 above, spectral centroid and spectral flux do not neatly fall into two groups. McAdams’s research here confirms that binaries might not adequately capture spectral centroid or spectral flux (which colloquially maybe could be referred to as brightness and hollowness, respectively—I’ll save a more thorough investigation of these ideas for another time). On these axes, all timbres are scattered around the values from 3 to −3. So in these cases, the usefulness binaries may have to be reassessed. 

Even if binaries are good to assess slow attack vs. fast attack, I may still not be adequately capturing other ways that attacks contribute to timbre. McAdams 1999 is actually using FM-synthesized sounds, not acoustic sounds, in this study. I haven’t studied this exhaustively but a hypothesis I have is that FM synthesizers like the Yamaha DX7 do not have as complex or as nuanced attack sounds as acoustic instruments do—even though the DX7 has a highly sophisticated envelope generator. But perhaps McAdams’s use of FM synthesis lead to this binary being a useful generation. Acoustic instruments may have opened up more subtleties in attack sounds that would not be so easily captured. 

How Things Sound

“Any accurate analysis of rock music must therefore ultimately account for its timbre and studio production at least as much as on the traditionally analyzed parameters of tonality, harmony, and meter; in other words, how the song sounds is as important—if not more so—than what is sounding.”


Kevin Holm-Hudson, “The Future Is Now … and Then: Sonic Historiography in Post-1960s Rock”

Trying out a new post format today—posting a quote from a recent reading and reflecting on it a teensy weensy bit.

Holm-Hudson’s idea of sonic historiography, tracing the history of rock music through the sound of that music, is integral to my approach and my (still under construction) thesis statement for my dissertation. Part of what I want to do is define the “’80s sound” through its technology, analyze the timbres of those technologies, and finally raise issues of aesthetics and reception and how they relate to those timbres/technologies.