A theory of attacks?

Studies have shown that the attack (onset) of a sound plays an important role in a listener’s ability to accurately determine the sound’s source. In Saldanha and Corso 1964, listeners were able to identify the source of a tone with 50% greater accuracy if the attack of the sound was included in the sample, as opposed to a sample that cuts out the attack and plays only the sustain of the sound.

Therefore the attack of a sound must greatly influence our perception of timbre. In order to summarize the most important aspects of a timbre, my methodology must have an adequate way of accounting for the attack of the sound. How to do this? At the moment, my methodology is based on a system of oppositions. My first thought, of course, was an opposition between sounds with a fast attack and a slow attack. But isn’t this oversimplifying? There are probably degrees of variance between “fast” and “slow.” (Now you have a little insight into what I think about when I walk between my apartment and the cafe.)

The critique of binaries as being over-generalizing is leveled at me a lot. But McAdams 1999 shows that perhaps this isn’t actually a damaging oversimplification. McAdams also theorizes timbre but from a perceptual approach. From a study asking participants to rank the similarity between 153 pairs of timbres, McAdams devised a three-dimensional timbre space onto which the 18 timbres could all be mapped. One of these dimensions is attack time, on a scale from short (4) to long (−3). Listeners seem to have conceived of attack times as basically short or long, with little middle ground. This is visible in McAdams’s Figure 2 (below) by the grouping of the sounds into two clusters: there are basically timbres that are up high at around +2 (vibraphone, guitar, harpsichord) and timbres that are down low at around −2 (clarinet, trombone, English horn). 

Screen Shot 2016-06-20 at 3.13.50 PM
from McAdams 1999, 89.

This encourages me, but I admit that binaries are not always going to be appropriate. I have already begun to discard one binary in favor of a number, which measures the distance in octaves between the fundamental and the highest sounding partial. In Figure 2 above, spectral centroid and spectral flux do not neatly fall into two groups. McAdams’s research here confirms that binaries might not adequately capture spectral centroid or spectral flux (which colloquially maybe could be referred to as brightness and hollowness, respectively—I’ll save a more thorough investigation of these ideas for another time). On these axes, all timbres are scattered around the values from 3 to −3. So in these cases, the usefulness binaries may have to be reassessed. 

Even if binaries are good to assess slow attack vs. fast attack, I may still not be adequately capturing other ways that attacks contribute to timbre. McAdams 1999 is actually using FM-synthesized sounds, not acoustic sounds, in this study. I haven’t studied this exhaustively but a hypothesis I have is that FM synthesizers like the Yamaha DX7 do not have as complex or as nuanced attack sounds as acoustic instruments do—even though the DX7 has a highly sophisticated envelope generator. But perhaps McAdams’s use of FM synthesis lead to this binary being a useful generation. Acoustic instruments may have opened up more subtleties in attack sounds that would not be so easily captured.