Parsing's a bitch, ain't it? At least, that's the hook for a recent
Newsweek article about a theory of the brain's mechanism for telling time, which involves tracking signal propogation in neurons following perceptual events, or something; it was a bit hard to tell from the article's description.
On the researcher's homepages there's some nice java animations and sound files, suitable for intriguing undergraduates, illustrating the brain's temporal abillities. One of them shows that timing is important in speech perception. If an [s] sound is followed by a few milliseconds of silence and then by the vowel [i], the listener perceives the [s] as the lone onset consonant in a syllable [si]. If the [s] is separated from the [i] by just a few more milliseconds of silence, on the other hand, the listener perceives the silence as a voiceless stop between the [s] and the [i]. You get the impression of having heard the syllable [sti], instead of [s..i]. The point seems to be that in order to perform this feat, the brain has to have some quite sensitive mechanism for distinguishing small temporal intervals.
The Newsweek article starts off with a famous mondegreen to lure the reader, but then doesn't do anything to explain how that particular misparse is related to the question of timing. Never fear, though, Language Log is here!
So, here's the mondegreen in question:
Acting funny, but I don't know why
Jimi sounds like he's saying, "Scuse me, while I kiss this guy", when in fact the lyric is "Scuse me, while I kiss the sky". That is, the listener mishears the sequence
/kɪsðəˈskaɪ/
as
/kɪsðəsˈgaɪ/
Now, it is a puzzle why this would happen. One doesn't normally mishear /g/ for /k/ or vice versa; you wouldn't mix up 'coat' and 'goat', for example, in the usual case. But in the case of Purple Haze, some particularities of English pronunciation are at work to mislead you.
/k/ and /g/ are usually described as being pronounced exactly alike except for the vibration of the vocal cords: /k/ is voiceless and /g/ is voiced. Otherwise everything about the configuration of the tongue and oral cavity are identical -- they're both velar stops. In fact, however, there's more than one way to skin a /k/ in English, and some /k/s are more like /g/ than others.
English voiceless stops are pronounced in several different ways depending on their position in the syllable. A voiceless stop alone at the beginning of a syllable gets an extra oomph, an extra puff of air, making for a longer, more perceptible period of voicelessness before the vowel sound starts up (a longer "Voice Onset Time"). That extra puff of air is called aspiration, for those of you keeping track at home, and is transcribed with a superscripted 'h' after the consonant: [kh]. In coat, e.g., that initial /k/ is aspirated, so it's not pronounced just [kowt], but rather [khowt], when you really get down to it. That aspiration really makes the voicelessness of the /k/ stand out and sound quite different from the voiced /g/ at the beginning of 'goat', [gowt]. You'd never get them mixed up.
The trick is, the aspiration doesn't show up everywhere. Voiceless stops are notaspirated when they occur after an /s/ at the beginning of a syllable. (You can feel the difference if you put your hand in front of your face and alternate saying 'pot', [phɑt], and 'spot' [spɑt] — in the first you should feel the puff on /p/ but not in the second). In such cases, the absence of the little puff of air means that the voiceless period associated with the stop is shorter and less perceptible—that is, the /k/ sound following /s/ in complex syllable onsets sounds a lot more like a /g/ than other /k/s do.
This isn't normally a problem, because there aren't any syllables in English that begin with /sg/ -- that's not a legal English onset consonant cluster. But in connected speech, you might get a word that ends in a vowel (like the) in front of a word that starts /sk/ (like sky)...and in the right circumstances, the listener might think that the /s/ belonged with the preceding word that ended in the vowel, for instance if there's a very frequent alternate word, suitable to the syntactic and semantic context, that sounds just like the vowel-ending word but ending in an /s/ (like this).
Imagine the listener got as far as making that mistake, of understanding the /s/ as part of the previous word, hearing this ... rather than the s... Then the next problem they'd have to solve would be to try to identify the next consonant in line in the speech stream. It's definitely a velar stop -- but is it /k/ or /g/? "Well," their perceptual system reasons to itself, "if it were a /k/ at beginning a word like this, it would be aspirated -- I'd expect a longer period of voicelessness right here. Given that there's not too much voicelessness, I guess it must be a /g/." And presto! 'kiss the sky' turns into 'kiss this guy'.
1
So it does all have to do with timing after all -- the brain has to detect the difference between 30ms of voicelessness and 60ms of voicelessness, and the research described in the Newsweek article is about figuring out how it pulls that off.
Oddly enough, just as Prof. Shuy was posting the Newsweek clipping on the bulletin board next to the water cooler, I was reading a blog post all about this exact same thing over at In A Word: It's all about persbective. How would you spell that word of Mary Poppins'?
1Of course it's not really thinking this. It's just trying to settle on a pattern of activation that corresponds to a sensible linguistic representation for that stream of sound, given all the contextual factors involved. 'Kiss the sky' and 'kiss this guy' are two competing representations that are strongly activated given Jimi's sound waves. In fact, probably '...kiss this guy' gets an activation edge from the semantic context. Kissing is usually involves an animate direct object, after all. Those who hear 'kiss this guy' are seriously underestimating the degree to which the narrator in the song is acting funny.