Category Archives: Phonetics

Friday, December 2 @ 3:30 PM: Jacob Phillips (UChicago)

Please join LVC this Friday, December 2 at 3:30 PM in Rosenwald 301. It will be our last meeting of the quarter and our speaker is our own Jacob Phillips. Hope you can make it!

Retraction in Action: Examining phonological and prosodic effects on /s/-retraction in the laboratory”
Jacob Phillips
University of Chicago

An ongoing sound change in American English is /s/-retraction, the process by which /s/ is articulated approaching /ʃ/ in the context of /r/. Speakers vary significantly in the degree of retraction observed, with all individuals exhibiting coarticulatory effects of /r/ in /sCr/ clusters and some individuals displaying an apparent sound change, with /s/ reanalyzed as /ʃ/ in /str/ clusters (Mielke et al., 2010; Baker et al., 2011). The present study uses experimental methods seeks to better understand the actuation of this sound change through a phonological and prosodic lens. College-aged students from across the United States read a series of sentences manipulating the phonological and prosodic environments of these sibilant. The results of this study demonstrate a retracted /s/ in the context of /r/ and phrase-intitially. While there was not a significant group-level effect for the interaction of prosodic position and phonological environment, the inclusion of by-subject random slopes for that interaction, which significantly improves model likelihood, suggests that individuals vary with respect to the effects of prosodic conditioning of /s/-retraction in different phonological contexts. These findings suggest a possible role of prosodic position in the actuation of sound change, both in production and possible effects in perception.


Daniel Chen (Toulouse School of Economics) @ LVC on Friday, April 29th!

“Covering: Mutable Characteristics and Perceptions of (Masculine) Voice in the U.S. Supreme Court”

Daniel Chen
Institute for Advanced Study, Toulouse School of Economics

Using data on all 1,901 U.S. Supreme Court oral arguments between 1999 and 2013, we document that voice-based snap judgments based solely on the introductory sentences of lawyers predict Justices votes. The connection between vocal characteristics and court outcomes is specific to perceived masculinity even when judgment of masculinity is based only on less than three seconds of exposure to a lawyer’s speech sample. Although previous studies suggest a significant role for vocal characteristics on real world behavior, prior to our work none has identified a definitive connection using identical phrases. Roughly 30% of the association between voice-based masculinity and court outcomes comes from within-male lawyer variation, whereas 70% comes from between-male lawyer variation. Moreover, voice-based first impressions predict both male and female lawyers’ court outcomes: less masculine males and more feminine females are more likely to win. A de-biasing experiment separately identifies statistical discrimination and prejudice by showing that information reduces 40% of the correlation between perceived masculinity and perceived win and incentives reduces another 20% of the correlation. The negative correlations between perceived masculinity and win rates were stronger in private firms and in petitioner classes with more masculine voices. Perceived masculinity explains an additional 10% of variance relative to best existing prediction models of Supreme Court justice votes. Sincere and strategic voting considerations may explain why liberal justices were more likely to vote against male lawyers perceived as more masculine and conservative justices were more likely to vote for female lawyers perceived as more feminine.

Friday, April 29th at 3:00 PM in Rosenwald 015

Britta Ingebretson @ LVC on Friday, March 11th!

Friday, March 1st @ 3:00PM in Rosenwald 015

Shepu or Mandarin? Attention and second order indexicality in a Chinese yoga studio

Britta Ingebretson
University of Chicago

In this talk, I will examine how the phonetic qualities of language become mobilized in processes of second-order indexicality in a yoga studio in Huangshan, China. Shepu, a portmanteau of Shexianhua (She county dialect) and Mandarin, is the local term for the dialect of Mandarin spoken in She county, a nonstandard dialect which incorporates many phonetic, prosodic, and tonal qualities from Shexianhua. Second-order indexicality is the process through which indexical relationship between ways of speaking and certain types of speakers becomes naturalized, such that ways of speaking become seen as iconic of, rather than indexing, certain types of speakers, and thus linking linguistic traits to other socially meaningful non-linguistic traits. While much literature has been devoted to showing how listener judgments allow the listener to classify speakers as belonging to certain social categories, in this talk I will show how the process also works in reverse. If listeners have already classified individuals as a certain social type, they are more likely to be attentive to and pick out the qualities of speech which conform to their preconceived perceptions than they are with other speakers, regardless of actual speaker variation. I show how this process works with three speakers of Shepu

Kathryn Franich @ LVC on Friday, November 6th!

Friday, November 6th @ 3:00PM in Rosenwald 301

Intrinsic and Contextual Cues to Tone Perception In Medʉmba
(or: A How-To Guide for Doing Phonetics Experiments in the Field)

Kathryn Franich
University of Chicago

In this talk, I discuss results of experimental work on tone perception in Medʉmba, a Grassfields Bantu language spoken in Cameroon. The following research questions were investigated:

1) What kinds of acoustic cues are relevant to the perception of tones in this language?
2) Is tone perception sensitive to pitch information from the surrounding context? And if so, is perception sensitive to contextual information from non-speech sounds as well as speech sounds?

Results indicate that both F0 and duration are important cues to tone perception, but that the influence of duration was strongest where target F0 values were low. This finding is in-line with previous cross-linguistic work showing interactions between duration perception and tone and is thought to arise through a compensatory mechanism on the part of speakers to normalize for F0-related perceptual or articulatory biases (Yu 2011, Gussenhoven & Zhou 2013).

Results also indicate that perception of tones on target syllables was influenced by the tone of the syllable in the previous trial within the experimental block. Interestingly, preceding non-speech tones did not influence perception, suggesting that the observed contextual effect was specific to linguistic stimuli, rather than attributable to domain-general auditory processing effects, as has been suggested by Huang & Holt (2009; 2011).

In describing the experiment, I provide a play-by-play of its design and execution to highlight ways in which typical laboratory setups can be adapted for a fieldwork setting. In particular, I focus on subject recruitment, stimuli creation and presentation, pilot-testing, and the use of computers for data collection in contexts where subjects are not accustomed to them.

14 April: Tony Woodbury (UT Austin)

Monday, April 18th @ 3:00 PM, Pick 016

The Emergence from Tone of Vowel Register and Graded Nasalization in the Eastern Chatino of San Miguel Panixtlahuaca

(based on joint work with John Kingston, University of Massachusetts, Amherst)

The Chatino languages (Otomanguean; Oaxaca, Mexico) generally retain the conservative Proto-Chatino vowel inventory: */a, e, i, o, u/, with nasalized counterparts */ą, ę, į, ǫ/. Pride & Pride’s 2004 dictionary of San Miguel Panixtlahuaca Eastern Chatino (PAN) indicates the same for that variety. But work by our group (Cruz et al. 2012) tells a quite different story. We find that PAN departed from the system by developing a more elaborate vowel system: /a, ɛ, e, i, ɔ, o, u/ (Cruz et al. 2012), as well as a contrast between ‘light’ and ‘heavy’ nasalized vowel sets: /ą, ę, ǫ/ vs.  /ąŋ, ęŋ, įŋ, ǫŋ/.

We argue that the main triggers for the expansion of this inventory was tonal: A mora-linked low or falling tone followed by a floating tone *L-(T) in Proto Eastern Chatino (pEC). In its (etymological) presence, the historical vowel system was rendered as /a, ɛ, e, ɔ, o/ and /ą, ę, ę, ǫ/ (merging *ę with *į); while in its absence the system was rendered as /ɔ, e, i, o u/ and /ąŋ, ęŋ, įŋ, ǫŋ/. We call the two renditions the low (and light-nasal) register vs. the high (and heavy-nasal) register, where ‘low’ and ‘high’ refer to the overall effect on Proto-EC vowel quality.


After giving general background on the Chatino languages, we describe the development from pEC of the PAN vowel system, justifying the claim that it is an innovation; we then use comparative evidence from other Eastern Chatino varieties to reconstruct the likely phonological and phonetic content of the *L-(T) tonal trigger (based on Campbell & Woodbury 2010). We then show that the tonal reflexes of the tonal trigger in the modern PAN tonal are virtually merged with non-*L-(T) tones for some speakers, and entirely merged for others, leaving a system in which the expanded vowel system has phonemic status while the tonal distinctions, if present, are residual.


This set of changes is significant as: (a) a relatively rare case of  relationship between vowel height and tone that is not mediated by voice quality (as discussed by Denning 1989; but cf. Becker & Jurgec 2008, who demonstrate a relationship between vowel height and tone in Slovenian); (b) an (unprecedented?) case of a relationship between nasal grading and tone); (c) a case involving tone where the crucial conditioning factor in a series of historical changes is synchronically barely detectable or undetectable, leaving room for alternative synchronic analyses; and (d) a demonstration of the value of comparative and historically-informed field work as a method for discovery and description, and as a source of insight for phonological and phonetic investigation.

24 February: Carissa Abrego-Collier (UChicago)

Monday, February 24th @ 3:00 PM, Kent 107

Investigating phonetic variation over time in the U.S. Supreme Court

Phonetic research over the past two decades has shown that individual speakers vary their phonetic realizations of words, phonemes, and subphonemic features. What we have found is that speakers show remarkable stability over time, while a small minority exhibit time-dependent variation—what we term change. Prior research has shown that individual-level phonetic change can occur at scales ranging from minutes (as induced in laboratory experiments (Nielsen 2007, Babel 2009, Yu et al. 2013) to years (as observed in speech corpora, e.g., Sankoff 2004, Harrington 2006). Significantly, this research suggests that individual change in both the short and long term may ultimately be a crucial component of sound change in a population.

The SCOTUS speech corpus project is concerned with this kind of individual variation and change. How do different phonetic variables vary over time? How do different speakers vary their pronunciations over time? That is, what time dependence, if any, do different phonetic variables show within individual speakers, and how might individuals’ variation patterns converge with one another?  These are the questions which I seek to address. My research will yield three types of contributions: an extensive speech corpus for studying the link between social interaction and language change; a study of change within individuals and within a group of speakers over time; and an exploration of the relationship between different individuals’ patterns of variation (which may be time-dependent), as mediated by linguistic, social, and environmental factors.  In this talk, I introduce the SCOTUS speech corpus, a digital audio archive of U.S. Supreme Court oral argument recordings transcribed to phoneme level via forced alignment.  I then describe an ongoing longitudinal study of phonetic variation and convergence using the corpus, which will analyze the speech of the justices of the Supreme Court over a period of 7 years. Using data from one term year as a case study, I present preliminary findings on one phonetic variable, vowel formants, and situate the current project within past research on phonetic variation and change over time.

21 October: Jonathan Keane (UChicago)

Monday, October 21st @ 3 PM, Harper 140

Variation in fingerspelling: time, pinky extension, and what it means to be active

This talk will look at two sources of variation in fingerspelling of American Sign Language: overall timing, and one aspect of hand shape.
Reported fingerspelling rates have considerable variation (a lower bound of ~125msec per letter; an upper bound of ~400msec) (Quinto-Pozos, 2010; Bornstein, 196; Hanson, 1981; Wilcox, 1992; Geer, 2010}. Most of these did not analyze individual letter-segments, but rather, the length of the word and divided by the number of letters expected. Some used a segment based analysis which showed word medial letters are fingerspelled quicker than initials or finals  (Reich, 1977). Emmorey (2011) showed breaking at a phonological-syllable-boundary aided fingerspelling perception. Building on these studies, we have collected and analyzed timing data from 4 ASL signers. We replicated many of the previous findings, and additionally found that there are large differences between different letter types, large individual differences, as well as differences in rate based on the type of word being fingerspelled.
We show that not only position, but also type of letter and signer have a large influence on the timing properties of ASL fingerspelling. Also, it is important to look at fingerspelling segment by segment because there are large differences based on the kind of segment being fingerspelled. Finally, there are large individual differences that are obscured by looking at rate simplistically (just holds, just transitions, or letters per minute).
It is widely assumed in the articulatory phonology literature that when an articulator is not active (unspecified in the gestural score) it assumes a neutral state. One example of this is that the velum, when not active, assumes a closed position; only when it is actively opened does it deviate from that position. This assumption makes predictions about speech that seem to be fairly robust: nasal sounds are more marked than non-nasal, and nasalization spreads from nasal sounds, etc. This neutral position, however, is at odds with the position that the velum assumes naturally when people are at rest (eg not speaking), which is open, allowing for air to be drawn into the respiratory system from the nose or mouth. This being the case, there must be some muscular activity on the velum during periods that have previously been described as inactivity in order to keep it closed. One solution to this apparent problem is to specify gestures for periods previously assumed to have no activity, although these gestures would necessarily be weaker than active articulator gestures.
There are two major predictions that come from the fact that the targets associated with nonactive gestures are not a physiologically neutral state, but rather a state that is default for speech. First, it is possible that the targets for nonactive gestures will differ cross linguistically with different languages having different default states. This is supported in work on spoken languages looking at default targets of nonactive articulators, or what are described as articulatory settings which vary from language to language (Wilson, 2006; Wilson, 2006; Gick, 2004). Second, it’s possible that the targets for nonactive gestures will vary depending on the targets of the active gestures. This will be used in the development of the articulatory phonology model of handshape proposed here for the configuration of the nonactive (nonselected) fingers.
Since the earliest theories of sign language phonology, handshapes have divided the fingers into selected and non-selected groups (Mandel (1981), ff). Mandel describes the selected fingers as the foreground and the nonselected fingers as the background. This talk presents an articulatory model of handshape which explicitly links this distinction to the distinction of active and inactive articulators used widely in speech (Browman, 1992). This link makes critical, testable predictions about the nature of handshape variation due to coarticulatory pressure: The hand configurations of a letter vary predictably based on surrounding context, constrained by the following tendencies: 1. The nonselected fingers are targets of coarticulatory pressure. 2. The selected fingers are the likeliest sources of coarticulatory pressure. The articulatory model of handshape is based on articulatory phonology (following Browman (1992)) and can explain the phonetic implementation of handshape from phonological specifications. It explains variation due to articulatory effects (eg coarticulation) because it uses dynamic articulator gestures. That is, the articulators that make up the hand are not static, sequential configurations (ie discrete units), but rather individual articulator gestures overlapping across segments. This ability to model gradient phonetic implementation and contextual variation represents a critical improvement over previous phonological models.
An analysis of coarticulation of pinky extension revealed a puzzling fact: There is less pinky extension coarticulation in handshapes where the pinky is selected and flexed (-A-, -S-, -E-, and -O-) compared to other handshapes where the pinky is nonselected and flexed. Despite having the same phonetic realization in both (flexed), the pinky behaves differently with respect to coarticulation depending on its membership in the selected fingers group. This follows directly from the articulatory model of handshape: in handshapes where the pinky is selected and flexed, there is less pinky extension as a result of coarticulation because the pinky is an active articulator, which suppresses coarticulatory pressure from surrounding articulator gestures because the flexion is associated with an (active) articulatory gesture.
The articulatory model of handshape provides a concrete and principled way to convert the phonological specifications of handshape into phonetic configurations using a model of articulator targets and gestures developed for speech. Additionally, the articulatory model of handshape correctly predicts how the active or inactive status of particular articulators will affect variation in natural production.


7 October: Ed King (Stanford University)

Monday, October 7th @ 3 PM, Harper 140

Voice-specific lexicons: acoustic variation and semantic association

Over the past twenty years, evidence has accumulated that listeners store phonetically- rich memories of spoken words (Goldinger 1996, Johnson 1997; Schacter & Church, 1992). These memorized episodes are linked to various speaker characteristics, including gender (Strand & Johnson 1996, Strand 1999), nationality (Hay & Drager 2010), and age (Walker & Hay 2011). Generally, listeners are faster and more accurate at recognizing spoken words when the acoustic patterns match speaker characteristics indexed by acoustic variation. Research has overwhelmingly focused on the match between acoustic patterns and lexical memories, predicting that speaker characteristics are only relevant in the initial lexical access stage of spoken word recognition. We investigate the effect of speaker-specific variation on semantic activation; if acoustic variation influences semantic activation, then effects of indexical variation are more pervasive than typically thought.

We first investigated this issue with a word association task: listeners heard a male or female voice producing words (probes) one at a time. Listeners provided the first word that came to mind for each word. Of 262 probe words, 59 (22%) resulted in different strongest associates across speakers, as determined for each probe-response pair by the frequency of that response to that probe for each voice (e.g., the most frequent response to the prompt ACADEMY_male was “school”, while for ACADEMY_female the strongest associate was “Awards”).

We subsequently tested the effects of these speaker-specific semantic associations in spoken word recognition with a semantic priming experiment, using 30 words whose strongest associates differed between speakers. Listeners heard a word produced by one speaker (the “prime”; e.g., ACADEMY_male or ACADEMY_female), then saw a printed word (the “target”; e.g., ”school” or ”awards”), and indicated whether the printed word was a real word. We expect faster responses when the speaker matches the semantic association (“awards” should be recognized more quickly when preceded by ACADEMY_female than by ACADEMY_male).

Listeners responded more quickly to semantically-associated words when the semantic association strength was strong and speaker-specific (p = 0.016). These results indicate that speaker-specific acoustic cues mediate spoken word interpretation as well as recognition. We suggest that a speaker’s voice provides semantic context in spoken word recognition.

29 April: Amanda Miller (Ohio State)

Monday, April 29th @ 3 PM, Wieboldt 408

What Can We Do with High Frame Rate Ultrasound: Investigating the Phonetic Basis of the Back Vowel Constraint in Mangetti Dune !Xung

Previously, the main articulatory field method used to investigate place of articulation was static palatography/ linguography. This method is invasive, and contact patterns are smeared over an entire syllable. Portable ultrasound can be used to find the place of articulation of consonants in field work settings, and it is safe and non-invasive. Standard ultrasound has made great gains in our understanding of sounds with relatively stable gestures: vowels, fricatives and liquids. High FR ultrasound allows us to view stop shutting and release gestures, the dynamics of diphthongs, clicks, labial-velars, and affricates, and C-V and V-V coarticulation.

I present a case study designed to investigate the phonetics basis of the Back Vowel Constraint (BVC), found in many non-Bantu and non-Cushitic click languages. The BVC is a C-V co-occurrence constraint found between alveolar and lateral clicks and the uvular fricative, with [i]. I present four experiments that investigate the phonetic basis of the BVC, by looking at the production of the four clicks, [k] and [ᵪ], in Mangetti Dune !Xung. The first two experiments investigate the production of the clicks using high FR ultrasound collected using the CHAUSA method (Miller and Finch 2011). TD and TR constriction locations prior to the anterior release are measured. The second experiment investigates the TD and TR locations over the first half of the vowel. The third experiment investigates F1 and F2 patterns in the vowel following the clicks. Regression analyses of the vowel data shows that the F2 patterns are statistically related to the TD/TR constriction locations in the alveolar and lateral clicks, while the F2 patterns in the dental and palatal clicks are best predicted by the TT constriction location. I attribute the TRR in the vowel to muscular constraints on click-vowel sequences that are similar to those found in English [r] variants.

11 February: Chris Corcoran (UChicago)

Monday, February 11th @ 12:30 PM, Social Sciences 302

The authentication of Sierra Leonean refugees

Competing ideologies of the acoustic characteristics of voice During the Sierra Leone civil war, 1991–2002, many European countries granted asylum to Sierra Leonean refugees. Those without documentation were given an opportunity to participate in a language analysis interview. There are many problems with the authentication process employed in these types of interviews (e.g., Eades 2010, Corcoran 2004). However, this paper focuses on the particular issue of competing ideologies associated with voice quality and prosody: relative breathiness, pitch, loudness, and tempo. From 2000­–2010, I contributed to assessments or counter-assessments in nearly fifty cases. European interviewers frequently admonished applicants to “speak up” in order to properly represent themselves. Applicants who spoke slowly using a lowered quiet breathy voice were identified as having something to hide or, at best, as rubes who did not understand how recording devices worked. In contrast to these Western assessments, I argue there are pan West African ideologies that associate these features with “good speech” (Obeng 2003: vii; Irvine 1973: 160­–­64, 1974; Yankah 1995) and, in particular for Sierra Leoneans, with positions of full Sierra Leonean citizenship in opposition to categories such as “stranger” (Dorjahn and Fyfe 1962). Supplementing previous work with current fieldwork with Sierra Leoneans living in the US, this paper presents acoustic analyses and ethnographic observation to contrast Sierra Leonean and Western ideologies concerning these characteristics of speech. Using Silverstein’s (1981) explication of the limits of awareness, I discuss how these ways of speaking have been taken up in naturalizing discourses and confound our ability to identify them as sites for potential misunderstanding.