Text to speech in language education

This post is my response to the first assignment of the open teacher education course identified below.

Pronunciation: text to speech

Open Educational Resources (OER) and Automatic Language Processing for Language Learning

PRONUNCIATION assignment

In up to 500 words, write down your experience with the OERs proposed. Have they been useful? Do you think they will help your students learn more successfully? Why?

The OERs are these:

http://www.naturalreaders.com/index.html
http://text-to-speech.imtranslator.net (available in multiple languages) http://www.fromtexttospeech.com (available in multiple languages)
https://text-to-speech-demo.mybluemix.net (with expressive SSML)

1. Natural readers https://www.naturalreaders.com/index.html

I put in this text from Google news and tried various accents.

A surprise victory for the government at this late stage seems unlikely and would be met with head-scratching in No 10, which has already conceded that parliament should be consulted at the end of the Brexit process.

Mike (US) and Graham (UK) miss No 10 (“no ten” instead of “number ten”) and intonation on head-scratching is off (head SCRATCHing instead of HEAD scratching).

I’m not sure I understand why we would want to hear it read by French or Italian speakers. How is this engineered? Is it sampled from French speakers reading English text, or does it just apply algorithms for machine reading of French to English text? I suspect the second. I teach English to French speakers so am certainly used to French-accented English, but nothing like “Alain” reading about Brexit (to hear him, paste my inset text above here and choose “Alain”). I defy anyone to understand “Juliette’s” version without a transcript.

I teach mainly French learners of English in higher education contexts in France. Some of them are future secondary school teachers of English facing national teacher entrance examinations which place high value on phonological and morphosyntactic accuracy in planned monologues. I have discussed some of the pronunciation problems I see in this post Improving spoken English: intermediate/advanced. I’m not at all sure how I would exploit text-to-speech tools with these students. They can get better information on phonemes and word stress from online dictionaries, and the suprasegmental information in the samples I’ve heard here don’t seem reliable enough to be useful.

2. IMtranslator http://text-to-speech.imtranslator.net/

I thought this was quite impressive. I typed in conversational French and the translation was pretty accurate, intonational contours less so perhaps.

untitled

3. Text to speech http://www.fromtexttospeech.com/

Next, another resource from text to speech, using the first paragraph of a CALL article (Gonzalez-Lloret, 2011):

The potential of CMC for L2 development resides mainly in the possibility that learners have to engage with other speakers of the language, including L1 speakers, which is especially important for the acquisition of not only linguistic resources but also social and pragmatic competence. As Thorne (2006) states “the use of Internet technologies to encourage dialogue between distributed individuals and partner classes proposes a compelling shift in second (L2) and foreign language (FL) education, one that ideally moves learners from simulated classroom-based contexts towards actual interaction with expert speakers of the language they are studying” (p. 3).

This tool creates an mp3 which you can link to (how long is it stored?) or download. My WordPress won’t accept this file type so I put it on SoundCloud for convenience:

Of course you can just visit from text to speech and do your own cut-and-paste with choice of speaker (that was British “Emma”). Or try French “Gabriel” (set to fast; there are 4 speeds) for another surreal experience.

To my ear “Emma’s” is a pretty good rendering – no obvious errors in intonation that would mislead the listener. But I’m struggling to imagine uses in the language classroom. I might use it if I wanted to have an article read to me during a commute, for example, though the time and planning required to convert and save the file to a device might not be worthwhile. If learners wanted more aural input, better to use authentic sources, surely, of which there is no lack.

In our course assignment, we were encouraged to experiment with different versions of sentences “to see how grammar affects voice outputs.” Here we see that the US voice distinguishes between the lexical verb have and the modal have to /hæftə/

  • learners have the possibility to engage with other speakers of the language mp3
  • learners have to engage with other speakers of the language mp3

You don’t seem to be able to retrieve this information from the Collins dictionary (have) so this gives the tool an advantage over a traditional learner dictionary in this case.

(On another note, you need to refresh the page (click the banner icon) for each new query; you can’t just cut and paste new text in the window.)

4. My blue mix

The tool was developed by IBM presumably for commercial purposes (see description). Here I listened to British and American voices reading English, and a French voice for French. I thought the French sounded better; is French intonation easier to imitate, or is my ear for French less discriminating?

There’s a feature called “expressive SSML” that tweaks the output in prosodic terms, in the example for customer service ends:

The Apology mode seems to place more emphasis (volume, length, pausing). Uncertainty has more pausing, Good News more pitch variation.

Another Voice Transformation features shows variation along different parameters: glottal tension, breathiness, strength, pitch range. Eleven of the 13 voices are female, and only two (female) are transformable in this way. It feels a bit Ex Machina.

Developers can use the tools to customise their own voices and specific texts.

Again, it’s not obvious to me how either the demo or the tool could be used for language teaching and learning beyond awareness-raising. I suppose lower proficiency learners could compare intonation in native and target languages, and more advanced ones could record themselves and compare with the synthetic voices. The tools seems to be ranked in order of sophistication, with perhaps the IBM demo the most convincing. It’s certainly interesting to see how these tools have developed in the past decade or so.

References

González-Lloret, M. (2011). Conversation analysis of computer-mediated communication. Calico Journal, 28(2), 308-325. PDF Calico

Improving spoken English: intermediate/advanced

screen-shot-2016-09-26-at-08-45-33

A new year, some new speaking classes for my students of English at a French university. It’s one thing to give students feedback on their spoken English, but what should they be doing to improve. Here are some ideas for students working with individual feedback in terms of individual sounds (phonemes), connected speech (stress, rhythm, intonation), and more generally.

Phonemes

The main problems involve

  • consonants in English that do not exist in French: h, th
  • vowel contrasts involving vowels not present in French
  • the s sound in plurals (present in French but not pronounced) and the third person singular of the present simple (he walks)

To work on /h/ try

To work on th, try

  • Sounds of American English (online or app) for articulatory information (voiced and voiceless lingua-dental fricatives)
  • shadow reading, paying attention to segments with th.

To work on vowel contrasts, try

You can also look at this interactive IPA chart to contrast, for example, a French uvular /r/ with an English alveolar one.

Connected speech

French and English stress patterns differ in two related ways

  • vowel length
  • sentence stress

In French, we don’t distinguish between short and long vowels – French vowels are generally all the same length. But in English, some vowels are longer and some shorter. In French, each syllable generally has the same weight. In English, there is quite a difference between stressed and unstressed syllables.

This means that French speakers of English sometimes have difficulty with sentence stress: transferring French intonation patterns means all syllables tend to be the same length (too short) and receive the same stress. Teachers might give feedback such as the following:

  • too many stresses: every syllable is the same length and has the same stress
  • clipped delivery: the syllables are all too short, with no long vowels/diphthongs
  • no weak forms: syllables are equally stressed, with no shortened, unstressed syllables

The sound schwa is the weakest unstressed sound, and also the most common vowel in English. Learn about schwa on the BBC Learning English archive from 2008 and also work on connected speech.

Another way of working on this is shadow reading. You need to find good audio with a transcript, then practice shadowing the speaker by reading along with the volume set low, so that you copy the way the speaker produces stressed and unstressed syllables. Read about this activity here.

Going further

You can read more about intonation in the form of nuclear stress or articulatory setting. Some students are uptalking – read about this here if you like.

But listening more will also help. You can listen to short extracts intensively, perhaps working with a transcript to identify particular sounds you have difficulty with, stressed and unstressed syllables, and other aspects of intonation. You can also listen extensively, to audiobooks, lectures and podcast with the goal of picking up speech patterns in a more subconscious manner.

 

References

Articulatory setting: an approach to pronunciation teaching

Buried treasure from the BBC (on ELF pronunciation): non-native accents of English.

H-deletion in connected speech

H-sound.

Interactive IPA chart.

Phonetics: the sounds of American English. How to use the site. (see also app)

Poetry Archive http://www.poetryarchive.org

Pronunciation of /h/ in English. ALERT Acquiring language efficiently: Research and teaching. Concordia University.

RP pronunciation. BBC English.

Shadowing and summarizing. YouTube lecture. Murphey, 2001.

Shadow reading. Habilitacioninglesmadrid.

Sounds of speech. University of Iowa app (Apple/Android)

Understanding nuclear stress. English as a lingua franca pronunciation.

Uptalk in the OED. Language Log.