Transcription of sermons fazes people and AI

by Ed Thornton

02 June 2023

iStock

RESEARCHERS in the United States have attempted to ascertain which transcribes a sermon more accurately: a human or a machine.

Transcripts of sermons are often published on church websites. While some preachers stick to a script, which can then be published, extempore sermons or sermons from notes require a transcription to be typed from an audio recording.

Technological advances, however, have meant that this task can be carried out by a computer program, and the Pew Research Center in the United States has compared the results.

As part of a study in 2019, followed up in 2020, researchers downloaded 60,000 audio and video files of sermons. The object was to analyse topics discussed in sermons in different denominations. They used Amazon Transcribe, a speech-recognition service, to transcribe the sermons.

The researchers discovered a problem, however. “The Amazon service did not always get specific religious terminology or names right,” they write in an analysis published online. “A few examples included ‘punches pilot’ instead of ‘Pontius Pilate’ and ‘do Toronto me’ in lieu of ‘Deuteronomy.’”

The researchers then asked a “third-party human transcription service to tackle portions of some of the sermons that Amazon Transcribe had already transcribed, and then compared the results between the two”.

The took a “stratified random sample” of 200 sermons from different regions of the US: the Midwest, the South, and “a combined region that merges the Northeast and the West”. Sermons were drawn from four denominations for which they had a sufficient sample size: “mainline Protestant, Evangelical Protestant, historically Black Protestant, and [Roman] Catholic.”

Audio samples of the sermons, lasting between 30 and 210 seconds, were sent to the human transcription service.

They compared the machine and human transcription services using a metric called “Levenshtein distance”, which, they write, “counts the number of discrete edits — insertions, deletions and substitutions — at the character level necessary to transform one text string into another”. For example, if the word “Covid” is transcribed as “cove in”, there is a Levenshtein distance of three, because three edits are required: one to add a space between the “v” and the “i”, one edit to add an “e” after the “v”, and one edit to substitute the “d” for an “n”.

The researchers found that, across all the files that they analysed, “the average difference between machine transcriptions and human transcriptions was around 11 characters per 100. That is, for every 100 characters in a transcription text, approximately 11 differed from one transcription method to the other.”

They also detected a “small but statistically significant” difference in Levenshtein distances between denominations. “Text taken from Catholic sermons, for example, had more inconsistency between transcripts than was true of those taken from evangelical Protestant sermons. And sermons from historically Black Protestant churches had significantly more inconsistency in transcriptions when compared with the other religious traditions.”

They continue, however: “While these differences were statistically significant, their magnitude was relatively small. Even for historically Black Protestant sermons — the tradition with the largest mismatch between machines and humans — the differences worked out to around just 15 characters per 100, or four more than the overall average.”

When it came to how accurately humans and machines transcribed regional accents, the researchers were surprised: they had expected machines to struggle most with Southern accents — but, in fact, “transcriptions of sermons from churches in the Midwest had significantly more inconsistency between machine and human transcriptions than those in other regions.”

The difference was not great, however: “Midwestern sermons, despite having the greatest inconsistency across regions, had only two more character differences per 100 characters than the overall average.”

They are not sure why machines found Midwestern sermons more difficult to transcribe; but one factor, they write, might be “worse audio quality than those from other regions”.

The researchers conclude that “issues with transcription quality can be tied to the quality of the audio being transcribed — which presents challenges for humans and computers alike. . .

“By the same token, machine transcription may perform worse or better on certain accents or dialects — but that’s also true for human transcribers. When working with audio that has specialized vocabulary (in our case, religious terms), human transcribers sometimes made errors where machines did not. This is likely because a robust machine transcription service will have a larger dictionary of familiar terms than the average person. Similarly, we found that humans are more likely to make typos, something one will not run into with machine transcription.”

They warn, however, that “the reliability of machine transcription can sometimes backfire. When presented with a segment of tricky audio, for example, humans can determine that the text is ‘unintelligible.’ A machine, on the other hand, will try to match the sounds it hears as closely as possible to a word it knows with little to no regard for grammar or intelligibility. While this might produce a phonetically similar transcription, it may deviate far from what the speaker truly said.”

Browse Church and Charity jobs on the Church Times jobsite

World

Church of Uganda ‘grateful’ as harsh new anti-homosexuality law is approved

30 May 2023

Newborn babies victims in Sudan conflict

02 Jun 2023

Ukrainian leaders wary of Vatican ‘peace mission’

02 Jun 2023

World news in brief

02 Jun 2023

South Carolina Supreme Court issues final ruling on disputed churches

02 Jun 2023

Florida bishop election in final stage

02 Jun 2023

Top Comment

Analysis: Deeper questions of Adolescence

The spiritual health of teenagers needs addressing, says Steve Chalke

jobs.CHURCHTIMES

Job of the Week

Vicar

South West

An Inclusive benefice, with worship styles ranging from less formal Celtic worship to liberal catholic format with robed choir alongside traditional village services. Set in the beautiful county of Somerset in a colourful, vibrant town with an eclectic community.

Appointments

Director

West Midlands

Shallowford House is the flourishing Retreat and Conference Centre of the Diocese of Lichfield. It is an Edwardian property situated in a delightful nine-acre location in the Staffordshire countryside with good links to the M6 and the West Coast railway.

Find more jobs

Forthcoming Events

Women Mystics: Female Theologians through Christian History

13 January - 19 May 2025

An online evening lecture series, run jointly by Sarum College and The Church Times

tickets available

Transcription of sermons fazes people and AI

Other stories

Preachers need to mind their language

Don’t Fuss, Love God, Don’t Fuss, compiled by Ruth Bamforth

Michaelmas ordinations: Preaching with relevance

‘Outpouring’ at Asbury University

The Present Preacher: Discerning God in the now by Liz Shercliff and Matt Allen

Advent series: Time to awake out of sleep

World