Reading Passages for Adult Intelligibility and Dysarthria
J Spoken language Lang Hear Res. 2016 April; 59(ii): 230–238.
Comparing of Intelligibility Measures for Adults With Parkinson'south Disease, Adults With Multiple Sclerosis, and Healthy Controls
Kaila L. Stipancic
aUniversity at Buffalo, NY
Kris Tjaden
aUniversity at Buffalo, NY
Gregory Wilding
aUniversity at Buffalo, NY
Received 2015 Aug 3; Revised 2015 Oct xx; Accepted 2015 Oct 26.
Abstract
Purpose
This study obtained judgments of sentence intelligibility using orthographic transcription for comparison with previously reported intelligibility judgments obtained using a visual analog scale (VAS) for individuals with Parkinson's disease and multiple sclerosis and salubrious controls (K. Tjaden, J. E. Sussman, & Thousand. Eastward. Wilding, 2014).
Method
Speakers read Harvard sentences in habitual, clear, loud, and slow weather. Sentence stimuli were equated for peak intensity and mixed with multitalker babble. A full of 50 listeners orthographically transcribed sentences. Procedures were identical to those for a VAS reported in Tjaden, Sussman, and Wilding (2014).
Results
The percent right scores from transcription were significantly college in magnitude than the VAS scores. Multivariate linear modeling indicated that the pattern of findings for transcription and VAS was nigh the aforementioned with respect to differences among groups and speaking weather condition. Correlation analyses further indicated a moderately strong, positive relationship between the 2 metrics. The majority of these correlations were significant. Final, intrajudge and interjudge listener reliability metrics for the two intelligibility tasks were comparable.
Conclusion
Results suggest that there may be instances when the less fourth dimension-consuming VAS chore may be a viable substitute for an orthographic transcription task when documenting intelligibility in balmy dysarthria.
Intelligibility refers to the degree or the accurateness with which a listener recovers the acoustic signal or message produced by a speaker (Duffy, 2013). Intelligibility has as well been described or defined as how effective one is in his or her advice (Cannito et al., 2012), the ease with which the acoustic speech signal is understood (Kim & Kuo, 2012), or the extent to which the acoustic bespeak is understood (Tjaden, Sussman, & Wilding, 2014). Intelligibility should exist distinguished farther from the perceptual construct of comprehensibility (Yorkston, Strand, & Kennedy, 1996). Comprehensibility, as defined by Yorkston, Beukelman, and Tice (1996), refers to how much of the acoustic oral communication signal a listener understands when gestures, orthographic cues, semantic cues, and other types of contextual information are available (for a discussion of differences amid the perceptual constructs of intelligibility, comprehensibility, and comprehension, see a review in Hustad, 2008).
Intelligibility is a common consequence of dysarthria. Therefore, quantifying intelligibility is necessary for determining the overall degree of communication damage and for demonstrating the efficacy of dysarthria therapy techniques. In add-on, by measuring intelligibility over fourth dimension, treatment progress and disease progression tin be quantified. In everyday conversation, speech communication is typically produced in utterances composed of multiple words rather than single words or phonemes. Therefore, sentence-level metrics of intelligibility are presumed to alphabetize the magnitude of an individual'southward communicative difficulty (Weismer, 2009). As discussed in the following section, transcription and scaling tasks have frequently been used to measure sentence intelligibility in dysarthria.
Methods for Measuring Intelligibility
Transcription
Transcription has been characterized every bit an objective intelligibility measure (Miller, 2013; Weismer, 2009) and involves the listener writing the speaker's message give-and-take for give-and-take. The give-and-take-for-discussion transcription is then compared with the target production, and the percent of words correctly transcribed is calculated. Orthographic transcription is time consuming for both the listeners who must write or blazon what they call up they hear and the individuals who score the accurateness of transcription. Computerized scoring improves efficiency, but even and so, responses must exist checked for spelling and other errors. However, transcription is the aureate standard for quantifying intelligibility in the dysarthria literature, and the Sentence Intelligibility Test (SIT; Yorkston, Beukelman, & Tice, 1996) is undoubtedly one of the about widely used, published clinical tools for quantifying intelligibility (Duffy, 2013; Yorkston, Beukelman, Strand, & Hakel, 2010). Transcription also has been considered to yield practiced reliability both inside and amid listeners (Miller, 2013). Thus, dysarthria studies using transcription practise non consistently report listener reliability (run into Hustad, 2006a, 2006b; Liss, Spitzer, Caviness, & Adler, 2002; McHenry, 2011), although reliability has been reported in a few studies (see Bunton, Kent, Kent, & Duffy, 2001; Tjaden, Kain, & Lam, 2014; Tjaden & Wilding, 2010).
Scaling Tasks
In comparison to transcription, scaling tasks have been characterized equally more than subjective measures considering listeners are instructed to approximate how much of the speaker'due south message they understand or to approximate the extent to which the message was understood (Hustad & Weismer, 2007; Miller, 2013). Overall, scaling tasks for quantifying intelligibility take been criticized in the dysarthria literature. Listener reliability for these tasks has been questioned and, in certain cases, has been plant to be poorer than is ideal for research purposes (Miller, 2013; Schiavetti, 1992). However, some inquiry suggests that reliability for transcription and a visual analog scale (VAS) may be comparable. Tjaden, Kain, and Lam (2014) reported intrajudge correlation coefficients ranging from .57 to .99 (M = .80, SD = .13) for a scaling task and correlation coefficients ranging from .58 to i.00 (M = .fourscore, SD = .xiii) for a transcription chore. Although this study included simply 40 listeners who judged sentences produced by two speakers with Parkinson's disease, the results advise that information technology should non be assumed that listener reliability for transcription is superior to a scaling task. Despite concerns regarding listener reliability, scaling tasks provide some attractive benefits. These tasks are less time consuming and labor intensive than orthographic transcription (Miller, 2013). Scaling tasks may also be easily applied to longer connected speech tasks commonly obtained in clinical practice, such as paragraph reading.
Visual analog scaling is a blazon of scaling task that shows promise for measuring intelligibility (Kent & Kim, 2011; Van Nuffelen, De Bodt, Vanderwegen, Van de Heyning, & Wuyts, 2010). A VAS involves listeners choosing a point on a continuous line that does non contain any ticks or intervals to stand for their judgment of a given voice communication sample (Kent & Kim, 2011). For example, Tjaden, Sussman, and Wilding (2014) recently used a computerized VAS task in which listeners judged intelligibility. Listeners were presented with a continuous 150-mm vertical-oriented scale on a computer monitor, with endpoints of the scale labeled "understand everything" and "cannot empathize anything." Listeners were instructed to use a mouse and click on the line to indicate how well a given speaker's sentence was understood.
Comparison of Intelligibility Measures
Several dysarthria studies accept used multiple tasks (i.e., direct magnitude estimation vs. transcription) to index intelligibility for dissimilar types of speech stimuli (e.g., Metz, Schiavetti, Samar, & Sitler, 1990; Sussman & Tjaden, 2012; Yunusova, Weismer, Kent, & Rusche, 2005). However, limited knowledge is available about how objective and subjective metrics of intelligibility compare for the aforementioned stimuli. In fact, Weismer, Barlow, Smith, and Caviness (2008) commented that the "proper piece of work to identify the benefits and problems of different measures has nevertheless to be washed" (p. 284).
In i of the few studies that straight compared intelligibility metrics for the same stimuli, Hustad (2006b) found that, for four speakers with dysarthria, on average, transcription scores were college versus scores obtained from listeners estimating the percentage of words understood. However, the magnitude of the difference between transcription scores and percent estimates varied across speakers. Thus, although a few studies take reported different types of intelligibility metrics for the same stimuli, to date, no big-scale study has compared orthographic transcription and a VAS in dysarthria.
Summary and Purpose
Orthographic transcription is the gold standard for measuring intelligibility, simply it is labor intensive for the listener and the private scoring the accurateness of responses. Less time-consuming methods for measuring intelligibility, such equally subjective scaling tasks, are attractive. However, few studies have looked at how objective and subjective metrics of intelligibility compare. If transcription and scaling are found to yield equivalent levels of severity or outcomes likewise as similar listener reliability, then there may be instances when the more efficient scaling task could exist used.
Therefore, the purpose of the present study was to compare the objective intelligibility metric of orthographic transcription with the subjective intelligibility metric of a VAS. Toward this finish, sentences read by speakers with multiple sclerosis (MS) and Parkinson's disease (PD) as well every bit healthy controls were orthographically transcribed for comparing to VAS judgments of sentence intelligibility reported in Tjaden, Sussman, and Wilding (2014). Sentences in the VAS written report were produced in a speaker'due south typical manner of talking (e.1000., habitual condition) likewise every bit in speaking conditions used therapeutically to maximize intelligibility in dysarthria, including clear, loud, and slow. A principal focus of the VAS study was to determine the impact of these speaking weather on intelligibility. Speaking condition effects were of secondary involvement in the nowadays written report. That is, inclusion of all sentences and speaking atmospheric condition from the VAS study was desirable for the purpose of maximizing the size of the information corpus and to permit more straightforward comparison of transcription results to those previously reported for a VAS. The post-obit research questions were addressed:
-
Does orthographic transcription yield similar intelligibility differences among speaker groups and speaking conditions, equally previously shown using a VAS? In particular, is transcription intelligibility for the PD grouping significantly reduced relative to the command group, and practise the articulate and loud conditions yield significantly improved transcription intelligibility relative to the habitual condition?
-
What is the strength of the relationship between the percent correct scores from orthographic transcription and the scale values from a VAS?
-
Are there pregnant differences in intralistener and interlistener reliability for orthographic transcription and a VAS?
Method
Speakers
The 78 speakers and sentence stimuli from Tjaden, Sussman, and Wilding (2014) were used. Command speakers (n = 32) included 10 men (25–lxx years sometime, M = 56) and 22 women (27–77 years old, M = 57) who reported the absence of neurological illness. Speakers with PD (n = 16) included eight men (55–78 years, K = 67) and eight women (48–78 years, M = 69) who had a medical diagnosis of idiopathic PD. Speakers with MS (north = 30) included 10 men (29–threescore years, M = 51) and 20 women (27–66 years, M = 50) who had a medical diagnosis of MS.
Clinical metrics of single-discussion intelligibility, judgement intelligibility, and scaled voice communication severity for the Grandfather Passage were reported in detail in Sussman and Tjaden (2012) and are summarized in Table 1 for the purpose of describing the participants. Word intelligibility was obtained using the unmarried-word test of Kent, Weismer, Kent, and Rosenbek (1989). Sentence intelligibility was obtained using the SIT (Yorkston, Beukelman, & Tice, 1996). To obtain perceptual judgments of oral communication severity for the Grandfather Passage, listeners used a computerized VAS with scale endpoints of 0 (no impairment) to 1 (severe impairment). These clinical metrics demonstrate that many of the speakers with MS and PD had relatively high intelligibility (eastward.one thousand., loftier Sit down scores: MS = 93%, PD = 85%), only a noticeable oral communication impairment, as reflected in the college scaled speech severity scores relative to control speakers. The combination of the clinical metrics of intelligibility and scaled severity propose mild dysarthria for many speakers with MS or PD (Yorkston et al., 2010).
Table one.
Clinical metrics of intelligibility and speech severity for speaker groups.
| Group | Hateful % single-word intelligibility a | Mean % judgement intelligibility b | Hateful scaled speech severity score c |
|---|---|---|---|
| Control | 97 (.01) | 94 (2.seven) | 0.eighteen (.08) |
| MS | 96 (.03) | 93 (4.5) | 0.44 (.25) |
| PD | 95 (.03) | 85 (x) | 0.46 (.21) |
Experimental Speech Stimuli and Speech Tasks
Speakers read 25 Harvard psychoacoustic sentences (Institute of Electric and Electronics Engineers, 1969) in habitual, clear, loud, and ho-hum conditions. For each speaker, a subset of x sentences produced in each condition was used for intelligibility testing. Judgments of intelligibility for each speaker were obtained for 40 sentences (i.e., iv conditions × ten sentences). Each judgement contained between seven and nine words, and five key words (e.g., nouns, verbs, adjectives, and adverbs). An in-depth clarification of recording procedures was presented in Tjaden, Sussman, and Wilding (2014).
Every bit reported in the previous study and summarized in Tables ii and 3, acoustic measures of acoustic level and articulatory charge per unit were obtained using TF32 (Milenkovic, 2005) to verify the presence of product differences betwixt the speaking conditions. Table 2 indicates that all speaker groups increased hateful acoustic level (SPL) for the loud and clear weather relative to the habitual status. Descriptive statistics in Table 3 further indicate a reduced charge per unit for the slow and clear conditions relative to the habitual condition.
Table 2.
Mean sound pressure level in dB SPL as a function of grouping and condition.
| Group | Habitual | Articulate | Loud | Ho-hum |
|---|---|---|---|---|
| Control | 73 (two.7) | 77 (four.5) | 83 (4.0) | 73 (4.0) |
| MS | 72 (three.0) | 75 (4.iv) | 80 (3.6) | 72 (4.vii) |
| PD | 72 (three.two) | 75 (four.0) | 79 (4.0) | 72 (4.half-dozen) |
Table 3.
Mean articulation charge per unit (syllables per second) as a role of group and status.
| Group | Habitual | Clear | Loud | Irksome |
|---|---|---|---|---|
| Command | three.7 (0.44) | 2.3 (0.32) | 3.2 (0.46) | one.9 (0.48) |
| MS | 3.6 (0.sixty) | ii.seven (0.63) | 3.3 (0.69) | 2.4 (0.60) |
| PD | 4.i (0.58) | three.3 (0.75) | 4.0 (0.71) | 2.nine (0.75) |
Listeners
Listener characteristics for the l individuals who orthographically transcribed sentences were the same as in the VAS written report (Tjaden, Sussman, & Wilding, 2014). All listeners ranged in historic period from xviii to 30 years and were required to pass a hearing screening at 20 dB HL for 250, 500, 1000, 2000, 4000, and 8000 Hz bilaterally. Listeners were native speakers of standard American English and had at least a loftier school diploma or equivalent. Listeners were also required to report no history of speech, language, or hearing problems and have limited to no experience with disordered speech.
Stimuli Preparation and Perceptual Task
Transcription data in the present study were collected using the aforementioned methods that were used to collect the VAS data (Tjaden, Sussman, & Wilding, 2014). Sentences were mixed with multitalker blubbering with a signal-to-noise ratio of −3 dB to induce a more challenging listening surround and to reduce the likelihood of ceiling effects. Stimuli were presented to individual listeners at 75 dB SPL via headphones (MDR V300, Sony) in a double-walled audiometric booth using custom software. The chore took between 2 and three hours with breaks and was self-paced.
Sentences for all speakers and conditions were offset pooled and divided into x lists. Sentence lists contained one sentence produced by each of the 78 talkers in each condition. Furthermore, judgement lists included like numbers (N = xv or 16) of each of the 25 Harvard sentences in all conditions. 5 listeners were assigned to judge each listing. Each listener also judged a random choice of ten% of sentences twice to make up one's mind intrajudge reliability. After hearing a sentence one time, listeners were instructed to type exactly what they heard. Listeners had no cognition of speakers' neurological diagnoses or the speaking conditions. Custom software saved typed responses for later scoring.
A key word scoring paradigm was used (see also Hustad, 2006a). This key give-and-take scoring paradigm involved scoring the five key advisory words, including nouns, verbs, adjectives, and adverbs, in each Harvard judgement for a correct or incorrect match with the target. Following a like approach to Cannito and colleagues (2012), a liberal scoring approach was taken. Homophones (e.g., gel for jell) and phonetically right misspellings (e.1000., doon for dune) were scored as correct. In addition, the scoring epitome disregarded word order (due east.g., wooden square crate for square wooden crate). Other typing errors (e.g., both for booth) were scored equally incorrect, equally were incorrect plurals (eastward.1000., cherry for cherries) and tense markers (due east.g., dries for dried). An exception to this rule involved obvious spelling errors that did not create other words (eastward.thousand., arbupt for abrupt), which were scored equally a correct match. For each sentence production, the v listeners' responses were pooled, and the number of key words correctly transcribed was tallied. The percent correct scores was tabulated for each speaker in each status.
Scoring Reliability
Scoring reliability refers to the consistency or reliability of scoring the transcription responses and was based on a model used past Hustad (2008). Intrascorer reliability was determined by having the original scorer rescore five randomly selected listeners' transcriptions (or 10% of the transcription responses). Unit-by-unit understanding was obtained past dividing the number of agreements by the number of agreements plus disagreements. Pearson product–moment correlation coefficients for the commencement and second scoring of listener responses ranged from .98 to 1.00, with a mean of .99 (SD = .01). Interscorer reliability was determined by having a second scorer who was non involved in the initial scoring rescore 10% of the listener responses. Pearson production–moment correlation coefficients for the first and second scoring of listener responses ranged from .92 to 1.00, with a hateful of 0.98 (SD = 0.03). Both intrascorer and interscorer reliabilities are comparable to those from Hustad (2006a, 2006b) and McHenry (2011), and both measures betoken high levels of reliability in the scoring of transcribed responses.
Data Analysis
Dependent measures were characterized using both descriptive and parametric statistics. Analyses are described separately for each of the three inquiry questions.
Enquiry Question 1: Comparison VAS and Transcription Intelligibility
Descriptive statistics (i.e., means, standard deviations) were computed for the percent correct scores for comparison with the descriptive statistics of the VAS from Tjaden, Sussman, and Wilding (2014). This examination served every bit a descriptive comparison of overall ways for transcription versus scaling.
Transcription information were also analyzed using the same parametric statistics applied to the VAS data in Tjaden, Sussman, and Wilding (2014). QQ-plots were generated to evaluate the need for transformations. Inspection of these plots based on the scaled residuals indicated that no transformation of the outcome was needed. A multivariate linear model was fit to the data using SAS version nine.1.three (SAS Institute, Inc., Cary, NC). The pct correct scores were fit equally a function of group (control, MS, PD), condition (habitual, clear, loud and, slow), and a Group × Status interaction. A covariate representing speaker sex was included in each model to account for dissimilar proportions of male and female person speakers amid groups. Follow-up contrasts were made in conjunction with a Bonferroni correction for multiple comparisons.
Enquiry Question ii: Forcefulness of Relationship Between Transcription and VAS
Pearson product–moment correlation coefficients were used to examine the force of the relationship between the per centum right scores and scale values from the VAS. Ii correlation analyses were performed. First, a correlation analysis was computed for each of the 78 speakers for data pooled across conditions and sentences. Second, correlations were computed separately for each condition and group. Given four conditions and three groups, a total of 12 correlations were computed for this 2nd analysis. Because the present investigation is the first big-scale study examining these ii metrics of intelligibility, it was deemed important to examine not just grouping trends but also individual speaker trends.
Research Question 3: Listener Reliability Comparing
For the transcription data, the number of exact word matches was calculated for the 10% of sentences judged twice by each listener. Intralistener reliability was calculated by summing the number of key words that were correctly transcribed in both presentations of the stimuli and dividing by the full number of primal words. For a given sentence production, a listener may accept transcribed three key words right in the beginning presentation of the stimuli and iii cardinal words correct in the second presentation of the stimuli. Nevertheless, of the iii cardinal words transcribed correctly in both presentations, it was possible for only one of these exact words to exist transcribed correctly in both presentations. By comparison, Pearson product correlation coefficients for scale values from original and reliability trials were calculated to appraise intralistener reliability for the VAS data (Tjaden, Sussman, & Wilding, 2014).
Following Neel (2009) and Tjaden, Sussman, and Wilding (2014), interlistener reliability was assessed using intraclass correlation coefficients (ICCs). ICCs were calculated separately for each of the ten sentence sets because the listeners assigned to judge each of these lists heard different sentences. A ii-way mixed-furnishings model was used to decide the overall consistency of ratings amidst listeners. Aggregate listener performance was of involvement; therefore, average ICC metrics were considered the primary measure of agreement amongst listeners. ICCs for transcription were summarized using descriptive statistics and descriptively compared with the ICC scores for the VAS data (Tjaden, Sussman, & Wilding, 2014).
Results
Research Question 1: Pattern of Findings for Intelligibility
Descriptive Statistics
Results for transcription intelligibility every bit a office of group and status are shown in Figure 1A in the form of ways and standard deviations. Results for VAS intelligibility (Tjaden, Sussman, & Wilding, 2014) are shown in Figure 1B. Scores from the VAS could range from 0 (empathize everything) to i (cannot understand anything). To allow these scaled values to exist more hands compared with the percent correct scores in Figure 1A, the scale was reversed and values were multiplied by 100, so that scale values closer to 100 in Figure 1B represent greater intelligibility.
Descriptive statistics for (Panel A) percent correct transcription scores and (Panel B) visual analog scale intelligibility scores. The colored bars represent hateful intelligibility scores for each group in each status, and vertical bars indicate SD. MS = multiple sclerosis; PD = Parkinson'southward illness.
Effigy 1A indicates that transcription intelligibility in each condition was ever highest for the control group, followed past the MS grouping and the PD grouping. Effigy 1B shows this same pattern for the VAS, as the control group was the nearly intelligible in each condition, followed by the MS group and the PD group. Test of the two figures farther suggests that the overall percent correct scores from transcription were of greater magnitude than the scores from the VAS task.
Using the guideline that changes in sentence intelligibility of approximately 5% are likely clinically meaningful in the context of an adverse perceptual environment such as multitalker babble (eastward.g., Tjaden, Sussman, & Wilding, 2014; Van Nuffelen et al., 2010), the pattern of transcription intelligibility in Figure 1A was similar for all speaker groups. That is, for each grouping, the clear and loud weather condition did non differ but increased intelligibility relative to the habitual condition by at least 5%. In add-on, the tedious and habitual conditions did not differ. For all groups, VAS judgments in Figure 1B show that the clear and loud weather condition also did not differ merely increased intelligibility relative to the habitual status. Equally for transcription intelligibility, the habitual and tiresome conditions did non differ with the VAS.
Parametric Statistics
As previously noted in the Data Analysis section, a multivariate linear model was fit to the percent correct scores as a function of group (command, MS, PD), status (habitual, articulate, loud, boring), and a Group × Condition interaction. There were meaning master furnishings of grouping, F(two, 71) = x.77, p < .0001; and condition, F(3, 71) = 35.75, p < .0001. The Condition × Grouping interaction was non significant. Follow-up dissimilarity tests indicated that the PD grouping had poorer intelligibility when compared with both the control (p < .001) and MS (p = .015) groups. Transcription intelligibility for the articulate and loud conditions was significantly better than habitual (p < .05). To summarize, for all speaker groups, the clear and loud conditions significantly increased intelligibility relative to the habitual status, but the clear and loud atmospheric condition did not differ. Transcription intelligibility for all groups also was non significantly dissimilar for the habitual and slow weather condition. As elaborated in the Word, these results are virtually identical to those for the VAS (Tjaden, Sussman, & Wilding, 2014).
Research Question 2: Force of the Relationship Between Transcription and VAS
Correlation Analyses
For each of the 78 speakers, correlations for VAS scores and transcription scores were computed for all sentences pooled across atmospheric condition. Across the 78 speakers, correlations ranged from .08 to .87, with an average of .57 (SD = .178). All correlations were significant (p < .05), with 4 exceptions (control female xiv [CSF14] r = .227, p = .158; MS female 7 [MSF07] r = .165, p = .310; MS female 12 [MSF12] r = .124, p = .444; MS female sixteen [MSF16] r = .083, p = .610). When these nonsignificant correlations were excluded from the calculation of descriptive statistics, the mean correlation was .lx (SD = .151) for the remaining 74 speakers. Therefore, for the bulk of speakers, there was a moderately potent relationship between the transcription task scores and the VAS chore scores (Cohen, 1988).
We completed a second correlation analysis to examine the strength of the relationship between the transcription job scores and the VAS task scores on a per-status and per-group ground. All 78 speakers were included in these computations. All correlations were meaning (p < .05) and ranged from .83 to .99. On boilerplate, correlations were strongest for the PD grouping (Chiliad = .96, range = .94–.99), followed by the MS group (M = .95, range = .95–.97), and the control group (Thou = .87, range = .83–.89).
Figure 2 shows a besprinkle plot of the data with the percent correct transcription scores on the x-axis and the VAS intelligibility scores on the y-axis. Each symbol on the graph represents a single speaker in a given condition. Condition is designated by symbol color, and grouping is designated by symbol shape. A visual inspection of Figure 2 suggests a curvilinear relationship between the two intelligibility metrics when information for all groups and weather condition are considered. To explore this possibility, we undertook a trend analysis for data pooled across all speakers, groups, and conditions. A linear regression part was statistically significant (p < .001) and accounted for nearly 90% of the variance in the relationship between the 2 intelligibility metrics (i.eastward., adjusted R 2 = .89). A quadratic regression function was also pregnant (p < .001) but only accounted for an boosted 3% of the variance (i.eastward., adjusted R 2 = .92) in the relationship between the 2 intelligibility metrics.
Pct correct transcription scores versus visual analog calibration scores for each speaker in each condition (each point on the graph represents a single speaker in a given condition). Speaker group is designated by symbol shape, and condition is designated past symbol color. Control = circles, multiple sclerosis = squares, and Parkinson's disease = triangles.
Research Question 3: Listener Reliability
Intralistener Reliability
The intralistener reliability assay examined the proportion of verbal matches in transcription responses and yielded Pearson product–moment correlations from .32 to .88 across the 50 listeners, with a mean of .66 (SD = .13). All correlations were significant (p < .05). For comparison, in the VAS task, intralistener reliability correlation coefficients ranged from .60 to .88 across the fifty listeners, with a hateful of .71 (SD = .07) (Tjaden, Sussman, & Wilding, 2014).
Interlistener Reliability
Average interlistener reliability ICCs for transcription intelligibility ranged from .78 to .86 (M = .81, SD = .02), and single-measure out ICCs ranged from .33 to .54 (M = .45, SD = .06). All ICC measures, both single and aggregate, were pregnant (p < .05). Past comparison, average ICCs across the l listeners for scaled intelligibility ranged from .85 to .91 (M = .87, SD = .02), and unmarried-measure ICCs ranged from .54 to .68 (One thousand = .59, SD = .04; Tjaden, Sussman, & Wilding, 2014).
Word
Research Question 1: Pattern of Findings for Intelligibility
The blueprint of descriptive statistics for the 2 tasks, as well every bit the pattern of results for parametric statistics for the two tasks, was similar. For all groups, the clear and loud conditions, but non the slow condition, improved intelligibility relative to the habitual condition. In addition, the PD group was consistently judged to have the poorest intelligibility followed past the MS and command groups. This consequence held for both transcription and VAS intelligibility.
The results further suggest that raw scores were lower in magnitude for the VAS than for transcription. Hustad (2006b) also constitute that subjective intelligibility scores in the form of percent estimates were lower than scores derived from a transcription job for iv speakers with dysarthria. The similar blueprint and departure in magnitude of intelligibility scores for transcription and a VAS has implications for clinicians and researchers. To the extent that transcription and a VAS are both measuring the construct of intelligibility, clinicians and researchers may be able to choose the less labor-intensive VAS task with the knowledge that VAS scores can be expected to exist lower than raw pct right scores for transcription. In addition, considering transcription and a VAS yield raw intelligibility scores of unlike magnitudes, when the purpose is to compare intelligibility findings either beyond time or across speakers, either transcription or a VAS should be used exclusively.
Research Question 2: Strength of Human relationship Between Transcription and VAS
The correlation analyses indicated a moderately potent human relationship between the percent right scores derived from transcription and judgments of intelligibility from the VAS for each of the three speaker groups as well as the bulk of individual speakers (Cohen, 1988). The implication is that although the magnitude of the scores may differ, the overall pattern of scores was broadly similar. Ane implication is that transcription and a VAS task are tapping into the aforementioned perceptual miracle. Notwithstanding, for four speakers, the two intelligibility metrics were not significantly correlated. This consequence may exist due to the fact that these four speakers received intelligibility scores, both from transcription and VAS, on the higher end of intelligibility, leading to a very restricted range of intelligibility across sentences and conditions.
Research Question 3: Listener Reliability Comparison
Miller (2013) stated that because listeners' "internal yardsticks" differ on subjective intelligibility metrics such equally the VAS, the cease effect is poor interrater reliability (p. 603). Both interlistener and intralistener reliabilities were slightly higher for the VAS task (Tjaden, Sussman, & Wilding, 2014) than for transcription. The nowadays study would therefore announced to contradict Miller's (2013) statement because the VAS was establish to be at least as reliable every bit transcription. Results further propose that transcription should not be the preferred intelligibility metric solely on the basis of assumptions concerning reliability. Future studies are needed to statistically compare the reliability of the VAS and transcription.
Other Considerations
Several factors should be kept in mind when interpreting the findings from this study. Start, listeners heard the stimuli in the presence of multitalker babble, which is thought to produce an ecologically valid environment. Yet, because intelligibility of dysarthria in background dissonance has simply begun to be investigated (Yorkston, Hakel, Beukelman, & Fager, 2007), cartoon parallels to other listening environments should be done with caution. Speakers with MS and PD were also highly intelligible, as indicated by average intelligibility on the Sit in the vicinity of 90%. Thus, caution should be taken when extending the current results to other populations. Intelligibility results, and the difference between metrics of intelligibility, may differ more for less intelligible/more than severe speakers.
Last, although listeners from Tjaden, Sussman, and Wilding (2014) were demographically similar to those in the current study and met the aforementioned inclusionary criteria as listeners who performed the transcription chore, having the same listeners perform both transcription and a VAS may have yielded dissimilar results. Hereafter studies may consider having the same listeners perform both the transcription and the scaling tasks as in Hustad (2006b; see also Tjaden, Kain, & Lam, 2014).
Clinical Implications
The nowadays report both replicates and extends previous research. Hustad's (2006b) study included only four speaker participants, and the present written report included 46 participants with a diagnosis of MS or PD, and although a much larger sample was used in the present study, the results were similar to those of Hustad (2006b), who also establish that pct estimates underestimated transcription scores. In the present written report and in Hustad's (2006b) study, scores from an objective measure of intelligibility (i.e., transcription) and a subjective measure of intelligibility (i.e., VAS or percent estimates) were highly correlated, and listener reliability tended to be slightly higher in the VAS task than in the transcription task. These results support using a less fourth dimension-consuming scaling measure as a substitute for orthographic transcription in at least some instances. Even so, because in that location was variability among speakers with regard to the blueprint of intelligibility and the strength of the human relationship betwixt the two metrics, clinicians should exist cautious and use the same measure with a single patient over time, or between patients, if the purpose is to compare intelligibility from one measurement to some other. Overall, the nowadays results back up using a scaling measure to quantify intelligibility in an efficient way in both enquiry and clinical settings, assuming that listener fault patterns are non of interest.
The principal purpose of the Tjaden, Sussman, and Wilding (2014) written report was to compare the effects of reduced speech rate, increased vocal intensity, and articulate speech on intelligibility in an try to inform therapy decisions. Although examining the effects of these conditions was non an aim of the nowadays enquiry, the transcription intelligibility results for the atmospheric condition are worth noting. Results showed that intelligibility improved for speakers with MS or PD in the clear and loud weather relative to the habitual condition and that intelligibility was non improved in the slow condition relative to the habitual condition. The present findings farther support the idea that both clear speech and increased vocal intensity accept the potential to meliorate intelligibility in mild dysarthria and that a slowed spoken language charge per unit shows less hope for aiding intelligibility, at to the lowest degree for speakers with MS or PD with relatively mild involvement.
Directions for Hereafter Research
Farther inquiry is warranted to examine variables that contribute to intelligibility, such as listener fault patterns and speech product characteristics, including severity and type of dysarthria, presence of background noise, listener experience, and type of stimuli. Furthermore, future research could examine whether transcription of Sit down sentences (Yorkston, Beukelman, & Tice, 1996) and VAS judgments of the same sentences yield a like design of results. This may have implications for future software development and advances in the widely used computerized Sit. Last, other perceptual metrics, such equally perception of monopitch and naturalness likewise spoken communication comprehension are gaining traction (Anand & Stepp, 2015; Fontan, Tardieu, Gaillard, Woisard, & Ruiz, 2015). Their human relationship to various intelligibility metrics or tasks is worth further examination.
Acknowledgments
This piece of work was completed every bit office of the commencement author's master's thesis. Funding was provided past the Marking Diamond Research Fund of the Graduate Student Association at the University at Buffalo, The State Academy of New York, Buffalo, NY (PI: Kaila L. Stipancic) and National Institute on Deafness and Other Communication Disorders, Washington, DC, Grant R01DC004689 (PI: Kris Tjaden).
Funding Statement
This piece of work was completed as role of the showtime author's master's thesis. Funding was provided by the Mark Diamond Inquiry Fund of the Graduate Student Association at the University at Buffalo, The State University of New York, Buffalo, NY (PI: Kaila L. Stipancic) and National Constitute on Deafness and Other Advice Disorders, Washington, DC, Grant R01DC004689 (PI: Kris Tjaden).
References
- Anand Due south., & Stepp C. E. (2015). Listener perception of monopitch, naturalness, and intelligibility for speakers with Parkinson's illness. Periodical of Speech communication, Linguistic communication, and Hearing Inquiry, 58, 1134–1144. [PMC free article] [PubMed] [Google Scholar]
- Bunton K., Kent R. D., Kent J. F., & Duffy J. R. (2001). The furnishings of flattening fundamental frequency contours on judgement intelligibility in speakers with dysarthria. Clinical Linguistics & Phonetics, xv(3), 181–193. [Google Scholar]
- Cannito M. P., Suiter D. 1000., Beverly D., Chorna Fifty., Wolf T., & Pfeiffer R. (2012). Sentence intelligibility before and later on vocalism treatment in speakers with idiopathic Parkinson's disease. Journal of Voice, 26(ii), 214–219. [PubMed] [Google Scholar]
- Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Erlbaum. [Google Scholar]
- Duffy J. R. (2013). Motor oral communication disorders: Substrates, differential diagnosis, and management (3rd ed.). St. Louis, MO: Elsevier Mosby. [Google Scholar]
- Fontan L., Tardieu J., Gaillard P., Woisard V., & Ruiz R. (2015). Relationship between speech intelligibility and spoken communication comprehension in babble noise. Journal of Spoken language, Language, and Hearing Research, 58, 977–986. [PubMed] [Google Scholar]
- Hustad K. C. (2006a). A closer look at transcription intelligibility for speakers with dysarthria: Evaluation of scoring paradigms and linguistic errors made by listeners. American Journal of Speech-Language Pathology, 15, 268–277. [PubMed] [Google Scholar]
- Hustad K. C. (2006b). Estimating the intelligibility of speakers with dysarthria. Folia Phoniatricia et Logopaedica, 58, 217–228. [PubMed] [Google Scholar]
- Hustad Thou. C. (2008). The relationship betwixt listener comprehension and intelligibility scores for speakers with dysarthria. Journal of Speech, Language, and Hearing Enquiry, 51, 562–573. [PMC free article] [PubMed] [Google Scholar]
- Hustad Grand. C., & Weismer 1000. (2007). A continuum of interventions for individuals with dysarthria: Compensatory and rehabilitative handling approaches. In Weismer 1000. (Ed.), Motor speech disorders (pp. 261–303). San Diego, CA: Plural. [Google Scholar]
- Plant of Electrical and Electronics Engineers. (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, 17, 225–246. [Google Scholar]
- Kent R. D., & Kim Y. (2011). The assessment of intelligibility in motor speech disorders. In Lowit A. & Kent R. D. (Eds.), Assessment of motor speech disorders (pp. 21–37). San Diego, CA: Plural. [Google Scholar]
- Kent R. D., Weismer Grand., Kent J. F., & Rosenbek J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499. [PubMed] [Google Scholar]
- Kim Y., & Kuo C. (2012). Effect of level of presentation to listeners on scaled speech intelligibility of speakers with dysarthria. Folia Phoniatrica et Logopedica, 64(i), 26–33. [PMC free commodity] [PubMed] [Google Scholar]
- Liss J. K., Spitzer South. K., Caviness J. N., & Adler C. (2002). The effects of familiarization on intelligibility and lexical partitioning in hypokinetic and ataxic dysarthria. The Periodical of the Acoustical Order of America, 112, 3022–3030. [PMC free article] [PubMed] [Google Scholar]
- McHenry One thousand. (2011). An exploration of listener variability in intelligibility judgments. American Journal of Voice communication-Linguistic communication Pathology, 20, 119–123. [PubMed] [Google Scholar]
- Metz D. E., Schiavetti Northward., Samar V. J., & Sitler R. W. (1990). Acoustic dimensions of hearing-impaired speakers' intelligibility: Segmental and suprasegmental characteristics. Journal of Speech and Hearing Inquiry, 33, 476–487. [PubMed] [Google Scholar]
- Milenkovic P. (2005). TF32 [Computer program]. Madison, WI: University of Wisconsin–Madison. [Google Scholar]
- Miller Due north. (2013). Review: Measuring up to speech communication intelligibility. International Journal of Language & Communication Disorders, 48, 601–612. [PubMed] [Google Scholar]
- Neel A. T. (2009). Effects of loud and amplified speech on sentence and word intelligibility in Parkinson disease. Journal of Speech, Language, and Hearing Inquiry, 52, 1021–1033. [PubMed] [Google Scholar]
- Schiavetti N. (1992). Scaling procedures for the measurement of spoken communication intelligibility. In Kent R. (Ed.), Intelligibility in speech disorders (pp. eleven–34). Philadelphia, PA: John Benjamins. [Google Scholar]
- Sussman J., & Tjaden K. (2012). Perceptual measures of spoken language from individuals with Parkinson's disease and multiple sclerosis: Intelligibility and beyond. Periodical of Speech, Language, and Hearing Inquiry, 55, 1208–1219. [PMC complimentary article] [PubMed] [Google Scholar]
- Tjaden Grand., Kain A., & Lam J. (2014). Hybridizing conversational and clear speech to investigate the source of increased intelligibility in Parkinson's illness. Journal of Speech, Language, and Hearing Research, 57, 1191–1205. [PMC costless article] [PubMed] [Google Scholar]
- Tjaden 1000., Sussman J. East., & Wilding G. E. (2014). Impact of articulate, loud and tiresome speech on scaled intelligibility and oral communication severity in Parkinson's disease and multiple sclerosis. Journal of Speech, Language, and Hearing Research, 57, 779–792. [PMC gratis article] [PubMed] [Google Scholar]
- Tjaden K., & Wilding Thou. (2010). Effects of speaking task on intelligibility in Parkinson's affliction. Clinical Linguistics & Phonetics, 25, 155–168. [PMC free article] [PubMed] [Google Scholar]
- Van Nuffelen Yard., De Bodt M., Vanderwegen J., Van de Heyning P., & Wuyts F. (2010). Effect of rate command on speech product and intelligibility in dysarthria. Folia Phoniatricia et Logopaedica, 62, 110–119. [PubMed] [Google Scholar]
- Weismer Thou. (2009). Spoken communication intelligibility. In Ball M. J., Perkins Grand. R., Muller N., & Howard South. (Eds), The handbook of clinical linguistics (pp. 568–582). Oxford, UK: Blackwell. [Google Scholar]
- Weismer Chiliad., Barlow Southward., Smith A., & Caviness J. (2008). Driving disquisitional initiatives in motor speech. Periodical of Medical Spoken language Language Pathology, 16(four), 283–294. [PMC free article] [PubMed] [Google Scholar]
- Yorkston Chiliad., Beukelman D. R., & Tice R. (1996). Judgement intelligibility test [Measurement musical instrument]. Lincoln, NE: Tice Technologies. [Google Scholar]
- Yorkston Thousand. M., Beukelman D. R., Strand Eastward. A., & Hakel K. (2010). Management of motor spoken communication disorders in children and adults (third ed.). Austin, TX: Pro-Ed. [Google Scholar]
- Yorkston K. M., Hakel K., Beukelman D. R., & Fager Southward. (2007). Testify for effectiveness of treatment of loudness, rate, or prosody in dysarthria: A systematic review. Journal of Medical Speech-Language Pathology, 15(ii), xi–xxxvi. [Google Scholar]
- Yorkston K. M., Strand Eastward. A., & Kennedy M. R. T. (1996). Comprehensibility of dysarthric speech communication: Implications for assessment and treatment planning. American Journal of Speech-Language Pathology, five, 55–66. [Google Scholar]
- Yunusova Y., Weismer G., Kent R. D., & Rusche North. M. (2005). Breath-grouping intelligibility in dysarthria: Characteristics and underlying correlates. Journal of Spoken communication, Language, and Hearing Research, 48, 1294–1310. [PubMed] [Google Scholar]
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4972008/
Post a Comment for "Reading Passages for Adult Intelligibility and Dysarthria"