Interactive Voice Response and Text-based Self-report Versions of the Electronic Columbia-Suicide Severity Rating Scale Are Equivalent

by Chad Gwaltney, PhD; James C Mundt, PhD; John H. Greist, MD; Jean Paty, PhD; and Brian Tiplady, PhD

Dr. Gwaltney is with Gwaltney Consulting, Westerly, Rhode Island (with ERT Inc. during the time of this study); Dr. Mundt is with ePRO Research Consulting, LLC, Sauk City, Wisconsin; Dr. Greist is Professor Emeritus of Psychiatry, University of Wisconsin School of Medicine and Pulic Health, Madison, Wisconsin; Dr. Paty is with Quintiles Advisory Services at QuintilesIMS, Pittsburgh, Pennsylvania; and Dr. Tiplady is an honorary fellow at Edinburgh University Medical School, Scotland, UK (with ERT Inc. during the time of this study).

Innov Clin Neurosci. 2017;14(3–4):17–23.

Funding: This study was sponsored by ERT, Inc., Philadelphia, Pennsylvania.

Financial disclosures: Dr. Gwaltney was an employee of ERT Inc., Philadelpia, Pennsylvania, at the time of this study. Drs. Mundt and Greist are shareholders in Healthcare Technology Systems, Madison, Wisconsin,which receives royalty payments for licensing of the eC-SSRS; Dr. Tiplady was an employee of eResearch Technology, Ltd., Petersborough, UK at the time of this study and is a holder of AstraZeneca stock.

Key words: Suicidal ideation and behavior, Columbia-Suicide Severity Rating Scale, C-SSR, electronic patient-reported outcomes, interactive voice response, tablet computer, equivalence

Abstract: Objectives: Our study objective was to compare the equivalence of a new version of the electronic Columbia-Suicide Severity Rating Scale that was administered on a tablet device with the existing interactive voice response version in order to support the prospective monitoring of suicidal ideation and behavior in clinical trials and clinical practice.
Design: This was a randomized, crossover-equivalence study with no treatment intervention.

Setting: The study setting was a psychiatric hospital.
Participants: Fifty-eight recently admitted psychiatric inpatients and 28 employees of the hospital site were included in the study. Mean age was 41.0 years (standard deviation=12.5), and 59 percent were female.

Measurements: Participants completed both tablet and interactive voice response versions in randomized order, with a 25-minute break between administrations. Finally, participants completed a second administration of the f

irst administered version. Intraclass correlation coefficients (ICCs) and Kappa coefficients were used to evaluate agreement across modalities.

Results: High levels of agreement were observed for most severe lifetime (ICC=0.88) and recent (ICC=0.79) ideation, occurrence of actual lifetime (Kappa=0.81) and recent (Kappa=0.73) suicide attempts, and occurrence of lifetime interrupted attempts (Kappa=0.78), aborted attempts (Kappa=0.54), and preparatory behaviors (Kappa=0.77), as well as non-suicidal self-injurious behavior (Kappa=0.73). Scores from both modes significantly differentiated psychiatric patients and hospital employee controls, and the test-retest reliability of both modes was excellent.

Conclusion: These results support the validity and reliability of the new tablet-based electronic Columbia-Suicide Severity Rating Scale. This will allow the inclusion of the electronic Columbia-Suicide Severity Rating Scale in a wider range of clinical studies, particularly where a tablet is also being used to collect other study data.

Introduction

Prospective assessment of suicidal ideation and behavior (SIB) is a critical step in many clinical trials. The United States Food and Drug Administration (FDA) mandates prospective SIB in “all clinical trials involving any drug being developed for any psychiatric indication, as well as for all antiepileptic drugs and other neurologic drugs with central nervous system (CNS) activity, both inpatient and outpatient, including multiple-dose Phase 1 trials involving healthy volunteers.”1 Systematic SIB monitoring not only can protect patients who are participating in clinical trials, it also allows for an estimation of the SIB risk associated with new treatments should they receive marketing approval.
There are several methods for assessing SIB. One option involves the use of broad assessments of mood and behavior that include SIB questions or subscales (e.g., Beck Depression Inventory,2 Hamilton Rating Scale3). Alternatively, SIB-specific assessments have been developed for use in clinical trials and practice.

The most widely used SIB-specific measure is the Columbia Suicide Severity Rating Scale (C-SSRS).4 The original C-SSRS is a paper and pencil measure that is completed by a clinician through a semi-structured interview with a patient. An electronic version of the measure—the eC-SSRS—was developed as an interactive voice response (IVR) system alternative that is completed directly by the patient through a structured, standardized, automated interview. In this modality, the subject calls a dedicated phone line and hears a recorded script with instructions and questions. The subject responds by pressing the appropriate keys on the telephone touch pad (e.g., 1 for Yes, 2 for No). The electronic version has several unique strengths, including directly representing the patient’s view, rather than a clinician interpretation of the patient,5 not requiring clinical staff time and effort for all interviews, and consequently allowing staff to focus on patients who are at highest risk and who are most in need of medical care and attention.

The eC-SSRS has been administered over 75,000 times in Phase 2 and 3 clinical trials.6 This primarily includes trials in psychiatric clinical populations (e.g., major depressive disorder [MDD], posttraumatic stress disoder [PTSD], and generalized anxiety disorder [GAD]. However, the eC-SSRS has also been widely used in non-psychiatric trials, as well (e.g., chronic obstructive pulmonary disease, epilepsy, fibromyalgia). The IVR version of the eC-SSRS provides data that are consistent, but not redundant, with data from the clinician-administered CSSRS. As with other sensitive areas where patients may report more to computers than clinicians, patients report more SIB on the eC-SSRS than on the C-SSRS.7,8

Although the IVR version of the e-CSSRS is widely used, it has some potential limitations. First, as other assessments are rarely completed through IVR, it decreases clinical site efficiency by requiring them to provide a telephone to patients for the exclusive purpose of completing the eC-SSRS. It would be ideal to have an option to administer the eC-SSRS via an electronic platform that may also be used for other assessment purposes. Second, patients may prefer visual, text-based modalities (e.g., tablet, laptop) to the auditory IVR mode. This may be a simple personal preference for one or another mode. Alternatively, preference could occasionally be driven by a clinically relevant factor, such as hearing problems, that would threaten the validity of the eC-SSRS data. For these reasons, a new text-based version of the eC-SSRS was developed as an alternative to the IVR version.

It would be ideal to provide patients and sites with the option to use either eC-SSRS solution within a single clinical trial. This would require pooling of text-based and IVR-based data within a single trial. The use of both options across trials would also require data pooling, in order to examine wider trends in SIB in a therapeutic area or treatment class. Pooling of data within and across trials requires that the multiple modes of administration produce equivalent data (i.e., that the administration modes do not introduce error or bias into the SIB data).9 Although there is substantial evidence supporting the equivalence of paper and electronic versions of patient-reported outcomes (PRO) instruments,10 the equivalence of different electronic modes is less well understood. In particular, the equivalence of IVR and text-based modes is unclear.11 Therefore, the goal of this study was to empirically test the equivalence of the new text-based and original IVR-based eC-SSRS. The assessments were administered to recently admitted psychiatric inpatients and to normal volunteers (employees of the hospital site) in order to explore the broadest range of SIB. A double- crossover, three-period design was used to allow assessment of both agreement between modalities and test-retest reliability within modalities.

Methods

Study design. The study used a three-period, crossover design, with the first two periods using the two modalities (IVR and tablet) in randomized order, and the third period using the same modality as the first period (Table 1). All three periods took place within a single half-day session. There were two intervals, during which subjects completed a distractor task.

Participants. Adult patients were recruited from a single hospital in the United States to participate in the study. Two groups of participants were enrolled: A) individuals who were recently admitted to an inpatient psychiatric ward and B) hospital staff. None of the enrolled hospital staff had any prior experience with the C-SSRS or eC-SSRS. Participants were required to be at least 18 years old, speak English, and be able and willing to provide written informed consent. Participants with dementia, delirium, or psychosis were excluded. Patients were also excluded if they had hearing or vision impairments that were sufficient to cause problems with either of the assessment modalities or if they had received electroconvulsive therapy in the past 28 days. Patients could only begin the study protocol following the approval of their attending physician. The study protocol was approved by an Institutional Review Board, and all patients provided informed consent prior to participation.

Procedures. Hospital staff were recruited through flyers posted in the hospital. Site staff doing chart reviews recruited inpatients. If a patient appeared to be eligible, staff requested that the attending physician allow the project coordinator to approach the patient regarding participation. The project coordinator then met with the patient to describe the study details and respond to any questions and concerns. All chart reviews and other screening activities were completed in accordance with Health Insurance Portability and Accountability Act of 1996 (HIPAA) guidelines.

At the beginning of the testing session, participants were instructed on the use of the ePRO instruments. Following randomization, the participant completed the first eC-SSRS assessment (lifetime assessment, version 2.0) using either the IVR or a text-based (tablet) solution. Following this, participants completed a lexical decision task and were given a refreshment break. The total time for lexical decision task and the break was 20 to 25 minutes. Participants then completed the second eC-SSRS assessment using the other modality (IVR or text-based), and this was followed by another 20- to 25-minute lexical decision task and refreshment break. Participants subsequently completed the third eC-SSRS assessment using the same modality as the first period.

Assessments. All assessments were presented in English.

eC-SSRS. The tablet and IVR versions (provided by ERT, Inc., Philadelphia, Pennsylvania) were derived from the baseline/lifetime version of the CSSRS with recency assessment of six months for suicidal ideation and two years for suicidal behavior.4,6

Lexical decision (word recognition) task. The lexical decision (word recognition) task was presented as a distractor test between eC-SSRS administrations; the data were not included in analysis. A list of pairs of word stimuli was printed on paper. Each pair consisted of a word and a nonword, and the subject identified which one was the real word. Two lists were used, with the words matched for frequency between the lists.

Technology/preference scale. A questionnaire asking about users’ experience with the telephone and tablet modes was completed at the end of the session.

Data management and analysis. Demographic and clinical information were summarized using descriptive statistics.

The eC-SSRS includes ratings of SIB. Suicidal ideation ratings were coded as 1) passive; 2) active: nonspecific, with no method, intent or plan; 3) active: method, but no intent or plan; 4) active: method and intent, but no plan; and 5) active: method, intent, and plan. Suicidal behavior reports were coded as 1) completed suicide (not applicable in this study); 2) suicide attempt; 3) interrupted attempt; 4) aborted attempt; 5) preparatory actions toward imminent suicidal behaviors; and 6) non-suicidal, self-injurious behavior.

The following scores were derived from the SIB reports on the
eC-SSRS:

Most severe lifetime ideation (0–5)
Most severe ideation in past six months (0–5)
Lifetime presence of each type of suicidal behavior (Yes/No)
Presence of each type of behavior in past two years (Yes/No)
Number of lifetime suicide attempts (continuous variable).

The objective of the analysis was to evaluate the level of agreement (i.e., equivalence) across the IVR and text-based modes. For the ideation variables, which are on an ordinal scale, this included calculation of Kendall’s Tau-b and intraclass correlation coefficients12 (ICC; using 2, 1 form from Shrout and Fleiss12). Kappa statistics were calculated for the suicidal behavior binary variables. Equivalence analyses compared the first and second assessments. The target value for “good” agreement is 0.5 or greater for Tau-b (tb), 0.7 or greater for ICC, and 0.6 or greater for kappa.13–15 For comparison purposes, the test-retest reliability of the modes was examined by calculating the ICC for the first and third administrations (ideation only).

Known-groups validity was examined by comparing scores on the two modes from the inpatients and hospital workers. It was expected that the inpatients would report significantly greater ideation and behavior than the hospital workers and that the mode of administration would not moderate this effect. Scores from the first administration of the eC-SSRS (either tablet or IVR) were used in analyses. ANOVA with terms for group and mode and the group x mode interaction were used for ordinal variables. Chi-square tests, including tests for all modes, IVR, and tablet were used for categorical variables.

Results

Participant characteristics. There were 115 participants enrolled in the study. However, data for 25 patients were excluded from analysis due to violations of the assessment randomization protocol. Another four patients were excluded because they were asked about their SIB “since the last assessment” (a variant that is often used when the eC-SSRS is administered to the same patient multiple times) instead of over their lifetime. This yielded a total sample of 86 evaluable patients. Of these, 58 were hospital patients and 28 were hospital employees.

The average age of the sample was 41.0 (standard deviation [SD]=12.5) years, and age did not differ between the patient (median [M]=40.1; SD=12.7) and control (M=43.1; SD=12.0) participants [t(84)=1.06, p=0.29]. There were 51 women and 35 men enrolled in the trial. The sex distribution was significantly different in the patient (28 women, 30 men) and control (23 women, 5 men) groups [c2 (1 df)=8.97, p<0.005]. Seventy-three percent of the total sample were Caucasian, 20 percent were African-American, and five percent were Latino. The racial distribution did not differ across groups [c2 (4 df)=8.16, p=0.09].

Diagnoses varied considerably across patients, and most patients had been given multiple diagnoses. The following broad diagnostic areas were represented in the patient sample: anxiety disorders (generalized anxiety; PTSD; obsessive compulsive disorder [OCD]; mood disorders (MDD; bipolar disorder; and mania, unspecified mood disorder), substance abuse, attention deficit hyperactivity disorder (ADHD), and schizoaffective disorder. Forty-four patients completed the IVR version first, and 42 completed the tablet version first.

Equivalence analyses: suicidal ideation. Most severe lifetime ideation. The average score for most severe lifetime ideation reported on IVR was 2.72±1.99, and the average on the tablet was 2.72±1.96. The correlation between the tablet and IVR scores was 0.87, p<0.001; and the ICC was 0.89, p<0.001. Order of administration did not impact the magnitude of the ICC (IVR completed first—ICC=0.89, p<0.001; tablet completed first—ICC=0.89, p<0.001). The relationship between the two modes of administration was similar to the test-retest reliability for the same administration mode (IVR test-retest—ICC=0.87, p<0.001; tablet test-retest—ICC=0.87, p<0.001). Average scores for hospital patients were significantly higher than scores for hospital staff (Figure 1a): main effect for patient group—F (1, 82]=39.9, p<.001; and this was not moderated by mode of administration: patient group by mode of administration interaction term: F (1, 82)=0.1, ns).

Most severe ideation in past six months. This analysis included 67 participants, as the questions regarding “recent” ideation were only administered if lifetime ideation was endorsed. All patients who reported no lifetime ideation on IVR also reported no ideation on the tablet. Therefore, the analysis is more conservative than if zero values had been imputed for the patients who were not administered the assessment. The correlation between the tablet and IVR scores was 0.69, p<0.001; and the ICC was 0.79, p<0.001. There was an order effect, in that patients who completed the IVR version first had a lower ICC (0.70) than participants who completed the tablet first (0.86). This discrepancy appeared to be due, in part, to one participant whose score changed from a 5 to a 0 and another patient whose score changed from 0 to 5 across the two administrations when the IVR version was administered first. This was also reflected in the test-retest reliability estimates: IVR test-retest—ICC=0.72, p<0.001; tablet test-retest—ICC=0.84, p<.001. Average scores for hospital patients were significantly higher than scores for hospital staff (Figure 1b): main effect for patient group—F (1, 63)=15.1, p<0.001; and this was not moderated by mode of administration: patient group by mode of administration interaction term—F (1, 63)=0.8, ns.

Equivalence analyses: suicidal behaviors. The findings for the suicidal behavior scores are listed in Table 2. The results overwhelmingly demonstrated a high degree of concordance across the two modes of administration. The only equivalence metric that was not above the acceptable threshold was the kappa statistic for the lifetime aborted attempts score (k=0.54). The average kappa score for categorical variables was 0.73. The order of presentation was related to the lifetime aborted attempts and non-suicidal, self-injurious behavior kappas. For lifetime aborted attempts, the kappa when the IVR was administered first was 0.66 and it was 0.42 when the tablet was administered first. For non-suicidal, self-injurious behavior, the kappa was 1.0 when IVR was administered first and 0.48 when tablet was administered first. For the other variables, the difference between the order of administration groups on the equivalence metrics was less than 0.10, and all metrics were above the acceptability threshold.

Table 2 also includes the findings from comparisons of the patient and hospital worker groups on the suicidal behavior variables. As expected, patients reported higher suicidal behavior rates across all variables. The only cases where the statistical analyses did not reveal a statistically significant relationship were instances where few or no hospital workers reported recent behaviors; therefore, the statistical test was either underpowered to detect an effect or impossible to complete (see recent actual attempts, Table 2). For the lifetime variables, the group effect was consistent across the IVR and tablet modes (Table 2; not calculated for recent behaviors due to small sample sizes). Also, as expected, the patients and hospital workers did not differ on non-suicidal, self-injurious behavior.

Figure 2 shows the responses to the questionnaire administered at the end of the study. The majority of users found both modes very easy to use (Figure 2A). This number was rather higher for the tablet, and this was reflected in the preference data (Figure 2B), with the majority of patients preferring the tablet.

Discussion

Prospectively assessing SIB is a critical step in protecting patients and evaluating product safety in many therapeutic areas.1 In order to seamlessly incorporate prospective SIB assessments into clinical trials, it is important to have several different administration options and platforms that can be matched to particular patients or sites. This matching can be done based on preference (e.g., patients prefer to use a text-based platform for privacy reasons) or clinical need (e.g., a patient is unable to read, so an IVR administration or clinician assessment is needed). This study examined the equivalence of two eC-SSRS administration modes—an IVR system that has been evaluated in previous studies5,6 and a novel text-based version that is completed by the patient. The data strongly support the equivalence of the two modes.

The eC-SSRS includes measures of lifetime and recent SIB. Across all of these question types, the IVR and text-based modes of administration were remarkably similar. Indeed, the average scores for lifetime suicidal ideation were almost identical across the two modes. One of the most stringent tests of equivalence is to compare the agreement between modes to the test-retest reliability of the original mode.10 The design of the current study allowed for this by administering either the IVR or text-based version twice during the study protocol. The ICCs for the different modes was almost identical to the single-mode test-retest ICCs for the ordinal measures of most severe lifetime and past six month ideation. This is compelling evidence for the equivalence of the two modes of administration.

Another novel aspect of the study was the comparison of responses from the patient and hospital staff subgroups. Patients were expected to report significantly higher levels of SIB than staff. This was observed across all lifetime and recent SIB variables. The difference between the groups was non-significant only for the non-suicidal, self-injurious behavior variable, which includes behaviors that were intended to cause harm but not result in suicide (e.g., actions used to relieve stress, feel better, get sympathy, or get something else to happen16). Endorsement rates for this variable were low in both groups.

The patient/hospital staff SIB differences were observed across the IVR and tablet modes. The statistical significance of the group comparison differed across modes in only one instance—for lifetime aborted attempts—and even in this case the non-significant finding was at a statistical trend level (p<0.10). Additionally, the non-significant result was observed with the IVR administration, which has been tested in other studies and found to be comparable to clinician C-SSRS ratings.5 The novel text-based administration discriminated the groups in all tests where there were enough patients to conduct a reliable test (all lifetime variables). These data also support the equivalence of the modes and provide additional known-groups validity data to support the eC-SSRS more generally.

There was some modest evidence for order effects in the data—in two cases the equivalence metrics were higher when the IVR mode was administered first, and in one case the metric was higher when the tablet was administered first. Since there was no consistent advantage for one order of presentation over another and no reason to expect one order to produce superior results, we do not believe that these findings impact the overall conclusions of the study.

Users’ acceptance of both modes was high. While users generally preferred the tablet, both modes were rated very easy to use by the majority of users.

Limitations. The study was limited in some respects. Approximately 25 percent of the enrolled sample was excluded from the analysis because of randomization violations or errors in the administration of the eC-SSRS. This reduced the sample size, and, if anything, made it more difficult to identify differences between the patient and hospital worker groups. It would have also made the equivalence metrics less reliable (e.g., increased the confidence interval around the ICCs). However, differences between the groups were still observed, and the equivalence metrics were robust and statistically significant. Although the exclusion of these patients did not seem to adversely impact the overall results of the study, having additional patients may have increased the ability to test group differences for recent suicidal behaviors; with more patients contributing data, there may have been more reports of recent suicidal behaviors among the hospital workers. Additionally, the original clinician-administered CSSRS was not included in the study as a comparator. However, the IVR version has already been shown to be equivalent to the clinician version;5 therefore, we can transitively draw the conclusion that all three versions produce equivalent data.

Conclusion

Accurately assessing SIB is a critical part of many clinical trials. The C-SSRS and eC-SSRS are recommended by FDA as valid and reliable assessments of SIB.1 It is important to provide as many mode of administration options as possible, so that all stakeholders—clinical sites, investigators, clinicians, and patients—have access to modes that meet their unique needs and preferences. This study provides empirical support for the use of the patient-reported text-based eC-SSRS in clinical trials. The study further demonstrates that multiple modes could be used in the same study and that data can be pooled across modes. This provides and important new eC-SSRS option that can advance the systematic measurement of SIB.

References

FDA Guidance for Industry. Suicidal Ideation and Behavior: Prospective Assessment of Occurrence in Clinical Trials, 2012. US Food and Drug Administration site. http://www.fda.gov/downloads/Drugs/…/Guidances/UCM225130.pdf. Accessed 14 April 2016.
Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–571.
Hamilton M. Development of a rating scale for primary depressive illness. Br J Soc Psychol. 1967;6:278.
Posner K, Brent D, Lucas C, et al. Columbia-Suicide Severity Rating Scale (C-SSRS), 2009. The Columbia Lighthouse Project site. http://cssrs.columbia.edu/the-columbia-scale-c-ssrs/about-the-scale/. Accessed 14 April 2016.
Mundt JC, Greist JH, Gelenberg AJ, et al. Feasibility and validation of a computer-automated Columbia-Suicide Severity Rating Scale using interactive voice response technology. J Psychiatr Res. 2010;44(16):1224–1228.
Greist JH, Mundt JC, Gwaltney CJ, et al. Predictive value of baseline electronic Columbia-Suicide Severity Rating Scale (eC-SSRS) assessments for identifying risk of prospective reports of suicidal behavior during research participation. Innov Clin Neurosci. 2014;11(9-10):23–31.
Greist JH, Gustafson DH, Stauss FF, et al. A computer interview for suicide-risk prediction. Am J Psychiatry. 1973;130(12):1327–1332.
Hesdorffer DC, French JA, Posner K, et al. Suicidal ideation and behavior screening in intractable focal epilepsy eligible for drug trials. Epilepsia. 2013;54(5):879–887.
Coons SJ, Gwaltney CJ, Hays RD, et al. Recommendations on evidence needed to support measurement equivalence between electronic and paper-based patient-reported outcome (PRO) measures: ISPOR ePRO Good Research Practices Task Force report. Value in Health. 2009;12(4):419–429.
Gwaltney CJ, Shields AL, Shiffman S. Equivalence of electronic and paper-and-pencil administration of patient-reported outcome measures: a meta-analytic review. Value in Health. 2008;11(2):322–333.
Eremenco S, Coons SJ, Paty J, et al. PRO data collection in clinical trials using mixed modes: report of the ISPOR PRO Mixed Modes Good Research Practices Task Force. Value in Health. 2014;17(5):501–516.
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychologic Bull. 1979;86:420–428.
Fredricks GA, Nelsen RB. On the relationship between Spearman’s rho and Kendall’s tau for pairs of continuous random variables. J Stat Plan Inference. 2007;137(7):2143–2150.
Lohr KN, Aaronson NK, Alonso J, et al. Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clin Ther. 1996;18(5):979–992.
Fleiss JL. Statistical Methods for Rates and Proportions. New York: Wiley, 1973.
Posner K, Oquendo MA, Gould M, et al. Columbia Classification Algorithm of Suicide Assessment (C-CASA): classification of suicidal events in the FDA’s pediatric suicidal risk analysis of antidepressants. Am J Psychiatry. 2007;164(7):1035–1043.