By Farah Tokmic, PhD; Mirsad Hadzikadic, PhD; James R. Cook, PhD; and Oleg V. Tcheremissine, MD
Drs. Tokmic and Hadzikadic are with the Department of Software & Information Systems at the University of North Carolina, in Charlotte, North Carolina. Dr. Cook is with the Department of Psychology at the University of North Carolina, in Charlotte, North Carolina. Dr. Tcheremissine is with the Department of Psychiatry and Behavioral Sciences Carolinas HealthCare System in Charlotte, North Carolina.
Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Disclosures: Dr. Tcheremissine has received research support from Eli Lilly, Forum Pharmaceuticals, Lundbeck, Janssen, and Roche; has received honoraria for speaking activities from Otsuka/Lundbeck and Alkermes; and has served on advisory boards for Otsuka, Takeda, Janssen, and Lundbeck. The other authors have no conflicts of interest relevant to the content of this article.
Innov Clin Neurosci. 2018;15(5–6):34–42
Abstract: Objective: Given the growing public health importance of measuring the change in mental health stigma over time, the goal of this study was to demonstrate the potential for using machine learning as a tool to analyze patterns of social stigma as a complement to traditional research methods. Methods: A total of 1,904 participants were recruited through Sona Systems, Ltd (Tallinn, Estonia), an experiment management system for online research, to complete a self-reported survey. The collected data were used to develop a new measure of mental (behavioral) health stigma. To build a classification predictive model of stigma, a decision tree was used as the data mining tool, wherein a set of classification rules was generated and tested for its ability to examine the prevalence of stigma. Results: A three-factor stigma model was supported and confirmed. Results indicate that the measure is content-valid and internally consistent. Performance evaluation of the machine learning-based classification algorithm revealed a sufficient inter-rater reliability with a predictive accuracy of 92.4 percent. Conclusion: This study illustrates the potential for applying machine learning to derive a data-driven understanding of the extent to which stigma is prevalent in society. It establishes a framework for the development of an index to track stigma over time and to assist healthcare decision-makers with improving the health of populations and the experience of care for patients.
Mental illness constitutes 13 percent of the global burden of disease and is projected to be 15 percent by 2020.1 It affects a significant portion of the United States population. Nearly half of all Americans are estimated to meet the criteria for a mental health disorder sometime in their lifetime.2 One study estimated that 90 percent of individuals who committed suicide suffered from one or more treatable or temporary mental—or behavioral—health disorder(s).3 The economic costs associated with mental health disorders are significantly high. An estimated $201 billion is spent on the treatment of mental illness, compared to $79 billion a decade ago, making mental health disorders the most expensive group of medical conditions to treat.4 In 2015, the total cost of suicide was estimated to be $93.5 billion.3
More than half of American adults who experience a mental illness do not get treatment.5 According to the United States Surgeon General and the World Health Organization, stigma is one of the key barriers to successful treatment engagement, including seeking and sustaining participation in services, employment opportunities, and access to social support activities.6–9 Those with behavioral health issues—termed behavioral health consumers—often tend to be in a state of personal distress, and many of them avoid treatment for fear of being socially disgraced or stigmatized by the general public.2,10
The word stigma can be traced back to Ancient Greece and literally meant “brand.” Use of the word referred to the practice of branding slaves who were caught after attempting to escape.11 Since then, the meaning of stigma has evolved to include any apparent or inferred condition an individual might have that is divergent from social norms. Most explicitly, Erving Goffman defines stigma as “the situation of the individual who is disqualified from full social acceptance” and can be experienced or anticipated by the exclusion, rejection, or devaluation of a person or group.12 It manifests when the power of societies labels a person as deviant from what is closest to the “norm.”
There are two main overarching behavioral health stigma types: public stigma and self-stigma.13 Public stigma refers to the attitudes and beliefs held by the general public toward consumers, while self-stigma occurs when consumers endorse the negative public attitudes assigned to them and perceive themselves as less adequate or inferior to others.8,14–16 The awareness of public stigma (i.e., perceived stigma) initiates the formation of a person’s own discriminatory attitudes (i.e., personal stigma among the general public and self-stigma among consumers) and stereotype endorsement (i.e., endorsed stigma).17 In this study, we chose to examine personal stigma over self-stigma because personal stigma applies to everyone in public, regardless of whether they are behavioral health consumers.
Efficient and systemic measurement of stigma is key to improving behavioral health treatment of populations. The standard practice for gauging people’s perceptions toward behavioral health disorders involves traditional self-reporting techniques, most of which use a Likert-type scale consisting of an average of 20 items.18,19 Such a large number of items is likely to increase respondent fatigue, measurement error, and misclassification.
Machine learning has gained interest in the healthcare domain as a method of making processes more efficient and effective. Building models to predict suicidality is one such example of using artificial intelligence to implement early interventions and help prevent suicides.20 Machine learning algorithms have also been used to establish clear indices for diagnosing and improving the treatment of mental health disorders.21,22 Here, we report the use of machine learning to examine behavioral health stigma.
Study objectives
The two main goals of this study were to 1) develop a relatively short and easy-to-administer psychometric instrument that measures stigma and 2) demonstrate the possibility of complementing traditional self-reporting techniques with machine learning to predict the occurrence of behavioral health stigma. This work serves as a preparatory step for the development of a stigma index to track the levels of stigma over time and across social contexts.
Methods
Survey instrument. Directly measuring stigma by observation would be prohibitively expensive, as it would rely on training a large crew of evaluators to assess stigmatizing behaviors and require a large number of records. A strategy commonly adopted, particularly in healthcare settings, is the use of an inexpensive and easy-to-obtain proxy measure. In this study, a self-reported survey was used that included a five-point Likert scale and a binary proxy question.
The Likert-scaled items were developed as an adaption of the Explanatory Model Interview Catalogue (EMIC) and the Participation domain of the International Classification of Functioning (ICF) to cover stigma components related to relationships (e.g., collegial, social, friendship, family, marriage), respect, and employment opportunities. EMIC was developed to elicit illness-related perceptions and beliefs. Part of EMIC, the Stigma Scale, assesses community-perceived stigma and discrimination that relates to a condition.23 The ICF provides a standard framework language for the description of health-related states. The Participation domain focuses on the involvement of individuals in a life situation at the community, social, and civic aspects of social life.24
The five-point Likert scale ranged from 1 to 5 (1=strongly agree, 2=agree, 3=neutral, 4=disagree, and 5=strongly disagree). Personal stigma items began with “I” followed by an example of a discriminatory behavior. Items measuring perceived stigma began with “Most people” or “In my community/family/place of employment.” Items measuring endorsed stigma consisted of general stereotypic statements such as “Having a behavioral health disorder is a problem for a person to get married.” A complete list of the original set of items is shown in Appendix A.
The fear that behavioral health consumers could be dangerous or violent is at the core of stigma. Some behavioral health consumers have been described as “homicidal maniacs” who should be feared. Emory Bogardus, a prominent figure in American sociology, is the creator of the “Bogardus Social Distance Scale” and the first to examine affective social distance as a proxy measure of a stigmatizing behavior.25,26 This type of social distance focuses on an individual’s affectivity—or the experience of a negative emotion as it relates to one’s consciousness. This measure is at the center of the feeling reactions that people have toward others. In our study, the following binary question was used as a proxy measure of a stigmatizing behavior, as it measures people’s affective social distance reaction toward behavioral health consumers and does not measure their actual behavior: “When you find yourself near someone who has a behavioral health disorder, do you fear for your safety?”
The survey instrument was approved by an institutional review board prior to administration.
Sample and administration. The survey instrument was deployed at a public research university located in North Carolina. It was electronically available to participants through SONA Systems, Ltd. (Tallinn, Estonia), an experiment management system for online research study participation. Study recruitment efforts were directed at college students who received research credits upon completing the survey. All participants were presented with a consent form describing the aim of the study and were asked to indicate their level of agreement with each of the items.
The data were collected over two academic semesters (fall and spring). Data collected during the spring semester (Study 1) were used for the purpose of minimizing the number of Likert-scaled items in order to create an easy-to-administer stigma scale. Machine learning techniques were then applied to the data collected during the fall semester (Study 2) for the purpose of developing a classification decision tree to analyze patterns of social stigma. Data collection was completely anonymous—there were no personal records about the participants.
Analysis. All analyses were performed using SPSS version 23 for Windows (IBM Corp., Armonk, New York, USA); StataSE version 14 (Stata Corp LLC, College Station, Texas, USA); and WEKA version 3.8 (The University of Waikato, Hilcrest, New Zealand).
First, an exploratory factor analysis (EFA) was conducted to evaluate the factor structure underlying the set of the Likert-scaled items. EFA is a data-reduction technique most commonly used to identify existent relationships among measured variables and to aid in examining the internal reliability of a measure.27 In this study, EFA assisted with selecting the minimum number of items required to retain the internal reliability of an easy-to-administer scale that measures stigma. Stopping rules were followed to conduct the analysis and interpret the results. Maximum likelihood extraction with an oblique rotation was used to allow for the factors to correlate, as anticipated.
Second, the sensitivity and specificity were computed across threshold values on the three subscales within the psychometric scale developed in Step 1 that measured three domains of stigma (Personal, Perceived, and Endorsed stigma). The receiving operating characteristic (ROC) curve is a diagnostic effective performance metric of accuracy that plots the sensitivity (true-positive rate [TPR]) versus specificity (false-positive rate [FPR]) and is mainly applied in healthcare settings.28 ROC curves are used to select an optimal threshold for a classifier that maximizes the true positives while minimizing the false positives. In this study, ROC curves were used to find the classifier value on each of the three subscales in order to predict the likelihood of subjects having a stigmatizing behavior. The proxy measure of a stigmatizing behavior was used as the reference variable.
Third, to build a classification decision tree, a MetaCostSensitive algorithm with the J48 as the classifier was adopted as the learning algorithm to account for any imbalance in the data. The model was built by classification trees because 1) results are relatively easier to interpret with the tree visualization and 2) the model generates a dynamic and automated classification based on participants’ answers to the psychometric scale items. Feature selection was performed to improve the classification accuracy by identifying and removing irrelevant and/or redundant attribute(s) from the data that could decrease the accuracy of the model. The information gain attribute evaluator was used to rank all of the features in the data set and aid in selecting the most influential attributes that have the highest information gain with respect to the class and contribute the least to the overall entropy. To avoid the use of the same data set when testing the performance of the final classifier, the classification accuracy was estimated using a 66-percent split in which 66 percent of the instances were training data and the remaining 34 percent were test data.
Results
Demographics. Table 1 shows the demographic characteristics of the total number of respondents (interchangeably referred to as participants) who answered the survey. Given that the data were collected in a college institution, it was expected that most of the participants were aged between 18 years and 22 years and were not married. Out of the total sample size, a higher percentage of respondents who completed the survey during Study 1 indicated to have a family member or a friend who was a behavioral health consumer.
Developing an easy-to-administer stigma scale. Prior to entering the original set of Likert-scaled items into the factor analysis, items were screened for appropriate item endorsement rates (item mean) and variability (standard deviation). Six items were removed from the item pool because their means were lower than 2.2, but the rest of the items had moderate means (2.2–4 on the 5-point Likert response scale). Corrected item correlations were then computed to assess item discrimination for the retained items. A total of five items were shown to have a weak correlation with rIT (item-total correlation) values below 0.2 and therefore were eliminated.
The initial extraction revealed four factors with an eigenvalue above 1.0, which indicates the presence of, at most, four factors within the model. Four factors were required to account for at least 54 percent of the total item variance. The list of eigenvalues showed a clear “elbow,” suggesting the possibility of the existence of three factors in this model. The degree of simple structure of three separate exploratory factor analyses specifying two, three, and four factors were evaluated. Based on this analysis, the three-factor model was selected as the best-fitting option. The four-factor model was rejected because multiple items had significant nonconceptual cross-loadings. Within the three-factor model, one item was removed, as it had a lower factor loading, while another one was removed because it conceptually loaded on the wrong factor. This resulted in a model that had a minimum of four items loading on each of the three factors. Definitions of the stigma measures are summarized in Table 2.
Following factor analysis and in order to minimize the total number of psychometric items and create a reliable and easy-to-administer scale, the three items that loaded the strongest on each of the three factors were selected. The remaining nine items demonstrated acceptable discrimination with rIT values above 0.3.29 The internal consistency for the overall scale was acceptable, with a Cronbach’s alpha value of 0.73.30 All statistics of the retained items are shown in Table 3. The complete wording of the retained nine items is shown in Appendix B.
Evaluating stigma scale scores. The sensitivity and specificity of potential threshold scores on each of the three subscales were evaluated to select the ones that generated an optimal classification of participants based on their responses to the scale items. The binary question was held as the reference variable. ROC curves for “personal stigma”, “perceived stigma”, and “endorsed stigma” are shown in Figures 1 to 3. Table 4 illustrates a detailed report of this evaluation; the optimal threshold scores selected for each subscale are highlighted in the blue rows in Table 4. The maximum score that a participant could obtain on each subscale was 15 (3 items in each subscale, each ranging from 1–5 points), with a higher score indicating a more stigmatizing attitude compared to a lower score.
A score of 9 was selected as the optimal threshold on the personal stigma subscale (a sensitivity of 61.41% and a specificity of 63.58%) and the perceived stigma subscale (a sensitivity of 52.17% and a specificity of 63.58%). A score of 11 was selected as the optimal threshold score on the endorsed stigma subscale (a sensitivity of 66.30% and a specificity of 59.81%).
As a result of this analysis, participants who obtained a score of 9 or higher on either the personal or perceived stigma subscale were classified as more likely to have a stigmatizing behavior toward behavioral health consumers than those with lower scores. Similarly, participants who scored 11 or higher were classified as more likely to have a stigmatizing behavior toward behavioral health consumers versus those with lower scores.
The results of this binary classification were used for the purpose of building a decision tree algorithm model that classifies individuals based on their individual subscale scores as well as their overall scale score. The majority (92%) of respondents who answered “yes” to the binary proxy measure had a higher score than the optimal threshold score on at least one of the three subscales. Consequently, for a participant to be classified as likely to have a stigmatizing behavior, he or she likely scored higher than the optimal threshold on at least one of the three subscales.
Building a decision tree model of stigma. Feature selection was performed to select the attribute(s) responsible for decreasing the predictive accuracy of the classification algorithm. The results showed four items to be the least-ranked as contributing to the predictability of the model. Item 3 was removed because it reduced the total accuracy of the model. This lowered the total number of scale items to eight. Table 5 illustrates a detailed report of the classifier output. Based on the results obtained, the predictive accuracy is approximately 92.4 percent, with 7.6 of the instances incorrectly classified.
The kappa statistic of 0.81 indicated a sufficient inter-rater reliability.31 This measure compares the observed and expected accuracies. The closer it is to a value of “+1,” the greater the inter-rater agreement between the two. The detailed report of the accuracy by class indicates a high TPR of 0.92 and a low FPR of 0.11. The confusion matrix in Figure 4 describes the performance of the classification model. Seventeen participants were incorrectly classified as not likely to be have a stigmatizing behavior and 15 participants were incorrectly classified as likely to have a stigmatizing behavior.
Figure 4 is a visualization of the J48 decision tree classification. The algorithm finds the normalized information gain ratio from splitting on each of the attributes (scale items). It creates a decision node that splits on the attribute that generates the highest normalized information gain (Item 23 in this model). Then, it recurs the sublists obtained by splitting on this attribute and adds these nodes as its children. Next, the algorithm looks again for the next attribute that generates the highest normalized gain among its child nodes and iteratively repeats the process until no attribute is left.
An example of how the resulting decision tree functioned at classifying respondents is illustrated as follows: If a participant’s response to Item 27 was 3 or less (“Neutral”), the algorithm would assess the response to Item 10. If this value was 2 or less (“Agree”), the algorithm would assess the response to Item 28. If this value was greater than 3 (“Neutral”), the algorithm would assess the response to Item 16. If this value was greater than 2 (“Agree”), the algorithm would classify participants as more likely to have a stigmatizing behavior. As illustrated in this example, a participant was classified as likely to have a stigmatizing behavior based on only four out of the eight (total) number of items.
Discussion
The motivation behind this work was to quantify behavioral health stigma efficiently based on the least number of psychometric items possible, to increase the willingness of participants to complete the survey with integrity and improve the accuracy of the data collected. Although machine learning has been used to help with diagnosing behavioral health disorders and predicting suicidality, it has not been used to assess behavioral health stigma levels. The findings of the current study illustrate the potential for using machine learning to address stigma measurement. In particular, the use of a classification decision tree analysis is comprehensive in nature and provides a graphical framework for analyzing the logic flow of the decision process to classify someone as likely or not likely to have a stigmatizing behavior. This model offers a dynamic approach to creating a customized path for each respondent when completing the survey. Consequently, not every participant is required to answer all of the items within the scale and in the same order. This work is a starting point for the development of an index that aggregates the responses from participants to the scale items into a single indicator that detects the overall stigma level within communities. Identifying a measurable total level of stigma within a specific geographical area can potentially assist policymakers with redirecting financial resources toward community-based services that are likely to reduce stigma.
A number of stigma-reduction initiatives have been developed in the United States increase the access to behavioral healthcare.32 Nevertheless, stigma still exists. The level of effectiveness such interventions have had on reducing the stigma associated with mental illness is not clear due to the absence of an index to monitor the change in stigma over time.
Seventy percent of all healthcare visits are related to psychosocial disorders.33 One strategy adopted by healthcare facilities to improve the quality of and access to behavioral healthcare is integrating behavioral health into primary care settings through the training of healthcare providers to use non-discriminatory, evidence-based practices with patients. Nevertheless, some healthcare providers perceive behavioral health consumers to be less deserving of care, annoying, and manipulative with suicidal urges.34 Forecasting the level of stigma in clinical settings could help to improve the access to a compassionate and respectful care.
Computing an index that identifies the overall level of stigma in individual communities could make it possible to monitor stigma levels over time. Such an indicator could ultimately be used to inform and guide health policy and health program decision-making on investments in stigma-reducing interventions. An example of a similar measure is the Consumer Sentiment Index (CSI), a statistical measure of the attitudes of consumers toward the economy’s overall health in the United States. Based on only five psychometric items, this index computes people’s perceptions into one indicator and monitors a complex system, such as the economy over time.35 The value of such an index relies on its ability to help retailers, economists, and investors monitor where the economy is headed in the United States. In fact, its rise and fall has historically helped predict economic expansions and contractions, (e.g., the 2001 and 2008 economic recessions).
Limitations. A number of limitations in this study outline room for future work, especially in terms of the representativeness of the sample. The target population mainly consisted of college students, which poses a potential issue as to whether the findings are a good representation of society and can be generalized across the United States population. Additionally, the findings are based on survey questions, which inherently have risk of response bias, including self-selection, sample composition, and social desirability.
Going forward, it would be desirable to more broadly administer the survey and reach out to other population samples, including non-student members of the general population and healthcare providers within primary care and psychiatry settings. It would also be worthwhile to analyze the impacts that other potential variables, such as sex and race, might have on the responses of participants.
Conclusion
Applying machine learning decision tree analysis when administering a psychometric measure to gauge people’s perceptions toward mental illness offers possibilities for a new approach to interactively address the measurement of mental health stigma. Monitoring the change in stigma over time can help healthcare and public health organizations improve policy decisions for the implementation of patient-centered care practices and social inclusion of behavioral health consumers.
Acknowledgments
This research was supported by the Department of Psychiatry and Behavioral Sciences at Carolinas HealthCare System and the University of North Carolina at Charlotte.
References
- World Health Organization. The World Health Report 2001 – Mental Health: New Understanding, New Hope. Available at: http://www.who.int/whr/2001/en/whr01_en.pdf?ua=1. Accessed June 19, 2018.
- Kessler R, Mickelson K, Williams D. The prevalence, distribution, and mental health correlates of perceived discrimination in the United States. J Health Soc Behav. 1999;40(3):208–230.
- Shepard D, Gurewich D, Lwin A, et al. Suicide and suicidal attempts in the United States: costs and policy implications. Suicide and Life-Threatening Behav. 2015;46(3):352–362.
- Roehrig C. Mental disorders top the list of the most costly conditions in the United States: $201 billion. Health Aff. 2016;35(6):1130–1135.
- National Alliance on Mental Illness. Mental Health by The Numbers. Available at: https://www.nami.org/Learn-More/Mental-Health-By-the-Numbers. Accessed January 3, 2017.
- Office of the Surgeon General. Mental Health: a report of the Surgeon General. Rockville, MD: US Department of Health and Human Services; 1999.
- Corrigan P, River L, Lundin R, et al. Three strategies for changing attributions about severe mental illness. Schizophr Bull. 2001;27(2):187–195.
- Corrigan P. How stigma interferes with mental health care. Am Psychol. 2004;59:614–625.
- The International Classification of Functioning, Disability and Health. 2003, World Health Organization (WHO).
- Regier D, Farmer M, Rae D, et al. One-month prevalence of mental disorders in the United States and sociodemographic characteristics: the epidemiologic catchment area Study. Acta Psychiat Scand. 1993;88(1):35–47.
- Funk C. Thereby Hangs a Tale—Stories of Curious Word Origins. Redditch, England: Read Books, Ltd.; 2013.
- Goffman E. Stigma. Englewood Cliffs, NJ: Prentice-Hall; 1963.
- Corrigan P. On the Stigma of Mental Illness. Washington DC: American Psychological Association; 2005.
- Corrigan P. The impact of stigma on severe mental illness. Cogn Behav Pract. 1998;5(2):201–222.
- Link B. Understanding labeling effects in the area of mental disorders: an assessment of the effects of expectations of rejection. Am Sociol Rev. 1987;52(1):96.
- Link B, Phelan J. Conceptualizing stigma. Annu Rev Socio. 2001;27:363–385.
- Corrigan P, Watson A, Barr L. The self-stigma of mental illness: implications for self-esteem and self-efficacy. Br J Soc Psychol. 2006;25(8):875–884.
- Yang L, Link B. Measurement of attitudes, beliefs, and behaviors of mental health and mental illness. Washington DC: National Academy of Sciences; 2016.
- Brohan E, Slade M, Clement S, et al. Experiences of mental illness stigma, prejudice and discrimination: a review of measures. BMC Health Serv Res. 2010;10:1–11.
- Walsh C, Ribeiro J, Franklin J. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Scie. 2017;5(3):457–469.
- Iwabuchi S, Liddle P, Palaniyappan L. Clinical utility of machine-learning approaches in schizophrenia: improving diagnostic confidence for translational neuroimaging. Front Psychiatry. 2013;4:95.
- Salvador R, Radua J, Canales-Rodríguez E, et al. Evaluation of machine learning algorithms and structural features for optimal MRI-based diagnostic prediction in psychosis. Plos One. 2017;12(4):e0175683.
- Weiss M, Doongaji D, Siddhartha S, et al. The Explanatory Model Interview Catalogue (EMIC). Contribution to cross-cultural research methods from a study of leprosy and mental health. Br J Psychiatry. 1992;160:819–830.
- The International Classification of Functioning, Disability and Health. Geneva, Switzerland: World Health Organization; 2003.
- Bogardus E. Essentials of Social Psychology. Los Angeles, CA: University of Southern California Press; 1918.
- Penn D, Kohlmaier J, Corrigan P. Interpersonal factors contributing to the stigma of schizophrenia: social skills, perceived attractiveness, and symptoms. Schizophr Res. 2000;45(1–2):37–45.
- Kim J, Mueller C. Introduction to factor analysis: what it is and how to do it. Contemp Sociol. 1980;9:4-562.
- Hanley J, McNeil B. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.
- Nunnally J. Psychometric Theory. New York, NY: McGraw-Hill; 1967.
- Streiner D. Starting at the beginning: an introduction to coefficient alpha and internal consistency. J Pers Assess. 2003;80(1):99–103.
- Fleiss J. Statistical Methods for Rates and proportions. 2nd ed. New York, NY: John Wiley; 1981.
- Corrigan P, Morris S, Michaels P, et al. Challenging the public stigma of mental illness: a meta-analysis of outcome studies. Psychiatr Serv. 2012;63(10):963–973.
- Substance Abuse Mental Health and Mental Health Services. Prevention and Behavioral Health. Available at: https://www.samhsa.gov/capt/practicing-effective-prevention/prevention-behavioral-health. Accessed August 28, 2016.
- Lewis G, Appleby L. Personality disorder: the patients psychiatrists dislike. Br J Psychiatry. 1988;153:44–49.
- Surveys of Consumers. Available: https://data.sca.isr.umich.edu. Accessed February 12, 2016.