ICNSl logo no-tagline copy

PEER REVIEWED, EVIDENCE-BASED INFORMATION FOR CLINICIANS AND RESEARCHERS IN NEUROSCIENCE

Validation of a Pediatric Cognitive Assessment Tool to Advance Knowledge on Children’s Cognitive Development, Health Risk Factors, and Health-promoting Interventions in Sub-Saharan Regions

Innov Clin Neurosci. 2025;22(10–12):33–51.

by Franco Di Cesare, MD; Cristiana Di Carlo, MPhil; and Leonardo Di Cesare, MD

All authors are with Leoben Research Aurora in San Vincenzo Valle Roveto (AQ), Italy.

FUNDING: Leoben Research S.R.L. sponsored and funded this project.

DISCLOSURES: All authors are Leoben Research Aurora ETS (a nonprofit organization) co-founders.

ABTRACT: BACKGROUND:
Cognitive Assessment Tool for Pediatric Clinical Research (CAT-PCR) is a new brief nonverbal test battery validated in a Zambian school-aged population. CAT-PCR involves the standardized administration of two tests, WAVES and SYMBOLS. CAT-PCR provides measures of visuomotor processing (VMP) and visuographomotor constructional processing (VGCP). The psychometric properties of CAT-PCR were evaluated in a field trial complemented with an ancillary test-retest study. RESULTS: Four-hundred-twenty children, aged 5 to 17 years, 51.2 percent female, speaking Bemba (81.1%), English (72.6 %), Nyanja (22.8%), Tonga (1.9%), and Lozi (1.2%), were recruited at multiple Zambian clinical sites and schools. Children able to speak one, two, and three or more languages composed 23.5, 10.0, and 66.5 percent of participants, respectively. CAT-PCR detected differences in cognitive performance between two comparable subgroups of children with or without conditions associated with increased risk for cognitive impairment. The poorer health group composition reflected a spectrum of conditions such as chronic neurologic or medical diseases (ie, epilepsy, human immunodeficiency virus/acquired immunodeficiency syndrome [HIV/AIDS], recurrent malaria, sickle cell disease, and other cardiac and metabolic conditions) or prolonged exposure to psychosocial stress and deprivation. The reliability was satisfactory. The intraclass correlation coefficient (95% confidence interval) at test-retest at 48 hours (n=86) was 0.85 (0.78, 0.91) for VGCP Index and 0.84 (0.76, 0.90) for VMP Index. CONCLUSION: Study findings support the reliability, validity, and utility of CAT-PCR measures in evaluating the effect of health-related risks on the child’s cognitive functioning and development. Further research should address the validity of CAT-PCR in response prediction, monitoring, and evaluation of health-promoting interventions. Keywords: Pediatric, cognitive assessment, Sub-Saharan, clinical research, child development

Introduction

Cognitive health is a prerequisite for a child’s psychological development and psychosocial functioning in everyday life. The widespread exposure to risk factors for altered cognitive development makes Sub-Saharan children a highly vulnerable health population.1–3 Evidence-based informed health risk mitigation strategies and health-promoting interventions should be adopted to protect and nurture children’s health.

Clinical research is a viable strategy to reduce the knowledge gap between what we know and what we should know better to design and implement effective health interventions for children in Sub-Saharan regions. The wide scope of clinical research embraces categories of studies with different purposes: development of clinical knowledge for detection, diagnosis, and natural history of the disease; evaluation of health-promoting preventive, therapeutic, and rehabilitative interventions; development of vaccines, therapeutics, and diagnostics; behavioral and mental health research; and health services research, epidemiology, and community-based trials.4

The armamentarium of cognitive tests to quantify the impact of adverse risk factors and diseases on children’s cognitive health in Sub-Saharan regions and to evaluate the cognitive outcomes of health-promoting interventions is still grossly inadequate. For example, findings from a recent meta-analytic review highlight the lack of pediatric age-appropriate outcomes sufficiently sensitive to assess anti-malarial treatment effects on cognitive function.5 The availability of valid cognitive outcomes is necessary for achieving definitive evidence on pharmacological preventive treatment effects and promoting informed therapeutic recommendations to protect school-aged children from malaria.5 The inadequacy of measurement methods is an important limiting factor for methodology and informational efficiency (ie, the extent to which information from clinical research translates into new actionable health knowledge) of cognitive pediatric research.

Furthermore, current clinical research practices in Sub-Saharan regions rely on cognitive assessment tools designed for use and standardized in distinct socioeconomic and cultural contexts (ie, North American and European). The use of imported measurement instruments involves the risk of poor cross-cultural validity and fairness, eventually misrepresenting the cognitive abilities of Sub-Saharan children.6–9

Zambia is a paradigmatic example of a Sub-Saharan developing country, where the complex sociocultural landscape is combined with the unmet medical need for more cognitive assessment tools.10 Like other Sub-Saharan countries, Zambia is characterized by a rich diversity of languages. Over thirty major languages are spoken.11 The most widely spoken languages are Bemba (35% of the population), Nyanja (20%), Tonga (12%), and Lozi (6%). The English language is spoken only by about two percent of the population as a first language, but it is the most frequently used second language, as it is taught at schools.11

To date, the Panga Munthu (translated as “Make a Person”) Test is the only instrument that has been developed and standardized on a Zambian pediatric (ages 3–15 years) population.12,13 The ZamCAT, a test battery to evaluate multiple neurocognitive domains for children aged 5 to 6 years,14 and the Object-based Pattern Reasoning Assessment9 have been among the other handful of attempts15–18 to originally develop a cognitive assessment tool for Zambian children.

Zambia relies on a limited number of instruments originally developed in Western countries, then imported to Zambia (ie, Developmental Neuropsychological Assessment NEPSY, Raven’s Progressive Matrices, Maze Test). They have been used in healthcare and research settings.12,13,19,20 However, issues of cross-cultural validity of test measures in use in Zambia are well documented. For example, the testing method is an important source of cultural bias. Test stimuli, test materials, poor familiarity with test demands, as well as response modality differentially affected the performance of Zambian versus non-Zambian children’s groups being evaluated.6,7, 9,13,21 Construct bias occurs when the instrument does not measure a construct in the same way in different cultural contexts. The cross-cultural validation of the construct of intelligence from Western countries to Zambia and other African countries is a pertinent example. A summary score of intelligence, such as intelligence quotient (IQ), measures different constructs rather than, or beyond, intelligence.21–23 Measures from an imported Western cultural intelligence test retain the risk of poor fairness, as they could be discriminating against the Zambian pediatric population as a whole and within groups (ie, children living in rural areas or from diverse ethnic or socioeconomic backgrounds with limited access to quality education).22,24

Notably, even though children with diverse cultural backgrounds may perform differently on cognitive tests, test validity can be adequate if population-appropriate norms are used. In this regard, whether norms derived from a different social-cultural context can be applied to a Zambian population is another critical issue in the debate on cross-cultural validation.10

Test adaptation is a strategy to ensure a satisfactory reduction of systematic bias or error in test scores when applying a test in a sociocultural context other than the one in which the test was originally developed. In their work on the Zambian adaptation of the Developmental Neuropsychological Assessment NEPSY,20,25 Mulenga et al20 examined the applicability of norms developed in the United States. They pointed out that the NEPSY battery was of clinical utility and insensitive to language and cultural factors. Nonetheless, they introduced a cautionary note on NEPSY interpretation to take into consideration the child’s cultural, language, and demographic background information. They also concluded that further research on the NEPSY and other new tests should address the development of norms that are indigenous and more culturally relevant.

Test adaptation entails a multimodular and complex process, which spans from the initial validation of test items and materials as well as the instructions through the final development of local reference norms. Test adaptation is an appealing strategy to introduce more instruments in the Sub-Saharan underresourced healthcare environments,26 and it is perceived as a more affordable process rather than developing original instruments. A recent example of test adaptation is provided by the study of Chernoff et al,27 where the originally Western-developed Kaufman Assessment Battery for Children (KABC) and the Test of Variables of Attention (TOVA) were investigated in a multicenter observational trial in four different Sub-Saharan African countries (ie, Malawi, South Africa, Uganda, and Zimbabwe) in nine different local languages. They found the KABC and TOVA to be valid and reliable when using standardized scores from high-income countries in a study of 611 children living with or without human immunodeficiency virus (HIV), who were perinatally exposed and unexposed, aged 5 to 11 years.27 However, cognitive test adaptations were conducted occasionally in Zambia.12,14,20 On this matter, there is no evidence to date documenting the successful completion of the entire (or for most) test adaptation process of any imported cognitive assessment tools currently in use for Zambian school-aged children.

In summary, the debate on cross-cultural validation of tests not originating from Zambia and to be applied to pediatric clinical research highlights four key consequential issues. Firstly, the development of culture-free cognitive tests is not feasible, and all tests are inherently culturally biased.28,29 Secondly, the current reliance on the use of standardized tests of Western origin carries the risk of misrepresenting the cognitive abilities of Zambian children.7,10 Thirdly, there is no availability of Zambian-adapted pediatric cognitive assessment tools with extensive cross-cultural validation. Lastly, an efficient development strategy of cognitive measures should be based on the design of a culturally appropriate cognitive testing instrument9,10,20 and a streamlined validation process conducted in the specific context of use.

We describe the development and validation of a cognitive assessment tool for Pediatric Clinical Research (CAT-PCR), a new brief cognitive nonverbal test battery, originally designed for and validated on the Zambian school-aged population. CAT-PCR development contributes to addressing the unmet assessment technology need for new psychometric tools with superior reliability and validity for cognitive health research in Sub-Saharan regions.

Methods

Instrument design. CAT-PCR was designed to measure attentive and visuoconstructional behaviors in a school-aged (5–17 years old) population. Attentive behavior is characterized by a conscious, intentional, and focused effort to maintain an efficient cognitive processing of a specific goal-oriented task.30 Attention enables filtering and prioritizing of available information for more in-depth coding and integration, as the processing of all information would be overwhelming, unnecessary, and counterproductive.30,31

Attention also refers to the collection of cognitive mechanisms that selects which of many stimuli to process and act on.32 The alerting, orienting, and executive systems participate in attentional processing, each including a diverse set of processes. For example, selective attention involves the processing of relevant and specific targets of information and simultaneous neglect or attenuation of distractors and other sources of irrelevant information as attending a goal-directed task.31,34,35 Visuospatial attention, defined as the ability to orient to salient visual stimuli and to parse the visual world, is part of the orienting network.33 Furthermore, visuospatial attention is functionally combined with the intentional/goal-directed motor systems. As such, it is to eye movements as well as other motor systems, such as those directing and fine hand movement regulation in a more complex way.32

Visuoconstructional behavior refers to any type of formulating behavior in which parts are organized to form a single object (ie, drawing or assembling) and entails the ability to see an object as a set of parts and then process them to construct a unitary replica of the original from these parts.36 The spatial relations among the parts of the object to be accurately perceived, then processed and integrated into the desired unitary object representation, ie, visuoconstructional processing.36,37

Selective attention and visuoconstructional processing are essential functional components in cognitive development and health, as they affect the everyday life of a school-aged child. Both cognitive functions are relevant to problem-solving and adapting to novel situations, such as copying with cognitive task demands in an educational environment. For example, selective attention is associated with the development of a variety of skills including speech,38 metalinguistic skills,39 and arithmetic.40 Selective attention is proposed to be one of the key foundational skills for academic success in children overall.41

We chose cognitive tasks involving paper-and-pencil graphic line tracing and cancellation as particularly appropriate to measure attention and visuoconstructional processing. The elementary activities of line tracing and cancellation using a pen/pencil and a sheet of paper as a medium are common (ie, more familiar and with wider use than a computerized touch screen) and socially important for a school-aged child’s everyday life in Zambia as well as in the rest of the Sub-Saharan regions.

The structure of cognitive tests also responds to the necessity to limit the use of language abilities (ie, the only use of verbal contents in test instructions) when measuring attention and visuoconstructional processing. This characteristic of test design was introduced to minimize, albeit not to control, the sociocultural and linguistic effects of the extraordinarily rich Zambian environment on CAT-PCR measures. Cultural variability in the processing of visuospatial materials is a matter of fact, and sources of cultural differences in visuospatial processing can be identified42 and mitigated by the test validation process.

Originally developed Western tests based on paper-and-pencil graphic line tracing and cancellation are already in use in research on pediatric cognitive development in Sub-Saharan settings. Examples of normed assessment tools include tools based on reproducing a bidimensional graphic configuration (ie, the Bender Visual Motor Gestalt Test or the Rey-Osterrieth Complex Figure Test) and cancellation tests, such as the d2-Test of Attention.43

Tests based on graphic reproduction or cancellation are developmentally sensitive and correlate to academic skills. For example, a study by Vakil et al44 found that younger age groups (8–11 years) were more dissociable than older age groups (12–17 years) based on the performance of cancellation tests, indicating that changes in attention are pronounced in late childhood years and stabilize in later years of adolescence.44 The abilities to select among competing stimuli and preferentially process more relevant information are available in young school-aged children, but the speed and efficiency improve as children develop.45,46 In a nonclinical Spanish-speaking pediatric population, aged 8 to 12 years, the d2-Test of Attention scores were correlated to math, reading, and writing skills. The study also confirmed the effects of age on performance.47 In a study with a cohort of 835 South African children, aged 8 to 12 years, from primary schools in socioeconomically disadvantaged neighborhoods, d2-Test of Attention scores correlated to academic achievements.48 The children’s end-of-year results—based on the mean of four subjects: home language (Xhosa or Afrikaans), first additional language (English), mathematics, and life skills—were considered as an indicator of academic achievement.48

Furthermore, specific brain systems and neural mechanisms underlying graphic line tracing and cancellation behaviors have been identified.49,50

CAT-PCR is the latest product of a technology development program aiming to develop original psychometric instruments for pediatric evaluation.15–18 The program was led by an international Working Group on Zambia Paediatric Cognitive Assessment Tools (ZPCATs) Development. A core team of local Zambian (pediatricians, one child neurologist, clinical psychologists, clinical operation roles) and European (clinical research methodologist and cognitive scientists) experts worked together on instrument development, trial design, oversight, and execution between 2017 and 2019.

CAT-PCR is composed of two tests, WAVES and SYMBOLS. Both tests were initially developed as stand-alone tests.16,17 Thereafter, we revised their scoring criteria to increase the efficiency of measurements. We detail the validation process of WAVES and SYMBOL’s new scoring structures in the following sections.

WAVES task description, administration, and outcomes. The WAVES test consists of the standardized administration of four drawing tasks.16 The base drawing task involves providing a target for reproduction to the child for free-hand reproduction using the dominant hand. The target for reproduction is a visual pattern consisting of a centered black periodic waveform displayed on a 14.8cm × 21.0cm white background (Figure 1).

The periodic waveform presents three sequential repetitions of cycles, each of them with a peak and a through-point as well as two points of flexus, over an axis of length of 14.4cm. The examiner asks the child to reproduce the target by drawing as exactly as possible onto the prespecified area of the data capture sheet using a pencil. There is no time limit for completing a reproduction. The reproduction task involves drawing a continuous line (ie, line tracing) in a succession of six bell-shaped curves; changing trail direction at peak- or through-points and at flexus-points; maintaining the spacing of the reproduction over a 14.4cm axis.

WAVES administration generates a gallery of four drawings. Each of the four drawings is characterized by a distinct modality of reproduction: copying design with open (CDO) eyes, immediate reproduction from memory with open (IRMO) eyes, immediate reproduction from memory with closed (IRMC) eyes, delayed reproduction from memory with closed (DRMC) eyes after few seconds (ie, three to five) of interference. The sequential order of drawing task administration is standardized by reproduction modality and is CDO, IRMO, IRMC, and DRMC. The child has one attempt when copying/reproducing the drawing for each condition.

Reproduction modalities were designed to set distinct levels of access to visuospatial figure/background cues for reproduction. The four reproduction modalities produce externally-cued (eg, copying from an existing model) and internally-cued (eg, drawing from memory and imagery) line tracings.

The use of different reproduction modalities also offers an approach to evaluate the effect of distinct levels of visuokinesthetic feedback on the line tracing. Visuokinesthetic feedback plays a crucial role in the control of the line tracing and the enabling of the correct assembly of the elementary graphic components of the waveform (ie, the continuous sequence of bell-shaped curves) during reproduction. The level of visuokinesthetic feedback modulates two types of constraints in line tracing. Those related to the accuracy of reproduction, which impact on the line trace trajectory (morphocinetic), and those relating to the sequencing of the bell-shaped series and to the spatial layout on the paper (topocinetic).51,52

The line trace trajectory, once it is made well-known by previous learning, would be still comfortably mastered and accurate as reproduced from memory. The morphocinetic constraint is minimally dependent on visuokinaesthetic feedback once line tracing is (over)learned. Nevertheless, the topocinetic constraint, which concerns the spatial layout line trace in the graphic space for reproduction (the “where?”), still largely depends on visuokinesthetic feedback. Even in the case of an accurately learned waveform during the CDO task, reducing the level of visuokinesthetic feedback during reproduction from memory with open or closed eyes is anticipated to reduce the detection and forward signaling of potential reproduction errors. Therefore, the line tracing flows at increased risk of being less accurate. This observation implies that the expected line tracing accuracy would be higher in the copy design modality and progressively lower in the other reproduction modalities from memory. In the WAVES task administration procedure, this reproduction modality effect was mitigated by using an overlearned-by-repeated-acquisition target waveform for the reproduction with closed eyes modalities.

Each drawing is assessed by visual inspection and metric assessment according to predefined scoring criteria. Reproduction errors are identified and scored. A reproduction error is a graphic marker signaling an inaccurate reproduction (in whole or in part) of the target waveform. There are four error types: angularity, morphology, perseveration, and spacing. The four error types in the new scoring system were chosen out of a larger number of 48 other graphic markers at the end of an iterative screening process and psychometric property assessment. The four error types were selected as the ones which showed adequate reliability at test-retest. Other graphic markers did not fit with the measurement model. A summary of graphic markers which were screened is reported in Supplemental Table 1.

Angularity is an error in changing trail direction during reproduction at peak- and through- or flexus points; the presence of a V-shaped graphic marker indicates an angularity error (Figure 1). Morphology refers to an error in reproducing geometrical features of the target waveform, such as periodicity (ie, the number of alternating peaks and throughs), symmetry, constant amplitude (ie, distance from peaks and throughs), line tracing continuity, and direction (Figure 1). Perseveration refers to the sum of peaks and throughs greater than eight (Figure 1). Spacing error is assigned when the measured axis length is 4.8cm or less. The axis length is the metric distance of the two farthest points in the reproduction (Figure 1). A WAVES gallery generates a set of 16-item error type 0 to 1 raw scores. The presence of an error type in the reproduction is scored as 1, and its absence is 0. The presence of multiple errors of the same type, such as the occurrence of nine V-shaped markers signaling nine angularity errors in the same reproduction, is scored as 1 (Figure 1).

What WAVES is intended to measure. WAVES was designed to provide measure(s) of the visuographomotor constructional processing (VGCP) underlying the reproduction of a visual image by graphomotor activity (ie, paper-and-pencil line tracing).

We construct VGCP as an integrative function of multiple cognitive processes underlying an accurate reproduction of a regular visual pattern under distinct visually guided and nonvisually guided graphic line tracing modalities. The outcome of VGCP is the resulting graphic line trace. The line trace is the graphic representation that translates a unified, coherent integration of visuospatial properties from the target waveform to its reproduction. As such, the graphic line trace provides a measurable outcome of the underlying functional components. The schematic representation of the functional organization of the VGCP system is outlined in Supplemental Table 2.

A conceptual framework of line drawing. The VGCP system is generative and adaptive. It operates based on the principle of active interference, which assumes the brain uses a generative model to predict the behavior(s) of its internal systems.53–55 Visual processing and graphic production are the two main subsystems,56,57 which are functionally integrated by visuokinesthetic feedback mechanisms. The VGCP system corresponds to specific brain areas and networks.49,55,58 Alterations of functioning in these brain areas and networks may impact on the VGCP.49,58

The WAVES test makes cognitive demands to the VGCP system in relation to the execution of a sequence of reproduction tasks. The VGCP system adapts to these demands by: 1) setting the behavioral goal(s); 2) target object recognition; 3) mapping the execution of the goal-directed behavior and encoding precision; 4) executing the goal-directed behavior; and 5) developing reinforced learning and process iteration.

Setting the behavioral goals. The reproduction of a simple bidimensional geometrical shape requires the initial appraisal of task demands and the outlining of optimal solutions by the cognitive system. This mapping-forward process sets the behavioral goal to be achieved (ie, the accurate reproduction by line tracing) and the planning of cognitive resources to deliver the execution. The product of this process is a set of assembled visuoconstructional schemas which informs the VGCP system specifications on what and how to deliver (ie, goal-directed policies).

Target object recognition. This process provides answer to the question, “What should the VGCP system reproduce?” The VGCP system determines the identity of the object targeted for reproduction, which is operationalized in terms of the visuospatial configuration of the parts of the object and a whole. In other words, the object’s identity informs on the configuration of its features (eg, periodic waveform with 3 repeated cycles), but these features can themselves inform on configurations of simpler features (eg, the periodic waveform with 3 repeated cycles can be decomposed into or assembled from 12 bell-shaped curves or 3 sinusoidal curves, or maintained as unitary configuration). A set of assembled visuoperceptual schemas specifying the target object defining features is the product outcome of this process.

Likely, the VGCP system identifies the defining features of the space on paper where to place the reproduction (eg, size, orientation). Once the target object and the space for reproduction are determined, geometric transformations are applied. Examples of spatial transformation include scaling, deforming, and normalizing, and constructing a mesh. A mesh specifies the start, peaks and throughs, and endpoints of the target waveform, effectively setting out the expected spatial limits of the reproduction. Furthermore, prior line visuokinesthetic and motor representations are complementary to the formulation of representations of target object recognition. The eventual product of these processes is a set of assembled visuomotor constructional schemas mapping forward the activity of reproducing the visual target object in a specified space by line tracing.

Mapping the execution of the goal-directed behavior and encoding precision. Then, the VGCP system faces the challenge, “How to deliver an accurate reproduction?” To solve this, the VGCP system constructs the visuoconstructional schemas that best predict what should happen if the VGCP system looked over there and drew an accurate line trace in the specific space for reproduction. These visuomotor constructional schemas devise a map of the corresponding elements of the target object for reproduction and the characteristics of the line tracing. The map to encode precision constitutes a functional set of reference standards designed to evaluate the accuracy of line tracing as the reproduction task evolves. Therefore, the execution of line tracing is guided by the principle of active interference and based on the already available and continuously updating VGCP system information.

Executing the goal-directed behavior. This process provides an answer to the question, “How accurately is the reproduction unfolding?” The VGCP system solves the problem of executing an accurate graphic reproduction by formulating alternative action sequences (ie, visuographomotor reconstruction decisions on where to look and how and where to draw the line and stop-and-go decisions, eg, where and how to start tracing, where and how to end, how to keep the line as continuous as possible) as a set of operational plans, then selecting the one anticipated to be most efficient.

The VGCP system unpacks the “deliver an accurate reproduction” policy into the set of processes that must be initiated at lower levels of a model to execute the policy.54,55 As visuographomotor reconstruction progresses, line tracing entails visuospatial encoding of visual representations and execution of visuokinesthetic sensory-guided movements as the two fundamental functional components.58  VGCP works by continuously converting visuokinesthetic sensory information into a graphomotor response, and these are optimally tuned to the goal of an accurate reproduction by trading off cognitive resource consumption with accuracy constraints.54,55

Furthermore, as visuographomotor reconstruction evolves, the VGCP system works to prevent the occurrence of a reproduction error. Error prevention is based on the timely detection and forward signaling of a potential error followed by the consequential adaptive graphomotor response. Detection of a potential error is prompted when visuokinesthetic feedback does not comply and deviate from the reference standards of line tracing accuracy (ie, schemas of encoded precision). Potential error signaling prompts an adaptive response characterized by immediate remapping of graphomotor activity resulting in error prevention. On the contrary, nontimely (ie, anticipated or delayed) potential error detection and signaling cause the failure of the adaptive graphomotor response. The process failure results in a reproduction error.

Developing reinforced learning and process iteration. The VGCP system “learns” as it operates a visuographomotor reconstruction. Visuoconstructional schemas are stored in memory and are subject to iterative updates as the VGCP of the reproduction task unfolds and becomes more efficient by successive repetitions.

WAVES graphic markers of inaccuracy. Inaccuracy is the extent to which errors of reproduction inform the efficiency of the underlying functional components. WAVES items represent 16 graphic markers of reproduction error and are estimates of the inaccuracy of reproduction. A higher item score indicates higher inaccuracy of reproduction.

By test design, WAVES items are expected to cluster by reproduction error type, (ie, angularity, morphology, perseveration, and spacing) and not by reproduction modality (ie, copy design/reproduction from memory). The VGCP measurement model posits each dimension assessment domain, consisting of a specific cluster of items, measures one specific cognitive ability. The VGCP measurement model also specifies the four dimensions being correlated as they reflect four core cognitive abilities, each one complementarily working with the others to enable VGCP. The four abilities are operationalized as the following.

Angularity is the ability to effectively deliver trajectory adjustments on time as the visuographomotor reconstruction progresses. The angularity item cluster includes angularity items at CDO, IRMO, IRMC, and DRMC.

Ending is the ability to effectively execute segmentation. Segmentation refers to the use of a step-by-step stop-and-go mode of execution as the visuographomotor reconstruction progresses. Perseveration error is a failure of preventing the stop-and-go error resulting in the production of redundant line elements. The ending cluster includes perseveration items at CDO, IRMO, IRMC, and DRMC.

Morphology is the ability to detect, transform, retain, and retrieve the key visual-geometrical characteristics of the target waveform as well as to identify and activate the related sensorimotor pattern(s) for a graphomotor response. Morphology error is interpreted as a failure of delivering a complete forward mapping, resulting in the overall inaccuracy of target reproduction. Defining geometrical (ie, waveform, continuity, direction, number of peaks and throughs, symmetry, periodicity, amplitude) characteristics of the target configuration are not maintained in the reproduction. The morphology item cluster includes morphology items at CDO, IRMO, IRMC, and DRMC.

Spacing is the ability to detect, transform, and retain the key visuospatial characteristics of the target waveform (ie, size, width, length, spatial extension) as well as to identify and activate the related sensorimotor pattern(s) for a graphomotor response. Spacing error is a failure to transfer the visual-spatial scaling transformation of target-defining features (ie, waveform size, width, length, spatial extension) into the reproduction. Spacing includes axis length decreased items at CDO, IRMO, IRMC, and DRMC.

As WAVES is intended to evaluate cognitive development and health, there are two additional key attributes of the construct of VGCP. The first one is that measures of VGCP should be dependent on age. VGCP should become more efficient over time, resulting in a progressive decrease in the inaccuracy of reproduction. Second, VGCP measures should relate to health status and be able to quantify the effect of health/disease status on cognitive development, wherein  lower health status should relate to impaired VGCP.

The assessment of WAVES test score dimensionality. Under the conceptual framework of VGCP, WAVES is designed to provide a multidimensional test score. We refer to the dimensionality of a test score as the number of abilities that are measured by the test. The dimensionality of the test score is an important aspect of the validity of an instrument. WAVES test score dimensionality should reflect the functionally related abilities.

Structural equation modeling was applied to validate the VGCP measurement model. Item response theory (IRT) modeling was used in conjunction with confirmatory factor analysis (CFA) for construct validation. CFA and IRT are sophisticated methods for examining the quality of the instrument. Each is understood to provide unique information about item and scale metric properties that build the argument for the reliability and validity of VGCP scores and their use in clinical research settings.

We executed a multistep analytic plan by initially applying IRT models to determine the basic psychometric properties of the 16 items and informing on the most parsimonious set of items to be retained. Then CFAs were performed for the systematic comparison of an alternative a priori factor structure based on systematic fit assessment procedures. The most relevant aspects of the CFA analytic plan are summarized here below.

We compared the WAVES Correlated 4-Factor “Error Type” model to three other measurement models: a) Correlated 2-Factor, b) 1-Factor, and c) Higher-order Factor.

The WAVES Correlated 4-Factor “Error Type” model requires VGCP items distributed across four subgroups by reproduction error type and was tested as the preferred model.

The correlated factors model includes two or more latent constructs (ie, VGCP abilities) which are allowed to correlate. Observed variables are grouped by shared characteristics and are indicators for a factor assumed to reflect this commonality. This also explicitly models the multidimensionality of a test. The correlated factors model does not incorporate any general or underlying factor; however, the correlations between each of the latent constructs (ie, VGCP abilities) indicate shared variation across all pairs of latent constructs in the model. Loadings indicate the strength of the relationship between the observed variables and their associated factor. Error terms are estimated against each observed item variable. Each observed item variable in the model is assumed to be only associated with a single factor (ie, the congeneric measurement model).

We compared the model fit of a Correlated 4-Factor “Error Type” versus a Correlated 2-Factor “Copy/Memory.” As the alternative model, the “Copy/Memory” model assumes WAVES measuring visuoconstructional processing (ie, evaluated by copy design) and imagery/memory (ie, reproductions from memory at IRMO, IRMC, and DRM) as two distinct and correlated abilities. The copy design item cluster included morphology, angularity, perseveration, and spacing items at CDO; the memory cluster; and the rest of the 12 items.

The 1-Factor unidimensional model posits a single common factor to explain the covariance (or correlation) among all test items, with no differentiation between subgroups of items. The unidimensional model is particularly valuable, as it can be used to model items measuring various aspects of a construct on the same scale and report a single score to represent the VGCP ability of the child. The key question when testing the 1-Factor measurement model was, “Can we rule out that the WAVES test score is unidimensional?” or, in other words, to what extent can a substantial proportion of variance in observed test score be explained with sole reference to the underlying VGCP construct?

We also examined the Higher-order Factor model. The Higher-order Factor model includes a superordinate higher-order factor (ie, VGCP) and four subordinate first-order factors upon which specified subgroup of items load. This higher-order factor VGCP explicitly models the shared variance between subordinate first-order grouping factors. Subordinate first-order grouping factors are conditionally independent of one another, and each one mediates the relationship between the superordinate VGCP factor and the observed item variables.

Model fit indexes with different measurement properties and cut-offs were applied: the χ2 goodness of fit test,59 χ2/df ratio of 3 or less,60 the comparative fit index (CFI)61 with cutoff of 0.95 or greater, the Tucker-Lewis index62 cutoff of 0.95 or greater, the root mean square error of approximation (RMSEA)63 cutoff of 0.06 or less, and the standardized root mean residual64 cutoff of 0.08 or less. Model fit indices measure the degree to which a pattern of parameters specified in the model is consistent with the pattern of variances and covariances from a set of observed data. The RMSEA is an absolute fit index that evaluates how far a speculative model is from a perfect model. The CFI and Tucker-Lewis index are incremental fit indices comparing the fit of a speculative model with that of a baseline model (ie, a model with the worst fit).

The combination of the above model fit indices was intended to minimize the methodological risk of Type I (ie, the probability of rejecting the null hypothesis when it is true) and Type II (ie, the probability of accepting the null hypothesis when it is false) errors under various conditions. In other words, the risk of rejecting a good measurement model or accepting a bad one.

SYMBOLS task description, construct, administration, and outcomes. SYMBOLS is based on the administration of a speeded and continuous target cancellation task.17 The data capture sheet (Figure 2) is directly placed in front of the child showing an organized, monochromatic array with a dimension of 25cm × 17cm, containing 324 familiar black symbols (*, <,  +), 108 targets, and 216 distractors, equally distributed across the four quadrants, with 27 targets and 54 distractors each. The child is asked to indicate the target of the cancellation task by selecting only one of the three symbols. Then the child is instructed to mark every corresponding target with a single slanted line as fast as they can until the task is completed.

SYMBOLS was designed to provide measures of attentive behavior. The structure of the cancellation task satisfies key requirements to test the construct of selective attention. First, the task allows for selective attention to occur (ie, the child can exercise selective attention in full by inhibiting response to distractors and by focusing on targets to hit simultaneously). Second, it provides measurement outcomes to which a child attends to the targets (ie, frequency of hits). The task involves a conscious, intentional, and continuous effort to maintain the efficient management of visuoperceptual-motor processing of a specific goal-oriented task (ie, signaling of true targets as avoiding distractors). Finally, the time constraint is applied to maximize the cognitive demand for resourcing and the gradient of attentional capabilities during the cancellation task. A time limit of 180 seconds makes SYMBOLS unable to test sustained attention that requires a continuous performance task lasting several minutes.

SYMBOLS involves a speeded task under a time constraint for completion. The sum of hits and the task processing time in seconds are the noted outcomes. Hit is defined as a correctly signaled target; there are 108 targets (ie, the target symbol chosen by the child) to hit, 27 for each quadrant. A Visuomotor Processing Index is obtained by the ratio of the sum of hits, defined as correctly signaled targets, by task processing time (in seconds). The Visuomotor Processing Index is constructed as a measure of visuographomotor processing speed. Visuomotor Processing Index values range between 0 (ie, the task was not even processed: no hits in 180 seconds) and 108 (ie, the hypothetical case where the cancellation task was perfectively processed with 108 hits delivered in one second).

A schematic representation of the functional organization of the visuomotor processing system is outlined in Supplemental Table 2.

Field trial, study design, participants, and investigational sites. A cross-sectional field trial including an ancillary test-retest reliability study on a selected subgroup of participants was executed. The study protocol obtained approval from the National Health Research Authority and the University of Zambia Biomedical Research Ethics Committee (reference number 008-08-165, approval on February 27, 2018). The study was conducted following the International Guidelines for Good Clinical Practice and the Declaration of Helsinki65 and Zambian legal and regulatory requirements.

Children aged 5 to 17 years (including limits) with access to formal education and experience and familiarity with cognitive tasks like the one required by the CAT-PCR (ie, cancellation task and drawing) and with the ability to communicate effectively with the examiner were eligible for participation. Exclusion criteria included the presence of a serious health condition currently requiring in-patient hospitalization or significant cognitive, sensory, or motor disabilities.

Participants and sites for test administration represented a broad sample of children and settings in which CAT-PCR use is intended for. The study population was recruited from multiple clinical and school sites in two diverse regions of Zambia, the Lusaka Metropolitan Area and the Copperbelt Province. Schools and clinical sites were purposely selected to provide a heterogeneous sample representative of the wide socioeconomic, cultural, and linguistic, as well as medical and psychosocial, conditions affecting the health and cognitive diversity landscape of the Zambian pediatric population.

Clinical sites were the outpatient clinics at the University Teaching Hospital Department of Paediatrics and Child Health in Lusaka—the highest referral hospital in Zambia, delivering specialized care in pediatrics and receiving children from all over the country—and tertiary clinics with local referrals in the Copperbelt Province, Ndola area. More details on eligibility criteria, informed consent procedures, investigational sites, and enrolment are reported elsewhere.16,17

Data collection methods and analysis. Sociodemographic and relevant health-related information was collected after obtaining the informed consent and before cognitive assessments by structured interview based on an ad hoc-designed questionnaire (Supplemental Table 3). WAVES and SYMBOLS were administered to all participants, scored, and data entered into the clinical database.

STATA software version 15.1 was used for statistical analyses.66

A multistep approach was applied for data analysis. Initially, descriptive statistics, nonparametric tests, and parametric tests were performed on variables of interest. Second, WAVES item analysis, including differential item functioning, was conducted. Third, the WAVES measurement model was evaluated using structural equation modeling, including IRT and CFA models. WAVES and SYMBOLS score reliability was further evaluated in the test-retest study and their convergent validity was determined. Fourth, the effect of age, sex, health status, and other psychosocial factors on test scores was evaluated. Finally, a percentile rank distribution of test scores adjusted by age was obtained. We adopted the following criteria for the strength of linear correlation: r small equals 0.10, r medium equals 0.30, r large equals 0.50; and for effect size: d small equals 0.20, d medium equals 0.50, d large equals 0.80 or greater.67,68

Results

Table 1 summarizes sample characteristics. The sample included a group of children in apparent good health condition at the time of assessment (healthy group), although they may have been suffering from a chronic health condition that does not represent an important risk or causative factor for cognitive impairment. The poor health group included children suffering from at least one chronic health condition that is a recognized important risk or causative factor for cognitive impairment and requiring a therapeutic intervention at the time of study assessment.

The poor health group consisted of three subgroups characterized by different adverse risk/causative factor profiles: neurological (NEU), psychosocial stress and deprivation (PSD), and nonneurological internal medicine (MED). The NEU subgroup had 20 children with electroencephalogram (EEG)-confirmed epilepsy, neurological sequelae of sickle cell disease, malaria, or other postinfective conditions. These children may also have suffered from other concurrent chronic medical conditions (ie, controlled and stable diabetes mellitus or stage 3–4 HIV). The PSD subgroup was made up of 46 children who had prolonged experience of exposure to extreme poverty and/or with poor psychosocial support for early or later development. This group included adolescents with a history of behavioral issues, alcohol, and drug substance abuse. They also may have suffered concurrent health conditions with the exclusion of NEU group neurological-type conditions. The MED subgroup included 24 children with a primary diagnosis of at least one chronic disease currently requiring therapeutic intervention resulting in a stable condition (ie, controlled diabetes, controlled anemia, controlled severe renal or cardiac conditions, tuberculosis, stage 1–2 HIV disease) and exclusion of NEU-group or PSD-group type conditions. Children with a history of nonrecurrent malaria were assigned to the healthy group. Diagnostic classification was based on the medical review of the child’s available health information, as made available by medical history, clinical and neurological examination, instrumental diagnosis, and reporting from reliable proxies.

All participants had access to formal education and were regularly attending school. The most widely spoken languages in the regularly school-attending sample of study participants were Bemba (81.1% of the participants), English (72.6%), Nyanja (22.8%), Tonga (1.9%), and Lozi (1.2%). Children able to speak one, two, or three or more (ie, Bemba, Nyanja, English, and French) languages composed 23.5, 10.0, and 66.5% percent of participants, respectively. An urban variety of Nyanja, Chewa (spoken by 0.6% of participants), is the lingua franca of the capital, Lusaka, and is used for communication between speakers of different languages.

All participants completed CAT-PCR evidencing satisfactory feasibility of test use.

WAVES IRT item-level analyses. The main concept in an IRT model is the item characteristic curve. The item characteristic curve describes the probability that a person succeeds on a given item. Even though it may appear counterintuitive, in the WAVES measurement model, a person succeeds when they make a reproduction error. We have the probability of 16 different reproduction errors. Item characteristic curve is determined for every single item and is different for each item. The probability of success on an item is a function of both the level of ability and the properties of the item. The value level of the ability for a given person is defined as the person’s location and is denoted by θ. The item properties, defined as difficulty and discrimination, are estimated in the IRT model. The difficulty parameter, or item location, commonly denoted by b, represents the location of an item on the ability scale. The discrimination parameter, denoted by a, is related to the slope of the item characteristic curve. This item parameter informs how fast the probability of success changes with ability near the item difficulty. An item with a large discrimination value has a high correlation between the ability and probability of success on that item and can differentiate better between low and elevated levels of the ability. Ideally, an instrument intended to differentiate between all levels of the ability should contain items with difficulties spread across the full range of the ability.

An item characteristic curve for each item was plotted and examined. Estimates of IRT item slope parameters (a-parameters) and location (b-parameters) using marginal maximum likelihood methods are shown in Table 2.

We found the 16 items covering a wide range of the item difficulty spectrum, with morphology DRMC showing the lowest value (–1.24) and axis length decreased CDO showing the highest value (3.23). Item discrimination estimates, ranging between axis length decreased (0.91) and perseveration CDO (3.31), suggest items are discriminating; that is, in the vicinity of a given difficulty estimate, any two children with a distinct level of ability would have different predicted probabilities of successfully responding to an item.

The item information function (ie, the amount of information an item provides at its estimated difficulty parameter for estimating the ability) was calculated for each item. The height of an item information function and therefore the amount of information an item provides around the difficulty parameter is proportional to the item’s estimated discrimination. Table 2 shows that the items perseveration CDO and angular CDO are the most discriminating and have the steepest item information functions. We found confirmation that all items were discriminating.

WAVES differential item functioning. Differential item functioning occurs when an item that is intended to measure the ability is unfair, favoring one group of individuals over another. An item has a differential functioning across individuals with the same ability level if these individuals have different probabilities of providing a given response. As a prerequisite for subsequent comparisons of WAVES scores across sex- and health-status groups, an initial assessment of differential item functioning was carried out.

Determining differential item functioning requires assessing whether a test item behaves differently across respondents with the same value of the ability. We calculated Mantel-Haenszel (MH) χ2 and the common odds ratio (OR) for dichotomously scored items. The MH statistics were used to determine whether an item exhibits uniform differential item functioning between two observed groups, that is, whether an item favors one group relative to the other for all values of the ability. The MH test was used to determine whether two dichotomous variables are independent of one another after conditioning on a third variable. In our case, one dichotomous variable (ie, item score) represents the reference (ie, item score=0, male children) and the focal group (ie, item score=1, female children), and the other represents a response to an item scored as accurate and inaccurate. The conditioning variable is the ability (ie, VGCP) as represented by the observed total score. For items that exhibit uniform differential item functioning, an odds ratio was used to assess the amount and direction (ie, the extent to which one of the two groups is unfairly treated) of differential item functioning. In addition, we used logistic regression to test whether an item exhibits both uniform and nonuniform differential item functioning, that is, whether an item favors one group over the other for all values of the ability or only some values of the ability.

With regards to uniform differential item functioning based on sex group, the analysis showed two items favored the male group relative to the female for all values of the ability: perseveration IRMO (MH OR: 2.52, χ2=3.86, p=0.033) and angular IRMC (MH OR: 1.76, χ2=4.32, p=0.038). The other two items favored the female group relative to the male for all values of the ability: morphology CDO (MH OR: 0.51, χ2=4.69, p=0.030, logistic regression χ2=5.84, p=0.016) and angular CDO (MH OR: 0.46, χ2=6.19, p=0.013, logistic regression χ2=5.92, p=0.015).

With regards to health status, MH showed no items as all p-values greater than 0.05; logistic regression showed perseveration CDO (uniform χ2=5.19, p=0.02), perseveration IRMO (uniform χ2=3.85, p=0.05), and angular CDO (uniform χ2=5.44, p=0.02) with uniform differential item functioning.

WAVES IRT test level analysis. We determined how each item and the whole group of 16 items relate to VGCP. We ran an IRT analysis on binary items applying 1-parameter logistic (1-PL), 2-parameter logistic (2-PL), and 3-parameter logistic (3-PL) models with postestimation to test the hypothesis of unidimensionality of test score. The assessment of item fit showed an IRT 2-PL model as the best fit of data compared to 1-PL and 3-PL alternative models. Summary results are reported in Table 2.

The test information function, obtained by summing up item information functions, was used to obtain precise estimates of a person’s ability level at specified intervals. The test information function plot indicates how precisely the instrument can estimate person locations. We found that the set of 16 items provided maximum information for persons located in the range of θ values between approximately –1.0 and 2.0, with the peak of information equalling 8 at the value of θ equals 0. As we moved away from that point in either direction, the standard error of the test information function increases, and the instrument provides less and less information about θ. The test information function curve was steep indicating that the set of items is more precise at the median of the ability range.

Finally, we constructed a test characteristic curve plotting the expected scores against the ability and found these expected scores correspond with the VGCP ability locations at 0.64 and 11.60 for values of θ equal to –1.96 and 1.96, respectively. Higher scores reflect a higher level of inaccuracy of reproduction.

The initial assessment of psychometric properties suggested that all 16 items should be retained, and the instrument showed satisfactory reliability. A WAVES test score could be derived by summing up single-item scores.

WAVES CFAs. The analysis was conducted on 420 cases, with a ratio of a sample size to model variable greater than 25, representing a medium-sized sample adequate for CFA.69 WAVES items did not appear to be normally distributed as the univariate and multivariate normality tests lead to rejection (at the 5% level) of the null hypothesis. The assumption of univariate normality was evaluated first using Kolmogorov-Smirnov tests on each of the indicators. Multivariate normality was examined by the multivariate kurtosis test (p=0.000), multivariate skewness test (p=0.000), Henze-Zirkler’s consistent test (p=0.000), and Doornik-Hansen’s omnibus test (p=0.000). No data manipulations, such as transformations like the square-root transformation designed to improve the distribution of measured item variables, were performed.

Variance-covariance matrix with maximum likelihood with the Satorra-Bentler correction estimation procedure was chosen as the input and estimation method.70 Maximum likelihood carries the assumption of multivariate normality, even though maximum likelihood estimation may perform well with mild departures from multivariate normality. We applied the Satorra-Bentler robust correction as past research found that the failure to meet the assumption of multivariate normality can lead to an overestimation of the χ2 statistic, hence inflated risk for Type I error, downward biased standard errors, and undermined assumptions relative to ancillary fit measures.71

Model fit statistics are reported in Table 3. The Correlated 4-Factor model successfully met the criteria for an acceptable fit of observed data. The alternative 1-Factor, the Correlated 2-Factor, and the Higher-order Factor measurement models did not fit the observed data. Model fit indices suggest three models can confidently be rejected as their values are not even close to fit.72

A path diagram with a pictorial representation of the model and reporting of standardized parameter estimates (ie, factor pattern coefficients, error variances, factor correlations) is shown in Figure 3.

All items have standardized loadings higher than 0.5. We found no standardized residuals higher than |4.0|, which may suggest a potentially unacceptable degree of error in the model. The results confirm that all loadings in the model are highly significant as required for convergent validity. Furthermore, the Correlated 4-Factor measurement model denotes robust discriminant validity because it does not contain any cross-loadings among either the measured variables or the error terms. Taken together, these results support the convergent and discriminant validity of the WAVES measurement model. Thus, we can be confident that the WAVES behaves as we would expect in terms of the unidimensionality of the four measures angularity, ending, morphology, and spacing, and in the way constructs they reflect relate to each other.

WAVES scale item scoring and construct reliabilities. Upon satisfactory assessments of WAVES reliability, differential item functioning (ie, measurement noninvariance), and test score dimensionality, a VGCP Index was calculated by using unweighted procedures involving summing the 16 raw item scores for each study participant. VGCP Index ranges from 0 to 16 with a higher value indicating higher inaccuracy of reproduction. Cronbach’s α73 was calculated as a point estimate of internal consistency for the VGCP Index and four subscales. We applied a cut-off value of α of 0.80 or greater to determine the threshold for acceptability of reliability estimates for the intended clinical research use of the instrument.74 Cronbach’s α values were 0.82 for VGCP Index, 0.75 for angularity, 0.80 for ending, 0.70 for morphology, and 0.80 for spacing.

Reliability of CAT-PCR scores. Table 4 shows the results of the test-retest evaluation. Reliability, as stability of the measure, was good to excellent for the WAVES VGCP Index and scores and for SYMBOLS VMP Index and task processing time scores. SYMBOLS hit score showed unsatisfactorily reliability point estimates.

 

Effect of sex, age, and other psychosocial indicators on CAT-PCR scores. The effect of sex was evaluated by comparing two groups of 148 male and 182 female healthy participants using one-way ANOVA and was not significant (VGCP Index [male, mean (M): 5.22, standard deviation (SD): 3.4 vs. female, M: 5.13, SD: 2.9, at the p<0.05 level for the two conditions, F(1,328)=0.08, p=0.781], VMP Index [male, M: 0.78, SD: 0.3 vs. female, M: 0.83, SD: 0.3, at the p<0.05 level for the two conditions, F(1,328)=2.05, p=0.153]).

VGCP and VMP Indexes were associated with age with a medium strength of correlation (Table 5).

The percentile rank of test scores by age group is reported in Table 6. In healthy participants, the ability to speak multiple languages was associated with VGCP Index (single language, M: 6.57, SD: 3.1 vs. multiple languages, M: 4.79, SD: 3.1, at the p<0.05 level for the two conditions, F(1,321)=17.43, p=0.000) and VMP Index (multiple languages, M: 0.86, SD: 0.3 vs. single language, M: 0.55, SD: 0.2, at the p<0.05 level for the two conditions, F(1,321)=51.85, p=0.000). Similarly, being an English speaker was also associated with VGCP Index (no English, M: 6.42, SD: 3.1 vs. English, M: 4.74, SD: 3.1, at the p<0.05 level for the two conditions, F(1,321)=17.60, p=0.000) and VMP Index (English, M: 0.87, SD: 0.3 vs. no English, M: 0.59, SD: 0.2, at the p<0.05 level for the two conditions, F(1,321)=47.70, p=0.000). These findings suggest an association between language, selective attention, and visuoconstructional processing. Healthy children with multilanguage skills showed higher performance on both measures compared to single-language children.

Having had a delayed access to school showed no significant association with VGCP Index (M: 5.11, SD: 3.1, F(1,313)=0.32, p=0.574) and VMP Index (M: 0.80, SD: 0.3, F(1,313)=1.77, p=0.184).

Effect of health status. The effect of health status was evaluated by comparing one group of 90 participants with poor health status and one group of 330 healthy participants using one-way ANOVA. The healthy group showed more favorable indicators of social functioning (ie, the ability to speak multiple languages, the timely access to education) than the poor health group (Table 1). The two groups were comparable by months of age (healthy, M: 134.86, SD: 39.3 vs. poor health, M: 142.48, SD: 35.6, F(1,418)=2.76, p=0.097). Female participants were more frequently represented in the healthy group (Table 1). Health status influences CAT-PCR scores with medium effect sizes (VGCP Index, healthy, M: 5.17, SD: 3.1 vs. poor health, M: 6.70, SD: 3.1, at the p<0.05 level for the two conditions, F(1,418)=16.92, p=0.000, d=0.49; VMP Index, healthy, M: 0.80, SD: 0.3 vs. poor health, M: 0.62, SD: 0.3, F(1,418)=24.0, p=0.000, d=0.58). Healthy children showed higher performance on both measures compared to children with poor health.

Two-way ANOVAs showed that the effect of health status on CAT-PCR scores was still maintained once the effects of covariates are controlled for. On VGCP Index, the health status effect controlled for sex (F(1,417)=15.7, p=0.001, η2=0.04), the ability to speak multiple languages (F(1,410)=12.2, p=0.0005, η2=0.03), and delayed access to education (F(1,399)=14.8, p=0.0001, η2=0.04). All statistical assumptions relating to the two-way ANOVA were met. On VMP Index, the health status effect controlled for sex (F(1,417)=21.5, p=0.000, η2=0.05, Breusch-Pagan/Cook-Weisberg test for heteroskedasticity (χ2=5.1, p=0.024), the ability to speak multiple languages (F(1,410)=15.5, p=0.0001, η2=0.04, test for heteroskedasticity, χ2=14.1, p=0.0002), delayed access to education (F(1,399)=19.5, p=0.000, η2=0.05, test for heteroskedasticity, χ2=5.4, p=0.020). Notably, the evidence of heteroskedasticity suggests a cautionary approach to the interpretation of these findings. Overall, the effect sizes of health status on VGCP and VMP Indices were small-medium.67,68

Finally, the combined effect of health status and age on CAT-PCR Indexes was evaluated by comparing childhood (n=116, 5–8 years old), late childhood (n=96, 9–11 years old), and adolescence (n=208, 12–17 years old) age groups with poor health status and healthy subgroups. Table 7 shows the summary results.

A two-way ANOVA revealed that there was a statistically significant interaction between the effects of health status and age group on VGCP Index. In contrast, the interaction between the effects of health status and age group on the VMP Index did not reach statistical significance (p=0.082). Main effects analysis showed both health status and age group did have a statistically significant effect on VMP Index. A post-hoc multiple comparison test was conducted to determine which age groups were significantly different from each other. (Table 8) The results of the post-hoc test indicated that Group 1 had significantly higher mean scores than Group 2 (p<0.05) and Group 3 (p<0.01). Group 2 and Group 3 did differ significantly from each other (p<0.05).

Overall, both health status and age group influence VGCP and VMP Indexes, though through different modalities. Cognitive functioning, as measured by VGCP and VMP Indexes, improves with age. Younger children make more reproduction errors and their visuomotor speed is slower compared to older groups, as reflected by higher VGCP Index and VMP Index scores. Health status affects cognitive functioning; however, as the extent of VGCP impairment increases with age increase, the same relation was not demonstrated for VMP.

Discussion

The exposure to multiple risk factors for altered cognitive functioning makes children in Sub-Saharan regions a particularly vulnerable health population.1–3,48 Clinical research is an effective knowledge development strategy to characterize the effects of widespread diseases and adverse psychosocial conditions on pediatric cognitive development and health. As such, clinical research also sets the foundation of evidence-based informed decision-making on health-promoting interventions. However, the scarcity of valid measurement tools is one of the barriers preventing efficient study design and assessment of cognitive health outcomes in pediatric clinical research. The new development and adoption of easy-to-use, informative, cost-efficient, and validated in the context of the intended use psychometric instruments offers a strategic solution to improve cognitive health assessment in clinical research.

CAT-PCR is a new paper-and-pencil brief cognitive test battery with a user-friendly, simplified, fast (<15 minutes), and streamlined procedure for administration and scoring by qualified healthcare professionals. CAT-PCR consists of the standardized administration of two nonverbal tests, WAVES and SYMBOLS. CAT-PCR’s intended uses are the assessment of selective attention and visuographomotor constructional processing over time and the detection of their impairment in a school-aged population. CAT-PCR provides indices of VGCP and VMP. The VGCP Index summarizes the accuracy of the reproduction of a regular visual pattern under distinct visually guided and nonvisually guided modalities of reproduction by graphomotor activity (ie, graphic line tracing). The VMP Index is an indicator of a child’s ability to exert an attentional behavior requiring selective attention by measuring the extent of accurate graphomotor response under a time constraint. Both CAT-PCR indices have good reliability (ie, internal consistency and stability over repeated assessments) when the purpose of the measurement is to provide an assessment of cognitive status. Reliability point estimates are adequate for the use of CAT-PCR in clinical research settings.

The validation of a novel construct of VGCP was an important aspect of the study. Study results provide supporting evidence of the VGCP Index construct validity, which is the extent to which the set of measured items reflects the theoretical latent constructs those items were designed to measure. Typically, there are two sources of uncertainty built into any validation of a test that uses a summary index to reflect multiple dimensions (ie, the underlying abilities). The first is that one cannot know the nature of the different dimensions’ contributions to the index, and hence to correlations of the index with measures of other constructs. The second source of uncertainty is the index is likely to reflect different combinations of constructs for different cases of the observed sample. We applied structural equation modeling methodologies to address the issue of the dimensionality of the VGCP Index to explain how many dimensions and to what extent they contribute to the VGCP Index. We obtained a robust demonstration of the VGCP Index multidimensionality by directly comparing alternative models of relationships among constructs. We found a Correlated 4-Factor model is the preferred solution when compared to other measurement model solutions (ie, 1-Factor, Correlated 2-Factor, Higher-order Factor). At this stage of the validation process, we believe that limited options exist to improve the measurement model based on currently available empirical data.

Study results support the validity and utility of CAT-PCR measures in evaluating a child’s cognitive development. CAT-PCR Indices were not dependent on sex. We also found that CAT-PCR measures change with age, indicating a progressive improvement of functioning from early childhood to middle childhood and then stabilizes through adolescence (12–17 years old). Our findings are consistent with results from other studies with Sub-Saharan and non-Sub-Saharan/Western samples where performance at selective attention task44,46,48,76–78 or the accuracy of reproduction in visuographomotor drawing improved with age and plateaued in adolescence.79–83 Our findings suggest that the development of VGCP abilities correspond to the progressive emergence more efficient processes for target object visual recognition, reproduction error detection, and adaptive graphomotor response. By the age of 12 years, the development is complete. The sequence of sinusoidal curves waveform is reproduced with a high degree of accuracy.

Study results also demonstrate the validity and utility of CAT-PCR Indices in evaluating the effect of health-related risks or protective factors on the child’s cognitive functioning and development. CAT-PCR Indices were affected by health status suggesting lower cognitive functioning associated with poorer health status. The study sample included a large group of children without obvious clinical signs of cognitive impairment or significant visuo-sensory-motor disability, and they regularly attended school. CAT-PCR was able to detect differences in cognitive performance between two comparable subgroups of children with or without conditions associated with increased risk for cognitive impairment. The composition of the poorer health group reflected a wide spectrum of conditions such as chronic neurologic or medical disease or prolonged exposure to psychosocial stress and deprivation. Furthermore, the effect of health status on CAT-PCR scores is still maintained once the effects of covariates, such as sex, ability to speak multiple languages, and timely access to education are controlled for. Further investigations should address whether other socioeconomic factors are significant covariates, driving some of the health status effects on CAT-PCR scores.

Findings suggest CAT-PCR utility in the diagnosis of pediatric mild cognitive impairment, as milder cognitive dysfunctions are not evident by standard clinical and neurological assessment or by parental and teacher behavioral observation.

Finally, we found an association between language, selective attention (measured as VMP Index), and visuoconstructional processing (as VGCP Index) in healthy children. This finding is relevant as it provides more evidence to support the validity of CAT-PCR Indexes. The ability to speak and switch from multiple languages is common in Zambian children. More than 30 major languages are spoken in Zambia.11 There is a complex situation of language use, multilingualism, and code-switching, where children employ different languages in different social contexts. For example, a child in Lusaka may use Bemba as the home language but use Nyanja and English as languages of wider communication at school. Using multiple languages is an important aspect of cognitive functioning. Our findings are consistent with results from other studies suggesting the existence of a major processing demand for the child’s cognitive system as multilanguage use leads to parallel activation and competition between languages, requiring the child to selectively prioritize one and inhibit the other.84–86

Limitations. The study protocol was designed to deliver an assessment of the test feasibility of use, reliability, and validity. Establishing validity is a comprehensive and iterative process, encompassing all sources of evidence supporting the rationale, adequacy, and appropriateness of specific interpretations of a score from a measure as well as actions based on such interpretations.87 Although study results are encouraging, the study design allowed only initial considerations on CAT-PCR validity.

Another important limitation of our research relates to the characteristics of the study sample, which is not representative of the entire Zambian school-aged population due to the limited size and composition. For instance, virtually all the participants were from an urban area, whereas about two-thirds of the Zambian pediatric population lives in rural areas.11 It should be noted also that all participants had access to formal education and regular attendance at school. School-aged children attending school irregularly or not at all were not represented in the study sample. In this regard, robust evidence shows how exposure to formal schooling is associated with improved performance on cognitive testing.88 At this stage of the instrument validation, given the urban and education level of the sample, and significant differences in the educational experiences of the rural children, CAT-PCR may not be applicable to the rural school-aged populations.

Cognitive measurements should be validated across Sub-Saharan culturally, socioeconomically, and geographically diverse and balanced groups. Unfortunately, constraints on the availability of human and economic resources to conduct the field trial hurt the recruitment process (ie, number of participants and duration of enrollment period), site selection, site number, and geographical distribution.

Future Directions

Notwithstanding limitations, study results do enable and inform on the next steps of the CAT-PCR validation process and its potential applications in clinical research. One of the major challenges to the validation of CAT-PCR in Zambia (and other pediatric cognitive assessment tools in similar developing-economy countries) is that its utility and fairness vary according to the characteristics of the test populations. Dynamic and diverse sociocultural factors have an impact on test performance.89 The validation of CAT-PCR scores should ascertain the effects of the degree of urbanization, socioeconomic status, and access to quality education. For example, with specific regard to quality of education, studies have reported significant differences in cognitive performance based on neuropsychological testing, also including the assessment of attention and visuoconstructional abilities (ie, Symbol Cancellation, Bender Visuomotor Gestalt Test, Rey-Osterreith Copy Design Test) between South African school-aged populations with disadvantaged quality of education and those with advantaged quality of education, and nonlocal normative samples.89–91

Future studies should also provide supporting evidence for convergent and discriminant validity of CAT-PCR score interpretation. The relationship of CAT-PCR Indices with other constructs (ie, executive functioning, intelligence, language, mental rotation, social cognition, sustained attention, visual and verbal memory, and visuomotor integration) should be explored. The use of cognitive tests originally developed in Sub-Saharan regions rather than the ones imported with cross-cultural validation should be prioritized for use in these validation studies.

CAT-PCR is a versatile assessment tool for cognitive research. CAT-PCR retains the potential for application in hypothesis-driven or exploratory clinical research. CAT-PCR Indices should be validated for use in clinical trials as outcomes to determine clinical benefits, safety risks, and cost-efficiency of health-promoting interventions (ie, preventive, diagnostic, therapeutic, and rehabilitative). CAT-PCR’s reliability when the purpose of the measurement is prediction (ie, risk assessment for treatment-related serious adverse events), study subject eligibility (ie, for diagnostic classification), or evaluation of a preventive/ therapeutic/rehabilitative treatment (ie, as an outcome measure) should be assessed. Interrater reliability and estimates of clinically meaningful change should also be determined. CAT-PCR’s clinical utility in diagnostic classification and staging of postinfective neurocognitive impairment (eg, malaria, HIV disease) should be of interest. The extent to which VGCP and VMP Indices are sensitive to neurobiological integrity, including alteration of brain structure and functions, should also be explored.

CAT-PCR is a new brief cognitive nonverbal test battery originally developed for use in a Zambian school-aged population. CAT-PCR showed satisfactory feasibility for use and provides a set of reliable and valid measures for use in Sub-Saharan clinical research settings. Study results warrant further research on the validation of its uses in the prediction of response, monitoring, and evaluation of health-promoting interventions.

References

  1. Fernald LC, Prado EL, Kariger PK, Raikes A. A toolkit for measuring early childhood development in low and middle income countries. 2017. Washington DC: International Bank for Reconstruction and Development/The World Bank. https://openknowledge.worldbank.org/handle/10986/29000
  2. Ford ND, Stein AD. Risk factors affecting child cognitive development: a summary of nutrition, environment, and maternal-child interaction indicators for Sub-Saharan Africa. J Dev Orig Health Dis. 2016;7:197–217.
  3. Idro R, Carter JA, Fegan G, et al. Risk factors for persisting neurological and cognitive impairments following cerebral malaria. Arch Dis Child. 2006;9:142–148.
  4. Institute of Medicine (US) Clinical Research Roundtable. Appendix V, Definitions of Clinical Research and Components of the Enterprise. In: Tunis S, Korn A, Ommaya A, eds. The Role of Purchasers and Payers in the Clinical Research Enterprise: Workshop Summary. National Academies Press. 2002.
  5. Cohee LM, Opondo C, Clarke SE, et al. Preventive malaria treatment among school-aged children in Sub-Saharan Africa: a systematic review and meta-analyses. Lancet Glob Health. 2020;8:e1499-e1511.
  6. Deręgowski JB, Serpell R. Performance on a sorting task: a cross-cultural experiment. Int J Psychol. 1971;6:273–281.
  7. Serpell R. How specific are perceptual skills? A cross of pattern reproduction. Br J Psychol. 1979;70(3):365–380.
  8. Stemler SE, Chamvu F, Chart H, et al. Assessing competencies in reading and mathematics in Zambian children. In Grigorienko EL (ed). Multicultural Psychoeducational Assessment. Springer Publishing Company. 2009;157–185.
  9. Zuilkowski SS, McCoy DC, Serpell R, et al. Dimensionality and the development of cognitive assessments for children in sub-Saharan Africa. J Cross Cult Psychol. 2016;47:341–354.
  10. Matafwali B, Serpell R. Design and validation of assessment tests for young children in Zambia. New Dir Child Adolesc Dev. 2014;146:77–96.
  11. Zambia Central Statistical Office. 2010. 2010 Census Population and Housing Summary Report. Government of Zambia. https://www.zamstats.gov.zm. Accessed on 6 March 2022.
  12. Ezeilo B. Validating Panga Munthu Test and Porteus Maze Test (Wooden Form) in Zambia. Int J Psychol. 1978;13(4):333–342.
  13. Kathuria R, Serpell R. Standardisation of the Panga Muntu test. A nonverbal cognitive test development in Zambia. J Negro Educ. 1998;67(3):228–241.
  14. Fink G, Matafwali B, Moucheraud C, Zuilkowski SS. The Zambian Early Childhood Development Project 2010 Assessment Final Report. Cambridge, MA. Harvard University. 2012. https://developingchild.harvard.edu/activities/global_initiative/zambian_project/
  15. Di Cesare F, Di Cesare L, Di Carlo C. Development of a cognitive ability assessment tool for a pediatric school-aged population. Innov Clin Neurosci. 2021;18(10–12):30–37.
  16. Di Cesare F, Di Carlo C, Di Cesare L. WAVES: a novel test to evaluate visuospatial construction ability in a school-aged population. Innov Clin Neurosci. 2023;20(1–3):39–45.
  17. Di Cesare F, Di Carlo C, Di Cesare L. Development of a symbol cancellation test to evaluate attention in a school-aged Zambian population. Innov Clin Neurosci. 2023;20(1–3):46–52.
  18. Di Cesare F, Piccinini G, Di Carlo C, Di Cesare L. WORDS: a new verbal memory test to evaluate cognitive health in a Zambian school-aged population. Innov Clin Neurosci. 2023;20(7–9):11–17.
  19. Irvine SH. Toward a rationale for testing attainments and abilities in Africa. Br J Educ Psychol. 1966;36(1):24–32.
  20. Mulenga KC, Ahonen T, Aro M. Performance of Zambian children on the NEPSY: a pilot study. Dev Neuropsychol. 2001;20:375–383.
  21. Serpell R. Preference for specific orientation of abstract shapes among Zambian children. J Cross Cult Psychol. 1971;2(3):225–239.
  22. Serpell R, Simatende B. Contextual responsiveness: an enduring challenge for educational assessment in Africa. J Intell. 2016;4(1):3.
  23. Wicherts JM, Dolan CV, Carlson JS, van der Maas HLN. Raven’s test performance of sub-Saharan Africans: average performance, psychometric properties, and the Flynn effect. Learn Individ Differ. 2010;20:135–151.
  24. Serpell R, Haynes B. The cultural practice of intelligence testing: problems of in international export. In Sternberg RJ, Grigorenko E (eds). Culture and Competence: Contexts of Life Success. Washington, DC: American Psychological Association. 2004;163–185.
  25. Korkman M, Kirk U, Kemp S. NEPSY: A developmental neuropsychological assessment. Manual. San Antonio, TX. The Psychological Corporation. 1998.
  26. Nampijja M, Apule B, Lule S, et al. Adaptation of Western measures of cognition for assessing 5-year-old semi-urban Ugandan children. Br J Educ Psychol. 2010;80(1):15–30.   
  27. Chernoff MLB, Ratswana M, Familiar I, et al. Validity of neuropsychological testing in young African children affected by HIV. J Pediatr Infect Dis. 2018;13(3):185–201.
  28. Serpell R, Boykin W. Cultural dimensions of cognition: a multiplex, dynamic system of constraints and possibilities. In: Sternberg RJ, ed. Thinking and Problem Solving. 2nd Edition. San Diego. Academia Press. 1994;369–408.
  29. Ardila A. Cultural values underlying psychometric cognitive testing. Neuropsychol Rev. 2005;15:185–195.
  30. James WA. The principles of psychology. Henry Holt and Co. 1890.
  31. Lavie N, Hirst A, de Fockert JW, Viding E. Load theory of selective attention and cognitive control. J Exp Psychol Gen. 2004;133(3):
    339–54.
  32. Smith SE, Chatterjee A. Visuospatial attention in children. Arch Neurol. 2008;65:1284–1288.
  33. Petersen S, Posner M. The attention system of the human brain: 20 years after. Annu Rev Neurosci. 2012;21:73–89.
  34. Treisman A. Selective attention in man. Br Med Bull. 1964;20:12–16.   
  35. Treisman A. Strategies and models of selective attention. Psychol Rev. 1969;76(3):282–299.   
  36. Benton A, Tranel D. Visuoperceptual, visuospatial, and visuoconstructive disorders. In: Heilman KM, Valenstein E, eds. Clinical Neuropsychology. Oxford University Press. 1993;165–213
  37. Mervis CB, Robinson BF, Pani JR. Visuospatial construction. Am J Hum Genet. 1999;65(5):1222–1229.
  38. Astheimer LB, Sanders LD. Temporally selective attention supports speech processing in 3- to 5-year-old children. Dev Cogn Neurosci. 2012;2(1):120–128.
  39. Astheimer L, Janus M, Moreno S, Bialystok E. Electrophysiological measures of attention during speech perception predict metalinguistic skills in children. Dev Cogn Neurosci. 2014;7:1–12.
  40. Moll K, Snowling MJ, Göbel SM, Hulme C. Early language and executive skills predict variations in number and arithmetic skills in children at family-risk of dyslexia and typically developing controls. Learn Instr. 2015;38:53–62.
  41. Stevens C, Bavelier D. The role of selective attention on academic foundations: a cognitive neuroscience perspective. Dev Cogn Neurosci. 2012;2(Suppl 1):S30–S48.
  42. Gonthier C. Cross-cultural differences in visuo-spatial processing and the culture-fairness of visuo-spatial intelligence tests: an integrative review and a model for matrices tasks. Cogn Research. 2022;7(11):2–27.
  43. Brickenkamp R, Zillmer E. The d2 test of attention. Seattle, WA. Hogrefe & Huber Publishers.1998.
  44. Vakil E, Blachstein H, Sheinman M, Greenstein Y. Developmental changes in attention tests norms: implications for the structure of attention. Child Neuropsychol. 2009;15(1):
    21–39.
  45. Ridderinkhof KR, van der Stelt O. Attention and selection in the growing child: views derived from developmental psychophysiology. Biol Psychol. 2000;54(1–3):55–106.
  46. Wassenberg R, Hendriksen JGM, Hurks PPM, et al. Development of inattention, impulsivity, and processing speed as measured by the d2 Test: results of a large cross-sectional study in children aged 7–13. Child Neuropsychol. 2008;14(3):195–210.
  47. Arán-Filippetti V, Gutierrez M, Krumm G, et al. Convergent validity, academic correlates and age- and SES-based normative data for the d2 Test of attention in children. Appl Neuropsychol Child. 2022;11(4):629–639.
  48. Gall S, Müller I, Walter C, et al. Associations between selective attention and soil-transmitted helminth infections, socioeconomic status, and physical fitness in disadvantaged children in Port Elizabeth, South Africa: an observational study. PLoS Negl Trop Dis. 2017;11(5):e0005573.
  49. Bai S, Liu W, Guan Y. The visuospatial and sensorimotor functions of posterior parietal cortex in drawing tasks: a review. Front Aging Neurosci. 2021;13:717002.
  50. Yantis S. The neural basis of selective attention: cortical sources and targets of attentional modulation. Curr Dir Psychol Sci. 2008;17(2):86–90.
  51. Paillard J. Les bases nerveuses du contrôle visuo–manuel de l’écriture [The neural bases of the visual–manual control of handwriting]. In: Sirat C, Irigoin J, Poulle E, eds. L’écriture: Le cerveau, L’æil Et La Main [Writing: Brain, Eye, and Hand]. Turnhout Brepols. 1990;23–52.
  52. Danna J, Velay JL. Basic and supplementary sensory feedback in handwriting. Front Psychol. 2015;6:169.
  53. Friston K. Hierarchical models in the brain. PLoS Comput Biol. 2008;4(11):e1000211.   
  54. Friston KJ, Parr T, de Vries B. The graphical brain: Belief propagation and active inference. Netw Neurosci (Camb Mass). 2017;1(4):
    381–414.
  55. Parr T, Sajid N, Da Costa L, et al. Generative models for active vision. Front Neurorobot. 2021;15:651432.
  56. Van Sommers P. A system for drawing and drawing-related neuropsychology. Cogn Neuropsychol. 1989;6:117–164.
  57. Guérin F, Ska B, Belleville S. Cognitive processing of drawing abilities. Brain Cogn. 1999:40(3):464–478.
  58. McCrea S. A neuropsychological model of free-drawing from memory in constructional apraxia: a theoretical review. AJPN. 2014;2(5):60–75.
  59. Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6(1):1–55.
  60. Kline RB. Principles and Practice of Structural Equation Modeling. 4th edition. New York. The Guilford Press. 2016.
  61. Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107(2):238–246.
  62. Tucker LR, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973;38(1):1–10.
  63. Steiger JH. Notes on the Steiger-Lind (1980) handout. Struct Equ Modeling. 2016;23(6):
    777–781.
  64. Bentler PM. EQS: structural equations program manual. Multivariate Software. 1995.
  65. Barton AG. Handbook for good clinical research practice (GCP): guidance for implementation. J Epidemiology Community Health. 2007;61:559.
  66. StataCorp. Stata: Release 15. Statistical Software. College Station, TX. StataCorp LLC. 2017.
  67. Cohen J. A power primer. Psychol Bull. 1992;112(1):155–159.
  68. Sawilowsky SS. New effect size rules of thumb. J Mod Appl Stat Methods. 2009;8(2):597–599.
  69. Myers ND, Ahn S, Jin Y. Sample size and power estimates for a confirmatory factor analytic model in exercise and sport. Res Q Exerc Sport. 2011;82:412–423.
  70. Satorra A, Bentler PM. Corrections to test statistics and standard errors in covariance structure analysis. In: von Eye A, Clogg CC, eds. Latent Variables Analysis: Applications for Developmental Research. Sage Publications, Inc. 1994;399–419.
  71. Xia Y, Yang Y. The influence of number of categories and threshold values on fit indices in structural equation modeling with ordered categorical data. Multivariate Behav Res. 2018;53(5):731–755.
  72. Shi D, Maydeu-Olivares A, DiStefano C. The relationship between the standardized root mean square residual and model misspecification in factor analysis models. Multivariate Behav Res. 2018;53(5):676–694.
  73. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:
    297–334.
  74. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6:284–290.
  75. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–163.
  76. Green DW. Mental control of the bilingual lexico-semantic system. Biling-Lang Cogn. 1998;1(2):67–81.
  77. Ickx G, Bleyenheuft Y, Hatem SM. Development of visuospatial attention in typically developing children. Front Psychol. 2017;8:2064.
  78. Laurent-Vannier A, Chevignard M, Pradat-Diehl P, et al. Assessment of unilateral spatial neglect in children using the Teddy Bear Cancellation Test. Dev Med Child Neurol. 2006;48(2):
    120–125.
  79. Arango-Lasprilla JC, Rivera D, Ertl MM, et al. Rey-Osterrieth Complex Figure—copy and immediate recall (3 minutes): Normative data for Spanish-speaking pediatric populations. NeuroRehabilitation. 2017;41(3):593–603.
  80. Conson M, Siciliano M, Baiano C, et al. Normative data of the Rey-Osterrieth Complex Figure for Italian-speaking elementary school children. Neurol Sci. 2019;40(10): 2045–2050.
  81. Del Giudice E, Grossi D, Angelini R, et al. Spatial cognition in children. I. Development of drawing-related (visuospatial and constructional) abilities in preschool and early school years. Brain Dev. 2000;22(6):362–367.
  82. Senese VP, De Lucia N, Conson M. Cognitive predictors of copying and drawing from memory of the Rey-Osterrieth complex figure in 7- to 10-year-old children. Clin Neuropsychol. 2015;29(1):118–132.
  83. Viljoen G, Levett A, Tredoux CG, Anderson SJ. Using the Bender Gestalt in South Africa: Some normative data for Zulu-speaking children. S Afr J Psychol.1994;24:145–151.
  84. Bialystok E. Bilingualism and the development of executive function: the role of attention. Child Dev Perspect. 2015;9(2):117–121.
  85. Costa A, Hernández M, Sebastián-Gallés N. Bilingualism aids conflict resolution: evidence from the ANT task. Cognition. 2008;106(1):
    59–86.
  86. Phelps J, Attaheri A, Bozic M. How bilingualism modulates selective attention in children. Sci Rep. 2022;12(1):6381.
  87. Messick S. Validity. In: Linn RL, ed. Educational Measurement. 3rd Edition. New York, NY. Macmillan Publishing Co, Inc. American Council on Education. 1989;13–103.
  88. Brod G, Bunge SA, Shing YL. Does one year of schooling improve children’s cognitive control and alter associated brain activation? Psychol Sci. 2017;28(7):967–978.
  89. Shuttleworth-Edwards AB. Generally representative is representative of none: Commentary on the pitfalls of IQ test standardization in multicultural settings. Clin Neuropsychol. 2016;30(7):975–998.
  90. Cave J, Grieve K. Quality of education and neuropsychological test performance. New Voices Psychol. 2009;5:29–48.
  91. Shuttleworth-Edwards AB, Van der Merwe AS, Van Tonder P, Radloff SE. WISC–IV test performance in the South African context: a collation of cross-cultural norms. In: Laher S, Cockcroft K, eds. Psychological Assessment in South Africa: Research and Applications. Johannesburg. Wits University Press. 2013;33–37.