Is modality-specific working memory capacity correlated within individuals?

Is modality-specific working memory capacity correlated within individuals?

The multi-component model of working memory posits different processes for phonological and visual working memory storage.

Working memory capacity can be measured in a variety of ways (see: How to reliably measure working memory capacity?), and we know that there is reliable individual variation on working memory capacity.

Is there a strong correlation of working memory capacity in different modalities within an individual? Put another way: If I have measured working memory capacity for an individual using a visual working memory task (like a Matrix span task), how reliably can I predict performance on a verbal working memory task for that individual?

How working memory breaks down, or even whether it is a valid construct at all, is still somewhat controversial. The evidence for domain-specific modalities is largely based on the (lack of otherwise expected) interference between them. Another line of evidence that could be used to validate this model is checking the correlations between them: A high correlation would argue for a domain-general working memory, while a low correlation would support domain-specific modalities. In practice, however, the value of examining correlations between working memory measures is confounded by a general capacity factor (eg, executive control), interaction with short-term memory, the possibility of any number of additional modalities (for example, visual vs. spatial), and the result of many years of assuming a unitary working memory construct resulting in a variety of measures that are validated against each other (ie, expected to correlate highly).

Nonetheless, just such a test was done by Alloway, Gathercole & Pickering (2006), who subjected over 700 primary school children to a battery of different working memory capacity measures. As expected, correlations between many of the measures were statistically significant, but importantly, not commensurate with either a simple domain-specific model, or a simple general model with WM/STM split. To answer your question, here is a table from the paper that summarizes some of the results (more detail here):

As you can see, the correlations, while nearly all significant at p>.001, vary considerably. These results are consistent with similar studies that review several independent measures of working memory that nonetheless all tend to correlate well with cognitive factors such as general intelligence.

To add to Arnon's answer, modality-specific working memory capacity (WMC) does correlate within individuals, so in some cases, you could certainly predict one from the other. As long as dumb prediction is all you're interested in, why not?

The problem with making that prediction is that it's very difficult to interpret what it means in causal or practical terms. Recent findings suggest we can't just assume that the domain general contribution is equal across domains, as we've been doing, because the interference effects used to establish WMC subscales in the first place appear to actually be asymmetrical. A verbal WM load does decrease visual WM performance, but not vice versa (Morey et al., 2013). Predicting verbal WMC from visual WMC thus means something completely different from predicting visual WMC from verbal WMC! How to resolve this, or whether it can be resolved, to my knowledge remains an open question for multi-component WM research.

In summation, the relationship between modalities in the multi-component model is currently on shaky theoretical footing. If one measured working memory capacity for an individual using a visual working memory task (like a Matrix span task), one could semi-reliably predict performance on a verbal working memory task for that individual. Good luck figuring out what the result means, though, so maybe one shouldn't.



Limitations on divided attention across different modalities have been the subject of much controversy. While it is well known that information from multiple senses can be integrated very rapidly (e.g., [1]), it remains equivocal whether attention to one modality comes at a cost for a different modality. Whereas there are several early cognitive studies that have shown a cost for cross-modal divided attention [2], [3], [4], [5] there is also considerable evidence demonstrating substantial independence between visual and auditory attentional resources [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16].

In this paper we focus on evidence from the attentional blink paradigm, which has proven to be particularly useful in indexing the time-course of attention. In hundreds of experiments it has been shown that when two targets are presented within a rapid stream of non-targets (i.e., distractors), most individuals demonstrate a profound difficulty to report the second target (T2) when presented within 200–500 ms after the first (T1). This interference effect, which is referred to as the attentional blink (AB) in analogy to eye blinks [17], is very robust and can be obtained under a variety of task conditions, using for instance alphanumeric stimuli [18], words [19], pictures [20], and with auditory [16] or tactile stimuli [21]. Consequently, the effect is thought to reflect a very general property of perceptual awareness with broad implications for understanding how the brain perceives a relevant stimulus (for a review, see [22]).

Duncan, Martens, and Ward [16] have shown that the AB occurs in vision as well as in audition when both targets are presented within the same modality. However, when the two targets were presented in different modalities (one in the visual and another in the auditory modality), any temporal restrictions in attentional capacity as reflected in the AB disappeared. Thus, the use of relatively simple, independent visual and auditory stimuli (one-syllable words) that required unspeeded responses, led to an AB within but not between modalities, which strongly suggests the existence of modality-specific limitations rather than an amodal, more central bottleneck.

Although there are a number of studies that have replicated the lack of a cross-modal AB [23], [24], [25], other studies have challenged these findings, reporting significant cross-modal AB effects [26], [27], [28]. A possible explanation for these conflicting results is that in studies finding a cross-modal AB, one of the targets required a speeded response [27], [29], [30] or incorporated a task-switch due to targets differing in task set, target set, target set size, response set, target difficulty, or target-defining features other than modality [26], [27], [28], [31], [32], [33], [34].

Recently, it has been reported that large individual differences exist in AB magnitude [12], [35], [36], [37]. The aim of the current study was to resolve the cross-modal AB controversy described above by taking an individual differences approach combined with the use of equivalent independent targets that differed only in modality.

The primary research question was: if some individuals have large within-modality AB magnitudes, might they also show an AB in the cross-modal case? If cross-modal and within-modal interference arise from the same central amodal bottleneck, individual cross-modal ABs should correlate with individual within-modal AB magnitudes. The lack of such a correlation would suggest that the interference observed in cross-modal conditions is different from that observed in within-modality conditions. A third possibility is that a significant AB is only observed when targets are presented within the same modality, but not when presented in different modalities. Such a finding would provide strong evidence that the AB reflects modality-specific rather than amodal limitations.

The second question that we wanted to address was whether individual differences in AB magnitude within one modality correlate with individual differences in another modality. In other words, does an individual with a large visual AB magnitude also show a large auditory AB magnitude? If not, this would suggest that attentional restrictions within each modality are completely separate.



The dataset analyzed in the present study has been reported in Baker et al. (2018). 254 students enrolled at Louisiana State University were recruited for the present study. Participants were paid $15, volunteered, or participated for course credit. After screening for participant eligibility and data collection error (see Baker et al., 2018), 242 participants met the criteria for inclusion. The eligible participants were between the ages of 17 and 38 years (M = 20.64, SD = 3.23) and included 76 men and 165 women (one person did not identify gender). Participants’ formal years of musical training were between 0 and 21 (M = 4.71, SD = 4.58) and years of learning music theory were between 0 and 21 (M = 2.24, SD = 3.45). The full data file (WmMusicII_OSF.csv) and data dictionary (wmMusicII_datadictionary.csv) are available at However, of those who met criteria, there were additional participants who were missing data on just one task (such as only missing a WMC measure due to computer error) and the full set with complete data on all measures was N = 234 (ParticipantDetails.csv,

Variables of interest

As reported in Baker et al. (2018), participants completed a large battery of tasks, lasting a total of approximately 90 min. For the current study, we analyzed a subset of variables from this battery, including musical training, WMC, preference for musical complexity, and demographics. Operational definitions for each of these variables are described below.

Musical training

Musical Training was measured using the Goldsmiths Musical Sophistication Index (Gold-MSI Müllensiefen, Gingras, Musil, & Stewart, 2014). The Gold-MSI was developed to measure various aspects of musical sophistication, independent of preference for particular musical styles, and includes five facets: active musical engagement, self-reported perceptual abilities, musical training, self-reported singing abilities, and sophisticated emotional engagement with music.

For the musical training subscale of the Gold-MSI, participants self-reported responses on a 7-point Likert scale to items such as “I have had __ years of formal training on a musical instrument (including voice) during my lifetime” and “I have had formal training in music theory for __ years.” For our analysis, musical training was defined using participants’ total score on this subscale of the Gold-MSI, coded according to Müllensiefen et al. (2014).

Working memory capacity (WMC)

To assess working memory, participants completed three complex span tasks adapted from Unsworth, Heitz, Schrock, and Engle (2005). In the Tone Span task, participants completed math judgments and recalled a sequence of tones. In each math judgment, participants saw an arithmetic problem and had to determine whether the solution presented was true or false. After each math operation, a high, medium, or low tone was presented aurally for 1,000 ms. The three tones were played at frequencies outside of the equal tempered system (200 Hz, 375 Hz, and 702 Hz after Li, Cowan, & Saults, 2013). These frequencies were chosen to avoid familiarity with the pitches of Western tonality. During recall, participants were asked to recall the order of high, medium, and low tones (no time limit). The test procedure included three trials of each list length (three to seven tones), with a maximum score of 75 (Baker et al., 2018). Because Tone Span focuses on recall of auditory stimuli, we used scores on this task as our operational definition for auditory WMC.

In the Symmetry Span task, participants completed symmetry judgments and recalled a sequence of locations of a red square. The symmetry judgment was performed on an 8 × 8 matrix with random squares filled with black participants were required to judge whether the black square pattern was vertically symmetrical. After each symmetry judgment, a red square was presented on a 4 × 4 matrix for 650 ms. During recall, participants were asked to recall in order the location of each red square (no time limit). The test procedure included three trials of each list length (two to five red squares), with a maximum score of 42 (Baker et al., 2018). Because Symmetry Span does not focus on recall of auditory stimuli, we used scores on this task to test questions regarding domain-specificity of working memory capacity in our mediation model.

In the Operation Span task, participants completed a math judgment and recalled a sequence of letters. In each math judgment, participants saw an arithmetic problem and had to determine whether the solution presented was true or false. After each math operation, a letter was presented visually for 1,000 ms. During recall, participants saw a 4 × 3 matrix of all possible letters, and were asked to recall the order of letters in the sequence. The test procedure included three trials of each list length (three to seven letters), with a maximum score of 75 (Baker et al., 2018). Because Operation Span does not focus on recall of auditory stimuli, we used scores on this task to test questions regarding domain-specificity of WMC in our mediation model.

Preference for musical complexity

Preference for musical complexity was measured using the Reflective and Complex dimension of the STOMP (Rentfrow & Gosling, 2003). The STOMP is a 14-item scale assessing preferences in music genres across four broad music preference dimensions: Reflective and Complex, Intense and Rebellious, Upbeat and Conventional, and Energetic and Rhythmic. The Reflective and Complex dimension of the STOMP includes participants’ preference for jazz, folk, classical, and blues on a 7-item Likert scale (1 = strongly dislike, 7 = strongly like). To obtain a measure of Preference for Musical Complexity, we averaged participant scores for the four genres in the Reflective and Complex dimension.

Demographic variables

The current study also considered participant demographics that have previously been shown to be associated with engagement in musical activities, including age, gender, and socioeconomic status (Corrigall & Schellenberg, 2015 Lima, Correia, Müllensiefen, & Castro, 2018). Socioeconomic status (SES) was an aggregate measure composed of parental education and family income. Participant education was not included since there was very little variability in the current college student sample. Participants reported education for each parent by selecting one of eight categories: 1 = some high school, 2 = completed high school, 3 = some associates/vocational program (e.g., AA or AS), 4 = completed associates/vocational program, 5 = some undergraduate degree (BA, BS, BM), 6 = completed undergraduate degree, 7 = some graduate degree (PhD, JK, MD, MA), 8 = completed graduate degree, 9 = NA (Corrigall & Schellenberg, 2015). Participants raised by one parent reported 9 = NA for the absent parent. Participants reported family income by selecting one of nine categories: 1 = less than $25,000, 2 = between $25,000 and $50,000, 3 = between $50,000 and $75,000, 4 = between $75,000 and $100,000, 5 = between $100,000 and $125,000, 6 = between $125,000 and $150,000, 7 = between $150,000 and $175,000, 8 = between 175,000 and 200,000, 9 = greater than $200,000. SES was calculated by the sum of each participant’s family income and each parent’s education level. For participants who reported only one parent’s level of education, that parent’s education level was counted twice for the calculation of SES.

Analysis procedure

Our predictions and analysis plan have been pre-registered and can be viewed at, along with our analysis code All analyses were carried out using R (R Core Team, 2018).

Predicted mediation

As a first step, we confirmed previous research suggesting that there exist significant bivariate associations between all pairwise combinations of Musical Training, auditory WMC, and Preference for Musical Complexity. Next, a mediation analysis was conducted (following Hayes, 2017) to test our hypothesized model [1] in which WMC mediates the relation between Musical Training and Preference for Musical Complexity. Models in this form (Fig. 1) are henceforth referred to as “predicted models.”

Schematic of the predicted mediation model, with coefficients italicized

The total effect of Musical Training on Preference for Musical Complexity, c, can be partitioned into the direct effect of Musical Training on Preference for Musical Complexity (c’) and the indirect effect of Musical Training on Preference for Musical Complexity, through WMC (ab), such that c = ab + c’ (Fig. 1). Of particular interest in our analysis was whether the indirect (i.e., mediating) effect ab was a significant predictor in modeling Preference for Musical Complexity. Significance testing of the mediating effect of Musical Training → WMC → Preference for Musical Complexity (quantified by ab) was carried out via a bootstrapping analysis with 5,000 iterations (Canty & Ripley, 2019 Davison & Hinkley, 1997). We investigated four versions of the predicted mediation, as follows.

Mediation [1] tested a domain-specific model in which auditory WMC mediated the relation between Musical Training and Preference for Musical Complexity.

Mediation [2] explored whether demographic variables (age, gender, SES) affected mediation by calculating three regression models in which each of Musical Training, Auditory WMC, and Preference for Musical Complexity were predicted by age, gender, and SES. From these models, we then extracted residualized variables, Musical Trainingres, Auditory WMCres, and Preference for Musical Complexityres, which represent the variability in each respective variable that is not predicted by age, gender, and SES. For Mediation [2] these residual variables were then analyzed in the same fashion as the original variables from Mediation [1]. In accordance with our pre-registration, we report on a model that considers fluid intelligence (a composite of standardized scores on Raven’s Progressive Matrices Raven, Raven, & Court, 1998 and Number Series Thurstone, 1938), beat perception, and melodic memory in addition to these demographics, in supplementary material.

Mediations [3] and [4] tested whether the role of working memory capacity in our mediation model was domain-specific or domain-general. Thus, Mediation [3] replaced Auditory WMC with Symmetry Span, and Mediation [4] replaced Auditory WMC with Operation Span.

Order-switched mediation

Following this, we tested the causal order of the mediation by assessing a model in which the positions of Musical Training and WMC were switched [2]. Our prediction was that the indirect effect (ab) would be significant for our predicted mediations (Fig. 1), but not for these order-switched mediations. Models in this form (Fig. 2) will henceforth be referred to as “order-switched models.” We investigated four order-switched mediations. Mediations [5], [6], [7], and [8] were order-switched versions of predicted mediations [1], [2], [3], and [4], respectively.

Schematic of order-switched mediation model, with coefficients italicized

Results and Discussion

When appropriate, Greenhouse-Geisser-corrected p values are reported. Eight participants were rejected from analysis due to high error rates in identifying T1 and T2.

Figure 1 shows T1 identification performance (dotted lines) as a function of the interval between the two targets (lag) within the visual and auditory modalities. Overall mean T1 performance was 74.6%. A repeated measures analysis of variance (ANOVA) of T1 performance with condition (VV, AA, VA, AV) and lag (1, 2, 3, 4, 7, 9) as a within-subjects factor revealed a significant effect of lag, F(5, 200) =𠂣.39, MSE  =�.55, p<.01, η 2 p = .08, such that performance was lower at lag 1 than at the other lags. No significant main effect of condition (p = .24) was found, but the Condition × Lag interaction was borderline significant, F(15, 600) =𠂡.83, MSE  =�.22, p = .053, η 2 p = .04, such that performance at lag 1 was decreased in the within-modality conditions, but not in the between-modality conditions. This suggests that there was more direct competition between two successive targets when presented within the same modality than between modalities.

Mean percentage correct report of T1 as a function of lag when T2 was presented in the same (solid lines) or in a different (dotted lines) sensory modality. Error bars reflect standard error of the mean.

As can be seen in Figures 2 and ​ and3, 3 , an AB occurred when targets were presented within either the auditory ( Figure 2 , solid line) or the visual ( Figure 3 , solid line) modality. In contrast, there was a lack of an AB when targets were presented in different modalities (dotted lines). An ANOVA on T2 performance given correct T1 report, with condition and lag as within-subjects factors, revealed significant main effects of condition, F(3, 5) =𠂧.33, MSE  =�.00, p<.001, η 2 p = .16 lag, F(5, 200) =�.20, MSE  =�.35, p<.001, η 2 p = .32 and a significant Condition × Lag interaction, F(15, 600) =�.69, MSE  =�.52, p<.001, η 2 p = .21, reflecting differences in performance when targets were presented in the same or in different sensory modalities.

Mean percentage correct report of an auditory T2 given correct report of T1 as a function of lag when presented within (solid line) or between modalities (dotted line).

Mean percentage correct report of a visual T2 given correct report of T1 as a function of lag when presented within (solid line) or between modalities (dotted line).

An ANOVA on within-modality T2 performance showed significant main effects of condition, F(1, 40) =�.38, MSE  =�.90, p = .002, η 2 p = .22 and lag, F(5, 200) =�.96, MSE  =�.32, p<.001, η 2 p = .41 and a significant Condition × Lag interaction, F(5, 200) =�.95, MSE  =�.28, p<.001, η 2 p = .27, reflecting the occurrence of an AB that was larger in the visual modality than in the auditory modality. A separate ANOVA on T2 performance in the auditory within-modality (AA) condition, revealed a significant main effect of lag, F(5, 235) =𠂤.43, MSE  =�.72, p = .001, η 2 p = .09, confirming the presence of an AB within the auditory modality.

An ANOVA on between-modality T2 performance only revealed a significant effect of condition, F(1, 40) =𠂥.10, MSE  =�.50, p = .03, η 2 p = .11, such that overall performance of auditory T2s was slightly better (72.8%) than that of visual T2s (69.4%). Importantly though, neither a significant main effect of lag (p = .24), nor a significant Condition × Lag interaction was observed (p = .24).

Intra-individual stability of performance was checked on odd and even number trials for all participants. For T1, the Spearman-Brown prophecy coefficients were .88, .94, .90, and .90 for the AA, VV, AV, and VA condition, respectively. For T2|T1, Spearman-Brown prophecy coefficients were .80, .90, .91, and .88 for the AA, VV, AV, and VA condition, respectively. These values reflect stable within-subject performance, similar to that observed in previous studies [12], [35], [37].

For each individual and condition, AB magnitude was computed according to the following formula:

That is, the percentage of decrement in T2 performance within the AB period (lags 2, 3, and 4) relative to that outside the AB period (lags 7 and 9) was calculated, and the resulting AB magnitudes are shown in Figure 4 . One-sample t-tests revealed that AB magnitude was significantly different from zero in both within-modality conditions (ps <.001), but not in the between-modalities conditions (ps >.71). When only the 25% of participants with the largest within-modality ABs (mean of AB magnitude in AA and VV conditions) were selected, mean AB magnitudes were 21.5% in the AA, 40.1% in the VV, 3.4% in the AV, and 4.8% in the VA condition, respectively. Again, one-sample t-tests revealed that AB magnitude was significantly different from zero in both within-modality conditions (ps <.001), but not in the between-modalities conditions (ps >.20).

AB magnitudes within (AA and VV) and between modalities (AV and VA).

Pearson product-moment correlations were computed and revealed a significant positive correlation between individual AB magnitudes within each modality, r = .37, p<.01 (two-tailed), such that participants with a relatively large visual AB also tended to be show a large auditory AB. This may seem to suggest the existence of a common amodal pool of resources. However no significant relation was found when AB magnitude within modalities was correlated with AB magnitude between modalities, r = .18, p = .23, providing strong evidence against an amodal limited-capacity bottleneck as the underlying cause of the AB. The Spearman-Brown prophecy coefficients were .57 and .28 for AB magnitude within- and between sensory modalities, respectively. The relatively low intra-individual stability in cross-modal AB magnitude suggests that the variability in cross-modal AB magnitude merely reflects random noise. In other words, under the current experimental conditions there is no evidence for a cross-modal AB.

General Discussion

An aspect of the AB that is often ignored is that there are large individual differences in the magnitude of the effect (e.g., [36]). In the current study, we exploited these individual differences to address a long-standing question: does attention to a visual target come at a cost for attention to an auditory target (and vice versa)? More specifically, the goal of the current study was to investigate a) whether individuals with a large within-modality AB also show a large cross-modal AB, and b) whether individual differences in AB magnitude within different modalities correlate or are completely separate.

While minimizing differential task difficulty and chances for a task-switch to occur between the targets, using a randomized within-subjects design we observed a significant AB effect when targets were both presented within the auditory or visual modality. A positive correlation was found between an individual's auditory and visual AB magnitude, and at first sight, this may seem to suggest a common amodal source of interference.

Importantly, however, when the two targets were presented in different modalities, no interference that was time-locked to the presentation of the targets occurred, reflecting the absence of a cross-modal AB effect. The commonly observed decreased T1 performance at lag 1 was found in within-modality, but not between-modality conditions, indicating modality-specific interference between the two targets. Moreover, individual cross-modal AB magnitude did not correlate with individual within-modal AB magnitude. Even the 25% of participants with the largest within-modality ABs did not show a significant cross-modal AB effect. Finally, the relatively low intra-individual stability of cross-modal AB magnitude on odd and even trials suggests that the observed cross-modal variability in AB magnitude between individuals probably reflected random noise. Taken together, the results suggest that under the current experimental conditions, a major source of attentional restriction must lie in modality-specific sensory systems.

These findings replicate and extend previous reports of an AB within- but not between visual and auditory modalities [16], [23], [24]. Whereas for instance the original study by Duncan and colleagues [16] used different target sets, different target locations, a varying number of stimulus streams, and different groups of participants for each condition, the current study addressed these potential methodological issues by employing a within-subjects design, incorporating both within- and between-modality conditions within a single experiment, and randomly mixed all conditions across trials (rather than blocks of trials). That is, even after participants had received the first target on a given trial, the modality of the upcoming T2 (visual or auditory) remained unpredictable. In addition, none of the targets required a speeded response [27], [29], [30]. Whereas previous findings of time-locked cross-modal interference may have been caused by some sort of task-switch [26], [27], [28], [31], [32], [33], [34], chances for a task-switch to occur were minimal in the current study as there was no change in task set, target set, target set size, response set, target difficulty, or any target-defining feature other than modality. To the best of our knowledge, this is the first study taking an individual differences approach to resolve the cross-modal AB controversy, revealing modality-specific restrictions in temporal attention within but not between sensory modalities in a within-subjects design, while controlling for the above-mentioned confounds.

In a previous study on individual differences in AB magnitude, we found that visual non-blinkers do show an auditory AB, suggesting restrictions within the visual and the auditory modality to be independent [12]. Indeed, the significant but relatively modest correlation between individual within-modality 𠆋links’ suggest that strong auditory blinkers are not always strong visual blinkers (and vice versa). Nevertheless, this modest correspondence in within-modality AB magnitudes, together with the lack of a cross-modal AB, suggest that in most (but apparently, not all) individuals there is a common delay in the modality-specific re-allocation of attention for T2, or alternatively, a similar protection process that inhibits modality-specific sensory input.

It has also been suggested that strong blinkers may be especially committed or focused in their processing of T1 [39], [40], [41], and to some extent that tendency to focus can apparently cross modalities. Importantly though, whatever they are focusing (resource allocation, blocking of processing, etc.) is strictly within-modality. Since participants did not know which T2 will be presented at the moment of receiving T1, it seems plausible to assume that, for strong blinkers, the same focused T1 processing occurs on both types of trials (within-modality and cross-modality) - but it only affects T2 on within-modality trials. The current study shows that individual differences in AB magnitude can provide important information about the modular structure of human cognition.

Chunk capacity limits

The concept of capacity limits was raised several times in the history of cognitive psychology. Miller (1956) famously discussed the “magical number seven plus or minus two” as a constant in short-term processing, including list recall, absolute judgment, and numerical estimation experiments. However, his autobiographical essay (Miller, 1989) indicates that he was never very serious about the number seven it was a rhetorical device that he used to tie together the otherwise unrelated strands of his research for a talk. Although it is true that memory span is approximately seven items in adults, there is no guarantee that each item is a separate entity. Perhaps the most important point of Miller’s (1956) article was that multiple items can be combined into a larger, meaningful unit. Later studies suggested that the limit in capacity is more typically only three or four units (Broadbent, 1975 Cowan, 2001). That conclusion was based on an attempt to take into account strategies that often increase the efficiency of use of a limited capacity, or that allow the maintenance of additional information separate from that limited capacity. To understand these methods of discussing capacity limits I will again mention three types of contamination. These come from chunking and the use of long-term memory, from rehearsal, and from non-capacity-limited types of storage.

Overcoming contamination from chunking and the use of long-term memory

A participant’s response in an immediate-memory task depends on how the information to be recalled is grouped to form multi-item chunks (Miller, 1956). Because it is not usually clear what chunks have been used in recall, it is not clear how many chunks can be retained and whether the number is truly fixed. Broadbent (1975) proposed some situations in which multi-item chunk formation was not a factor, and suggested on the basis of results from such procedures that the true capacity limit is three items (each serving as a single-item chunk). For example, although memory span is often about seven items, errors are made with seven-item lists and the error-free limit is typically three items. When people must recall items from a category in long-term memory, such as the states of the United States, they do so in spurts of about three items on average. It is as if the bucket of short-term memory is filled from the well of long-term memory and must be emptied before it is refilled. Cowan (2001) noted other such situations in which multi-item chunks cannot be formed. For example, in running memory span, a long list of items is presented with an unpredictable endpoint, making grouping impossible. When the list ends, the participant is to recall a certain number of items from the end of the list. Typically, people can recall three or four items from the end of the list, although the exact number depends on task demands (Bunting et al., 2006). Individuals differ in capacity, which ranges from about two to six items in adults (and fewer in children), and the individual capacity limit is a strong correlate of cognitive aptitude.

Another way to take into account the role of multi-item chunk formation is to set up the task in a manner that allows chunks to be observed. Tulving and Patkau (1962) studied free recall of word lists with various levels of structure, ranging from random words to well-formed English sentences, with several different levels of coherence in between. A chunk was defined as a series of words reproduced by the participant in the same order in which the words had been presented. It was estimated that, in all conditions, participants recalled an average of four to six chunks. Cowan et al. (2004) tried to refine that method by testing serial recall of eight-word lists, which were composed of four pairs of words that previously had been associated with various levels of learning (0, 1, 2, or 4 prior word–word pairings). Each word used in the list was presented an equal number of times (four, except in a non-studied control condition) but what varied was how many of those presentations were as singletons and how many were as a consistent pairing. The number of paired prior exposures was held constant across the four pairs in a list. A mathematical model was used to estimate the proportion of recalled pairs that could be attributed to the learned association (i.e., to a two-word chunk) as opposed to separate recall of the two words in a pair. This model suggested that the capacity limit was about 3.5 chunks in every learning condition, but that the ratio of two-word chunks to one-word chunks increased as a function of the number of prior exposures to the pairs in the list.

Overcoming contamination from rehearsal

The issue of rehearsal is not entirely separate from the issue of chunk formation. In the traditional concept of rehearsal (e.g., Baddeley, 1986), one imagines that the items are covertly articulated in the presented order at an even pace. However, another possibility is that rehearsal involves the use of articulatory processes in order to put the items into groups. In fact, Cowan et al. (2006a) asked participants in a digit span experiment how they carried out the task and by far the most common answer among adults was that they grouped the items participants rarely mentioned saying the items to themselves. Yet, it is clear that suppressing rehearsal affects performance.

Presumably, the situations in which items cannot be rehearsed are for the most part the same as the situations in which items cannot be grouped. For example, Cowan et al. (2005) relied on a running memory span procedure in which the items were presented at the rapid rate of 4 per second. At that rate, it is impossible to rehearse the items as they are presented. Instead, the task is probably accomplished by retaining a passive store (sensory or phonological memory) and then transferring the last few items from that store into a more attention-related store at the time of recall. In fact, with a fast presentation rate in running span, instructions to rehearse the items is detrimental, not helpful, to performance (Hockey, 1973). Another example is memory for lists that were ignored at the time of their presentation (Cowan et al., 1999). In these cases, the capacity limit is close to the three or four items suggested by Broadbent (1975) and Cowan (2001).

It is still quite possible that there is a speech-based short-term storage mechanism that is by and large independent of the chunk-based mechanism. In terms of the popular model of Baddeley (2000), the former is the phonological loop and the latter, the episodic buffer. In terms of Cowan (1988, 1995, 1999, 2005), the former is part of activated memory, which may have a time limit due to decay, and the latter is the focus of attention, which is assumed to have a chunk capacity limit.

Chen and Cowan (2005) showed that the time limit and chunk capacity limit in short-term memory are separate. They repeated the procedure of Cowan et al. (2004) in which pairs of words sometimes were presented in a training session preceding the list recall test. They combined lists composed of pairs as in that study. Now, however, both free and serial recall tasks were used, and the length of list varied. For long lists and free recall, the chunk capacity limit governed the recall. For example, lists of six well-learned pairs were recalled as well as lists of six unpaired singletons (i.e., were recalled at similar proportions of words correct). For shorter lists and serial recall strictly scored, the time limit instead governed the recall. For example, lists of four well-learned pairs were not recalled nearly as well as lists of four unpaired singletons, but only as well as lists of eight unpaired singletons. For intermediate conditions it appeared as if chunk capacity limits and time limits operate together to govern recall. Perhaps the capacity-limited mechanism holds items and the rehearsal mechanism preserves some serial order memory for those held items. The exact way in which these limits work together is not yet clear.

Overcoming contamination from non-capacity-limited types of storage

It is difficult to demonstrate a true capacity limit that is related to attention if, as I believe, there are other types of short-term memory mechanisms that complicate the results. A general capacity should include chunks of information of all sorts: for example, information derived from both acoustic and visual stimuli, and from both verbal and nonverbal stimuli. If this is the case, there should be cross-interference between one type of memory load and another. However, the literature often has shown that there is much more interference between similar types of memoranda, such as two visual arrays of objects or two acoustically presented word lists, than there is between two dissimilar types, such as one visual array and one verbal list. Cocchini et al. (2002) suggested that there is little or no interference between dissimilar lists. If so, that would appear to provide an argument against the presence of a general, cross-domain, short-term memory store.

Morey and Cowan (2004, 2005) questioned this conclusion. They presented a visual array of colored spots to be compared to a second array that matched the first or differed from it in one spot’s color. Before the first array or just after it, participants sometimes heard a list of digits that was then to be recited between the two arrays. In a low-load condition, the list was their own seven-digit telephone number whereas, in a high-load condition, it was a random seven-digit number. Only the latter condition interfered with array-comparison performance, and then only if the list was to be recited aloud between the arrays. This suggests that retrieving seven random digits in a way that also engages rehearsal processes relies upon some type of short-term memory mechanism that also is needed for the visual arrays. That shared mechanism may be the focus of attention, with its capacity limit. Apparently, though, if the list was maintained silently rather than being recited aloud, this silent maintenance occurred without much use of the common, attention-based storage mechanism, so visual array performance was not much affected.

The types of short-term memory whose contribution to recall may obscure the capacity limit can include any types of activated memory that fall outside of the focus of attention. In the modeling framework depicted in Fig. 1 , this can include sensory memory features as well as semantic features. Sperling (1960) famously illustrated the difference between unlimited sensory memory and capacity-limited categorical memory. If an array of characters was followed by a partial report cue shortly after the array, most of the characters in the cued row could be recalled. If the cue was delayed about 1 s, most of the sensory information had decayed and performance was limited to about four characters, regardless of the size of the array. Based on this study, the four-character limit could be seen as either a limit in the capacity of short-term memory or a limit in the rate with which information could be transferred from sensory memory into a categorical form before it decayed. However, Darwin et al. (1972) carried out an analogous auditory experiment and found a limit of about four items even though the observed decay period for sensory memory was about 4s. Given the striking differences between Sperling and Darwin et al. in the time period available for the transfer of information to a categorical form, the common four-item limit is best viewed as a capacity limit rather than a rate limit.

Saults and Cowan (2007) tested this conceptual framework in a series of experiments in which arrays were presented in two modalities at once or, in another procedure, one after the other. A visual array of colored spots was supplemented by an array of spoken digits occurring in four separate loudspeakers, each one consistently assigned to a different voice to ease perception. On some trials, participants knew that they were responsible for both modalities at once whereas, in other trials, participants knew that they were responsible for only the visual or only the acoustic stimuli. They received a probe array that was the same as the previous array (or the same as one modality in that previous array) or differed from the previous array in the identity of one stimulus. The task was to determine if there was a change. The use of cross-modality, capacity-limited storage predicts a particular pattern of results. It predicts that performance on either modality should be diminished in the dual-modality condition compared to the unimodal conditions, due to strain on the cross-modality store. That is how the results turned out. Moreover, if the cross-modality, capacity-limited store were the only type of storage used, then the sum of visual and auditory capacities in the dual-modality condition should be no greater than the larger of the two unimodal capacities (which happened to be the visual capacity). The reason is that the limited-capacity store would hold the same number of units no matter whether they were all from one modality or were from two modalities combined. That prediction was confirmed, but only if there was a post-perceptual mask in both modalities at once following the array to be remembered. The post-perceptual mask included a multicolored spot at each visual object location and a sound composed of all possible digits overlaid, from each loudspeaker. It was presented long enough after the arrays to be recalled that their perception would have been complete (e.g., 1 s afterward cf. Vogel et al., 2006). Presumably, the mask was capable of overwriting various types of sensory-specific features in activated memory, leaving behind only the more generic, categorical information present in the focus of attention, which presumably is protected from masking interference by the attention process. The limit of the focus of attention was again shown to be between three and four items, for either unimodal visual or bimodal stimuli.

Even without using masking stimuli, it may be possible to find a phase of the short-term memory process that is general across domains. Cowan and Morey (2007) presented two stimulus sets to be recalled (or, in control conditions, only one set). The two stimulus sets could include two spoken lists of digits, two spatial arrays of colored spots, or one of each, in either order. Following this presentation, a cue indicated that the participant would be responsible for only the first array, only the second array, or both arrays. Three seconds followed before a probe. The effect of memory load could be compared in two ways. Performance on those trials in which two sets of stimuli were presented and both were cued for retention could be compared either to trials in which only one set was presented, or it could be compared to trials in which both sets were presented but the cue later indicated that only one set had to be retained. The part of working memory preceding the cue showed modality-specific dual-task effects: encoding a stimulus set of one type was hurt more by also encoding another set if both sets were in the same modality. However, the retention of information following the cue showed dual-task effects that were not modality-specific. When two sets had been presented, retaining both of them was detrimental compared to retaining only one set (as specified by the post-stimulus retention cue to retain one versus both sets), and this dual-task effect was similar in magnitude no matter whether the sets were in the same or different modalities. After the initial encoding, working memory storage across several seconds thus may occur abstractly, in the focus of attention.

Other evidence for a separate short-term storage

Last, there is other evidence that does not directly support either temporal decay or a capacity limit specifically, but implies that one or the other of these limits exist. Bjork and Whitten (1974) and Tzeng (1973) made temporal distinctiveness arguments on the basis of what is called continual distractor list recall, in which a recency effect persists even when the list is followed by a distracter-filled delay before recall. The filled delay should have destroyed short-term memory but the recency effect occurs anyway, provided that the items in the list also are separated by distracter-filled delays to increase their distinctiveness from one another. In favor of short-term storage, though, other studies have shown dissociations between what is found in ordinary immediate recall versus continual distractor recall (e.g., word length effects reversed in continual distractor recall: Cowan et al., 1997b proactive interference at the most recent list positions in continual distractor recall only: Craik & Birtwistle, 1971 Davelaar et al., 2005).

There is also additional neuroimaging evidence for short-term storage. Talmi et al. (2005) found that recognition of earlier portions of a list, but not the last few items, activated areas within the hippocampal system that is generally associated with long-term memory retrieval. This is consistent with the finding, mentioned earlier, that memory for the last few list items is spared in Korsakoff’s amnesia (Baddeley and Warrington, 1970 Carlesimo et al., 1995). In these studies, the part of the recency effect based on short-term memory could reflect a short amount of time between presentation and recall of the last few items, or it could reflect the absence of interference between presentation and recall of the last few items. Thus, we can say that short-term memory exists, but often without great clarity as to whether the limit is a time limit or a chunk capacity limit.


Measure of Social-Distancing Compliance and Its Relationship with Other Variables of Interest.

Across two studies, two independent groups of mTurk participants reported how closely they had followed a set of practices to keep away from close-distance social interactions in the past week (e.g., whether they have cancelled social gathering with friends and avoided handshakes, hugs, or kisses when greeting see Materials and Methods for details). To estimate the validity of this measure, we correlated participants’ total scores for social-distancing compliance with their self-report numbers of times that they had left their home and with the frequency of hand washing in the past week, using Spearman rank-order correlations. Our assumption is that participants who are more likely to comply with social-distancing guidelines are less likely to leave their home and are more cognizant about the means to prevent disease transmission. Indeed, we found that participants with higher scores in social-distancing compliance also reported leaving home less (ρ = −0.32 [−0.40, −0.22], P < 0.001, n = 397 in study 1 and ρ = −0.19 [−0.28, −0.10], P < 0.001, n = 453 in study 2), but washing hands more frequently (ρ = 0.54 [0.46, 0.60], P < 0.001 in study 1 and ρ = 0.34 [0.25, 0.42], P < 0.001 in study 2). In contrast, this social-distancing compliance measure was not significantly correlated with education or income levels of the participants, even though female and older participants tended to show more social-distancing compliance (SI Appendix, Tables S2 and S3).

Of primary interest, we found that social-distancing compliance was significantly correlated with participants’ ability to retain a certain number of color squares in WM, namely WM capacity, measured from an established change localization task (36, 37). In this task, on each trial, participants tried to remember a set of briefly presented color squares over a short delay and reported a changed color in a second set of color squares, by clicking on the changed color (Fig. 1A). Response accuracy across trials was converted to K (38), as the task measure of the total number of remembered items (i.e., WM capacity). We found that higher visual WM capacity was significantly correlated with more social-distancing compliance both in study 1 (r = 0.29 [0.20, 0.38], P < 0.001) and in study 2 (r = 0.25 [0.17, 0.34], P < 0.001). In addition, individuals with higher WM capacity who scored above the median K value in each sample indeed reported more social-distancing compliance (Fig. 1B), as compared with lower WM individuals [study 1: t(395) = 4.72, P < 0.001, Cohen’s d = 0.47 (0.27, 0.67) study 2: t(451) = 3.97, P < 0.001, Cohen’s d = 0.37 (0.19, 0.56)]. We also found that social-distancing compliance and K were significantly correlated with other affective and trait variables, such as depressed mood, anxious feelings, agreeableness, and fluid intelligence (SI Appendix, Tables S2 and S3). The critical issue then is whether the association between WM capacity and social-distancing compliance can be accounted for by these mood-related and trait-related covariates. To address this issue, we adopted a regression approach to investigate the unique variance in social-distancing compliance explained by WM capacity after taking into account other covariates (39 ⇓ –41).

WM Capacity Contributes Unique Variance to Social-Distancing Compliance.

In study 1, we first looked at whether WM capacity could predict social-distancing compliance after taking into account several mood-related covariates, such as depressed mood, anxious feelings, and poor sleep quality (see Materials and Methods for details). These variables have all been previously linked to reduced WM capacity (42 ⇓ –44), and our current data replicated these previous observations (SI Appendix, Tables S2 and S3). Nonetheless, we found that WM capacity remained a robust predictor of social-distancing compliance (β = 0.18 [0.09, 0.28], P < 0.001), even after taking into account these mood-related covariates and other demographic variables, including age, gender, education, and income levels. This observation remained robust when WM capacity was entered into the regression model as the last predictor [ΔR 2 = 0.03, F(1, 388) = 13.57, P < 0.001], suggesting that WM capacity contributed unique and additional variance to individual differences in social-distancing compliance (Table 1).

Predicting social-distancing compliance with multiple regression in study 1

In study 2, we further evaluated the robustness of this observation after factoring out some additional covariates, such as the “Big Five” personality and fluid intelligence. Consistent with some previous findings regarding personality and social norms (27), we found that participants with certain personality types showed more social-distancing compliance (e.g., agreeableness, β = 0.18 [0.08, 0.27], P < 0.001). However, individual variations in personality did not take away the unique contribution of WM capacity to social-distancing compliance. Similarly, although fluid intelligence was a significant predictor of social-distancing compliance (β = 0.13 [0.05, 0.22], P = 0.003), its contribution attenuated (β = 0.09 [−0.01, 0.18], P = 0.063) when WM capacity was entered into the model as the last predictor. These observations were supported by a model comparison between the regression models with and without WM capacity as an additional predictor [β = 0.14 (0.05, 0.24), ΔR 2 = 0.02, F(1, 439) = 8.50, P = 0.004] (Table 2). Altogether, converging findings from studies 1 and 2 indicate the unique and significant contribution of WM capacity to individual variations in social-distancing compliance, which cannot be simply accounted for by mood-related variables, personality, or fluid intelligence of the participants.

Predicting social-distancing compliance with multiple regression in study 2

Weighting Benefits over Costs Mediates the Relationship between WM Capacity and Social-Distancing Compliance.

We next examined how WM capacity might account for unique variance in social-distancing compliance. Our working hypothesis is that higher WM capacity may facilitate one’s ability to perform cost-and-benefit analysis of social-distancing practice, which subsequently facilitates social-distancing compliance. This hypothesis predicts that participants’ understanding about benefits over costs of social distancing mediates the relationship between WM capacity and social-distancing compliance. We tested this prediction in study 2, in which participants evaluated the extent to which they agreed with several statements regarding social distancing during the COVID-19 pandemic (Materials and Methods and SI Appendix, Table S4). Some of these items highlight the needs or benefits to perform social distancing (e.g., “Social distancing may minimize the burden on medical resources, so people in need can use them”), whereas others highlight the potential costs associated with social distancing (e.g., “Small business could not survive if people keep social distancing”). We standardized the sum scores for the benefit- and cost-related items separately and then calculated the difference score as a measure of participants’ understanding of benefits over costs regarding social distancing at the time of testing.

We subsequently performed a formal mediation analysis (45), using WM capacity as a predictor, social-distancing compliance as the outcome variable, and participants’ understanding of benefits over costs about social distancing as a mediator (Fig. 1C). After factoring out other covariates as background confounders (Materials and Methods), we found that participants’ understanding of benefits over costs significantly mediated the relationship between WM capacity and social-distancing compliance (indirect effect: β = 0.03 [0.003, 0.07], P = 0.038). However, it was a partial mediation effect since this mediator did not fully take away the direct contribution of WM capacity to social-distancing compliance (β = 0.12 [0.02, 0.21], P = 0.013).

WM Capacity Also Predicts Fairness Norm Compliance.

To understand the more general role of WM capacity in social-norm compliance, we asked participants in study 2 to perform an additional fairness norm decision-making task (Fig. 2 A and B), which involved staged interactions between two anonymous players with real financial consequences (35, 46). At the beginning of this task, participants were told that they would be randomly paired with another mTurk worker and that they could be arbitrarily assigned into different roles (“Player A” vs. “Player B”). In fact, all participants were assigned to be Player A, whereas Player B was simulated by a preprogramed algorithm (46). The two players began each round with 25 money units (MUs), but Player A (the participant) received an additional 100 MUs and could decide to transfer x amount of MUs deemed fair by himself/herself to Player B (baseline condition). In another condition, Player B could respond to the MU transfer by either accepting it if it was fair or punishing Player A by y amount of MUs if it was deemed unfair (punishment condition). We simulated the amount of punishment based on the probability distribution and magnitude of how a real human Player B would have responded in this task from previous studies (33, 41) (SI Appendix, Table S5).

A modified ultimatum game (A and B) for social-norm compliance (C) and its relationship to WM capacity (D). On each round, both players started with 25 MUs. Player A also received an additional 100 MUs and had to decide whether to transfer x amount of MUs, in steps of 10 MUs from 0 to 100, to another anonymous Player B to be fair. (A) In the baseline condition, Player B could only accept whatever MUs Player A offered, resulting in a reduction of Player A’s earning by x MUs. (B) In the punishment condition, Player B could decide to either accept Player A’s offer or punish Player A by taking y MUs, ranging from 0 to 100 MUs, away from Player A. (C) The ultimate fairness for this game would be an even split of the 100 MUs between the two players (i.e., equality) in Western cultures. Therefore, the closer to equality the x amount of MUs Player A transferred to Player B is, the less likely Player B would punish Player A. Hence, over the course of the experiment, when faced with the punishment threat, Player A tended to transfer more MUs (close to the fairness norm of 50 MUs) to Player B, as compared to the baseline condition. The difference in the amount of MUs transferred from Player A to Player B is indicative of Player A’s sanction-induced fairness norm compliance. (D) We found that this fairness norm compliance measure was significantly correlated with WM capacity in study 2. Error bar areas in C indicate SEM estimates. And the solid line in D represents the linear fit of the data, with the dashed lines indicating its 95% confidence intervals.

According to the fairness norm in Western cultures, the ultimate fairness in this social setting would be an even split of the MUs between the two players (i.e., “split the cake”). However, this would conflict with Player A’s self interest in obtaining more MUs for a higher monetary reward, and consequently Player A generally tended to transfer a smaller amount of MUs to Player B in the baseline condition. In contrast, when a sanctioning threat was present as a reminder to comply with the fairness norm in the punishment condition, Player A tended to transfer more MUs to Player B (35, 46). We replicated these observations from previous research. Specifically, the amount of MUs the participants as Player A transferred to Player B was statistically not different from the fairness norm (i.e., 50 MUs) in the punishment condition [t(452) = 1.13, P = 0.26, Cohen’s d = 0.05 (−0.04, 0.15), Bays factor in favor of the null hypothesis = 10.02]. In contrast, participants as Player A transferred significantly fewer MUs to Player B in the baseline relative to the punishment condition [t(452) = 11.59, P < 0.001, Cohen’s d = 0.55 (0.45, 0.64)] (Fig. 2C). Consequently, the difference in the amount of transferred MUs between punishment and baseline conditions could capture individual differences in sanction-induced fairness norm compliance by taking into account individual differences in altruism and response biases (35).

We next evaluated whether WM capacity had a unique contribution to individual differences in sanction-induced fairness norm compliance. We found that WM capacity was significantly correlated with participants’ compliance with the fairness norm (r = 0.26 [0.17, 0.34], P < 0.001) (Fig. 2D). Furthermore, WM capacity’s contribution to individual differences in fairness norm compliance [β = 0.10 (0.003, 0.20), ΔR 2 = 0.01, F(1, 439) = 4.13, P = 0.043] could not be accounted for by other significant predictors of performance in this task (SI Appendix, Table S6), such as continuousness (β = 0.13 [0.02, 0.24], P = 0.018) and fluid intelligence (β = 0.26 [0.16, 0.35], P < 0.001). This may be because that individuals with higher WM capacity can better evaluate the consequences of not following the fairness norm, such that they can maximize the total amount of reward in the end. This prediction was supported by a significant correlation between WM capacity and total amount of earned MUs across participants (r = 0.28 [0.20, 0.37], P < 0.001). Furthermore, participants’ compliance to the fairness norm was significantly correlated with social-distancing compliance as well (r = 0.18 [0.09, 0.27], P < 0.001). Altogether, these results suggest that participants who are more inclined to follow one set of social norms may also be more likely to follow another set of social norms, which are both highly related with individual differences in WM capacity.


Behavioral Data

WMC and n-back

The mean WMC score, expressed as a probability value, was .59 (SD = 0.14, range .31–.81). n-Back data from two participants were lost because of technical failure. Mean performance (n = 31), expressed as a probability score, was .99 (SD = .01), .99 (SD = .02), and .89 (SD = .06) on the 1-back, 2-back, and 3-back version of the n-back task, respectively. Alpha was set to .05 in all statistical analyses. A one-way repeated-measures analysis of variance confirmed a significant difference between the high load condition (3-back) and the other two conditions, F(2, 58) = 102.81, MSE = 0.001, p < .001, ηp 2 = .77. Follow-up t tests revealed a significant difference between 3-back and 2-back, t(30) = 11.04, p < .001 and between 3-back and 1-back, t(30) = 9.98, p < .001, but not between 1-back and 2-back, t(30) = 0.78, p = .440.

Counting Deviating Tones during Active Listening

The sound consisted of a sequence of tone blocks (1002.50 msec in duration). A standard tone (1.0 kHz) was presented repeatedly (40 times) in 216 blocks, and a deviating tone (1.2 kHz) was presented repeatedly (40 times) in 24 blocks. In the active listening condition, the participants were requested to count the deviating tone blocks. The participants performed near ceiling on this task. Twenty-nine of 33 participants responded accurately (i.e., 24), and the other four reported 25, 25, 23, and 22 deviating tone blocks, respectively.

Auditory-Evoked Brainstem Response

Data from two participants were lost because of technical failure (not the same as above). The ABR from a representative participant in the four experimental conditions (active listening, low, intermediate, and high visual–verbal cognitive load) as a function of time is reported in Figure 1A and the grand averaged ABR for the four experimental conditions is reported in Figure 1B. The Wave V visible at around 6.8 msec for all traces in Figure 1 is the most prominent and consistent component of the ABR, and its magnitude is therefore used in the current analysis. Also apparent in Figure 1 are the differences in ABR magnitudes at latencies shorter than Wave V (between 3.5 and 6.8 msec). However, as the peaks before Wave V were not identifiable in the individual traces, this part of the ABR trace is not analyzed in the current study.

In what follows, ABR magnitude refers to the magnitude of Wave V of the ABR trace. As can be seen in Figure 2, the normalized magnitude of Wave V (in response to the 1.0 kHz standard tone) was highest in the active listening condition and declined as a function of increasing cognitive load. This conclusion was supported by a repeated-measures ANOVA (n = 31), F(3, 90) = 14.92, MSE = 0.03, p < .001, ηp 2 = .33. Bonferroni-adjusted follow-up t tests revealed no significant difference between the active listening and 1-back condition, and no difference between the 2-back and 3-back condition. All other comparisons were significant at the p < .001 level.

The figure shows the ABR (normalized Wave V magnitude) to sound when the participants were asked to actively listen to the sound only, and when the sound was played in the background while the participants performed three versions of the n-back task with varying cognitive load (1-back = low load, 2-back = intermediate load, 3-back = high load). Error bars show the SEMs.

The figure shows the ABR (normalized Wave V magnitude) to sound when the participants were asked to actively listen to the sound only, and when the sound was played in the background while the participants performed three versions of the n-back task with varying cognitive load (1-back = low load, 2-back = intermediate load, 3-back = high load). Error bars show the SEMs.

The intercorrelations for the normalized magnitudes of Wave V in the four experimental conditions and their relations to WMC are reported in Table 1. Two findings are particularly noteworthy. First, higher WMC was associated with a lower ABR magnitude, but only in the high load condition, as hypothesized. Second, the ABR magnitude in active listening and in the low load condition was negatively related to the ABR magnitude in the conditions of higher visual–verbal cognitive load. This generally indicates that, as the cognitive demands changed, some participants modulated the ABR magnitude to a greater extent than others (those with greater magnitude in the 1-back condition tended to have lower magnitude in the 3-back condition). We used a residual analysis technique in the context of hierarchical regression analysis to analyze whether WMC could account for some of this variance. Specifically, we tested whether WMC is related to the change in ABR magnitude between the 1-back and the 3-back condition, because they represent the largest difference between two cognitive load conditions whereby the sound is task-irrelevant and should reveal how visual–verbal cognitive load modulates perceptual filtering of irrelevant sound. ABR magnitudes in the 3-back condition were selected as the dependent variable, ABR magnitudes in the 1-back condition were selected as the independent variable of the first step of the hierarchical regression analysis, and WMC scores were selected as independent variable in the second step of the analysis. The residual variance left to be explained in the second step of this analysis represents the difference between ABR magnitude in the 1-back condition and ABR magnitude in the 3-back condition. Hence, if WMC is related to this residual variance, the results would indicate that WMC is related to the modulation of the ABR across conditions. Indeed, WMC explained a significant and unique part of the variance when added in the second step of the analysis, ΔR 2 = .19, β = −.44, t(28) = −2.72, p = .011. As correlations are sensitive to outliers and may well occur as a result of a single value that deviates from the rest of the sample, we conducted a control analysis with outliers removed. The correlation between the residual variance and WMC was even stronger when outliers (z > 2.00) were excluded, r(26) = −.52, p < .01, and further reinforce the conclusion that higher WMC is related to a greater modulation of the ABR magnitude (Figure 3).

The Correlation between ABRs (Wave V) under Four Experimental Conditions, WMC, and Scores of Three Versions of n-back with Various Cognitive Load

Measure . 1. . 2. . 3. . 4. . 5. . 6. . 7. .
1. ABR in active listening condition 1
2. ABR in 1-back condition −.23 1
3. ABR in 2-back condition −.47** −.39* 1
4. ABR in 3-back condition −.62** −.31 .11 1
5. WMC index .17 .12 .12 −.47** 1
6. 1-Back score −.05 −.26 .19 .15 −.08 1
7. 2-Back score .04 −.10 .17 −.11 .25 −.21 1
8. 3-Back score −.20 −.04 .23 .13 .13 −.06 .49**
Measure . 1. . 2. . 3. . 4. . 5. . 6. . 7. .
1. ABR in active listening condition 1
2. ABR in 1-back condition −.23 1
3. ABR in 2-back condition −.47** −.39* 1
4. ABR in 3-back condition −.62** −.31 .11 1
5. WMC index .17 .12 .12 −.47** 1
6. 1-Back score −.05 −.26 .19 .15 −.08 1
7. 2-Back score .04 −.10 .17 −.11 .25 −.21 1
8. 3-Back score −.20 −.04 .23 .13 .13 −.06 .49**

The figure shows z values (outliers removed) for the relationship between WMC and the change in ABRs (Wave V) between the low load condition (1-back) and the high load condition (3-back). The change scores (y axis) are the residual variance left to be explained when ABRs in the low load condition was partialed out from ABRs in the high load condition. Note that the smaller values on the y axis represent a larger change in the negative direction (a larger decrease of the ABR).

The figure shows z values (outliers removed) for the relationship between WMC and the change in ABRs (Wave V) between the low load condition (1-back) and the high load condition (3-back). The change scores (y axis) are the residual variance left to be explained when ABRs in the low load condition was partialed out from ABRs in the high load condition. Note that the smaller values on the y axis represent a larger change in the negative direction (a larger decrease of the ABR).

Aims and Predictions for the Present Study

The present study has three interlinked aims. First, we test hypotheses from the three theoretical views about the nature of WMC outlined above. Second, we investigate to what extent different task classes for measuring WMC are interchangeable indicators of the same construct. Third, we explored how WMC is related to SM, cognitive control, and fluid intelligence. The three aims are interlinked because different theories about the nature of WMC lead to different expectations about which kind of tasks measure the same construct (i.e., WMC), and about how these tasks and constructs relate to other cognitive constructs. To this end, we tested participants on multiple tests of the following categories: (1) complex-span tasks (Cspan), (2) working memory updating tasks (Updating), (3) tests of immediate memory for temporary bindings (Binding), (4) tests of SM for associations (SM), (5) tasks measuring response inhibition (Inhibition), and (6) tests of fluid intelligence (Gf).

The executive-attention theory of WMC motivates the following predictions (see Figure 1, red arrows): Cspan tasks should be highly correlated with Updating and with Inhibition, because the latter two represent aspects of executive functions. The common variance of these three classes of measures, reflecting general executive attention, should be a good predictor of Gf. This theory does not rule out that other constructs, such as SM or Binding, also contribute to predicting Gf.

The dual-component theory of Unsworth and colleagues conceptualizes performance in complex-span tasks as being determined by maintenance in PM and search in SM (see Figure 1, blue arrows). As we argued above, tests of working-memory updating are unlikely to rely much on SM, and therefore reflect maintenance in PM to a larger extent than complex-span tasks. Therefore, we can expect that Cspan is related to SM on the one hand, and to PM (as measured by Updating) on the other hand, whereas SM and Updating are comparatively weakly related to each other. Updating and SM should also contribute independently to predicting Gf.

The binding hypothesis implies that a measure of the maintenance of temporary bindings should share a large proportion of variance with other measures of WMC, including Cspan and Updating. The shared variance of all those measures is assumed to reflect the WMC construct (see Figure 1, green arrows). Together with the binding measures, the updating tasks should have comparatively high loadings on this construct because, as we argued above, these tasks require the rapid updating of bindings. The broad WMC construct, reflecting maintenance and updating of bindings, should be a good predictor of Gf. In contrast, Inhibition is not expected to be closely related to Cspan, Updating, or Gf. SM performance should substantially depend upon the ability to create bindings in working memory. According to the binding hypothesis, temporary bindings in working memory, not more long-term associations in SM, are directly relevant for reasoning. Therefore, we expect that Gf is better predicted by WMC (i.e., the shared variance of Binding, Updating, and Complex Span) than by SM.

Individual differences in lexical learning across two language modalities: Sign learning, word learning, and their relationship in hearing non-signing adults

A considerable amount of research has been devoted to understanding individual differences in lexical learning, however, the majority of this research has been conducted with spoken languages rather than signed languages and thus we know very little about the cognitive processes involved in sign learning or the extent to which lexical learning processes are specific to word learning. The present study was conducted to address this gap. Two-hundred thirty-six non-signing adults completed 25 tasks assessing word learning and sign learning (via associative learning paradigms) as well as modality-specific phonological short-term memory, working memory capacity, crystallized intelligence, and fluid intelligence. Latent variable analyses indicated that, when other variables were held constant, fluid intelligence was predictive of both word and sign learning, however, modality-specific phonological short-term memory factors were only predictive of lexical learning within modality-none of the other variables made significant independent contributions. It was further observed that sign and word learning were strongly correlated. Exploratory analyses revealed that all lexical learning tasks loaded onto a general factor, however, sign learning tasks loaded onto an additional specific factor. As such, this study provides insight into the cognitive components that are common to associative L2 lexical learning regardless of language modality and those that are unique to either signed or spoken languages. Results are further discussed in light of established and more recent theories of intelligence, short-term memory, and working memory.

Keywords: Associative learning Lexical learning Phonological short-term memory Sign language.

Working Memory Model

Atkinson’s and Shiffrin’s (1968) multi-store model was extremely successful in terms of the amount of research it generated. However, as a result of this research, it became apparent that there were a number of problems with their ideas concerning the characteristics of short-term memory.

Baddeley and Hitch (1974) argue that the picture of short-term memory (STM) provided by the Multi-Store Model is far too simple.

According to the Multi-Store Model, STM holds limited amounts of information for short periods of time with relatively little processing. It is a unitary system. This means it is a single system (or store) without any subsystems. Whereas working memory is a multi-component system (auditory, and visual).

Therefore, whereas short-term memory can only hold information, working memory can both retainin and process information.

Fig 1. The Working Memory Model (Baddeley and Hitch, 1974)

Working memory is short-term memory. However, instead of all information going into one single store, there are different systems for different types of information.

Central Executive

Visuospatial Sketchpad (inner eye)

Phonological Loop

  1. Phonological Store (inner ear) processes speech perception and stores spoken words we hear for 1-2 seconds.
  2. Articulatory control process (inner voice) processes speech production, and rehearses and stores verbal information from the phonological store.

Fig 2. The Working Memory Model Components (Baddeley and Hitch, 1974)

The labels given to the components (see fig 2) of the working memory reflect their function and the type of information they process and manipulate. The phonological loop is assumed to be responsible for the manipulation of speech based information, whereas the visuospatial sketchpad is assumed to be responsible for manipulating visual images.

The model proposes that every component of working memory has a limited capacity, and also that the components are relatively independent of each other.

Watch the video: Η τέλεια παγίδα από την Άγκυρα που τεστάρει τα αντανακλαστικά της Αθήνας σε κυπριακή ΑΟΖ και Κρήτη (January 2022).