Brain Training Falls Short in Big Online Experiment
23 April 2010. However far apart the worlds of television and scientific research may seem, they came together for a study published online in Nature on April 20. It began when the producers of a popular television show set out to learn whether brain training improves cognitive performance. The researchers, including Adrian Owen of the MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom, tested whether six weeks of online brain training would improve the cognitive performance of 11,430 healthy adults recruited through the show and its website. Owen and colleagues found that the training mainly benefited performance on the trained tasks; it did little to boost performance on other tasks, even ones closely related to those used in the training. The study’s implications for schizophrenia and other diseases that impair cognition remain unclear.
The benefits touted by certain commercially available brain-training programs include better memory, sharper thinking, faster cognitive processing, and improved concentration. The Brain Age website hints that training can “increase blood flow to the prefrontal cortex,” Posit Science claims to improve the speed and quality of information processing, and HAPPYneuron says that its program “minimizes the natural effects of brain aging by maximizing the brain's capacity to learn and its ability to adapt to new information.” Whether these electronic games cost consumers $20 or $400, however, their ability to improve overall cognition remains unproven. A review of approaches to improving cognitive function through “brain exercise” concluded that they do little or nothing to improve cognition outside of the trained tasks (Green and Bavelier, 2008), although more promising findings also appear in the literature (see SRF related news story).
To examine the effects of brain training on cognitive functioning, Owen and his collaborators in the United Kingdom conducted a massive clinical trial online. At the start, 52,617 subjects, recruited through the British Broadcasting Corporation science show Bang Goes the Theory, signed up to participate. Ranging in age from 18 to 60, they were instructed to perform certain cognitive tasks for at least 10 minutes a day, three times per week, for six weeks. To do so, they logged on to the show’s website, Lab UK.
The study randomly assigned subjects to either one of two brain-training groups or to a control group. The two training groups differed in the kind of training they received: one practiced tasks designed to improve reasoning, planning, and problem-solving, while the other received training more like that involved in many commercial brain-training programs. For instance, they practiced tests assessing a wide range of cognitive functions, including short-term memory, attention, visuospatial processing, and math. A third group consisted of control subjects whose task involved finding answers to “obscure questions” using any online resources they could find.
To detect cognitive gains, the researchers administered a neuropsychological battery at baseline and six weeks later. It consisted of four tests that assessed verbal short-term memory, spatial working memory, paired-associates learning, and reasoning. The change in scores from pre-test to post-test served as the main outcome measure and an indicator of generalized cognitive improvement. Analyses focused on the 11,430 subjects who had completed at least two training sessions during the six-week period, as well as the neuropsychological test battery before and after that time.
Due to concerns that the huge sample size would yield positive results for trivial differences, the researchers reported effect sizes rather than p-values. Not surprisingly, the training improved performance on the trained tasks. It yielded substantial effect sizes that ranged from 0.73 (99 percent CI, 0.68 to 0.79) to 1.63 (99 percent CI, 1.57 to 1.7), depending on the task, in the reasoning training group, and from 0.72 (99 percent CI, 0.67 to 0.78) to 0.97 (99 percent CI, 0.91 to 1.03) in the general training group. These numbers exceed the small effect size, 0.33 (99 percent CI, 0.26 to 0.4), seen in control subjects.
Of course, the most interesting question was whether the training gains would generalize to other tasks. Owen and colleagues found that from baseline to six weeks, the group that received reasoning training improved on four of the neuropsychological tests, while the broadly trained group improved on three. Even so, the highest effect size, 0.35 (99 percent confidence interval, 0.29 to 0.41), was small. More to the point, the control group also improved on all four tests, with similar effect sizes. When the researchers directly compared the performance of the three groups across all four tests, they found that all improved, but only slightly. Owen and colleagues conclude, “These results provide no evidence for any generalized improvements in cognitive function following brain training in a large sample of healthy adults.”
Methodological land mine or landmark?
Owen and colleagues considered a number of possible reasons for the failure to find generalized training gains. For instance, they wondered whether an insufficient training “dose” could account for the disappointing results. The average subject completed 24 sessions, but individuals varied widely in the number of sessions completed. However, in all three groups, the number of training sessions correlated only slightly with performance gains on the neuropsychological tests (largest Spearman’s rho = 0.059), although it did correlate with performance on the trained tasks. “In all three groups, whether these improvements reflected the simple effects of task repetition (that is, practice), the adoption of new task strategies, or a combination of the two is unclear, but whatever the process effecting change, it did not generalize to the untrained benchmarking tests,” write Owen and associates. However, they note,” the possibility that an even more extensive training regime may have eventually produced an effect cannot be excluded.”
The study may add to uncertainty about cognitive training for people with schizophrenia. Philip Harvey of Emory University, in a comment to SRF, raises several crucial methodological points about this study. For instance, he warns that subjects’ ideas about the desired findings of the study could have shaped the outcome. Robert Bilder, of the University of California at Los Angeles, also expresses methodological concerns, including agreeing with Harvey that healthy, high-performing subjects might have less room for improvement than patients with brain disorders. However, he thinks the study highlights the potential usefulness of the Web for conducting large-scale, behavioral phenotyping.
Helping the schizophrenic brain
Brain-training programs may interest people with schizophrenia and related disorders that cause cognitive impairments. Apparently, Posit Science thinks so; its website says that “highly promising preliminary results” support its effectiveness for various conditions that cause cognitive impairment, such as schizophrenia and Alzheimer’s disease. Actually, prior research does support the ability of a neuroscience-based training program, developed by Posit Science, to produce lasting effects on global cognition in schizophrenia, but only after 100 hours of training (see SRF related news story). In addition, a 2007 meta-analysis found that cognitive remediation programs moderately improved overall cognition in schizophrenia (McGurk et al., 2007). On the other hand, a subsequent randomized, control trial of computer-assisted cognitive training found that its benefits did not transfer to other neuropsychological or functional outcomes (Dickinson et al., 2009). Clearly, the issue remains unsettled.
While the Nature study looked at healthy, relatively young volunteers, subjects over 60 years old have also been training their brains on the television show’s Lab UK website. Owen and associates plan to follow them for a year to see if a longer training program might help these subjects, some of whom have cognitive deficits.—Victoria L. Wilcox.
Owen AM, Hampshire A, Grahn JA, Stenton R, Dajani S, Burns AS, Howard RJ, Ballard CG. Putting brain training to the test. Nature. 2010 Apr 20. Abstract
Comments on News and Primary Papers
Comment by: Robert Bilder, SRF Advisor
Submitted 27 April 2010
Posted 27 April 2010
It’s wonderful to see this study in Nature, for it draws international attention to extremely important issues, including the degree to which cognitive training may yield generalizable effects, and to the amazing potential power of Web-based technologies to engage tens of thousands of individuals in behavioral research. It seems likely—and unfortunate—that for much of the world, the “take-home” message will be that all this “brain training” is bunk.
For me, the most exciting aspect of the study is that it was done at all. The basic design (engaging a TV audience to register for online experiments) is ingenious and indicates the awesome potential to use media for “good instead of evil.” Are there any investigators out there who would not be happy to recruit 52,617 research participants (presumably within the course of a single TV season)? Of course, this approach yielded only 11,430 people who completed the protocol (still sounds pretty good to me, especially since this reflects roughly 279,692 sessions completed). For those of us who struggle for years to obtain behavioral test data on several thousand research participants in our laboratories, this is a dream come true. Thus, I see the big contribution here as a validation of high-throughput behavioral phenotyping using the Internet, which will be necessary if we are to realize the promise of neuropsychiatric genetics.
The success of this validation experiment is obvious by looking at the specific effects (not the generalization effect), which showed effect sizes (Cohen’s d) ranging from 0.72 to 1.63. It would be of high value to see considerably more data on the within-subject variability of each training test score and on the covariance of different scores across sessions, not to mention other quality metrics. These include response time consistency, which should be used to assure “on-task” behavior and the validity of each session. Despite the lack of these details, the positive findings of these large effects means the results must be reasonably reliable (i.e., it is not feasible to get large and consistent effects from noise). This is very encouraging for those who want to see more Web-based behavioral phenotyping.
The “negative” results center on the lack of generalization to the “benchmark” tests, and this aspect of the outcome involves many more devils in the details. The authors are sensitive to many possibilities. The argument that there might be a misalignment of training with the benchmark tests is difficult to refute. The authors suggest that their 12 training tasks covered a “broad range of cognitive functions” and “are known to correlate highly with measures of general fluid intelligence or “g,” and were therefore most likely to produce an improvement in the general level of cognitive functioning” [italics added]. But this assertion is not logically sound. It is not unreasonable, but there is no necessary reason to suppose that the best way to improve general cognitive ability is to do so via general intellectual ability practice.
The suitability of the CANTAB (Cambridge Neuropsychological Test Automated Battery) tasks as benchmarks is also not an open-closed case. Yes, they are sensitive to brain damage and disruptive effects of various agents (for a comprehensive bibliography, see www.cantab.com/science/bibliography.asp). The data showing that these tasks can detect improvement in healthy people are more limited. One wonders if the guanfacine and clonidine improvement effects observed in the one study that is cited (Jakala et al., 1999a) would be seen in the sample of people who participated in the new study. Incidentally, it could be noted that the same authors also reported impairments on other cognitive tasks using the same agents (Jakala et al., 1999b; Jakala et al., 1999c). The bottom line is that it may be difficult to see big improvements, particularly in general cognitive ability, in people who are to some extent preselected for their healthy cognitive function and interest in “exercising” their brains.
Overall, I believe more research is needed to determine which aspects of cognitive function may be most trainable, in whom, and under what circumstances. I worry that this publication may end up derailing important studies that can shed light on these issues, since it is much easier to conclude that something is “bunk” than to push the envelope and systematically study all the reasons such a study could generate negative generalization results. Let us hope the baby is not thrown out with the bathwater at this early stage of investigation.
Jäkälä P, Sirviö J, Riekkinen M, Koivisto E, Kejonen K, Vanhanen M, Riekkinen P Jr. Guanfacine and clonidine, alpha 2-agonists, improve paired associates learning, but not delayed matching to sample, in humans. Neuropsychopharmacology. 1999a Feb;20(2):119-30. Abstract
Jäkälä P, Riekkinen M, Sirviö J, Koivisto E, Kejonen K, Vanhanen M, Riekkinen P Jr. Guanfacine, but not clonidine, improves planning and working memory performance in humans. Neuropsychopharmacology. 1999b May;20(5):460-70. Abstract
Jäkälä P, Riekkinen M, Sirviö J, Koivisto E, Riekkinen P Jr. Clonidine, but not guanfacine, impairs choice reaction time performance in young healthy volunteers. Neuropsychopharmacology. 1999c Oct;21(4):495-502. Abstract
View all comments by Robert BilderComment by: Philip Harvey
Submitted 27 April 2010
Posted 27 April 2010
The paper from Owen et al. reports that a sample of community dwellers recruited to participate in a cognitive remediation study did not improve their cognitive performance except on the tasks on which they trained. While the results of cognitive remediation studies in schizophrenia have been inconsistent, the results of this study are particularly difficult to interpret, for several reasons:
1. Baseline performance on the "benchmarking" assessment does not appear to be adjusted for age, education, and other demographic predictive factors. As a result, we do not know if the participants even had room to improve from baseline. It is possible that the volunteers in this study were very high performers at baseline and could not improve. Furthermore, if they are, in fact, high performers, their performance and the lack of any improvements with treatment may be irrelevant to poor performers.
2. There is no way to know if the research participants who completed the baseline and endpoint assessments were the same ones who completed the training. Without this control, which is provided in studies that directly observe subjects, there may be reason to be suspicious of the results.
3. Although this is a letter to the editor, methodological details are missing. Recent studies that reported successful cognitive remediation interventions have used dynamic titration to adjust task difficulty according to performance at the time of the training. While Owen et al. do not say, it seems unlikely that their study used dynamic titration. The use of this technique is the major difference between older, unsuccessful cognitive remediation interventions and recent, more successful ones delivered to people with schizophrenia (see McGurk et al., 2007 for a discussion).
4. Even more important is the small effect of the training and participation in general on changes in elements of the benchmarking assessment. These changes on half of the tests administered are smaller than those reported in simple retest assessments without cognitive training in people with schizophrenia (see Goldberg et al., 2007). Although the authors argue that these tests are known to be sensitive, this very small effect is particularly salient for paired associates learning. Thus, the issue of whether some of these tests are not sensitive to changes originating from either treatment or practice requires some consideration, particularly since we do not know how well the participants performed at baseline.
5. Most important, these data are likely to reflect substantial demand characteristics. A study put on by a show called Bang Goes The Theory certainly appears to pull for disconfirmation. It is possible that demand characteristics account for more variance than training since even successful training effects can be small. We know that environmental factors such as disability compensation account for more variance in real-world outcomes in schizophrenia than ability (Rosenheck et al., 2006); it would be no surprise if demand characteristics account for more variance than ability as well.
Thus, while these results generate the reasonable suggestion that participation through the Internet in cognitive remediation does not guarantee improved cognitive performance, the current research design does not address many important issues regarding cognitive enhancement in clinical populations.
Goldberg TE, Goldman RS, Burdick KE, Malhotra AK, Lencz T, Patel RC, Woerner MG, Schooler NR, Kane JM, Robinson DG. Cognitive improvement after treatment with second-generation antipsychotic medications in first-episode schizophrenia: Is it a practice effect? Arch Gen Psychiatry. 2007 Oct;64:1115-22. Abstract
McGurk SR, Twamley EW, Sitzer DI, McHugo GJ, Mueser KT. A meta-analysis of cognitive remediation in schizophrenia. Am J Psychiatry. 2007;164:1791-1802. Abstract
Rosenheck R, Leslie D, Keefe R, McEvoy J, Swartz M, Perkins D, Stroup S, Hsiao JK, Lieberman J, CATIE Study Investigators Group. Barriers to employment for people with schizophrenia. Am J Psychiatry. 2006;163:411-7. Abstract
View all comments by Philip HarveyComment by: Terry Goldberg
Submitted 7 May 2010
Posted 7 May 2010
This important paper by Owen and colleagues reads like a cautionary tale. In a Web-based study of over 11,000 presumptively healthy individuals, neither of two different types of cognitive training resulted in transfer of improvement to a reasoning task or to several well-validated cognitive tasks from the Cambridge Neuropsychological Test Automated Battery (CANTAB). I would like to point out three issues with the study.
First, the amount of training that individuals received at their own behest differed greatly. While the authors found no correlation between the number of training sessions and performance improvement or lack thereof, it is nevertheless possible that there is some critical threshold, either in number of sessions or, more importantly, time spent in sessions (not noted in the paper), that must be reached before transfer can occur. In other words, the relationship between training and transfer may be nonlinear and perhaps sigmoidal.
Second, it is possible that scores on some of the benchmark/transfer tasks were close to ceiling in this normal population, preventing gain. Perhaps more likely, they could have been close to floor (see Figure 1 in the paper; scores were seemingly quite low), making them insensitive to gain.
Last, as pointed out by Phil Harvey, the nature of the recruitment tool, a debunking TV show called Bang Goes the Theory, may have introduced a bias to disconfirm in the participants. This would be especially pertinent if participants understood the design of the study, which seems likely.
View all comments by Terry GoldbergComment by: Angus MacDonald, SRF Advisor
Submitted 11 May 2010
Posted 11 May 2010
Owen and colleagues are to be commended for drawing attention to the great constraint of cognitive training—that is, the potential for improvements on only the restricted set of abilities that were trained.
This has been the bugbear of cognitive training for a long time. Short story with a purpose: In 2001, when I raved about the remarkable results of Klingberg (later published as Olesen et al., 2004) to John Anderson, an esteemed cognitive psychologist at Carnegie Mellon University, he scoffed at the possibility that Klingberg's training might have led to improvements on Raven's Matrices, a measure of generalized intelligence. "People have been looking into this for a century. If working memory training improved intelligence, schools would be filled with memory training courses rather than math and language courses," he said (or something to that effect). This issue of training and generalization is not new, and the results of Owen and colleagues are consistent with a large body of twentieth-century research.
Owen, therefore, reminds us of an important issue in the current generation of excitement about neuroplasticity: behavioral effects are likely to be small for distal generalization. The possibility of striking results is likely going to require something well beyond what is encountered in everyday or casual experience.
One way to improve on casual experience is dynamic titration. It is reasonably well established that when faced with a task of fixed difficulty, people will begin to asymptote on accuracy and then get faster and faster, with no hope of generalization. The online methods are mum about how this concern was addressed. (One certainly hopes that it was.)
We have recently demonstrated that dynamic titration on an n-back task, in the context of a broader working memory training, can provide local generalization (Haut et. al, 2010). In that study, we examined changes in prefrontal cortical activity with cognitive training compared to an active placebo control in schizophrenia patients. We found that training had provided a stimulus-general improvement in the trained task, and that this improvement mapped onto greater frontopolar and dorsolateral prefrontal cortex activity. This result was therefore quite similar to that reported in healthy adults by Olesen (Olesen et al., 2004). I don't think I'll press for working memory training courses in my local school yet, but Owen won't be the reason.
Olesen PJ, Westerberg H, Klingberg T. Increased prefrontal and parietal activity after training of working memory. Nat Neurosci. 2004;7(1):75-9. Epub 2003 Dec 14. Abstract
Haut KM, Lim KO, MacDonald AW III. Prefrontal cortical changes following cognitive training in patients with chronic schizophrenia: effects of practice, generalization and specificity. Neuropsychopharmacology. 2010 Apr 28. Abstract
View all comments by Angus MacDonald
Comments on Related News
Related News: Training Study Questions Fixed Nature of Fluid IntelligenceComment by: Andrei Szoke
Submitted 7 May 2008
Posted 7 May 2008
The authors suggest that they have found what could be considered the Holy Grail of cognitive research—a means to enhance intelligence. There is some hope from the article, as results on a task considered to measure fluid intelligence are improved, even if the subjects are not trained on this specific task. The “dual n-back” training task, although not pure working memory (as the authors acknowledge), is a very interesting experimental paradigm. Unfortunately, the authors fail to convince us of its usefulness in enhancing “fluid intelligence.” When a drug is tested, any effect, to be convincingly supported, must be demonstrated in a double-blind, randomized, placebo (or standard treatment)-controlled trial. The same should be true for any (pharmacological or otherwise) means aimed at enhancing cognition.
As for the issue of whether this training will have the same effects in schizophrenic subjects as it had in these normal, motivated controls, that is an entirely different question that is not addressed in the article. I think that future studies have to address all those limitations (randomization of subjects, a similar amount of training with a different task in controls, a double-blind design) before any firm conclusions could be drawn.
View all comments by Andrei Szoke