Crowd-sourcing the tracking and interpretation of replication evidence.

Published scientific findings can only be considered trustworthy -- for theory and applications (e.g., health interventions) -- once successfully replicated and generalized by independent researchers. No database, however, currently exists that systematically tracks and meta-analytically summarizes independent direct replications to gauge the replicability and generalizability of social science findings over time. Curate Science is a crowd-sourced effort to achieve just this to accelerate the development of trustworthy knowledge that can soundly inform theory and effective public policy to improve human welfare (see About section for more details). Go to curated replications.

Update (October 21, 2016): We will soon be releasing a new framework for the curation of replication evidence of social science findings (version 3.0.0). Details of our previous approaches can be found here (version 2.0.4) and here (version 1.0.5).

Large-Scale Meta-Science
  • Reproducibility Project [100 replications]
  • Social Psych Special Issue [31 replications]
    • Many Labs 1 [12 effects x 36 labs = 432 replications]
  • Many Labs 2 [26 effects, N = ~15,000]
  • Many Labs 3 [10 effects x 21 labs = 210 replications]
  • RRR1 & RRR2: Verbal overshadowing [23 replications]
  • RRR3: Hart & Albarracín (2011) [13 replications]
  • RRR4: Ego depletion [24 replications]
  • RRR5: Facial feedback hypothesis
  • RRR6: Commitment priming on forgiveness [20 replications]
Applied Topics
  • Brain training [10 replications]
  • Effects of violent video games [4 replications]
  • Cognitive benefits of bilingualism [6 replications]
  • Stereotype Threat [5 replications]
  • Reducing prejudice via imagined contact [4 replications]
  • Benefits of growth mindset [4 replications]
  • Reading fiction boosts empathy [3 replications]
  • Mozart effect [3 replications]
  • Subliminal advertising [3 replications]
  • Weather on happiness [3 replications]
Basic Research Areas
Social Priming
  • Elderly priming [4 replications]
  • Intelligence priming [8 replications]
  • Money priming [42 replications]
  • Cleanliness priming [7 replications]
  • Achievement/goal priming [3 replications]
  • Religious priming [4 replications]
  • Color priming [8 replications]
  • Honesty priming [3 replications]
  • Heat priming [2 replications]
  • Distance priming [5 replications]
  • Mate priming [8 replications]
  • US flag priming [36 replications]
Social Embodiment
  • Macbeth effect [11 replications]
  • Power posing [3 replications]
  • Embodiment of weight [6 replications]
  • Embodiment of secrets [5 replications]
  • Embodiment of warmth
    • Bargh & Shalev (2012) [14 replications]
    • Williams & Bargh (2006) [3 replications]
    • Vess (2012) [2 replications]
Classic social psychology
  • Mood on helping [3 replications]
  • Argument strength x NFC effect (ELM) [21 replications]
Self Evolutionary Psychology
  • Ovulation on voting [2 replications]
  • Ovulation on mate preferences [3 replications]
  • Ovulation on clothing choice [2 replications]
  • Color on physical attraction [5 replications]
Attitudes & Stereotypes
  • Race-erased effect [2 replications]
  • 1/f noise in racial bias [2 replications]
  • Subliminal approach/avoidance effects [3 replications]
  • Status legitimacy effect [3 replications]
  • SES on unethical behaviors [9 replications]
Judgment & Decision-making
  • Unconscious decision-making [12 replications]
  • Framing effects [36 replications]
  • Anchoring effects [36 replications]
  • Protection effect [3 replications]
  • Incidental values on time judgments [8 replications]

Social Priming

Schnall, Benton, & Harvey (2008a) -- Replications (7)   
With a Clean Conscience: Cleanliness Reduces the Severity of Moral Judgments

[Original Abtract ]

Theories of moral judgment have long emphasized reasoning and conscious thought while downplaying the role of intuitive and contextual influences. However, recent research has demonstrated that incidental feelings of disgust can influence moral judgments and make them more severe. This study involved two experiments demonstrating that the reverse effect can occur when the notion of physical purity is made salient, thus making moral judgments less severe. After having the cognitive concept of cleanliness activated (Experiment 1) or after physically cleansing themselves after experiencing disgust (Experiment 2), participants found certain moral actions to be less wrong than did participants who had not been exposed to a cleanliness manipulation. The findings support the idea that moral judgment can be driven by intuitive processes, rather than deliberate reasoning. One of those intuitions appears to be physical purity, because it has a strong connection to moral purity.
Original Studies & Replications Data/Syntax Materials/Pre-reg N Effect size (d) [95% CI]
Schnall et al. (2008a) Study 1 Study_1.sav 40
Johnson et al. (2014a) Study 1 Exp1_Data.sav OSF folder 208
Johnson et al. (2014b) Online_Rep.sav OSF folder 736
Lee et al. (2013) lee_data.csv 90
Arbesfeld et al. (2014) 60
Besman et al. (2013) 60
Huang (2014) Study 1 study1.sav 189
Current meta-analytic estimate of replications of SBH's Study 1 (random-effects):
Schnall et al. (2008a) Study 2 Study_2.sav 43
Johnson et al. (2014a) Study 2 Exp2_Data.sav OSF folder 126
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary:The main finding that cleanliness priming reduces the severity of moral judgments does not (yet) appear to be replicable (overall meta-analytic effect: r = -.08 [+/-.13]). In a follow-up commentary, Schnall argued that a ceiling effect in Johnson et al.'s (2014a) studies render their results uninterpretable and hence their replication results should be dismissed. However, independent re-analyses by Simonsohn, Yarkoni, Schönbrodt, Inbar, Fraley, and Simkovic appear to rule out such ceiling effect explanation, hence, Johnson et al.'s (2014a) results should be retained in gauging the replicability of the original cleanliness priming effect. Of course, it's possible "cleanliness priming" may be replicable under different operationalizations, conditions, and/or experimental designs (e.g., within-subjects). Indeed, Huang (2014) has reported new evidence suggesting cleanliness priming may only reduce severity of moral judgments under conditions of "low response effort", however, the research appears to be low-powered (<50%) to detect the small interaction effect found (r = .12). Regardless, independent corroboration of Huang's interaction effect is required before confidence is placed in such moderated cleanliness priming effect.

Original authors' and replicators' comments: F. Cheung mentioned a note should be added that data for the Besman et al. (2013) replication has been lost (communicated to him by K. Daubman, who has not yet responded to my request for links to original data of both her Arbesfeld et al. and Besman et al. replications). M. Frank mentioned we should consider including some of Huang's (2014) studies (baseline un-moderated conditions only), which led us to add Huang's Study 1 (only study with baseline condition comparable to Schnall et al.'s Study 1 design). S. Schnall has yet to respond (email sent March 11, 2016).

Related Commentary

Money priming -- Replications (42)  
Vohs, Mead, & Goode (2006) 
The psychological consequences of money
Caruso, Vohs, Baxter, & Waytz (2013) 
Mere exposure to money increases endorsement of free-market systems and social inequality

Original Studies & Replications Data/Syntax Materials/Pre-reg N Effect size (d) [95% CI]
Vohs et al. (2006) Study 3 39
Grenier et al. (2012) Report appendix 40
Caruso et al. (2013) Study 2 168
Rohrer et al. (2015) Study 2 Article appendix 420
Schuler & Wänke (in press) Study 2 115
Current meta-analytic estimate of replications of CVBW's Study 2 (random-effects):
Caruso et al. (2013) Study 3 80
Rohrer et al. (2015) Study 3 Article appendix 156
Caruso et al. (2013) Study 4 48
Rohrer et al. (2015) Study 4 Article appendix 116
Caruso et al. (2013) Study 1 30
Rohrer et al. (2015) Study 1 Article appendix 136
Morris (2014) ML1.Cleaned.sav ML1.protocol.pdf 98
Schmidt & Nosek (2014) ML1.Cleaned.sav ML1.protocol.pdf 81
Woodzicka (2014) ML1.Cleaned.sav ML1.protocol.pdf 90
Nier (2014) ML1.Cleaned.sav ML1.protocol.pdf 95
Bocian & Frankowska (2014) Study 1 ML1.Cleaned.sav ML1.protocol.pdf 79
Brandt et al. (2014) ML1.Cleaned.sav ML1.protocol.pdf 80
Furrow & Thompson (2014) ML1.Cleaned.sav ML1.protocol.pdf 85
Pilati (2014) ML1.Cleaned.sav ML1.protocol.pdf 120
Davis & Hicks (2014) Study 2 ML1.Cleaned.sav ML1.protocol.pdf 225
Wichman (2014) ML1.Cleaned.sav ML1.protocol.pdf 103
Brumbaugh & Storbeck (2014) Study 2 ML1.Cleaned.sav ML1.protocol.pdf 86
Kurtz (2014) ML1.Cleaned.sav ML1.protocol.pdf 174
Smith (2014) ML1.Cleaned.sav ML1.protocol.pdf 107
Levitan (2014) ML1.Cleaned.sav ML1.protocol.pdf 123
Brumbaugh & Storbeck (2014) Study 1 ML1.Cleaned.sav ML1.protocol.pdf 103
Adams & Nelson (2014) ML1.Cleaned.sav ML1.protocol.pdf 95
Rutchick (2014) ML1.Cleaned.sav ML1.protocol.pdf 96
Vaughn (2014) ML1.Cleaned.sav ML1.protocol.pdf 90
Bernstein (2014) ML1.Cleaned.sav ML1.protocol.pdf 84
Schmidt & Nosek (2014, PI) ML1.Cleaned.sav ML1.protocol.pdf 1329
Vianello & Galliani (2014) ML1.Cleaned.sav ML1.protocol.pdf 144
Hovermale & Joy-Gaba (2014) ML1.Cleaned.sav ML1.protocol.pdf 108
Schmidt & Nosek (2014, MTURK) ML1.Cleaned.sav ML1.protocol.pdf 1000
Huntsinger & Mallett (2014) ML1.Cleaned.sav ML1.protocol.pdf 146
Bocian & Frankowska (2014) Study 2 ML1.Cleaned.sav ML1.protocol.pdf 169
Cemalcilar (2014) ML1.Cleaned.sav ML1.protocol.pdf 113
Packard (2014) ML1.Cleaned.sav ML1.protocol.pdf 112
Vranka (2014) ML1.Cleaned.sav ML1.protocol.pdf 84
Klein et al. (2014) ML1.Cleaned.sav ML1.protocol.pdf 127
Davis & Hicks (2014) Study 1 ML1.Cleaned.sav ML1.protocol.pdf 187
Kappes (2014) ML1.Cleaned.sav ML1.protocol.pdf 277
John & Skorinko (2014) ML1.Cleaned.sav ML1.protocol.pdf 87
Swol (2014) ML1.Cleaned.sav ML1.protocol.pdf 96
Devos (2014) ML1.Cleaned.sav ML1.protocol.pdf 162
Cheong (2014) ML1.Cleaned.sav ML1.protocol.pdf 102
Hunt & Krueger (2014) ML1.Cleaned.sav ML1.protocol.pdf 87
Current meta-analytic estimate of replications of CVBW's Study 1 (random-effects):
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary:The claim that incidental exposure to money influences social behavior and beliefs does not (yet) appear to be replicable (overall meta-analytic effect: d = -.01 [+/-.05]). This appears to be the case whether money exposure is manipulated via instruction background images (Caruso et al., 2013, Study 1 & 4) or descrambling sentence task (Vohs et al., 2006, Study 3) and whether outcome variable is helping others (Vohs et al., 2006, Study 3), system justification beliefs (Caruso et al., 2013, Study 1), just world beliefs (Caruso et al., 2013, Study 2), social dominance beliefs (Caruso et al., 2013, Study 3), or fair market beliefs (Caruso et al., 2013, Study 4). Of course, it's possible money exposure reliably influences behavior under other (currently unknown) conditions, via other operationalizations, and/or using other experimental designs (e.g., within-subjects).

Original authors' comments: K. Vohs responded and mentioned Schuler & Wänke's (in press) replication of Caruso et al. (2013) was missing; this lead us to add Schuler & Wänke (in press) Study 2 (main effect) as a direct replication of Caruso et al. (2013) Study 2. Vohs pointed out several design differences between Grenier et al. (2012) and Vohs et al.'s (2006) original Study 3, but these deviations are minor (e.g., different priming stimuli, different help target); given Grenier et al. (2012) used the same general methodology as Vohs et al. (2006) Study 3 for the independent variable (unscrambling priming task) and dependent variable (offering help to code data sheets), the study satisfies eligibility criteria for a sufficiently similar direct replication according to Curate Science's taxonomy and hence was retained. Vohs also pointed out design differences between Tate (2009) and Vohs et al. (2006) Study 3; given Tate (2009) employed a different general methodology for the IV (background image on a poster instead of unscrambling task), the study does *not* satisfy eligibility criteria for a direct replication and hence was excluded. Finally, Vohs mentioned that "replication studies" for Vohs et al. (2006) are reported in Vohs (2015), however none of these studies were sufficiently similar methodologically to meet direct replication eligibility criteria and hence were not added.

Related Commentary

Social Embodiment

Zhong & Liljenquist (2006) -- Replications (11)   
Washing away your sins: Threatened morality and physical cleansing

[Original Abtract ]

Physical cleansing has been a focal element in religious ceremonies for thousands of years. The prevalence of this practice suggests a psychological association between bodily purity and moral purity. In three studies, we explored what we call the "Macbeth effect", that is, a threat to one's moral purity induces the need to cleanse oneself. This effect revealed itself through an increased mental accessibility of cleansing-related concepts, a greater desire for cleansing products, and a greater likelihood of taking antiseptic wipes. Furthermore, we showed that physical cleansing alleviates the upsetting consequences of unethical behavior and reduces threats to one's moral self-image. Daily hygiene routines such as washing hands, as simple and benign as they might seem, can deliver a powerful antidote to threatened morality, enabling people to truly wash away their sins.
Original Studies & Replications Data/Syntax Materials/Pre-reg N Effect size (r) [95% CI]
Zhong & Liljenquist (2006) Study 2 27
Gamez et al. (2011) Study 2 36
Earp et al. (2014) Study 1 Study-1.csv 153
Earp et al. (2014) Study 2 Study-2-USA.csv 156
Earp et al. (2014) Study 3 Study-3-India.csv 286
Siev (2012) Study 1 335
Siev (2012) Study 2 Spring08protocol.pdf 148
Current meta-analytic estimate of replications of Z&L's Study 2 (random-effects):
Zhong & Liljenquist (2006) Study 3 32
Gamez et al. (2011) Study 3 45
Fayard et al. (2009) Study 1 Study-1.sav 210
Current meta-analytic estimate of replications of Z&L's Study 3 (random-effects):
Zhong & Liljenquist (2006) Study 4 45
Gamez et al. (2011) Study 4 28
Fayard et al. (2009) Study 2 115
Reuven et al. (2013) washing-sins.sav 29
Current meta-analytic estimate of replications of Z&L's Study 4 (random-effects):
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary: The main finding that a threat to one's moral purity induces the need to cleanse oneself (the "Macbeth effect") does not (yet) appear to be replicable (overall meta-analytic effect: r = -.02 [+/-.05]). This appears to be the case whether moral purity threat is manipulated via recalling unethical vs. ethical deed (Studies 3 and 4) or transcribing text describing unethical vs. ethical act (Study 2) and whether need to cleanse onself is measured via desirability of cleansing products (Study 2), product choice (Study 3), or reduced volunteerism after cleansing (Study 4). Of course, it is possible the "Macbeth effect" is replicable under different operationalizations and/or experimental designs (e.g., within-subjects).

Original authors' comments: We shared a draft of the curated set of replications with both original authors, and invited them to provide feedback. Chenbo Zhong replied thanking us for the notice and mentioned two published articles that should potentially be considered (i.e., Denke et al., 2014; Reuven et al., 2013). Reuven et al. do indeed report a sufficiently close replication (in their non-OCD control group) of Zhong & Liljenquist's Study 4 and hence the control group replication was added (though we're currently clarifying an issue with their reported t-value).

Related Commentary

Bargh & Shalev (2012) -- Replications (14)   
The Substitutability of Physical and Social Warmth in Daily Life

[Original Abtract ]

Classic and contemporary research on person perception has demonstrated the paramount importance of interpersonal warmth. Recent research on embodied cognition has shown these feelings of social warmth or coldness can be induced by experiences of physical warmth or coldness, and vice versa. Here we show that people tend to self-regulate their feelings of social warmth through applications of physical warmth, apparently without explicit awareness of doing so. In Study 1, higher scores on a measure of chronic loneliness (social coldness) were associated with an increased tendency to take warm baths or showers. In Study 2, a physical coldness manipulation significantly increased feelings of loneliness. In Study 3, needs for social affiliation and for emotion regulation, triggered by recall of a past rejection experience, were subsequently eliminated by an interpolated physical warmth experience. Study 4 provided evidence that people are not explicitly aware of the relation between physical and social warmth (coldness), as they do not consider a target person who often bathes to be any lonelier than one who does not, all else being equal. Together, these findings suggest that physical and social warmth are to some extent substitutable in daily life and that this substitution reflects an unconscious self-regulatory mechanism.
Original Studies & Replications Data/Syntax Materials/Pre-reg N Effect size (r) [95% CI]
Bargh & Shalev (2012) Study 1a 51
Bargh & Shalev (2012) Study 1b 41
Donnellan et al. (2015a) Study 3 data(studies1-4).csv materials(studies1-4).pdf 210
Donnellan et al. (2015a) Study 5 data(studies5-9).csv materials(studies5-9).pdf 494
Donnellan et al. (2015a) Study 6 data(studies5-9).csv materials(studies5-9).pdf 553
Donnellan & Lucas (2014) 531
Donnellan et al. (2015a) Study 7 data(studies5-9).csv materials(studies5-9).pdf 311
Donnellan et al. (2015a) Study 8 data(studies5-9).csv materials(studies5-9).pdf 365
Donnellan et al. (2015a) Study 2 data(studies1-4).csv materials(studies1-4).pdf 480
McDonald & Donnellan (2015) 356
Ferrell et al. (2013) 365
Donnellan et al.(2015b) 291
Donnellan et al. (2015a) Study 1 data(studies1-4).csv materials(studies1-4).pdf 235
Donnellan et al. (2015a) Study 4 data(studies1-4).csv materials(studies1-4).pdf 228
Donnellan et al. (2015a) Study 9 data(studies5-9).csv materials(studies5-9).pdf 197
Current meta-analytic estimate of replications of B&S' Study 1 (random-effects):
Bargh & Shalev (2012) Study 2 75
Wortman et al. (2014) Appendix 260
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary: The notion that physical warmth influences psychological social warmth does not appear to be well-supported by the independent replication evidence (overall meta-analytic effect: r = .007 [+/-.035])), at least via Bargh and Shalev's (2012) Study 1 and 2 operational tests (Study 1: trait loneliness is positively associated with warmer bathing; Study 2: briefly holding a frozen cold-pack boosts reported feelings of chronic loneliness). Regarding first operational test, the loneliness-shower effect doesn't appear replicable whether (1) trait loneliness is measured using the complete 20-item UCLA Loneliness Scale (Donnellan et al., 2015 Studies 1-4) or a 10-item modified version of the UCLA Loneliness Scale (Donnellan et al., 2015 Studies 5-9, as in Bargh & Shalev, 2012 Studies 1a and 1b), (2) whether warm bathing is measured via a "physical warmth index" (all replications as in Bargh & Shalev, 2012 Study 1a and 1b) or via the arguably more hypothesis-relevant water temperature item (all replications of Bargh & Shalev Study 1), and (3) whether participants were sampled from Michigan (Donnellan et al., 2015 Studies 1-9), Texas (Ferrell et al., 2013), or Israel (McDonald & Donnellan, 2015). Of course, different operationalizations of the idea may yield replicable evidence, e.g., in different domains, contexts, or using other experimental designs (e.g., within-subjects). In a response, Shalev & Bargh (2015) point out design differences in Donnellan et al.'s (2015) replications that could have led to discrepant results (e.g., participant awareness not probed) and report three additional studies yielding small positive correlations between loneliness and new bathing and showering items (measured separately; r = .09 [+/-.09, N=491] and r = .14 [+/-.08, N=552]). These new findings, however, await independent corroboration (these additional studies not included in meta-analysis because they were executed by non-independent researchers, see FAQ for more details). In a rejoinder, Donnellan et al. (2015b) report an additional study that (1) probed participant awareness and found effect size unaltered by excluding participants suspected of study awareness (r=-.04, N=291 vs. r=-.05, N=323 total sample) and (2) found no evidence that individual differences in attachment style moderated the loneliness-showering link.

Original authors' comments: I. Shalev responsed stating that they've already publicly responded to these replications and have reported three additional studies in their response and that readers be referred to this article (Shalev & Bargh, 2015). B. Donnellan responded stating that several open questions remain including (1) unexplained anomalies in Bargh & Shalev's (2012) Study 1a data (i.e., 46 of the 51 participants (90%) reported taking less than one shower or bath per week) and (2) concerns regarding unclear exclusion criteria for Shalev & Bargh's (2015) new studies. Donnellan further stated that he's unconvinced by Shalev & Bargh's reply and that replication attempts by multiple independent labs would be the most constructive step forward.

Related Commentary


Strength Model of Self-Control -- Replications (32)  
Muraven, Tice, & Baumeister (1998) 
Self-control as limited resource: Regulatory depletion patterns
Baumeister, Bratslavsky, Muraven, & Tice (1998) 
Ego depletion: Is the active self a limited resource?

Original Studies & Replications Data/Syntax Materials/Pre-reg N Effect size (d) [95% CI]
Prediction 1: Glucose consumption counteracts ego depletion
Gaillot, Baumeister et al. (2007) Study 7 61
Cesario & Corker (2010) 119
Wang & Dvorak (2010) 61
Lange & Eggert (2014) Study 1 70
Current meta-analytic estimate of Prediction 1 replications (random-effects):
Prediction 2: Self-control impairs further self-control (ego depletion)
Muraven, Tice et al. (1998) Study 2 34
Murtagh & Todd (2004) Study 2 51
Schmeichel, Vohs et al. (2003) Study 1 24
Pond et al. (2011) Study 3 128
Schmeichel (2007) Study 1 79
Healy et al. (2011) Study 1 38
Carter & McCullough (2013) 138
Lurquin et al. (2016) data.s002.XLSX pre-reg.docx 200
Inzlicht & Gutsell (2007) 33
Wang, Yang, & Wang (2014) 31
Sripada, Kessler, & Jonides (2014) 47
Ringos & Carlucci (2016) protocol.pdf 68
Wolff, Muzzi & Brand (2016) protocol.pdf 87
Calvillo & Mills (2016) protocol.pdf 75
Crowell, Finley et al. (2016) protocol.pdf 73
Lynch, vanDellen et al. (2016) protocol.pdf 79
Birt & Muise (2016) protocol.pdf 59
Yusainy, Wimbarti et al. (2016) protocol.pdf 156
Lau & Brewer (2016) protocol.pdf 99
Ullrich, Primoceri et al. (2016) protocol.pdf 103
Elson (2016) protocol.pdf 90
Cheung, Kroese et al. (2016) protocol.pdf 181
Hagger et al. (2016) protocol.pdf 101
Schlinkert, Schrama et al. (2016) protocol.pdf 79
Philipp & Cannon (2016) protocol.pdf 75
Carruth & Miyake (2016) protocol.pdf 126
Brandt (2016) protocol.pdf 102
Stamos, Bruyneel et al. (2016) protocol.pdf 93
Rentzsch, Nalis et al. (2016) protocol.pdf 103
Francis & Inzlicht (2016) protocol.pdf 50
Lange, Heise et al. (2016) protocol.pdf 106
Evans, Fay, & Mosser (2016) protocol.pdf 89
Tinghög & Koppel (2016) protocol.pdf 82
Otgaar, Martijn et al. (2016) protocol.pdf 69
Muller, Zerhouni et al. (2016) protocol.pdf 78
Current meta-analytic estimate of Prediction 2 replications (random-effects):
[Underlying data (CSV)] [R-code]
Original Studies & Replications Independent Variables Dependent Variables Design Differences Active Sample Evidence
Prediction 1: Glucose consumption counteracts ego depletion
Gaillot, Baumeister et al. (2007) Study 7 sugar vs. splenda
video attention task vs. control
Stroop performance -
Cesario & Corker (2010) sugar vs. splenda
video attention task vs. control
Stroop performance No manipulation check Positive correlation between baseline & post-manipulation error rates, r = .36, p < .001
Wang & Dvorak (2010) sugar vs. splenda
future-discounting t1 vs. t2
future-discounting task -
Lange & Eggert (2014) Study 1 sugar vs. splenda
future-discounting t1 vs. t2
future-discounting task different choices in future-discounting task test-retest reliability of r = .80 across t1 and t2 scores
Prediction 2: Self-control impairs further self-control (ego depletion)
Muraven, Tice et al. (1998) Study 2 thought suppression vs. control anagram performance -
Murtagh & Todd (2004) Study 2 thought suppression vs. control anagram performance very difficult solvable anagrams used rather than "unsolvable"
Schmeichel, Vohs et al. (2003) Study 1 video attention task vs. control GRE standardized test -
Pond et al. (2011) Study 3 video attention task vs. control GRE standardized test 10 verbal GRE items used (instead of 13 analytic GRE items)
Schmeichel (2007) Study 1 video attention task vs. control working memory (OSPAN) -
Healy et al. (2011) Study 1 video attention task vs. control working memory (OSPAN) % of target words recalled (rather than total)
Carter & McCullough (2013) video attention task vs. control working memory (OSPAN) Effortful essay task vs. control in between IV and DV (perfectly confounded w/ IV)
Lurquin et al. (2016) video attention task vs. control working memory (OSPAN) 40 target words in OSPAN (rather than 48) Main effect of OSPAN set sizes on performance, F(1, 199) = 4439.81, p < .001
Inzlicht & Gutsell (2007) emotion suppression (video) vs. control EEG ERN during stroop task -
Wang, Yang, & Wang (2014) emotion suppression (video) vs. control EEG ERN during stroop task
Sripada, Kessler, & Jonides (2014) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) -
Ringos & Carlucci (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Wolff, Muzzi & Brand (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Calvillo & Mills (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Crowell, Finley et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Lynch, VanDellen et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Birt & Muise (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Yusainy, Wimbarti et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Indonesian language
Lau & Brewer (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Ullrich, Primoceri et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Elson (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Cheung, Kroese et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Hagger et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Schlinkert, Schrama et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Philipp & Cannon (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Carruth & Miyake (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Brandt (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Stamos, Bruyneel et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Rentzsch, Nalis et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Francis & Inzlicht (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Lange, Heise et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Evans, Fay & Mosser (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Tinghög & Koppel (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Otgaar, Martijn et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Muller, Zerhouni et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) French language
[Underlying data (CSV)] [R-code]

Summary: There appears to be replication difficulties across 6 different operationalizations of original studies supporting the two main predictions of the strength model of self-control (Baumeister et al., 2007). Prediction 1: Independent researchers appear unable to replicate the finding that glucose consumption counteracts ego depletion, whether self-control is measured via Stroop (Cesario & Corker, 2010, as in Gaillot et al., 2007, Study 7) or future-discounting task (Lange & Eggert, 2014, Study 1, as in Wang & Dvorak, 2010). Prediction 2: There also appears to be replication difficulties (across 4 distinct operationalizations) for the basic ego depletion effect. This is the case whether IV manipulated via thought supppression, video attention task, emotion suppression during video watching, or effortful letter crossing task and also whether DV measured via anagram performance, standardized tests, working memory, or multi-source interference task. Wang et al. (2014) do appear to successfully replicate Inzlicht & Gutsell's (2007) finding that ego depletion led to reduced activity in the anterior cingulate (region previously associated with conflict monitoring), however this finding should be interpretd with caution given potential bias due to analytic flexibility in data exclusions and EEG analyses. Of course, ego depletion may reflect a replicable phenomenon under different conditions, contexts, and/or operationalizations; however, the replication difficulties across 6 different operationalizations suggest ego depletion might be much more nuanced than previously thought. Indeed, alternative models have recently been proposed (e.g., motivation/attention-based accounts, Inzlicht et al., 2014; mental fatigue, Inzlicht & Berkman, 2015) and novel intra-individual paradigms to measure ego depletion have also emerged (Francis, 2014; Francis et al., 2015) that offer promising avenues for future research.

Original authors' and replicators' comments: B. Schmeichel pointed out a missing replication (Healy et al., 2011, Study 1) of Schmeichel (2007, Study 1); we've added the study, though are currently clarifying with K. Healey a potential issue with their reported effect size. F. Lange mentioned that effect sizes for the RRR ego depletion replications seemed off (also pointed out by B. Schmeichel); indeed, we inadvertently sourced the effect sizes from an RRR dataset that included all exclusions (these have now been corrected and match values reported in Figure 1 of Sripada et al. RRR article). M. Inzlicht responded that he's currently developing a pre-registered study of the basic ego depletion effect using a much longer initial depletion task and adapted to be effortful for everyone via a more powerful pre-post mixed-design. R. Dvorak stated their study was not a replication of ego depletion; we clarified that the Wang & Dvorak (2010) study is used as an original study whose finding is consistent with the glucose claim of Baumeister et al.'s (2007) strength model. J. Lurquin mentioned their effect size was d=0.22 (not d=0.21), but .21 is actually correct given we use Hedge's g bias correction, but still call it d because of its greater familiarity with researchers.

Related Commentary

Classic Social Psychology

Mood on Helping -- Replications (3)  
Isen & Levin (1972) 
Effect of feeling good on helping: Cookies and kindness
Levin & Isen (1975) 
Further studies on the effect of feeling good on helping

Original Studies & Replications Data/Syntax Materials/Pre-reg N Effect size (Risk Difference) [95% CI]
Isen & Levin (1972) Study 2 41
Blevins & Murphy (1974) 50
Levin & Isen (1975) Study 1 24
Weyant & Clark (1977) Study 1 32
Weyant & Clark (1977) Study 2 106
Current meta-analytic estimate of L&I Study 1 replications (random-effects):
Current meta-analytic estimate of all replications (random-effects):
[Underlying data & R-code]

Summary: The finding that positive mood boosts helping appears to have replicability problems. Across three replications, individuals presumably in a positive mood (induced via finding a dime in a telephone booth) helped at about the same rate (29.6%) as those not finding a dime (29.8%; meta-analytic risk difference estimate = .03 [+/-.19]; in original studies, 88.8% of dime-finding Ps helped compared to 13.9% of Ps in the control condition). This was the case whether helping was measured via picking up dropped papers (Blevins & Murphy, 1974 as in Isen & Levin, 1972, Study 2) or via mailing a "forgotten letter" (Weyant & Clark, 1977 Study 1 & 2 as in Levin & Isen, 1975, Study 1). These negative replication results are insufficient to declare the mood-helping link as unreplicable, however, they do warrant concern that perhaps additional unmodeled factors should be considered. For instance, it seems plausible that mood may influence helping in different ways for different individuals (e.g., negative, rather than positive, mood may boost helping in some individuals), and may also influence the same person differently on different occasions. Using highly-repeated within-person (HRWP) designs (e.g., Whitsett & Shoda, 2014) would be a fruitful avenue to empirically investigate these more plausible links between mood and helping behavior.

Original authors' comments: Report your research and results thoroughly, you may no longer be around when future researchers interpret replication results of your work!

Registered Replication Reports (RRR) @PoPS

RRR1 & RRR2: Alogna et al., ..., Zwaan (2014)  
Schooler & Engstler-Schooler (1990) -- Replications (23)   
Verbal overshadowing of visual memories: Some things are better left unsaid

[Original Abtract ]

It is widely believed that verbal processing generally improves memory performance. However, in a series of six experiments, verbalizing the appearance of previously seen visual stimuli impaired subsequent recognition performance. In Experiment 1, subjects viewed a videotape including a salient individual. Later, some subjects described the individual's face. Subjects who verbalized the face performed less well on a subsequent recognition test than control subjects who did not engage in memory verbalization. The results of Experiment 2 replicated those of Experiment 1 and further clarified the effect of memory verbalization by demonstrating that visualization does not impair face recognition. In Experiments 3 and 4 we explored the hypothesis that memory verbalization impairs memory for stimuli that are difficult to put into words. In Experiment 3 memory impairment followed the verbalization of a different visual stimulus: color. In Experiment 4 marginal memory improvement followed the verbalization of a verbal stimulus: a brief spoken statement. In Experiments 5 and 6 the source of verbally induced memory impairment was explored. The results of Experiment 5 suggested that the impairment does not reflect a temporary verbal set, but rather indicates relatively long-lasting memory interference. Finally, Experiment 6 demonstrated that limiting subjects' time to make recognition decisions alleviates the impairment, suggesting that memory verbalization overshadows but does not eradicate the original visual memory. This collection of results is consistent with a recoding interference hypothesis: verbalizing a visual memory may produce a verbally biased memory representation that can interfere with the application of the original visual memory.
Original Studies & Replications Data/Syntax Materials/Pre-reg N Effect size [95% CI]
Schooler & Engstler-Schooler (1990) Study 1 approved_protocol.pdf* 88
Michael et al. (2014, ONLINE MTURK) Study2Protocol.xlsx approved_protocol.pdf* 615
Alogna et al. (2014) lab_data_2.xlsx approved_protocol.pdf* 137
Birch (2014) BirchLab.xlsx approved_protocol.pdf* 156
Birt & Aucoin (2014) Study2.xlsx approved_protocol.pdf* 65
Brandimonte (2014) data2.xlsx approved_protocol.pdf* 100
Carlson et al. (2014) approved_protocol.pdf* 160
Dellapaolera & Bornstein (2014) approved_protocol.pdf* 164
Delvenne et al. (2014) approved_protocol.pdf* 98
Echterhoff & Kopietz (2014) DataStudy2.xls approved_protocol.pdf* 124
Eggleston et al. (2014) Study_2.xlsx approved_protocol.pdf* 93
Greenberg et al. (2014) exp2-data.xls approved_protocol.pdf* 75
Kehn et al. (2014) Study2.xlsx approved_protocol.pdf* 113
Koch et al. (2014) Schooler2.xlsx approved_protocol.pdf* 67
Mammarella et al. (2014) approved_protocol.pdf* 104
McCoy & Rancourt (2014) dataStudy2.xls approved_protocol.pdf* 89
Mitchell & Petro (2014) Replication_II.xlsx approved_protocol.pdf* 109
Musselman & Colarusso (2014) Data_Study2.xlsx approved_protocol.pdf* 78
Poirer et al. (2014) Study_2.xlsx approved_protocol.pdf* 95
Rubinova et al. (2014) Data2.xlsx approved_protocol.pdf* 110
Susa et al. (2014) Study2.xlsx approved_protocol.pdf* 111
Thompson (2014) Data2.xlsx approved_protocol.pdf* 102
Ulatowska & Cislak (2014) StudyII.xlsx approved_protocol.pdf* 106
Wade et al. (2014) Study_2.xlsx approved_protocol.pdf* 121
Current meta-analytic estimate of all lab replications (random-effects):
*RRR2 protocol identical to RRR1 except order of verbal description and filler task was switched.[Underlying data (CSV) & R-code]

Summary: The verbal overshadowing effect appears to be replicable; verbally describing a robber after a 20-minute delay decreased correct identification rate in a lineup by 16% (from 54% [control] to 38% [verbal]; meta-analytic estimate = -16% [+/-.04], equivalent to r = .17). Still in question, however, is the validity and generalizability of the effect, hence it's still premature for public policy to be informed by verbal overshadowing evidence. Validity-wise, it's unclear whether verbal overshadowing is driven by a more conservative judgmental response bias process or driven by a reduced memory discriminability process because no "suspect-absent" lineups were used. This is important to clarify because it directly influences how eye-witness testimony should be treated (e.g., if verbal overshadowing is primarily driven by a more conservative response bias process, identifications made after a verbal descriptions should actually be given *more* [rather than less] weight, see Mickes & Wixted, 2015). Generalizability-wise, in a slight variant of RRR2 (i.e., RRR1), a much smaller overall verbal deficit of -4% [+/-.03] emerged, when the lineup identification occured 20 minutes after verbal description (which occurred immediately after seeing robbery). Future research needs to determine the size of verbal overshadowing when there's a delay between crime and verbal description and before lineup identification, which better reflect real-world conditions.

Original authors' comments: We shared a draft of the curated set of replications with original authors, and invited them to provide feedback. Jonathan Schooler replied stating that the information seemed fine to him.

Related Commentary

Every year, billions of dollars (largely tax payers’ money) is spent funding scientific studies to deepen our understanding of the natural and social world, with the hope this new knowledge may help us overcome important societal problems (e.g., cancer, racism). However, the findings yielded by these studies can only be considered trustworthy knowledge ready to inform public-policy decisions once they have been successfully reproduced, replicated, and generalized by independent researchers. More generally, scientific findings are only useful (vis-à-vis return on investment) if they can be successfully reproduced, replicated, and generalized.

Curate Science is a community-oriented web application specifically designed to allow and help researchers (but also public-policy analysts and innovators) gauge the (1) reproducibility, (2) replicability, and (3) generalizability of published scientific findings.

1. Reproducibility: Reported findings are reproducible from the raw (or transformed) data.
2. Replicability: Reported findings are replicable in new independent samples.
3. Generalizability: Findings generalize to other methods, languages, cultures, and contexts, which helps ascertain the validity of findings.

Order matters: If a finding’s evidence isn’t reproducible, then it’s likely unwise to spend precious research money attempting to replicate such finding. Once a finding is deemed replicable, the reproducibility of original finding becomes a moot point, but it's then justified to spend more resources gauging the generalizability/validity of a finding (e.g., verbal overshadowing). Only once replicability and generalizability/validity have been established are findings “ready” to influence public-policy decisions.

Current focus: Replicability
Our current focus is designing components of the web application that facilitates gauging the replicability of findings, which includes developing a replication taxonomy and a new meta-analytic weighting system.

  • Replication Taxonomy: To meta-analytically analyze and interpret replications, we first need a system to decide what constitutes a replication in the first place. Though there's no universally agreed upon definition of direct replication, at Curate Science, a direct replication is a study employing (at minimum) the same general methodology (as in original study) for the relevant focal variables (e.g., IV and DV; "general methodology" defined as the same task or paradigm). A study more similar than this (e.g., a study using same task and stimuli, instructions, scale version) also qualifies as a direct replication; a study more dissimilar is considered a "conceptual replication" (see diagram below, click to expand). This (soft) demarcation aims to optimize the inherent tradeoff between replicability and generalizability to maximize cumulative knowledge development in relation to funding costs of executing replications.

    Elaboration of the tradeoff between replicability and generalizability: Highly similar replications afford high falsifiability to gauge replicability but at the cost of reduced generalizability whereas highly dissimilar replications afford high generalizability at the cost of reduced falsifiability (i.e., collectively fooling ourselves if negative results are repeatedly attributed to design differences, perhaps what may have happened in the social priming, embodiment, and ego depletion literatures).

    Highly similar replications, however, are very time-consuming and expensive to execute; hence in terms of efficient use of resources, direct replications should actually not be *too* similar to original studies to avoid spending precious resources determining a finding replicates only under very narrow circumstances (which means findings unlikely to be theoretically or practically important). To maximize research funding value then, initial replication attempts should only be generally similar to original study (i.e., execute a "similar direct replication"). If results replicate, increasingly **more dissimilar** designs should be used to determine the generalizability boundaries of original finding. If results don't replicate, increasingly **more similar** designs should be used to determine if original finding at least replicates under narrow circumstances (though this may be deemed an unwise use of resources for replicators and hence better left to original researchers).

    To clarify such nuances, let's consider two concrete examples applied to money priming studies: Grenier et al. (2012) used the same general methodology as Vohs et al. (2006) Study 3 for the independent variable (unscrambling priming task) and dependent variable (offering help to code data sheets), hence the study satisfies eligibility criteria for a sufficiently similar direct replication, even though the study design deviated in several other minor ways (e.g., different priming stimuli, different help target). Tate (2009), a replication originally considered but excluded after feedback from K. Vohs), employed a different general methodology for the IV (background image on a poster instead of an unscrambling task), hence the study does *not* satisfy eligibility criteria for a direct replication, even though it used the same general methodology for the DV (offering help to code data sheets).

  • Meta-analytic weighting system: Not all (replication) evidence is created equally (e.g., replications executed more transparently intuitively should be given more weight given they are more verifiable and hence more likely correct). Consequently, we're developing a new meta-analytic weighting system that assigns more weight to higher-quality replications based on (1) method-similarity and pre-registration status and (2) verifiability and reproducibility status of replication evidence. Replications more methodologically-similar to original and/or pre-registered are given more weight given lower possibility of (researcher) bias. Replication evidence that is more verifiable (i.e., open data, syntax, open materials) and/or endorsed as analytically reproducible is assigned more weight given correctness of results is higher. Below is an example weighting system guided by these principles:

    Focusing first on replication method similarity and pre-registration status, as can be seen, replication studies very similar to original (same IV, DV, stimuli, specific paradigm, physical setting) are given the most weight, with progressively less weight given to similar (same IV & DV) and dissimilar studies (only one of IV or DV similar); pre-registered studies assigned much larger weight across the board (NOTE: very dissimilar studies [everything can be different] are not considered replications, but under this scheme could nonetheless be considered albeit assigned the lowest weight). Turning to verifiability and reproducibility status, studies that are more verifiable (open data, open syntax, & open materials) are given more weight than studies that are increasingly less verifiable (only open data & open materials; least verifiable when no study components openly available), with studies whose evidence is independently endorsed as analytically reproducible given the most weight.

Note: These features are from an older version (2.0.4) of Curate Science. We will soon be releasing revamped UI designs and features based on a new curation framework (version 3.0.0).

Lightning-fast Search with Auto-complete

Our homepage will feature a lightning-fast search with auto-complete so that you can quickly find what you're looking for. To browse, you can select from the Most Curated or Recently Updated articles lists.

1 search page

Innovative Search Results Page

Easily find relevant articles via icons that indicate availability of data/syntax, materials, replication studies, reproducibility info, and pre-registration info. Looking for articles that have specific components available? Use custom filters to only display those articles (e.g., only display articles with available data/syntax)!

2 search results

Article Page: Putting it all Together

Our flagship feature is the consolidation and curation of key information about published articles, which all come together on the article page. The page will feature automatically updating meta-analytic effect size forest plots, in-browser R analyses to verify to reproducibility of results, editable fields to add, modify, or update study information, and element-specific in-line commenting.

3 article page 03

User Profile Dashboard Page

The user dashboard will display a user's recent contributions, a list of their own articles, reading and analyses history, recent activities by other users, and notifications customization.

4 profile dashboard
Main Team
Etienne P. LeBel
Founder & Lead
Alex Kyllo
Technical Advisor
Fred Hasselman
Lead Statistician

Advisory Board
Denny Borsboom
University of Amsterdam
Hal Pashler
University of California - San Diego
Daniel Simons
University of Illinois
Alex Holcombe
University of Sydney
E-J Wagenmakers
University of Amsterdam

Brent Roberts
University of Illinois - Urbana-Champaign
Eric Eich
University of British Columbia
Rogier Kievit
University of Cambridge
Leslie John
Harvard University
Brian Earp
Oxford University

Uli Schimmack
University of Toronto
Simine Vazire
Washington University in St. Louis
Axel Cleeremans
Universite Libre de Bruxelles
Brent Donnellan
Michigan State University
Richard Lucas
Michigan State University

Marco Perugini
University of Milan-Bicocca
Mark Brandt
Tilburg University
Joe Cesario
Michigan State University
Ap Dijksterhuis
Radboud University Nijmegen

Jeffry Simpson
University of Minnesota
Jan De Houwer
Ghent University
Lorne Campbell
Western University

Foundational Members
Christian Battista
Technical Advisor
Ben Coe
Technical Advisor
Stephen Demjanenko
Technical Advisor
Why are only "direct replications" considered on Curate Science?
"Direct replications" involve repeating a study using the same general methodology as the original study (except any required cultural or linguistic modifications). On the other hand, "conceptual replications" involve repeating a study using a different general methodology, to test whether a finding generalizes to other manipulations and measurements of the focal constructs (we argue such studies should thus more accurately be called "generalizability studies"). Curate Science only considers direct replications because only such studies can falsify original findings. Failed "conceptual replications" are completely ambiguous as to whether negative results are due to (1) the falsity of original finding or (2) the different methodology employed. Consequently, an over-emphasis on "conceptual replications", in combination with publication bias and unintentional exploitation of design and analytic flexibility, can grossly mischaracterize the evidence base for the reliability of empirical findings (Pashler & Harris, 2012; LeBel & Peters, 2011).
How methodologically close does a direct replication have to be to be added on Curate Science?
As mentioned, a "direct replication" involves repeating a study using the same general methodology as the original study. This means using the same experimental manipulation(s) for independent variables and the same measures for dependent/outcome variables. Minor deviations from the original methodology, however, are acceptable, including using different stimuli, different versions of a questionnaire (e.g., 18-item short-version of Need for cognition scale [NFC] instead of original 34-item version), and any cultural and/or linguistic changes required to execute a direct replication. For example, in Reuven et al.'s (2013) replication of Zhong & Liljenquist's (2006) Study 4, they used the same outcome measure (volunteer behavior), but used a slightly different operationalization (trichotomous rather than dichotomous volunteer choice). The important part is to disclose these design differences so that readers can judge for themselves the extent to which the design differences might be responsible for discrepant results (see Simonsohn (2016) for more on this). Indeed, in the near future, we will explicitly note any known design difference for curated replications (e.g., an icon, which when clicked, expands the row below revealing design differences for that replication).
What does "independent" in "independent replication" mean?
To prevent bias, replications must be carried out by researchers who are sufficiently independent from the researchers who executed the original studies. We conceptualize "sufficiently independent" following the "arm's length principle" used in law. In our context, this means that replicators have not co-authored articles with any original authors and also do not have any familial or interpersonal ties with any original authors.
What is Curate Science's official policy regarding soliciting feedback from original authors?
Curating replication results involves publically discussing original research, hence our policy is to contact authors of original studies under discussion *before* publicly posting curated information. Feedback from original authors will be used to improve the posted information (author comments may also be posted directly underneath replication results to further augment interpretation of results).
Who is Curate Science's intended audience?
Our primary audience is the community of academic, government, and industry researchers. However, we are designing the website so that the organized information is also useful to students, educators, journalists, and public policy makers (e.g., a journalist could look up an article to see whether limitations/flaws have been identified enabling them to write a more balanced news article). Our initial focus involves published articles in the life and social sciences (starting with psychology/neuroscience), however we may eventually expand to other areas.
What is curation?
Digital curation is the process of selecting, filtering, and extracting information as to increase its quality. Curate Science organizes, selects, filters, and extracts information from a diverse set of sources with the goal of increasing the quality of fundamental information about scientific articles.
Who can access and consume information about articles on Curate Science?
Anyone, including non-registered users, can lookup information on Curate Science.
Who can add, modify, and update article information on Curate Science?
Only registered users will be able to add, modify, and update information about articles. That being said, anyone that is affiliated with a research organization can become a registered user, as long as they provide their real names, email address, affiliation, and title (e.g., post-doc, graduate student, undergraduate student, research assistant).
How will you ensure quality control of the information posted about articles on Curate Science?
We will employ a two-stage verification process for some of the information whereby information initially posted will be labeled as "unverified" until a second user confirms it, at which time it will appear as verified. This will be the case for key statistics, independent replication information, and publication bias indices. Like Wikipedia, we will also have a revision history for each editable/updatable field showing which user changed what information on what date (and any notes regarding the edit left by the user).
Is it really feasible to organize and curate information for scientific articles? In other words, why would researchers be willing to spend their precious time curating information on Curate Science?
Our view is that researchers should be highly motivated to add and update information regarding published articles in their own area of research because there is an intrinsic interest to update the scientific record to more accurately reflect the totality of the evidence. We also expect -- just like what happened with Wikipedia -- that more influential and/or controversial articles will be curated first given that a large number of researchers are interested in these articles. Other articles will likely be curated commensurate to the level of interest commanded by the market, though of course article authors are free to curate their own articles as much as they want (and for good reasons, e.g. available data citation advantage, see Piwowar & Vision, 2013)
Who is funding Curate Science?
We have received a $10,000 USD seed grant from the Center for Open Science to help with initial development. Wellspring Advisors LCC have confirmed that they have allocated funds to support Curate Science as part of a renewal grant to be given to the Center for Open Science. Templeton Foundation has accepted our initial grant proposal and we will soon be submitting a full proposal to them. We are also currently in discussions with Sloan Foundation.
Is Curate Science associated with the Open Science Framework hosted by the Center for Open Science?
Though Curate Science has formed an informal partnership with the Center for Open Science (with respect to funding, see above), our web application is completely independent from the Open Science Framework (though we will of course be linking to available data, materials, and pre-registration information hosted on the OSF).
How is Curate Science different from,, and Harvard's Dataverse?,, and Harvard's Dataverse (and many other similar websites) are data repositories where researchers can make their data, syntax, and materials publicly available and get credit for doing so. Curate Science organizes, consolidates, and curates all of this publicly available information from as many different sources as possible at the study-level for all published articles with a DOI. Curate Science also provides a platform for the crowd to verify analyses and post comments regarding specific issues such as reproducibility of analyses, problems with posted materials/stimuli, etc.
How is Curate Science different from the Open Science Framework?
The Open Science Framework is a place for researchers to archive, collaborate on, and pre-register their research projects to facilitate researchers' workflow to help increase the alignment between scientific values and scientific practices (see more details on OSF's about page). In contrast, Curate Science, as its name implies, is focused primarily on the curation of scientific information tied to published articles by providing a platform for users to add, modify, update, and comment on published article's replication and reproducibility information (among other things, see features).
How is Curate Science different from is a highly useful website that was designed to overcome the pernicious file drawer problem in psychology where researchers can manually upload serious replication attempts whether they succeeded or failed. Curate Science aims to significantly build upon PsychFileDrawer's venerable efforts by automatically identifying as many extant independent replication results as possible (via text mining) and also will provide a simple interface for the crowd to add any missing replication results. Curate Science will also feature an innovative article page that visually depicts the complex inter-relationships between original and replication studies (to facilitate the difficult task of interpreting replication results) and which will also allow the crowd to curate key information about the original and replication studies (in addition to several other features).
Will I be able to post data/materials directly on an article page on Curate Science?
Yes, eventually. Our current focus is to organize and curate the available data/materials that are already hosted by the many public repositories that already exist. However, in our quest to radically simplify the fundamental scientific practice of sharing data/materials, we are working on forging a partnership with a major data repository website so that users can easily drag-and-drop data for their articles via our interface using our partner's infrastructure (indeed, we're currently in discussions with and the OSF in this regard).
Please sign up below to receive the Curate Science Newsletter to be automatically notifed about news and updates.

*Thanks to Felix Schönbrodt who is currently hosting Curate Science.