This report is based on research conducted by the University of Alberta Evidence-based Practice Center under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No.
We found no association between individual NOS items or overall NOS score and effect estimates. Inter-rater reliability of the NOS varied from substantial for length of followup to poor for selection of non-exposed cohort and demonstration that the outcome was not present at outset of study. No statistically significant differences were found in ES when comparing studies categorized as high, unclear or low risk of bias. Inter-rater variability resulted more often from different interpretation of the tool rather than different information identified in the study reports. Inter-rater variability was influenced by study-level factors including nature of outcome, nature of intervention, study design, trial hypothesis, and funding source.
Inter-rater reliability of consensus assessments across four reviewer pairs was moderate for sequence generation (κ=0.60), fair for allocation concealment and “other sources of bias” (κ=0.37, 0.27), and slight for the remaining domains (κ ranging from 0.05 to 0.09). Inter-rater reliability between two reviewers was considered fair for most domains (κ ranging from 0.24 to 0.37), except for sequence generation (κ=0.79, substantial).