This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Aging, is properly cited. The complete bibliographic information, a link to the original publication on http://aging.jmir.org, as well as this copyright and license information must be included.
The internet is commonly used by older adults to obtain health information and this trend has markedly increased in the past decade. However, studies illustrate that much of the available online health information is not informed by good quality evidence, developed in a transparent way, or easy to use. Furthermore, studies highlight that the general public lacks the skills necessary to distinguish between online products that are credible and trustworthy and those that are not. A number of tools have been developed to assess the evidence, transparency, and usability of online health information; however, many have not been assessed for reliability or ease of use.
The first objective of this study was to determine if a tool assessing the evidence, transparency, and usability of online health information exists that is easy and quick to use and has good reliability. No such tool was identified, so the second objective was to develop such a tool and assess it for reliability when used to assess online health information on topics of relevant to optimal aging.
An electronic database search was conducted between 2002 and 2012 to identify published papers describing tools that assessed the evidence, transparency, and usability of online health information. Papers were retained if the tool described was assessed for reliability, assessed the quality of evidence used to create online health information, and was quick and easy to use. When no one tool met expectations, a new instrument was developed and tested for reliability. Reliability between two raters was assessed using the intraclass correlation coefficient (ICC) for each item at two time points. SPSS Statistics 22 software was used for statistical analyses and a one-way random effects model was used to report the results. The overall ICC was assessed for the instrument as a whole in July 2015. The threshold for retaining items was ICC>0.60 (ie, “good” reliability).
All tools identified that evaluated online health information were either too complex, took a long time to complete, had poor reliability, or had not undergone reliability assessment. A new instrument was developed and assessed for reliability in April 2014. Three items had an ICC<0.60 (ie, “good” reliability). One of these items was removed (“minimal scrolling”) and two were retained but reworded for clarity. Four new items were added that assessed the level of research evidence that informed the online health information and the tool was retested in July 2015. The total ICC score showed excellent agreement with both single measures (ICC=0.988; CI 0.982–0.992) and average measures (ICC=0.994; CI 0.991–0.996).
The results of this study suggest that this new tool is reliable for assessing the evidence, transparency, and usability of online health information that is relevant to optimal aging.
Many people increasingly turn to the internet as a source of information, motivation, and support for healthy living and management of common health conditions [
Furthermore, access to online health information can help people stay up to date with emerging information about their health conditions and can facilitate shared decision-making between patients and health care providers [
As Khazaal et al [
In yet another review by Gagliardi and Jadad published in 2002 [
In 2005 Bernstam et al [
In 2006 Provost et al [
Finally, Breckons et al [
Clearly, a considerable amount of effort has been invested in the development of tools to assess the quality of online health information. However, it is not yet clear if there is one tool that is superior to all others with respect to being quick and easy to use and that reliably determines the quality of online health information. Furthermore, while quality assessment tools may help older adults more easily identify evidence-based information, a potentially more effective service might be one that compiles available online health information in one place, and assesses its quality. In particular, gateways or portals have been deemed particularly useful as they provide access to content that has been prescreened and deemed of high enough quality to be approved by a governing organization [
The McMaster Optimal Aging Portal (the Portal), launched in 2014, is a health information website that serves as such a gateway, providing access to online resources about healthy aging that have been preappraised for quality [
A search for instruments that assessed the quality of online health information was conducted through an electronic search of Medline from 2002 and 2012, a focused internet search, and through suggestions made by key informants. The search strategy used is described in
Relevant articles underwent a second relevance assessment to identify instruments within those articles that: (1) had been assessed for reliability, (2) assessed the quality of the evidence used to create online information, (3) had fewer than 15 criteria, and (4) were suitable for use by citizen raters.
Assessments were independently completed by two raters. All raters had achieved (or were in the final year of) an undergraduate degree at McMaster University, had been working with the Portal for 5-10 hours per week for 1-6 months, and received training from the project coordinator (SW).
Instruments retained from the second relevance assessment were then used to assess a sample of online health resources. Raters took note of how long it took to complete assessments for each instrument as well as how complex items within each instrument were to apply. Agreement between raters was assessed and the Portal team met to decide which instruments, if any, were appropriate for the purposes of the Portal. Assessments were completed by dyads with one assessor being a staff member (as described above for relevance assessment), and the second being a Lead of the Portal (MD, BH, JL; each of whom have decades of experience in evidence-based practice and appraisal of evidence) [
No one tool was deemed sufficient for its intended use for the Portal, so the development of a new instrument was begun. Items for the new instrument were crafted either anew by the Portal team or selected from the previously identified instruments. Items were developed and/or selected to meet the following expectations: (1) the answer needed to be dichotomous (Yes or No); (2) the items were suitable for assessing a Web resource on a website, rather than a website; (3) the information needed to assess the item would reasonably be included on the webpage of the resource; (4) had good reliability; and (5) was suitable for use by citizen raters. The items were organized into the following three categories: (1) the quality of the evidence which informed the Web resource, (2) the transparency of the resource development process, and (3) the usability of the resource. A guidance document explaining each item and how it should be rated was created and used to train raters, and was used as a resource while raters completed their assessments.
A set of 10 items was formally assessed for reliability in April 2014 using 120 Web resources relevant to healthy aging (2 raters, therefore a total of 240 ratings), with a second reliability assessment being conducted in July 2015 using a different set of 107 Web resources (214 ratings). The Portal used in this study employs a two-stage process for identifying and selecting Web resources. These tasks were completed by the same staff as described above for relevance assessment. In stage 1 internet searches are conducted to identify websites (worldwide) providing information relevant to healthy aging (ie, physical activity, nutrition, social engagement). Websites are assessed for the following criteria: the website is not funded by a company trying to sell products or services, content of the site is relevant to healthy aging, the website includes content intended for use by citizens, and the website is freely accessible. Websites meeting all of these criteria are deemed relevant, and move on to stage 2, which is identification and selection of Web resources housed on the website. Potentially relevant resources are uploaded to a content management system. Each Web resource is then assessed for the following: the resource is not funded by a company trying to sell products or services, the resource is relevant to healthy aging, the resource is intended for use by citizens, and the resource is less than 3 years old. Web resources meeting all four criteria then undergo quality assessment.
For this study a team of eight raters completed the quality assessments, with each Web resource being rated by two independent raters. Consistent with relevance assessment, all raters had achieved (or were in the final year of) an undergraduate degree at McMaster University and had been rating resources for 1-6 months part-time (5-10 hours per week). All raters received training on using the instrument. Ratings were conducted independently and conflicts were resolved through discussion. A third reviewer (MD or SW) resolved any conflicts in ratings. Data were exported in bulk from the online rating system into SPSS Statistics 22 software for statistical analyses.
Reliability between two raters for each item included in the instrument was assessed using the intraclass correlation coefficient (ICC). The ICC is defined as the correlation between one measurement on a target (in this case, the Web resource) and another rating on the same target [
ICC values were assessed for each individual item in both 2014 and July 2015. The overall ICC was assessed for the instrument as a whole in July 2015 once the final set of items was identified. A one-way random effects model was used to report the results; this model assumes that raters are randomly selected from a population of raters and different pairs of raters rate each product. Both the average and single measures were included in the analysis. Average measures calculate the mean reliability (selection of the same rating for the same criteria) of multiple raters. Single measures calculate the reliability of a single rater, accounting for any potential rater effect (ie, chance and error affecting variance in rater selections) [
Once duplicates were removed, 585 articles were identified, of which 19 were either an evaluation of an instrument assessing the quality of online information or a literature review of instruments assessing the quality of online information [
The DISCERN instrument is a 16-item instrument using a 5-point Likert scale rating system, which was developed by an expert panel to evaluate the reliability and quality of treatment information for a particular health problem [
Reliability assessment of Web Resource Rating criteria measured by intraclass correlation coefficient, April 2014. n=120 resources/240 ratings.
Criteria | Intraclass Correlation Coefficient (95% CI) | |||
|
|
Single measures | Average measures | |
|
|
|
||
|
1. Does the product comment on the quality of the evidence? | 0.929 (0.900-0.950) | 0.963 (0.948-0.975) | |
|
2. Does the product use language that communicates the strength of recommendation(s)? | 0.548 (0.410-0.662) | 0.708 (0.581-0.796) | |
|
|
|
||
|
3. Are sources provided for each claim/recommendation? | 0.728 (0.632-0.802) | 0.843 (0.774-0.890) | |
|
4. Authorship disclosure. Is the authors’ or editors’ name and affiliation disclosed? | 0.465 (0.313-0.594) | 0.635 (0.476-0.745) | |
|
5. Is advertising clearly labelled? | 0.838 (0.776-0.884) | 0.912 (0.874-0.939) | |
|
6. Is the date of creation within the last three years? | 0.822 (0.754-0.872) | 0.902 (0.860-0.932) | |
|
7. Is there a feedback mechanism? | 0.724 (0.627-0.799) | 0.840 (0.771-0.888) | |
|
|
|
|
|
|
8. Minimal scrolling | 0.489 (0.340-0.614) | 0.657 (0.508-0.761) | |
|
9. Logical flow | 0.660 (0.547-0.750) | 0.796 (0.707-0.857) | |
|
10. Accessibility (For text content: |
0.719 (0.620-0.795) | 0.836 (0.765-0.886) |
The results are presented in
Of the three items with ICCs <0.60, one was removed from the instrument (
The results of this reliability assessment illustrated that 11 of the 13 items had excellent ICC scores, and two (
The ICC of the total rating score for the 13 items, calculated with a one-way random model, has excellent reliability with both single measures (ICC=0.988; CI 0.982-0.992) and average measures (ICC=0.994; CI 0.991-0.996), as depicted in
Reliability assessment of Web Resource Rating criteria measured by intraclass correlation coefficient, July 2015. n=107 resources/214 ratings.
Criteria | Intraclass Correlation Coefficient (95% CI) | |||
|
|
Single measures | Average measures | |
|
|
|
|
|
|
1. Is the Web resource informed by published single studies? | 0.933 (0.904-0.954) | 0.965 (0.949-0.976) | |
|
2. Is the Web resource informed by published randomized controlled trials? | 1 | 1 | |
|
3. Is the Web resource informed by published systematic reviews/meta-analyses? | 1 | 1 | |
|
4. Is the Web resource informed by best practice guidelines? | 1 | 1 | |
|
5. Is the quality of the evidence reported? | 0.945 (0.921-0.962) | 0.972 (0.959-0.981) | |
|
6. Is the strength of recommendations provided? | 0.660 (0.538-0.755) | 0.795 (0.700-0.860) | |
|
|
|
|
|
|
7. Are peer-reviewed sources provided for each claim/recommendation? | 0.740 (0.641-0.815) | 0.851 (0.781-0.898) | |
|
8. Is the author’s or editor’s name and affiliations disclosed? | 0.942 (0.917-0.960) | 0.970 (0.957-0.980) | |
|
9. Is the advertising clearly labelled (or is there no advertising)? | 1 | 1 | |
|
10. Has the Web resource been created or updated within the last 3 years? | 0.926 (0.893-0.949) | 0.961 (0.943-0.974) | |
|
11. Is there a feedback mechanism? | 1 | 1 | |
|
|
|
|
|
|
12. Logical flow: is the information easy to follow? | 1 | 1 | |
|
13. Accessibility: does the Web resource offer options to access the information? |
0.944 (0.920-0.962) | 0.971 (0.958-0.980) | |
Total Score | 0.988 (0.982-0.992) | 0.994 (0.991-0.996) |
The purpose of this study was to determine if at least one instrument with proven reliability existed that was quick and easy to use for the assessment of online health information. If no such instrument was identified, the focus then became the development of a new instrument that was quick and easy to use, and to test the instrument for reliability. Although various quality assessment instruments specific to online resources exist, it was determined through this study that all identified instruments either had poor reliability or had not been assessed for reliability, had too many criteria to make the tool easy to use, or were not suitable for use by citizen raters.
As a result, a new instrument was created that incorporated items from existing instruments, as well as the development of new criteria. Formal reliability assessment, undertaken between April 2014 and July 2015, resulted in the identification of the 13 items included in the final version of the new instrument. The ICC assessment showed that–as of July 2015–the final set of 13 items had good-to-excellent reliability (ICC=0.660 to 1.0). Criterion 6 (
The one criterion eliminated due to low ICC during the reliability assessment was
As a result of this analysis, the new instrument can be recommended as reliable for assessing the quality of online health information, whether rated by one or two raters. It is important to place the results of this analysis within the context of other instruments available to assess the quality of online health information; however, the majority of these instruments have not been assessed for reliability. As a result, our comparison to other instruments is limited to DISCERN [
The new instrument was developed, and assessed for reliability through this analysis, to assess the
The data for this analysis came from ratings conducted by an established staff of trained raters. Although the ICC analysis takes into account the impact of untrained raters on assessments, ongoing analyses will be useful to verify this with a group of trainees or members of the public (eg, university student trainees contributing to the development of website content, including the rating of online Web resources).
Lastly, it is important to note that the new instrument assesses the process of resource development and not the accuracy of the information or congruency of the content with the latest high-quality evidence. In the development phase of this instrument, there was discussion about including criteria to rate the accuracy of online health information. However, our aim was to create a quality assessment instrument that was easy for anyone to use; an accuracy check requires subject matter expertise, and raters having access to the latest high-quality research and the ability to search, appraise, and interpret the messages of this research, which was deemed inappropriate for citizen raters. The final set of items included in the new instrument values the use of high-quality evidence in resource development as a proxy for measuring the quality of claims and recommendations included in the resource. This approach has been used by others with similar types of instruments [
This analysis not only illustrates that the new instrument is a reliable tool for assessing the quality of the process for developing online health information, but also supports the decision to move to a one-rater system for assessing Web resources. A small staff of 3-4 raters independently rate resources to publish on the McMaster Optimal Aging Portal; this saves considerable time, costs, and human resources toward the production of this content. Other practical implications of this analysis include the potential for external raters (eg, health professionals or citizens) to use this instrument to independently assess or design their own high-quality online health information. Future plans include making a copyrighted version of the instrument publicly available and using the instrument and ratings to provide guidance in developing high-quality online health information with health organizations and developers of health information websites. This new quality assessment instrument was designed to have a broad application, be adaptable to assess the quality of online health information relevant to topics across the health care continuum, and is intended for multiple audiences.
The instrument developed and assessed in this study has excellent interrater reliability for overall rating score and good-to-excellent reliability for individual rating criteria. The instrument can be recommended as highly reliable for the assessment of online health information.
Medline search output for web resource rating instruments.
Web resource rating tool.
intraclass correlation coefficient
Information Quality Tool
Quality Scale
This research was possible through funding provided by the Labarge Optimal Aging Initiative at McMaster University. We acknowledge the contribution of the expert leadership team in the development of the McMaster Optimal Aging Portal: Brain Haynes, MD, PhD, FRCPC, FACMI, MACP; John Lavis, MD, PhD; Anthony Levinson, MSc, MD, FRCPC; Parminder Raina, PhD; and Alfonso Iorio, MD, PhD, FRCPC. The authors would also like to thank the research assistants who performed the Web Resource Ratings included in the analysis.
MD coordinated writing of the manuscript with team members and finalized the manuscript for publication. SW coordinated the writing of the manuscript with team members and contributed to the final draft of paper. KR contributed to the background and discussion sections of the manuscript, conducted statistical analyses using SPSS, contributed to all drafts of the paper, and helped to finalize the manuscript for submission. KG contributed to the writing of the methods and results sections and reviewed manuscript drafts. RYN consulted on the statistical analyses, wrote components of the results and discussion sections, and reviewed all drafts. AJL contributed to the interrater reliability analysis methods, analyses of findings, and reviewed the final draft.
None declared.