A collective approach to diagnosis

The diagnostic accuracy of groups of clinicians beat individuals, even subspecialists.

When it comes to diagnosis, two (or more) heads might be better than one, a recent study found.

Photo courtesy of Dr Barnett
Photo courtesy of Dr. Barnett

Using data from the Human Diagnosis Project, an online platform where clinicians (primarily physicians) and trainees solve user-submitted clinical cases, researchers compared the diagnostic accuracy of individuals versus independent solutions combined from groups of two to nine users. The researchers gathered individuals' differential diagnoses for each case to make a “collective” top three list of diagnoses. Of 2,069 users working on 1,572 cases, about 59% were residents or fellows, 21% were attendings, and 20% were medical students. Internal medicine was the dominant specialty in the sample, and diagnostic accuracy was defined as having the correct diagnosis in one's top three choices.

The bigger the group of individual solutions pooled for a case, the higher the likelihood of correct diagnosis, from 62.5% for individuals up to 85.6% for groups of nine, according to results published in March 2019 by JAMA Network Open. In addition, groups of two (77.7% accuracy) to nine (85.5% accuracy) randomly chosen nonspecialist physicians outperformed individual subspecialists (66.3% accuracy) on subspecialty cases.

ACP Hospitalist recently spoke with lead author Michael L. Barnett, MD, MS, assistant professor of health policy and management at the Harvard T.H. Chan School of Public Health in Boston.

Q: What led you to study this issue?

A: This is one really innovative way that physicians can actually get input on patient care without having to necessarily refer them to a whole bunch of other specialists before they get an answer. . . . There are all sorts of specific details we have to iron out around how this works in clinical practice, but that's what drew me to this.

Q: Were the results surprising?

A: The fact that basically, once you add two or three people, you get this enormous boost in confidence and it doesn't go away . . . was really surprising. You're still pretty far from having excellent accuracy, but you're a lot closer than you were initially. The other thing is that we were surprised by how low the accuracy was for specialists alone and that we could basically beat specialists with just two or three internists or students.

The other thing that I thought was really interesting and surprising, and actually a little bit funny, is that if you subtract medical students, it doesn't really change the overall collective accuracy in any way, but if you subtract attendings, it does decrease it but only to a very small degree. For me, that was actually a nice little validation that what we're looking at appears to be related to some kind of difficult-to-measure dimension of diagnostic skill.

Q: How could collective diagnosis work in a clinical setting?

A: The hospital might be one of the better places for this to be used, actually, and the reason why is because the stakes are higher, particularly if you don't know what a diagnosis is, and there tends to be a more focused, clear problem to solve, whereas sometimes if you have an outpatient encounter, there can be a more diffuse set of issues that you're trying to manage at the same time. The other way that it is suited well for the hospitalist environment is that this fits very well within a team framework, and in hospitals, many patients, particularly in academic medical centers, are almost always cared for by a team of some composition.

One way you could think of potentially implementing this is: What if, before you had rounds (let's say in an academic team), you actually had everybody submit their top three diagnoses for this patient before they see the note? . . . You may actually have a different priority when people are able to express themselves freely [versus] once you have a discussion and everyone gets anchored. That's kind of a speculative idea, and there's a lot of details you'd have to iron out to make that work well, but I think it does raise the question of, if two heads are better than one, how good are we at really creating a safe psychological environment where people can contribute the amount of information in their heads equally? . . . Your standard student, resident, attending dynamic is that people don't want to challenge your superior, even if they're very open and the team dynamic is fine.

Q: Are there downsides to this approach?

A: However this is implemented, it definitely adds another layer of work. It's possible that this approach may not really make sense to use universally for every case, but rather just target particularly challenging ones or ones where it's easy to be misled. But I think that extra complexity has a cost. Every time we add another layer of work to get something done, we lose something, which I think we've learned from electronic health records. . . . We really haven't road tested this yet, and we need to test this in a lot of environments, I think, before we understand exactly where it works best. This is kind of the equivalent of having a promising drug in a mouse study. It's still very early, there's still tons of development and ideas that need to be done to operationalize it, so we're probably still a few years away from this being . . . a well-established approach.

Q: What are your next steps?

A: We're actively exploring how we can do this in a clinical setting. . . . One of the applications that we think could be most promising for this is actually in low-resource settings where you may not even have a doctor but some other health professional. If that person is able to enter in enough clinical information, this approach could be used more from a crowdsource perspective.