Researchers are questioning whether a commonly used gene expression classifier (GEC) test for evaluating indeterminate thyroid nodules lives up to expectations, based on a systematic review and meta-analysis published online July 18 in JAMA Otolaryngology-Head & Neck Surgery. The test is used to determine whether patients should be sent on for thyroid biopsy or surgery.
The researchers believe that their analysis of data from the Afirma GEC test (Veracyte) indicates the test may not live up to the results found in the pivotal trial used for the product's introduction (New England Journal of Medicine, August 23, 2012, Vol. 367:8, pp. 705-715). As a result, clinicians might want to recommend more aggressive follow-up of nodules classified as benign by the test.
A review of 19 studies involving the Afirma GEC test and a total of 2,568 indeterminate thyroid nodules showed that results differed significantly from what was reported in the initial validation study of the assay. In addition, the discrepancies are "not explained simply by the differences in the underlying prevalence of cancer," reported Dr. Pablo Valderrabano, PhD, a consultant endocrinologist in the Thyroid Cancer Unit of the endocrinology and nutrition department at Hospital Universitario Ramón y Cajal in Madrid, and colleagues.
"The findings suggest that the initial validation study cohort was not representative of the populations in whom the GEC has been used, calling into question its reported diagnostic performance, including its negative predictive value," Valderrabano et al wrote.
The authors expressed concern given how widespread GEC testing is -- with some 20,000 tests performed every year in the U.S., they noted. The test indicates whether nodules are benign or suspicious; patients whose GEC results show a benign nodule often avoid surgery.
"As a consequence, patients whose nodules remain unresected on the basis of a GEC-benign result should be monitored more closely and for longer periods than typically would be recommended for a thyroid nodule with a benign cytological diagnosis," the researchers advised.
Veracyte is disputing the results.
Checking into an accepted test
The current research was performed to evaluate the evidence base for the GEC test. In the pivotal study used for the GEC test's regulatory submission, the test's negative predictive value (NPV) ranged from 94% to 95% and its positive predictive value (PPV) was approximately 38%.
The GEC test's negative predictive value was high enough that many patients with GEC-negative nodules could avoid an operation, and its positive predictive value seemed high enough to justify operating on nodules deemed suspicious, Dr. Jennifer Marti, an assistant professor of surgery at Weill Cornell Medicine, and Dr. Ashok Shaha, a surgeon at Memorial Sloan Kettering Cancer Center, noted in an accompanying editorial, also published July 18.
However, after the product's introduction, investigators at various institutions reported positive predictive values that were in the 17% to 30% range, which was lower than expected, Marti and Shaha noted.
"It became evident that clinicians who used the test needed to know the prevalence of malignancy in indeterminate nodules in their practice setting, but collecting these numbers may not always be easily achievable," Marti and Shaha wrote.
The sample size for the initial study was "relatively small," with 129 atypical or follicular lesion of undetermined significance (A/FLUS) and 81 follicular neoplasm (FN) specimens, and confidence intervals were wide, Valderrabano and colleagues noted. A true, independent clinical validation study involving resection regardless of GEC results to narrow the confidence intervals and confirm results has not been done, they wrote.
"The GEC, however, immediately garnered wide acceptance in clinical practice, and patients with GEC-benign results started to be offered follow-up in lieu of a diagnostic surgical procedure," the investigators noted. "Furthermore, guidelines from major professional associations endorsed such an approach."
In addition to the pivotal study, postmarketing studies have been conducted by several groups. But these were not blinded, and the resection rates were low for GEC-benign nodules and high for GEC-suspicious nodules.
"Selective resection makes measuring the false-negative rate and NPV impossible and provides unreliable estimates of sensitivity, specificity, and underlying prevalence of cancer in the study cohort," Valderrabano and colleagues wrote.
"One approach to overcoming this fundamental limitation has been to treat GEC-benign nodules that do not develop evidence of cancer during follow-up as true-negative nodules," the authors acknowledged. "However, given the slow rate of progression of many early-stage thyroid cancers, this approach continues to underestimate the prevalence of cancer and the false-negative rate, artificially inflating the reported sensitivity, specificity, and NPV of the GEC."
To counter this limitation, they conducted a novel statistical analysis that involved looking at the benign call rate -- the proportion of nodules tested with a GEC-benign result -- and the positive predictive value, both of which they see as reliable in the postmarketing studies. The researchers conducted a PubMed search of studies done with the test for indeterminate thyroid nodules through October 2017.
"Because of the lack of resection of GEC-benign nodules, neither the sensitivity or specificity nor the NPV of the GEC could be calculated from postmarketing studies," Valderrabano and colleagues wrote. "Results of this present study, however, indicate that the true sensitivity and/or specificity of the GEC differed significantly from those reported in the initial study, suggesting that the cohort of nodules included in the initial study was not representative of the populations to which the test has subsequently been applied."
The researchers estimated that the positive predictive value in the studies was 45% overall, 46% for A/FLUS, and 41% for FN. They noted that NPV results were inconsistent across studies, with an estimated overall NPV of 88%.
"Additional studies appear to be needed to better understand the true GEC's diagnostic performance," they wrote. "Until such studies are available, the follow-up of unresected GEC-benign nodules should be more intense and prolonged than that recommended for thyroid nodules with benign cytological diagnosis."
Implications for GEC's successor?
The authors acknowledged that the Afirma GEC has been superseded by the newer genomic sequencing classifier (GSC) test, also manufactured by Veracyte. The GSC test was validated using the same patient cohort as the GEC one, so similar issues would apply; however, the method is substantially different, and the test may offer improved performance over GEC.
"Future studies will need to appropriately validate the GSC diagnostic performance," Valderrabano and colleagues advised.
Marti and Shaha noted that 15% to 30% of specimens from thyroid fine-needle aspiration specimens are indeterminate. These cases are difficult to manage, and in the past, many patients wound up needing surgery to get a diagnosis.
"The development of the Afirma GEC, a gene expression-based test, seemed to be the answer many surgeons and endocrinologists were hoping for," they wrote in their editorial.
The high NPV initially reported was not questioned, but in addition to PPV, that now also looks unreliable, Marti and Shaha wrote.
"Ultimately, this discrepancy illuminates the difference between efficacy (the performance of a diagnostic test under ideal and controlled circumstances) and effectiveness (the performance in real-world conditions)," they wrote. "Often, effectiveness underperforms efficacy because patients and physicians participating in initial prospective trials do not always perfectly represent the patients and physicians who end up using a drug, device, or diagnostic test. "
They suggested viewing reports of test results with a critical eye.
"With an understanding that the NPV of a GEC-benign result and the PPV of a GEC-suspicious result may not always match the number cited on the test report, clinicians may want to keep in mind the substantial costs of these assays to health systems, [payors], and patients. These tests should not be sent reflexively," Marti and Shaha wrote. "With increasing standardization of radiographic and cytological classifications, risk stratification based on clinical examination, ultrasonography characteristics, and cytological features can now be more informed than in years past."
The editorial authors also believe the trial has broader implications.
"These findings underscore the value of more rigorous evaluation of new diagnostic tests," they wrote. "Larger, multicenter pivotal studies prior to approval would be helpful, as would more real-world analyses conducted independently of industry sponsorship."
Veracyte flags study flaws
There are significant flaws in the method used to evaluate Afirma GEC's performance in the study, Giulia Kennedy, PhD, Veracyte's chief scientific and medical officer, commented in a statement provided to LabPulse.com. Bias was introduced by not including Afirma GEC benign nodules that are not operated on in their calculations, as these make up the vast majority of GEC benign nodules, she said.
"It is likely that the vast majority of such nodules are truly benign, meaning that the majority of true-negative results are excluded," Kennedy said.
Kennedy also takes issue with the combining of the benign call rate, which is evaluated using all samples, and the PPV, which is derived only from GEC suspicious cases that were operated on. At the same time, however, there are positive elements to the dataset, she maintained. Out of 1,071 nodules judged to be benign by the GEC test, only 23 (2.1%) were determined to be cancerous.
"This is an important finding that should reassure clinicians regarding the reliability of an Afirma GEC benign result," she said. "The Afirma GEC's ability to accurately rule out thyroid cancer is further proven in three long clinical utility studies published in peer-reviewed journals in which patients with Afirma GEC-benign nodules were followed for up to three years."
Kennedy also noted that GEC has been replaced with the GSC test, which has the same sensitivity and negative predictive value as its predecessor but with higher specificity.