br The main strength of
The main strength of our study is the use of international con-sensus guidelines [28,29] to train the CNN to delineate a large number of OARs (16), including the different PCMs and laryngeal sub-volumes. In case delineation guidelines were to be modified in the future, the CNN could be easily retrained to adapt to these changes. A possible limitation is that the contours used to train the CNN in this study were delineated by only one RO, possibly introducing some observer bias despite following consensus guide-lines. Nevertheless, the use of the automated delineation tool was shown to result in a shorter delineation time for both ROs in this study and in less IOV between them. Another possible limitation is that only two RO were used in this study. However, we did have a third RO experienced in HNC verify and approve all contours to ensure their clinical validity. This third RO made no modifications to the manually and corrected contours from the other two ROs. We believe that if delineations from more ROs would have been used, there would have been even more difference in manual delineation IOV.
Now that we have shown good performance of this network, implementation of this tool in clinical practice could be especially beneficial to ROs in training. It could reduce delineation time, and facilitate the recognition of OARs, thus benefitting training and resulting in a steeper learning curve, provided of course, that the delineation guidelines are consulted and feedback is given to them by a supervisor. Moreover, with more efficient OAR delineation, more time could be spent on other aspects of RT, such as delin-eation of TVs and clinical follow-up of patients. Automated delin-eation is also very relevant for adaptive RT regimens. When patient anatomy or tumour volume change during treatment, re-contouring is very labour-intensive. Automated delineation could make this process significantly more efficient by either providing new contours or by correcting contours provided by deformable image registration from previous CT scans . Although with photon therapy not all HNC patients are eligible for adaptive RT,
Fig. 3. Network accuracy, corrections needed by Ifenprodil hemitartrate oncologist 1 (A) and 2 (B) vs manual interobserver variability. Network accuracy quantified by average corrections needed before the automated delineations were clinically acceptable. This was compared to interobserver variability between manual delineations. Each data point represents an organ at risk from one patient. For all structures in the grey zone, the corrections are smaller than variability in clinical practice. Abbreviations: ASSD: average symmetric surface distance; mm: millimetres; Acc: accuracy of network; RO: radiation oncologist; PCM: pharyngeal constrictor muscles; PG: parotid gland; SG: submandibular gland; U: upper; S: supra; IOVm: manual interobserver variability.
Evaluation of intra- and inter-observer variability between the manual and the corrected automated OAR delineations. Intra-observer variability (IOV1, IOV2) is assessed by comparing the manual to the corrected delineations for each 3D OAR for either observer separately (RO1, RO2) using DSC and ASSD. Inter-observer variability (IOVm, IOVc) is assessed by comparing the delineations of RO1 to those of RO2 for each 3D OAR for the manual and corrected delineations separately. All values are reported as mean ± STD for all patients (n = 15). Statistically significant differences (p < 0.05) in inter-observer variability for the corrected versus the manual delineations (IOVc vs IOVm) are indicated in bold.
Intra (manual vs corrected)
Abbreviations: OARs: organs at risk; DSC: dice similarity coefficient; ASSD: average symmetric surface distance; STD: standard deviation; RO: radiation oncologist; PCM:
pharyngeal constrictor muscles; PG: parotid gland; SG: submandibular gland; U: upper; S: supra.
the need for adaptive RT will presumably be higher for proton ther-apy [33,34].
To conclude, we validated a CNN trained for automated delin-eation of OARs in HNC patients based on international consensus guidelines in a clinical setting, and showed that automated delin-eation is not only significantly more efficient than manual delin-eation, but also reduces interobserver variability. The automated delineations mainly require only minor corrections before they are approved for treatment planning. The CNN has therefore been implemented in clinical practice in our centre and corrections made to the generated delineations can be used to further train the CNN in the future.
Declaration of Competing Interest
Siri Willems is supported by a Ph.D. fellowship of the research foundation – Flanders (FWO) mandate 1SA6419N. David Robben
is supported by an innovation mandate of Flanders Innovation & Entrepreneurship (VLAIO) innovation mandate HBC.2017.0187. Frederic Maes was supported by: Internal Funds KU Leuven under grant number C24/18/047.