br In this paper we propose a method that extracts
In this paper, we propose a method that extracts richer multilevel features and integrates the advantages of the CNN and recurrent neural network (RNN), thus, the short-term and long-term spatial correlations between patches are preserved. We first split the high-resolution pa-thology images into small patches. Then, the CNN is used to extract the richer multilevel image features of each patch. Finally, the RNN is used to fuse the patch features to make the final image classification. For the 4-class classification task, we obtained an average accuracy of 91.3%, which outperforms the state-of-the-art method.
Additionally, cooperating with Peking University International Hospital, we released a dataset with 3771 breast cancer histopatholo-gical images, which led to an order of magnitude increase in the current dataset volume. Experimental results show that the average sensitivity for normal, benign, in situ carcinoma and invasive carcinoma improved by 2.9%, 16.4%, 7.8% and 2.3%, respectively, compared with the re-sults on the Bioimaging2015 dataset using the same method. It is especially worth emphasizing that, due to our dataset covering as many different subsets spanning different age groups as possible to ensure sufficient data diversity, the classification sensitivity of benign images improved significantly from 68.7% to 85.1%. This increase indicates that both a high-performance deep learning algorithm and a sufficiently large and diverse dataset are essential to improve the ability of histo-pathological image classification.
2. Related work
Although many studies have been conducted and a series of im-portant advances have been made in the automatic classification of Methods xxx (xxxx) xxx–xxx
breast cancer histopathological images, the characteristics of histo-pathological images, such as the inconsistency of tissue and cell mor-phology, the phenomenon of cell overlapping, the appearance varia-bility of stained histological sections and the uneven color distribution, have created considerable difficulties in image classification . These problems result in considerable challenges for automatic and precise classification of breast cancer pathological images. It should also be noted that the 23111-00-4 of pathological images is very high, which makes it impossible to directly transplant some methods that are suc-cessful in the field of natural images to the field of pathological images.
Early research methods for breast cancer pathological image clas-sification mainly focused on the 2-class classification of cancer and noncancer [9–14] or a more complex 3-class classification of normal, in situ carcinoma and invasive carcinoma [15,16]. Most of the works were carried out on the entire image or extracted nuclei using textural, morphological and architectural features based on the traditional ma-chine learning method. It is worth noting that most of the above clas-sification approaches were carried out on low-resolution images at different magnifications. In addition, these approaches used artificial-based feature extraction methods, which require not only considerable effort and professional domain knowledge but also have certain diffi-culties in extracting distinguishing high-quality features that seriously restrict the application of traditional machine learning methods in the classification of breast cancer histopathological images.
Later, deep learning methods  achieved remarkable results in a wide array of computer vision tasks. The most important deep learning methods are the CNN and the RNN. CNNs have been widely used in the classification of pathological images. Spanhol et al.  released a breast cancer pathological image dataset named BreaKHis. Based on the da-taset, they used the AlexNet network and used different integration strategies for classification, with a classification accuracy of 6% higher than traditional machine learning methods. Bayramoglu et al.  also used the magnification-independent deep learning method on the BreaKHis dataset, with a classification accuracy of approximately 83%. Araújo et al.  first considered 4-class classifications for breast cancer pathological images. They first extracted features based on a CNN similar to AlexNet and then used SVM to classify the extracted features. In contrast, RNNs are rarely used in pathological image clas-sification tasks. Unlike the CNN, the RNN can use its internal state to process input data, and this characteristic ensures that the RNN has long-distance memory.
Recently, several excellent CNN-based methods for automatic and precise classification of breast cancer pathological images were devel-oped for the ICIAR2018 challenge . These methods have sig-nificantly advanced the state-of-the-art. The core ideas of these methods are much the same. The high-resolution histopathological images are first preprocessed and data-enhanced and then divided into equal-sized patches, and each patch is classified or the features extracted by a CNN. An image-wise classification is then made based on the vote of patch-wise classification results or fusion of extracted features. Vesal et al.