HECD: Pipeline for H and E image preprocessing and cell detection
Contents
Introduction
- To enhance the cell detection accuracy, the preprocessing of H&E is necessary and helpful. Usually, the stain normalization is first applied on H and E image, then the blue ratio transformation is done to make the nuclei outstanding. Next, the OTSU thresholding method is adopted to remove the information in the background.
- The key idea for cell detection is the nuclei detection. For the normal cell, the nuclei size is typically invariant across cells (e.g., Lymphocytes). Thus, to detect the nucleus, a moving block method is considered to extract the training/testing samples and corresponding label information. A bounding box with fixed size screens the whole image. For each patch image, if the nuclei are at the center, the patch image is labeled as positive, otherwise, negative.
- Normally, there is an issue in data unbalance. To address it, the data augmentation (e.g., flip, rotation, and add noise) is considered to increase the number of positive samples.
- Different neural networks are considered here: deep CNN (VGG16), ResNet, CapsNet, and convolutional autoencoder.
Pipeline
All the codes are in the folder named code
1. Stain normalization
Go to the folder stain_normalisation_toolbox. Run the MATLAB function run-all.m. You need to change the data folder path in the function according to you case.
2. Deep CNN
Go to the folder nuclei_norm.
First, run the python code data_norm.py, which will preprocess the normalized images to extract the patch images and corresponding labels. The augmentation will be called if necessary. You need to change the data folder path in the function according to you case.
Second, run the python code train_norm.py, the model will be trained and the performance on the testing dataset will be assessed as well.
The shell scripts for running on server are available within the same folder.
3. ResNet
Go to the folder nuclei_resnet.
First, run the python code data_norm.py, which will preprocess the normalized images to extract the patch images and corresponding labels. The augmentation will be called if necessary. You need to change the data folder path in the function according to you case.
Second, run the python code train_resnet.py, the ResNet will be trained and the performance on the testing dataset will be assessed as well.
The shell scripts for running on server are available within the same folder.
4. CapsNet
Go to the folder CapsNet-Keras.
First, run the python code data.py, which will preprocess the normalized images to extract the patch images and corresponding labels. The augmentation will be called if necessary. You need to change the data folder path in the function according to you case.
Second, run the python code capsulenet.py, the CapsNet will be trained and the performance on the testing dataset will be assessed as well.
The shell scripts for running on server are available within the same folder.
5. Convolutional autoencoder (CAE)
Go to the folder crcnucleus.
Run the python code test_xcae_crcnucleus.py, the CAE will be trained and the performance on the testing dataset will be assessed as well. You need to change the data folder path in the function according to you case.
Key words: H and E; CNN.