232 - Semantic Segmentation of BraTS2020 - Part 1 - Getting the data ready
Table of Contents
Introduction
This tutorial guides you through the process of preparing data for semantic segmentation using the BraTS2020 dataset. Semantic segmentation is crucial in medical imaging for identifying and delineating different regions within scans. By the end of this tutorial, you will have the necessary data ready for training a model, specifically for brain tumor segmentation.
Step 1: Download and Unzip the Dataset
-
Download the Dataset
- Visit Kaggle's BraTS20 Dataset page.
- Download the dataset ZIP file.
-
Unzip the Dataset
- Extract the contents of the downloaded ZIP file to your desired local directory.
Step 2: Rename Segmented Files
-
Locate the Segmented File
- Within the unzipped folder, find the segmented file in the directory named "355".
-
Rename the File
- Rename this segmented file to match the naming convention of the other files for consistency.
Step 3: Install Necessary Libraries
- Install nibabel
- To handle NIfTI (.nii.gz) files, install the nibabel library. You can do this using pip:
pip install nibabel
- To handle NIfTI (.nii.gz) files, install the nibabel library. You can do this using pip:
Step 4: Scale the Volumes
- Use MinMaxScaler
- Scale all volumes using
MinMaxScaler
from thesklearn.preprocessing
library to ensure the pixel values are normalized.
- Scale all volumes using
Step 5: Combine Non-Native Volumes
- Combine the Volumes
- Merge the three non-native volumes (T2, T1CE, and FLAIR) into a single multi-channel volume. This can be done using numpy:
import numpy as np combined_volume = np.stack((T2, T1CE, FLAIR), axis=-1)
- Merge the three non-native volumes (T2, T1CE, and FLAIR) into a single multi-channel volume. This can be done using numpy:
Step 6: Reassign Pixel Values
- Reassign Pixels
- Change pixel values from 4 to 3, as label 3 is missing from the original annotations. You can perform this operation using:
combined_volume[combined_volume == 4] = 3
- Change pixel values from 4 to 3, as label 3 is missing from the original annotations. You can perform this operation using:
Step 7: Crop Volumes
- Crop to Region of Interest
- Remove unnecessary blank regions by cropping all volumes to a size of 128x128x128. This can be done using slicing techniques in numpy.
Step 8: Filter Volumes
- Drop Insufficiently Annotated Volumes
- Analyze the volumes and drop any where the percentage of annotated data is below a certain threshold to maximize training efficiency.
Step 9: Save Volumes as Numpy Arrays
- Save Useful Volumes
- Save all useful volumes to your local drive in the .npy format using:
np.save('path/to/save/volume.npy', combined_volume)
- Save all useful volumes to your local drive in the .npy format using:
Step 10: Split Data into Train and Validation Sets
- Create Train and Validation Datasets
- Split the image and mask volumes into training and validation datasets to prepare for model training.
Conclusion
You have successfully prepared the BraTS2020 dataset for semantic segmentation. The steps outlined include downloading and organizing the data, processing it for compatibility, and saving it in a usable format for training. In the next part of this series, you will learn how to define a custom data generator for your model.