232 - Semantic Segmentation of BraTS2020 - Part 1 - Getting the data ready

3 min read 4 months ago
Published on Aug 16, 2024 This response is partially generated with the help of AI. It may contain inaccuracies.

Table of Contents

Introduction

This tutorial guides you through the process of preparing data for semantic segmentation using the BraTS2020 dataset. Semantic segmentation is crucial in medical imaging for identifying and delineating different regions within scans. By the end of this tutorial, you will have the necessary data ready for training a model, specifically for brain tumor segmentation.

Step 1: Download and Unzip the Dataset

  1. Download the Dataset

  2. Unzip the Dataset

    • Extract the contents of the downloaded ZIP file to your desired local directory.

Step 2: Rename Segmented Files

  1. Locate the Segmented File

    • Within the unzipped folder, find the segmented file in the directory named "355".
  2. Rename the File

    • Rename this segmented file to match the naming convention of the other files for consistency.

Step 3: Install Necessary Libraries

  1. Install nibabel
    • To handle NIfTI (.nii.gz) files, install the nibabel library. You can do this using pip:
      pip install nibabel
      

Step 4: Scale the Volumes

  1. Use MinMaxScaler
    • Scale all volumes using MinMaxScaler from the sklearn.preprocessing library to ensure the pixel values are normalized.

Step 5: Combine Non-Native Volumes

  1. Combine the Volumes
    • Merge the three non-native volumes (T2, T1CE, and FLAIR) into a single multi-channel volume. This can be done using numpy:
      import numpy as np
      combined_volume = np.stack((T2, T1CE, FLAIR), axis=-1)
      

Step 6: Reassign Pixel Values

  1. Reassign Pixels
    • Change pixel values from 4 to 3, as label 3 is missing from the original annotations. You can perform this operation using:
      combined_volume[combined_volume == 4] = 3
      

Step 7: Crop Volumes

  1. Crop to Region of Interest
    • Remove unnecessary blank regions by cropping all volumes to a size of 128x128x128. This can be done using slicing techniques in numpy.

Step 8: Filter Volumes

  1. Drop Insufficiently Annotated Volumes
    • Analyze the volumes and drop any where the percentage of annotated data is below a certain threshold to maximize training efficiency.

Step 9: Save Volumes as Numpy Arrays

  1. Save Useful Volumes
    • Save all useful volumes to your local drive in the .npy format using:
      np.save('path/to/save/volume.npy', combined_volume)
      

Step 10: Split Data into Train and Validation Sets

  1. Create Train and Validation Datasets
    • Split the image and mask volumes into training and validation datasets to prepare for model training.

Conclusion

You have successfully prepared the BraTS2020 dataset for semantic segmentation. The steps outlined include downloading and organizing the data, processing it for compatibility, and saving it in a usable format for training. In the next part of this series, you will learn how to define a custom data generator for your model.