Note

This is a Hugging Face dataset. Learn how to load datasets from the Hub in the Hugging Face integration docs.

Hugging Face

DeepLesion Benchmark Subset (Balanced 2K)#

This dataset is a curated subset of the DeepLesion dataset, prepared for demonstration and benchmarking purposes. It consists of 2,000 CT lesion samples, balanced across 8 coarse lesion types, and filtered to include lesions with a short diameter > 10mm.

Dataset Details#

  • Source: DeepLesion

  • Institution: National Institutes of Health (NIH) Clinical Center

  • Subset size: 2,000 images

  • Lesion types: lung, abdomen, mediastinum, liver, pelvis, soft tissue, kidney, bone

  • Selection criteria:

    • Short diameter > 10mm

    • Balanced sampling across all types

  • Windowing: All slices were windowed using DICOM parameters and converted to 8-bit PNG format

License#

This dataset is shared under the CC BY-NC-SA 4.0 License, as specified by the NIH DeepLesion dataset creators.

This dataset is intended only for non-commercial research and educational use.
You must credit the original authors and the NIH Clinical Center when using this data.

Citation#

If you use this data, please cite:

@article{yan2018deeplesion,
title={DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning},
author={Yan, Ke and Zhang, Yao and Wang, Le Lu and Huang, Xuejun and Summers, Ronald M},
journal={Journal of medical imaging},
volume={5},
number={3},
pages={036501},
year={2018},
publisher={SPIE}
}

Curation done by FiftyOne.
@article{moore2020fiftyone,
  title={FiftyOne},
  author={Moore, B. E. and Corso, J. J.},
  journal={GitHub. Note: https://github.com/voxel51/fiftyone},
  year={2020}
}

Intended Uses#

  • Embedding demos

  • Lesion similarity and retrieval

  • Benchmarking medical image models

  • Few-shot learning on lesion types

Limitations#

  • This is a small subset of the full DeepLesion dataset

  • Not suitable for training full detection models

  • Labels are coarse and may contain inconsistencies

Contact#

Created by Paula Ramos for demo purposes using FiftyOne and the DeepLesion public metadata.