VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography.
Hieu T NguyenHa Q NguyenHuy Hieu PhamKhanh LamLinh T LeMinh DaoVan VuPublished in: Scientific data (2023)
Mammography, or breast X-ray imaging, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe/x) tools have been developed to support physicians and improve the accuracy of interpreting mammography. A number of large-scale mammography datasets from different populations with various associated annotations and clinical data have been introduced to study the potential of learning-based methods in the field of breast radiology. With the aim to develop more robust and more interpretable support systems in breast imaging, we introduce VinDr-Mammo, a Vietnamese dataset of digital mammography with breast-level assessment and extensive lesion-level annotations, enhancing the diversity of the publicly available mammography data. The dataset consists of 5,000 mammography exams, each of which has four standard views and is double read with disagreement (if any) being resolved by arbitration. The purpose of this dataset is to assess Breast Imaging Reporting and Data System (BI-RADS) and breast density at the individual breast level. In addition, the dataset also provides the category, location, and BI-RADS assessment of non-benign findings. We make VinDr-Mammo publicly available as a new imaging resource to promote advances in developing CADe/x tools for mammography interpretation.
Keyphrases
- contrast enhanced
- high resolution
- image quality
- deep learning
- magnetic resonance imaging
- electronic health record
- primary care
- big data
- artificial intelligence
- computed tomography
- climate change
- emergency department
- squamous cell carcinoma
- machine learning
- sensitive detection
- papillary thyroid
- mass spectrometry
- single molecule