Feedback

Lightweight Visual Transformers Outperform Convolutional Neural Networks for Gram-Stained Image Classification: An Empirical Study

ORCID
0000-0002-2826-2796
Affiliation
Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany
Kim, Hee E.;
ORCID
0000-0002-1589-8699
Affiliation
Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany
Maros, Mate E.;
ORCID
0000-0001-7451-5857
Affiliation
Institute of Medical Microbiology and Hygiene, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany
Miethke, Thomas;
ORCID
0000-0002-2580-5038
Affiliation
Institute for Clinical Chemistry, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany
Kittel, Maximilian;
ORCID
0000-0002-9673-5030
Affiliation
Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany
Siegel, Fabian;
ORCID
0000-0001-6864-8936
Affiliation
Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
Ganslandt, Thomas

We aimed to automate Gram-stain analysis to speed up the detection of bacterial strains in patients suffering from infections. We performed comparative analyses of visual transformers (VT) using various configurations including model size (small vs. large), training epochs (1 vs. 100), and quantization schemes (tensor- or channel-wise) using float32 or int8 on publicly available (DIBaS, n = 660) and locally compiled ( n = 8500) datasets. Six VT models (BEiT, DeiT, MobileViT, PoolFormer, Swin and ViT) were evaluated and compared to two convolutional neural networks (CNN), ResNet and ConvNeXT. The overall overview of performances including accuracy, inference time and model size was also visualized. Frames per second (FPS) of small models consistently surpassed their large counterparts by a factor of 1-2×. DeiT small was the fastest VT in int8 configuration (6.0 FPS). In conclusion, VTs consistently outperformed CNNs for Gram-stain classification in most settings even on smaller datasets.

Cite

Citation style:
Could not load citation form.

Access Statistic

Total:
Downloads:
Abtractviews:
Last 12 Month:
Downloads:
Abtractviews:

Rights

License Holder: © 2023 by the authors.

Use and reproduction: