| Accuracy | C.-Dec. | C.-Conf. | |||||
|---|---|---|---|---|---|---|---|
| QO | QS | QT | Scd | Rcd | Shape | ||
| Arch. | Model | ||||||
| CNN | ConvNeXt L | 0.996 | 0.838 | 0.969 | 0.563 | 0.907 | 0.348 |
| RegNetY | 0.996 | 0.797 | 0.986 | 0.546 | 0.895 | 0.456 | |
| ResNeXt101 | 0.995 | 0.678 | 0.869 | 0.537 | 0.777 | 0.391 | |
| DPN92 | 0.992 | 0.408 | 0.754 | 0.446 | 0.585 | 0.263 | |
| ResNet101 | 0.989 | 0.326 | 0.848 | 0.363 | 0.593 | 0.181 | |
| VGG19 | 0.983 | 0.228 | 0.774 | 0.304 | 0.510 | 0.124 | |
| Vision Transf. | EVA02 L | 0.997 | 0.921 | 0.988 | 0.581 | 0.957 | 0.542 |
| BEiT | 0.996 | 0.782 | 0.974 | 0.544 | 0.881 | 0.486 | |
| ViT B16 | 0.994 | 0.699 | 0.931 | 0.528 | 0.820 | 0.455 | |
| Swin B | 0.994 | 0.683 | 0.911 | 0.527 | 0.802 | 0.321 | |
| Inception v3 | 0.990 | 0.488 | 0.668 | 0.521 | 0.584 | 0.285 | |
| VLM | FLAVA-full | 0.955 | 0.752 | 0.605 | 0.649 | 0.711 | 0.530 |
| Align-base | 0.969 | 0.677 | 0.621 | 0.618 | 0.670 | 0.489 | |
| CLIP ViT-B32 | 0.972 | 0.799 | 0.758 | 0.611 | 0.801 | 0.564 | |
| SigLIP-base | 0.975 | 0.829 | 0.816 | 0.602 | 0.843 | 0.478 | |
| CLIP RN101 | 0.967 | 0.605 | 0.776 | 0.537 | 0.714 | 0.246 | |
| Hybrid | SEResNeXt | 0.996 | 0.747 | 0.955 | 0.538 | 0.855 | 0.436 |
| CAFormer b36 | 0.997 | 0.758 | 0.983 | 0.534 | 0.873 | 0.415 | |
| SENet154 | 0.992 | 0.427 | 0.812 | 0.439 | 0.625 | 0.261 | |
| ConvFormer b36 | 0.994 | 0.477 | 0.955 | 0.426 | 0.720 | 0.277 | |