Background
Type: Article

Machine Learning Evaluation Metric Discrepancies Across Programming Languages and Their Components in Medical Imaging Domains: Need for Standardization

Journal: IEEE Access (21693536)Year: 2025Volume: Issue: Pages: 47217 - 47229
Salmanpour M.R. Alizadeh M. Mousavi G. Sadeghi S. Amiri S. Oveisi M. Rahmim A. Hacihaliloglu I.Moradzadehdehkordi A.a Faghihi A. Alagöz Y. Mohammadi Hassanabadi A.
DOI:10.1109/ACCESS.2025.3549702Language: English

Abstract

This study evaluates metrics for tasks such as classification, regression, clustering, correlation analysis, statistical tests, segmentation, and image-to-image (I2I) translation in medical imaging domains. Metrics were compared across Python libraries, R packages, and Matlab functions to assess their consistency and highlight discrepancies. The findings underscore the need for a unified roadmap to standardize metrics, ensuring reliable and reproducible ML evaluations across platforms. This study examined a wide range of evaluation metrics across various tasks in medical imaging and found only some to be consistent across platforms, such as Accuracy, Balanced Accuracy, Cohens Kappa, F-beta Score, MCC, Geometric Mean, AUC, and Log Loss in binary classification; Accuracy, Cohens Kappa, and F-beta Score in multi-class classification; MAE, MSE, RMSE, MAPE, Explained Variance, Median AE, MSLE, and Huber in regression; Davies-Bouldin Index and Calinski-Harabasz Index in clustering; Pearson, Spearman, Kendall’s Tau, Mutual Information, Distance Correlation, Bicor, Percbend, Shepherd, and Partial Correlation in correlation analysis; Paired t-test, Chi-Square Test, ANOVA, Kruskal-Wallis Test, Shapiro-Wilk Test, Welch’s t-test, and Bartlett’s test in statistical tests; Accuracy, Precision, and Recall in 2D segmentation; Accuracy in 3D segmentation; MAE, MSE, RMSE, and R-Squared in 2D-I2I translation; and MAE, MSE, and RMSE in 3D-I2I translation. Given observation of discrepancies in a number of metrics (e.g. precision, recall and F1 score in binary classification, WCSS in clustering, and multiple statistical tests, amongst multiple metrics), this study concludes that ML evaluation metrics require standardization and recommends that future research use consistent metrics for different tasks to effectively compare ML techniques and solutions.INDEX TERMS 2D/3D medical images, consistency of evaluation metrics in multi-framework, evaluation metric roadmap, ML evaluation metrics. © 2013 IEEE.


Author Keywords

2D/3D medical imagesconsistency of evaluation metrics in multi-frameworkevaluation metric roadmapML evaluation metrics2-Engel groupsEndomorphisms of groupsNear-ringsp-Groups2-Engel groupsEndomorphisms of groupsNear-ringsp-GroupsFixed pointFuzzy metric spaceFuzzy quasi-contractive mappingCommon best proximity pointCommon fixed pointsWeakly proximally dominating mappingsFuzzy generalized contractive mappingB-metric spacesFatou propertyFixed pointsQuasi contraction mapsP-Quasi contraction mapsBest proximity pointOptimal approximate solutionProximal generalized contractioncommon fixed pointgeneralized Ćirić quasi-contraction mapsAltering distance approachBanach contraction principleContractive type mappingCyclic generalized contraction mapMetric spaceOptimal solutionUniformly convex Banach space(ω,δ)-contractionP-propertyPreordered metric spaceMetric-like spacePartial metric spaceCone metric spaceHausdorff metricSet-valued mapsCoercivity conditionsComplete metric spacesEkeland's variational principleEquilibrium problemLower semicontinuous functionsSet-valued quasi-contraction mapsAbstract convex metric spaceBest approximationCoupled best approximationCoupled coincidence pointHyperconvex metric spaceKKM propertyNormed spaceEndpointSet-valued fuzzy contraction mapTopologyApproximate endpoint propertySet-valued contractionCoincidence pointQuasi-lower semicontinuous mapWeakly inward mapGeneralized KKM mappingHyperconvex spacesCondensing mapA-RD-subinjective moduleKöthe ringRD-indigent moduleRD-injective moduleRD-subinjectivity domain

Other Keywords

Problem oriented languages2d/3d medical image3D medical imageClusteringsConsistency of evaluation metric in multi-frameworkEvaluation metric roadmapEvaluation metricsML evaluation metricRoadmapRegression analysisOptimizationCommon best proximity pointsCommon fixed pointExistence and uniquenessMetric spacesControlBest proximity pointOptimal approximate solutionsPoint theoremFixed point arithmeticFunctional analysisSet theoryBanach contraction principleComplete metric spaceCone metric spacesFixed point theoryFixed pointsFixed points theoremsHausdorff metricSet-valued mapTopologyFixed point theoremsSet-valued quasi-contraction mapsAbstractingApproximation theoryBest approximationsCoincidence pointsConvex metric spacesHyperconvex metric spaceKKM propertyNormed spacesShrinkageSemi-continuousSet-valued contractionMetric systemBest approximationCoincidence pointFixed pointQuasi-lower semicontinuous mapWeakly inward mapProgramming theoryVector quantizationGeneralized KKM mappingHyperconvex metric spacesHyperconvex spacesKuratowskiSet-valued mappings