You are reviewing four papers submitted to a conference on machine learning for medical expert systems. All the four papers validate their superiority on a standard benchmarking cancer dataset, which has only $5 \%$ of positive cancer cases. Which of the experimental settings is acceptable to you?
paper i) We evaluated the performance of our model through a $5$-fold cross-validation process and report an accuracy of $93 \%$.
paper ii) The area under the $\text{ROC}$ curve on a single left-out test set of our model is around $0.8$, which is the highest among all the different approaches.
paper iii) We computed the average area under the $\text{ROC}$ curve through $5$-fold cross-validation and found it to be around $0.75$ - the highest among all the approaches.
paper iv) The accuracy on a single left-out test set of our model is $95 \%$, which is the highest among all the different approaches.
- $\text{paper i}$
- $\text{paper i and paper iv}$
- $\text{paper ii and paper iv}$
- $\text{paper iii}$