I want to prove that my proposed machine learning algorithm (prop_ml) is better than other baseline algorithms (ml_1, ml_2, ml_3) when given a small number of data for training. What I’ve done is to split a dataset into train and test sets. Then, I’ve randomly selected small k samples (10, 20, 30, … 100) from the train set and used them to train the classifiers and used the test set for testing. I’ve replicated this 5 times to make sure I got some reliable results.
Now, I want to evaluate the results. Any suggestions on a statistical test that I can use to prove that the proposed ml is better or not? Thanks.