Hi Schmidt-Lebuhn, Alexander (NCMI, Black Mountain)
Welcome to Microsoft Q&A Forum, thank you for posting your query here!
It is a standard procedure in machine learning to split the dataset in Train, Validation and Test. The evaluation metrics including F1 score, precision, accuracy etc. are derived from comparison of predicted data against validation data and is displayed on summary page once model has been trained. We can provide external source of images for testing new model, ideally which were not used in model training to check its accuracy.
Kindly refer below link: evaluate-the-trained-model
Thank You.