Knowledge check

Completed
1.

You have a specific set of questions you want to ensure your chat application answers correctly. What is the best evaluation to verify that?

2.

Which model benchmark quantifies the semantic similarity between a ground source and the generated response?

3.

You want to evaluate how well the generated text adheres to grammatical rules. Which type of evaluation would be best to use?