This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
You have a specific set of questions you want to ensure your chat application answers correctly. What is the best evaluation to verify that?
Model benchmarks
Manual evaluations
Machine learning metrics
Which model benchmark quantifies the semantic similarity between a ground source and the generated response?
GPT Similarity
Coherence
Accuracy
You want to evaluate how well the generated text adheres to grammatical rules. Which type of evaluation would be best to use?
Automated evaluations
Risk and safety metrics
You must answer all questions before checking your work.
Was this page helpful?