This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
What does it mean to have an imbalanced dataset?
The number of samples is smaller or greater than is required
The feature columns contain many missing values
There are many more training examples that correspond to some outputs (categories) than others
What information can we extract from a single confusion matrix?
Log loss, and/or mean squared error
Whether the dataset has overfitted the training set
What kind of mistakes the model is making
Why are measures like True Positives or Accuracy not used to train our models directly?
There are mathematical barriers that prevent these being used for some training regimens
Subtle model improvements often don't affect these metrics
Both of the above
You must answer all questions before checking your work.
Was this page helpful?