This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Why do we clean our data before training?
Removing rows of data makes our model more powerful.
Cleaning data helps us select features that help the performance of the model.
Removing rows that have errors prevents these rows from misleading the training process.
What kind of data are best encoded with one-hot vectors?
Ordinal data
Categorical data with two possible values
Categorical data with three or more values
What is a data sample? What is a population?
A sample is all possible data we care about. A population is the subset of that data which we actually have on hand.
Both population and sample refer to data we use to train our model.
A population is all possible data we care about. A sample is the subset of that data which we actually have on hand.
You have a model that doesn't perform well. Which of these options definitely do not help improve its performance?
Adding more samples (rows).
Adding a few features (columns) that you know relate to what the model is trying to predict.
Adding a large number of features that you know have no relation to what the model is trying to predict.
You must answer all questions before checking your work.
Was this page helpful?