Considerations for public use datasets

Completed

Using public datasets is useful and easy as the data is collected and is free to use. However, there are important things that need to be kept in mind to ensure that there are no concerns regarding the quality, reliability, and availability of the data. Ask yourself the following questions before choosing a publicly available dataset.

  • What is the quality of the data in the dataset?
  • Analyze the probable data collection methods
  • Analyze the original purpose of the data
  • Check the author details or organization credentials
  • What is the availability of the data codebook or manual?
  • What kind of data is depicted in the dataset?
  • How many times has the dataset been used in the past by different people?
  • What are the variables available in the dataset and how are they defined?
  • What kind of analysis can define the measurement of the variables?
  • What is the actual or the estimated dataset? (How recent is the data?)
  • Is there any missing? What could be the possible place of origin of the data?

This video goes deeper into How to Prepare Data for Machine Learning and AI.