Data Types - Statistics
Knowing the data types is very importand for the exploratory data analasys (EDA), which is one of the most underestimated parts of a machine learning project.
Figure 1 - Data Types
Categorical Data
Categorical data represents characteristics. Therefore it can represent things like a person’s gender, language etc. Categorical data can also take on numerical values (Example: 1 for female and 0 for male). Note that those numbers don’t have mathematical meaning.
Nominal Data
Nominal data is represented as “labels” or strings. Note that nominal data that has no order. Therefore if you would change the order of its values, the meaning would not change.
Figure 1 - Nominal Data
Ordinal Data
Ordinal values represent discrete and ordered units. It is therefore nearly the same as nominal data, except that it’s ordering matters.
Figure 1 - Ordinal Data
Numerical Data
Discrete Data
We speak of discrete data if the data can only take on certain values. This type of data can’t be measured but it can be counted. It basically represents information that can be categorized into a classification. An example is the number of heads in 100 coin flips.
You can check by asking the following two questions whether you are dealing with discrete data or not: Can you count it and can it be divided up into smaller and smaller parts?
Continuous Data
Continuous Data represents measurements and therefore their values can’t be counted but they can be measured. An example would be the height of a person, which you can describe by using intervals on the real number line.
The Squid