Data Types - Statistics

              Knowing the data types is very importand for the exploratory data analasys (EDA), which is one of the most underestimated parts of a machine learning project.

                   data

                                                              Figure 1 - Data Types

Categorical Data

              Categorical data represents characteristics. Therefore it can represent things like a person’s gender, language etc. Categorical data can also take on numerical values (Example: 1 for female and 0 for male). Note that those numbers don’t have mathematical meaning.

Nominal Data

              Nominal data is represented as “labels” or strings. Note that nominal data that has no order. Therefore if you would change the order of its values, the meaning would not change.

                   data

                                                              Figure 1 - Nominal Data

Ordinal Data

              Ordinal values represent discrete and ordered units. It is therefore nearly the same as nominal data, except that it’s ordering matters.

                   data

                                                              Figure 1 - Ordinal Data

Numerical Data

Discrete Data

              We speak of discrete data if the data can only take on certain values. This type of data can’t be measured but it can be counted. It basically represents information that can be categorized into a classification. An example is the number of heads in 100 coin flips.

You can check by asking the following two questions whether you are dealing with discrete data or not: Can you count it and can it be divided up into smaller and smaller parts?

Continuous Data

              Continuous Data represents measurements and therefore their values can’t be counted but they can be measured. An example would be the height of a person, which you can describe by using intervals on the real number line.

                            The Squid