The Four Common Types of Variables in Machine Learning

2024-04-08 | By Maker.io Staff

Understanding the types of variables, you can encounter when training an ML model is essential in choosing the correct preprocessing measures and algorithms for the task at hand. Therefore, this article introduces the four common types of variables commonly found when dealing with datasets for machine learning.

Nominal Variables

This type of variable describes categorical values, where the variable value is exactly one distinct value of a finite set of possible, mutually exclusive, and usually exhaustive values. Therefore, these variables represent distinct groups or categories without any inherent ordering or ranking. Examples of nominal variables include colors, cities, and animal types. Sometimes when dealing with questionnaires, however, binary values can also be present as nominal values: Yes/No, True/False, Agree/Disagree, etc.

As machine learning models can’t usually operate directly on nominal values, it’s important that they get converted to numeric values during preprocessing — for example, using techniques such as one-hot encoding. It’s crucial to remember that nominal variables don’t have a natural ordering, so the resulting numeric encoding must not rely on any ordering, either.

Ordinal Attributes

Similar to nominal values, ordinal variables also infer an order on the individual elements. However, there's still no notion of distance between the objects. Typical examples include likeness scales (e.g., strongly dislike, dislike, neutral, like, love) and education levels (e.g., primary, high school, bachelor's, master's.)

Ordinal values, just like nominal values, can be represented using numbers. Yet this time, the numbers can be chosen in a way that preserves the ordinal elements' natural order.

Interval Variables

Interval variables are continuous, numeric variables without a true zero point. These values have a consistent scale but no ratios between the individual values within that scale. Years and temperatures are prime examples of interval values. Between each degree Celsius, the distance is exactly 1, but it doesn't make sense to say that 20°C feels twice as hot as 10°C. Therefore, the sum and product are also not defined for interval values.

Ratio Values

The Four Common Types of Variables in Machine Learning Courtesy of Pixabay

Ratio variables have an actual, meaningful zero point, allowing for ratios between all values. In addition, these variables have a consistent and uniform scale, and the distance of a variable to itself is always zero. Due to these properties, all arithmetic operations are permitted, including division and multiplication.

These variables are often used in regression tasks and usually require little to no preprocessing. However, detecting and removing outliers and normalization may still be necessary, depending on the method.

Summary

Understanding variable types is crucial for you to choose appropriate preprocessing measures and algorithms. Typically, you might encounter four common types of variables in machine learning: nominal, ordinal, interval, and ratio.

Nominal variables represent distinct categories without inherent ordering, such as colors or animal types. They can be represented using one-hot encoding. Ordinal variables additionally have an order, like likeness scales or education levels, but lack a notion of distance. A numeric representation can preserve the order of ordinal attributes. Interval variables, such as years or temperatures, are numeric but lack a valid zero point, making ratios and some arithmetic operations undefined. Ratio variables like height or weight have an actual zero point, allowing ratios and all arithmetic operations.

You must consider these variable types during preprocessing to select suitable techniques like numeric encoding or order preservation. Understanding each type's characteristics helps handle data effectively for ML tasks.

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.