Member-only story

Choosing an ML estimator

Soumendra's Blog
7 min readMar 30, 2019

--

Approaching the problem

Problem

Do you want to predict a category (classification), a quantity (regression), an anomaly (anomaly detection), finding a relationship between variables in different databases (association rules or recommendation) or do you want to discover structure in unexplored data (clustering)?

Available resources

The quantity, quality, variety of data available play an important role. Typically, it is said that if you have more than 1,00,000 data points you can apply almost all algorithms. (SAP Conversational AI, n.d.) The number of classes labeled data matter.

Based on Sci-kit learn minimum 50 samples are needed to start with ML approach.[1]

Constraints

What is your data storage capacity? Depending on the storage capacity of your system, you might not be able to store Gigabytes of classification/regression models or gigabytes of data to clusterize. This is the case, for instance, for embedded systems.

Does the prediction have to be fast? In real-time applications, it is obviously very important to have a prediction as fast as possible. For instance, in autonomous driving, it’s important that the classification of road signs be as fast as possible to avoid accidents, obviously…

Does learning have to be fast? In some circumstances, training models quickly are necessary: sometimes, you need to rapidly update, on the fly, your model…

--

--

Soumendra's Blog
Soumendra's Blog

Written by Soumendra's Blog

AI Architect and Sustain Lead at PepsiCo

No responses yet