Does Auto-Machine Learning (AutoML) really exists?

26 Nov 2023

Automated machine learning (AutoML) has existed since 1990, it was considered as a silent revolution in the Artificial Intelligence (AI) field. When we analyze the term AutoML, we see that it refers to two words, Automated and Machine Learning.

Machine Learning with its different types of learning

Supervised (Labeled data)
Unsupervised (Unlabeled data)
Semi-supervised (A mixture between labeled and unlabeled data)
Reinforcement learning (learning from mistake)

AutoML aims to optimize and accelerate human tasks by improving everyday life. The list of examples could be very long, but I will mention a few: automatic waste classification, optimization of water filtering membranes maintenance, cyber security protocols improvement to detect attacks.

The “Auto” part refers to the automation algorithms of ML by using Machine Learning algorithms. In other words, we are taking the AI to another level, and that’s what leads the AutoML to become a hot topic in both Industry and Academia. However, the main question remains whether it is a real process or not.

The AutoML consists in optimizing all of the pipeline for a data science project. By this, we are referring to the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology with the main phases: Business understanding, data understanding, data preparation, modeling, evaluation and deployment. This methodology defines the step by step guide of this project. Outside of the “business understanding” phase ,the AutoML aims to automate the whole pipeline in order to facilitate the task to a non-expert in this field (For instance Cloud AutoML by Google for the vision).

Some of the advantages of AutoML:

1. A good background for data preparation

Cleaning (filter noisy) and formatting (coded value like categorical) data needs a good background for data preparation. With the AutoML we can accelerate this phase by a process in which we have a different way to format and detect the noise in data.

2. Avoiding using the default parameters in the models

Because searching for the best parameters needs a knowledge of the Grid Search & Random Search methods (tuning techniques that attempts to compute the optimum values of hyperparameters) in order to give a list of settings and then choose the best ones. This whole process can be time consuming and that is why AutoML is needed to solve the problem.

3. Simplification to create and manage models

Usually, the data scientist make a list of the interesting models according to the context and to the problem. This requires a deep knowledge and a business expertise in the field of data. AutoML makes this step easier because it is a pipeline with more models to use for most problems.

4. Deep Learning (DL) Optimization

The Deep Learning is a function that imitates the human brain in processing data and creating patterns to be used in the decision making process. To do so, we have to look for the best architecture of neural network for the specific problem. For example, with Keras, an open source library for Deep Learning, we need a lot of lines of code to make the best architecture. However, thanks to the method Auto-Keras (library for DL) of Machine Learning, we are now able to obtain a better result with way less lines.

Automated Machine Learning Libraries:

To discover in depth the advantages mentioned above, here are some libraries of Auto-Machine Learning:

The following phases

Data Cleaning
Hyperparameter selection
Model selection

Use and source: The Machine Learning Box (MLBox)

The following phases

Model selection
Hyperparameter tuning
Feature engineering

Use and source: H2O Auto-ML and Auto-sklearn

The following phases

Feature Selection
Feature Preprocessing
Feature Construction
Model Selection
Parameter Optimization

Use and source: TPOT stands for Tree-based and Pipeline Optimization Tool

The following phases

Automated DL architecture

Use and source: Auto-Keras Ludwig

animation-vid-auto-machine-learning

Data, analytics and automated personalization

Now, how can you successfully leverage data to optimize your digital strategy? Knowing what products, brands, and pricing fall into the “convenient-and-valuable-alternative” category is one thing. Predicting what your customer will like is the other one.

Analytics solutions, combined with next best offer solutions, automate the analysis of products that are eligible for a customer, and use a combination of business rules, machine learning algorithms and accurate customer 360° data to suggest the best matching offer for that customer.

In an online world, it is becoming more and more important to make this happen in real-time. As a marketer, you want to immediately grasp the moment your customer shows interest. You want to be able to quickly present an alternative product for something that is out of stock or that is perceived more valuable to a customer. In addition, you absolutely need to respect data privacy and contact preferences. Automating this entire process becomes, then, crucial into achieving relevance at scale and ROI.

Next to supporting relevant personalization, data & analytics can also provide interesting findings. In the same study, Ipsos not only discovered that consumers prefer value over premium, they also confirmed that many consumers, after having bought a more valuable alternative, were very satisfied, and even kept buying these products after the recession. In this case, the crisis permanently changed their behavior.

Smart data analysts will be able to keep track of these consumer behaviors, and closely follow up signals of the ones that stick. Sharing and acting on these insights will allow marketers to anticipate and prepare new personalization strategies. Measuring the outcomes of these strategies in their term, will confirm changed behavior or not, creating valuable feedback to search for new approaches.

AutoML won’t replace the data scientists

AutoML won’t replace the data scientists, so we do not need to worry colleagues (at least for now). However, we can see it is as a support to Data Scientists and a great way to facilitate this complex field for the non-experts, so they can benefit from the Machine Learning experience.

In addition, for a better illustration, we may consider Kaggle (a competition community of Machine Learning). In fact, humans have always won with models not generated by AutoML tools. As far as I know at least, AutoML didn’t win any contest of data science.

So, is there going to be a day when the pipeline generated by AutoML wins such competitions?

At the end, I hope that it was easy to understand for all of you. I am at your disposal to further discuss and answer your questions in the comments section 😉.