Real-world Machine Learning applied at enterprise data involves complex feature engineering steps. Flat tables are seldom sufficient to represent all the information needed to produce insights or predictions. Although, flat tables are good at capturing the static features within the training data, certain dynamic features are difficult to represent with flat tables. For instance, if you are trying to predict the probability of a user to churn, the flat table can represent the static attributes of the user such as age, geography, device, etc., however, dynamic features such as browsing activity, payment history, etc., cannot be captured easily using flat tables. Although you can manually engineer features for flat tables, but it is time consuming, error prone, and often does not allow ML algorithms to capture maximum information. This is where Abacus.AI's Nested Features comes into play. At the training stage, our AutoML and deep learning models are designed to extract maximum information from Feature Groups and Nested Feature Groups.
Let's say that you have two datasets, "user_info" and "interaction_data". The dataset "user_info" has several static features such as age, city, memberid, etc., and the dataset "user_interaction_data" has features like memberid, itemid, item_kept, category, etc. When you upload/attach these two datasets to your project, corresponding feature groups will be created for them with their respective table names as entered by you. Now, to take advantage of the user data as well as all the interaction data that the user has with the items in the catalog, you would nest the user_info feature group with an additional feature such that the "interaction_data" feature group will be encapsulated within your new feature. This is performed with the help of a simple "Using Clause". You can also specify a "Where Clause" and an "Order by Clause" for the join. Thus, interaction_data will be treated as Nested Feature Group for the user_info feature group.
The first step is to click on the "Feature Groups" tab at the left navigation and select the "Features" sub-tab. Make sure that you have selected the user_info feature group at the top. Next, Click on the "Add Nested Feature" button. Enter the name of the feature group, the table to be referenced for creating the nested feature group (interaction_data for the current example), and the column/feature to be used under "Using Clause" as shown below:
Click on the "Add Nested Feature" button. You will find your nested feature as the bottom-most feature:
You can visualize the data within this feature group by materializing the feature group. Click on the "Materialize Latest Version" button and wait for the process to finish. Once it finishes, click on the "View" button under the "Data" column of the feature group version to visualize the data:
Notice the downward pointing arrow on the rows. Click on any of them to see the nested feature group data for that row:
Scroll right to see other columns:
This is how the nested feature group helps make it effective and efficient to represent the dynamic nature of the ML features and makes it possible for the ML algorithms to get the most out of the complex real-world data.