Feature Group Materialisation

Learning Objectives

The learning objectives of this article are:

Understanding Materialization

When creating a feature group using the Abacus.AI platform, you will often see the “Materialize Latest Version” warning. The warning basically means that “Changes have been applied to either the SQL definition or the Python definition of the feature group, and if you want to see those changes, click on Materialize Latest Version”.

Materialize Latest Version Warning

By default, the Abacus.AI platform performs lazy execution for materializations, meaning materialization occurs only when necessary.

For instance, consider a simple model that relies on a feature group for training. Even if the feature group hasn't been materialized, it will be automatically materialized when the model retrains, ensuring the model always uses the most up-to-date version.

This applies to:

Therefore, when Abacus.AI requires the latest version of a feature group to carry out a specific task, it will automatically materialize it. You don't need to manually materialize the feature group unless you wish to inspect the data yourself.

Even when using Python to load a feature group as a pandas DataFrame, the platform will automatically re-materialize it, ensuring you always have access to the latest version without any additional effort.

# Example code to load feature group as pandas dataframe
df = client.describe_feature_group("ID").load_as_pandas()

Referencing Feature Groups

Let's explore lazy execution in more detail by examining some advanced examples.

Below, we have a graph illustrating how feature groups reference each other. This can be implemented using either SQL or Python; the choice of language is not crucial. What is important is the dependency structure: Feature Group D relies on Feature Group C, which in turn depends on a combination of Feature Groups A and B.

Feature Group Dependency Graph

Consider the following scenario:

When the refresh schedule triggers, the datasets are updated with the latest data, and new versions of Feature Groups A and B are automatically materialized.

When the batch prediction process begins, Feature Group D is also automatically materialized. However, Feature Group C is not explicitly materialized. Despite this, Feature Group D will still have the correct data as if Feature Group C had been materialized. This is because Abacus tracks all dependencies between feature groups. There is no need to explicitly materialize Feature Group C, as Abacus handles this behind the scenes.

In summary, you generally don't need to worry about manually materializing feature groups on demand unless you have specific requirements. Abacus will manage the materialization process as needed. Don’t be scared when you see Feature Group C is not materialized either. The data will be correct for Feature Group D, which is what we care about.