Feature Groups

Learning Objectives

Feature Groups Overview: Understanding the concept of feature groups.
Data Preprocessing: Techniques for preprocessing data using feature groups

Watch the Tutorial

What is a Feature Group?

A feature group is a versioned and managed data abstraction in Abacus.AI that is automatically created when you:

Upload data to the platform
Apply data transformations
Generate predictions using a model

Feature groups provide a structured way to manage and version your data throughout the machine learning lifecycle.

How Feature Groups are Created

Automatic Creation from Datasets

When you manually upload a dataset (e.g., "bike_sharing"), the platform automatically creates a feature group with an identical name. This feature group serves as the foundation for further data transformations.

Steps:

Upload your dataset to Abacus.AI
The platform automatically creates a corresponding feature group
The feature group inherits the name and structure of your dataset

Creating New Feature Groups

You can create new feature groups using either SQL or Python transformations on existing feature groups.

Using SQL

Steps:

Navigate to the Feature Groups screen
Click "Add Feature Group" in the top right corner
Select "SQL" as your transformation method
Provide a name for your new feature group
Write your SQL query (e.g., SELECT * FROM bike_sharing_new)
Save and materialize the feature group

Using Python

Steps:

Navigate to the Feature Groups screen
Click "Add Feature Group" in the top right corner
Select "Python" as your transformation method
Write your Python transformation code
Save and materialize the feature group

Considerations:

SQL can be as complex as needed
You can reference multiple feature groups simultaneously
Feature groups can have code dependencies where one works on top of another

Understanding Feature Group Versions

Feature groups maintain multiple versions for tracking changes over time. You can view all versions by scrolling to the bottom of the feature group details page.

Why Multiple Versions Exist

There are two main reasons for version changes:

Code Changes: When users modify the SQL or Python transformation code
Upstream Data Changes: When the source data that feeds into the feature group is updated

How Versioning Works:

Abacus.AI automatically detects changes in code or upstream data
The platform prompts you to rematerialize the feature group
Rematerialization uses the latest version of both your code and data
You don't need to manually manage versioning—just keep your code and data up to date

Feature Group Pages and Functionality

Features Tab

The Features tab displays all columns in your feature group table.

What You'll See:

Column names
Feature types (numerical, categorical, text, etc.)
Feature mappings (discussed in separate documentation)

Purpose:

Understand the structure of your data
Verify column types are correct
Review feature mappings for model training

Explore Tab

The Explore tab provides aggregate statistics for each feature in your feature group.

What You'll See:

Statistical summaries for each column
Distribution information
Data quality metrics

Use Cases:

Find outliers in your data
Verify data loaded correctly
Perform basic sanity checks
Understand data distributions

Materialized Data Tab

The Materialized Data tab allows you to view and query the actual data in your feature group.

Capabilities:

View data row by row
Run custom SQL queries
Use the text interface for query assistance

How to Query:

Navigate to the Materialized Data tab
Type your SQL query in the query editor
Alternatively, use the text interface for help constructing queries
Execute and view results

Feature Group Lineage

Feature groups can have complex dependencies where one feature group builds upon another. The platform visualizes these relationships through feature group lineage.

Example Dependency Chain:

Original dataset → Feature Group A
Feature Group A → Feature Group B (with transformations)
Feature Group B → Feature Group C (with additional transformations)

Benefits:

Track data flow through your pipeline
Understand dependencies between feature groups
Debug issues by tracing back through the lineage

Learning Objectives​

Watch the Tutorial​

What is a Feature Group?​

How Feature Groups are Created​

Automatic Creation from Datasets​

Creating New Feature Groups​

Using SQL​

Using Python​

Understanding Feature Group Versions​

Why Multiple Versions Exist​

Feature Group Pages and Functionality​

Features Tab​

Explore Tab​

Materialized Data Tab​

Feature Group Lineage​

Learning Objectives

Watch the Tutorial

What is a Feature Group?

How Feature Groups are Created

Automatic Creation from Datasets

Creating New Feature Groups

Using SQL

Using Python

Understanding Feature Group Versions

Why Multiple Versions Exist

Feature Group Pages and Functionality

Features Tab

Explore Tab

Materialized Data Tab

Feature Group Lineage