Skip to main content

Feature Groups

Learning Objectives​

  • Feature Groups Overview: Understanding the concept of feature groups.
  • Data Preprocessing: Techniques for preprocessing data using feature groups

Watch the Tutorial​

What is a Feature Group?​

A feature group is a versioned and managed data abstraction in Abacus.AI that is automatically created when you:

  • Upload data to the platform
  • Apply data transformations
  • Generate predictions using a model

Feature groups provide a structured way to manage and version your data throughout the machine learning lifecycle.


How Feature Groups are Created​

Automatic Creation from Datasets​

When you manually upload a dataset (e.g., "bike_sharing"), the platform automatically creates a feature group with an identical name. This feature group serves as the foundation for further data transformations.

Steps:

  1. Upload your dataset to Abacus.AI
  2. The platform automatically creates a corresponding feature group
  3. The feature group inherits the name and structure of your dataset

Creating New Feature Groups​

You can create new feature groups using either SQL or Python transformations on existing feature groups.

Using SQL​

Steps:

  1. Navigate to the Feature Groups screen
  2. Click "Add Feature Group" in the top right corner
  3. Select "SQL" as your transformation method
  4. Provide a name for your new feature group
  5. Write your SQL query (e.g., SELECT * FROM bike_sharing_new)
  6. Save and materialize the feature group

Using Python​

Steps:

  1. Navigate to the Feature Groups screen
  2. Click "Add Feature Group" in the top right corner
  3. Select "Python" as your transformation method
  4. Write your Python transformation code
  5. Save and materialize the feature group

Considerations:

  • SQL can be as complex as needed
  • You can reference multiple feature groups simultaneously
  • Feature groups can have code dependencies where one works on top of another

Understanding Feature Group Versions​

Feature groups maintain multiple versions for tracking changes over time. You can view all versions by scrolling to the bottom of the feature group details page.

Why Multiple Versions Exist​

There are two main reasons for version changes:

  1. Code Changes: When users modify the SQL or Python transformation code
  2. Upstream Data Changes: When the source data that feeds into the feature group is updated

How Versioning Works:

  • Abacus.AI automatically detects changes in code or upstream data
  • The platform prompts you to rematerialize the feature group
  • Rematerialization uses the latest version of both your code and data
  • You don't need to manually manage versioningβ€”just keep your code and data up to date

Feature Group Pages and Functionality​

Features Tab​

The Features tab displays all columns in your feature group table.

What You'll See:

  • Column names
  • Feature types (numerical, categorical, text, etc.)
  • Feature mappings (discussed in separate documentation)

Purpose:

  • Understand the structure of your data
  • Verify column types are correct
  • Review feature mappings for model training

Explore Tab​

The Explore tab provides aggregate statistics for each feature in your feature group.

What You'll See:

  • Statistical summaries for each column
  • Distribution information
  • Data quality metrics

Use Cases:

  • Find outliers in your data
  • Verify data loaded correctly
  • Perform basic sanity checks
  • Understand data distributions

Materialized Data Tab​

The Materialized Data tab allows you to view and query the actual data in your feature group.

Capabilities:

  • View data row by row
  • Run custom SQL queries
  • Use the text interface for query assistance

How to Query:

  1. Navigate to the Materialized Data tab
  2. Type your SQL query in the query editor
  3. Alternatively, use the text interface for help constructing queries
  4. Execute and view results

Feature Group Lineage​

Feature groups can have complex dependencies where one feature group builds upon another. The platform visualizes these relationships through feature group lineage.

Example Dependency Chain:

  • Original dataset β†’ Feature Group A
  • Feature Group A β†’ Feature Group B (with transformations)
  • Feature Group B β†’ Feature Group C (with additional transformations)

Benefits:

  • Track data flow through your pipeline
  • Understand dependencies between feature groups
  • Debug issues by tracing back through the lineage