Connect Your Athena to Abacus.AI
To integrate Athena with Abacus.AI, you need to set up the connector and provide the necessary IAM permissions so Abacus.AI can assume a role in your AWS account to query your data.
- AWS Prerequisites Setup
- Setup Instructions
- User Connector Flow (RBAC)
- How to Use the Athena Connector
- Troubleshooting and FAQ
AWS Prerequisites Setup​
Before configuring the Athena connector in Abacus.AI, ensure the following AWS resources exist in your account. If you already have these set up, skip to Setup Instructions.
1. Create S3 Buckets​
You need two S3 buckets (or prefixes within the same bucket):
- Data bucket: Stores your source data files (CSV, Parquet, JSON, etc.) that Athena will query.
- Results bucket: Stores Athena query results. Athena writes output files here after each query execution.
Create these via the S3 Console or AWS CLI:
aws s3 mb s3://my-athena-data-bucket --region us-east-2
aws s3 mb s3://my-athena-results-bucket --region us-east-2
2. Create a Glue Database​
Athena uses the AWS Glue Data Catalog as its metastore. Create a Glue database to hold your table definitions:
- Go to the AWS Glue Console → Databases → Add database.
- Enter a database name (e.g.
my_analytics_db) and click Create database.
Or via CLI:
aws glue create-database --database-input '{"Name": "my_analytics_db"}' --region us-east-2
Then create tables pointing to your S3 data using Glue crawlers, the Glue Console, or DDL statements in the Athena query editor.
Glue tables are typically EXTERNAL_TABLE type — Athena does not own the underlying S3 data. Dropping a Glue table removes only the metadata definition; it does not delete the S3 files.
3. Create an Athena Workgroup (Optional but Recommended)​
Workgroups let you separate query execution settings, control costs, and track usage:
- Go to the Athena Console → Workgroups → Create workgroup.
- Configure the workgroup:
- Name: e.g.
abacus-workgroup - Query result location:
s3://my-athena-results-bucket/query-results/ - Enforce workgroup configuration: Enable this to prevent individual users or roles from overriding the output location. This ensures all query results go to the designated bucket.
- Bytes scanned cutoff per query (optional): Set a maximum byte limit per query (e.g. 10 GB) as a cost guard. Queries exceeding this limit are automatically cancelled.
- Publish CloudWatch metrics: Enable this for monitoring query performance and costs.
- Name: e.g.
Enforcing workgroup configuration (EnforceWorkGroupConfiguration: true) is a best practice — it ensures consistent output routing and prevents accidental writes to unintended S3 locations.
Setup Instructions​
-
Create the IAM Role in AWS:
Before configuring the connector in Abacus.AI, create an IAM role in your AWS account that Abacus.AI will assume.
a. Create the IAM Role with a Trust Policy:
Go to the AWS IAM Console, navigate to Roles, and click Create role. Select Custom trust policy and paste the following trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::448970459817:root"
]
},
"Action": "sts:AssumeRole"
}
]
}b. Name and Create the Role:
Click Next, skip adding permissions for now (you will attach the permissions policy after creating the connector), give the role a name (e.g.
AbacusAthenaRole), and click Create role. Copy the Role ARN — you will need it in the next step. -
Access Abacus.AI Connected Services Dashboard:
- Go to the Abacus.AI Connected Services Dashboard. This is where you manage all your connected services.
- At the top-right of the page, click Add New Connector and select Athena.
-
Enter Athena Connector Details:
- Fill in the connector configuration form:
- AWS Region (required): The AWS region where your Athena and Glue resources are configured (e.g.
us-east-2). - Glue Database Name (required): The name of the AWS Glue database containing the tables you want to query.
- IAM Role ARN (required): The ARN of the IAM role you created in step 1.
- S3 Data Bucket (required): The S3 bucket where your Athena source data resides (e.g.
my-data-bucket). - Athena Workgroup (optional): The Athena workgroup to use for queries. Defaults to
primaryif not specified. - S3 Output Location (optional): An S3 path where Athena query results will be stored (e.g.
s3://my-results-bucket/query-results/). If not specified, the workgroup's default output location is used.
- AWS Region (required): The AWS region where your Athena and Glue resources are configured (e.g.
- Click Create.
- Fill in the connector configuration form:
-
Update the Trust Policy and Attach the Permissions Policy:
After creating the connector, you will see a pop-up with the trust policy and permissions policy pre-populated with the values from your connector form.
Go back to the AWS IAM Console and find the role you created in step 1.
a. Update the Trust Policy:
Click the Trust relationships tab, then Edit trust policy. Replace the existing trust policy with the one shown in the Abacus.AI instructions pop-up. It will include the node role ARN specific to your account:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::448970459817:root",
"<node_role_arn>"
]
},
"Action": "sts:AssumeRole"
}
]
}noteThe
<node_role_arn>value will be automatically filled in the instructions pop-up. Copy the trust policy directly from there.b. Attach the Permissions Policy:
Click Add permissions → Create inline policy → JSON, and paste the permissions policy from the instructions pop-up. The policy grants the minimum permissions needed for Abacus.AI to query your Athena data:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AthenaQueryExecution",
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution",
"athena:GetWorkGroup",
"athena:ListWorkGroups"
],
"Resource": "arn:aws:athena:<region>:<account_id>:workgroup/*"
},
{
"Sid": "AthenaMetadataBrowsing",
"Effect": "Allow",
"Action": [
"athena:ListDataCatalogs",
"athena:ListDatabases",
"athena:ListTableMetadata",
"athena:GetTableMetadata"
],
"Resource": "arn:aws:athena:<region>:<account_id>:datacatalog/*"
},
{
"Sid": "GlueCatalogRead",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": [
"arn:aws:glue:<region>:<account_id>:catalog",
"arn:aws:glue:<region>:<account_id>:database/<database_name>",
"arn:aws:glue:<region>:<account_id>:table/<database_name>/*"
]
},
{
"Sid": "S3DataRead",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<data_bucket>",
"arn:aws:s3:::<data_bucket>/*"
]
},
{
"Sid": "S3ResultsReadWrite",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::<output_bucket>",
"arn:aws:s3:::<output_bucket>/*"
]
},
{
"Sid": "LakeFormationDataAccess",
"Effect": "Allow",
"Action": [
"lakeformation:GetDataAccess"
],
"Resource": "*"
}
]
}tipBoth the trust policy and permissions policy displayed in the Abacus.AI instructions pop-up will have all placeholders automatically replaced with the values from your connector form. You can copy and use them directly.
-
Verify Connector Setup:
- After attaching the permissions policy, return to the Abacus.AI connector configuration and click Verify Now.
- You should see a confirmation that your connector has been verified. Your Athena connector is now set up and ready to use.
User Connector Flow (RBAC)​
The Athena connector supports user-level permissions via Amazon Cognito. When RBAC is enabled, each user authenticates individually through Cognito OAuth, and their queries run with their own AWS credentials obtained via a Cognito Identity Pool. This ensures that data access is governed by each user's specific permissions.
Prerequisites for RBAC​
Before enabling RBAC, you need to set up the following AWS Cognito resources:
- Amazon Cognito User Pool: A user directory for managing user identities.
- Amazon Cognito App Client: An app client within the User Pool configured for OAuth 2.0.
- Amazon Cognito Identity Pool: A federated identity pool that maps Cognito users to IAM roles for obtaining temporary AWS credentials.
Enable RBAC​
To create an Athena connector with RBAC, toggle Enable RBAC to On in the connector configuration form. This reveals additional fields:
- Cognito Domain (required): The Cognito hosted UI domain prefix (e.g.
my-app-auth). - Cognito App Client ID (required): The App Client ID from your Cognito User Pool.
- Cognito User Pool ID (required): The User Pool ID (e.g.
us-east-2_aBcDeFgHi). - Cognito Identity Pool ID (required): The Identity Pool ID for obtaining AWS credentials (e.g.
us-east-2:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
Fill in these fields along with the standard Athena configuration fields (AWS Region, Glue Database Name, IAM Role ARN, etc.) and click Create.
Navigate to the Athena User Connector documentation for detailed instructions on setting up user-level authentication.
How to Use the Athena Connector​
Once the Athena connector is set up, you can fetch data to train models in Abacus.AI.
-
Create a New Project:
- Create a new project and select the use case, then go to the "Datasets" tab and click "Create Dataset".
-
Create New Dataset:
- Click on "Create New".
-
Name the Dataset:
- Name the dataset, select the data type, and click "Continue".
-
Read from External Service:
- Choose "Read from External Service" and select your Athena connector under "Connected Application Connectors".
-
Enter Dataset Details:
- Enter the SQL query or table name for the Athena data you want to use. Tables are referenced as
<table_name>within the configured Glue database.
- Enter the SQL query or table name for the Athena data you want to use. Tables are referenced as
-
Configure Schema Mapping:
- After the dataset is uploaded, configure the schema mapping and proceed to train models with the data.
Troubleshooting and FAQ for the Athena Connector​
What if verification fails with an "Access Denied" or "Failed to assume IAM role" error?​
- Double-check that the Trust Policy on the IAM role exactly matches the one provided in the Abacus.AI instructions. Ensure the Abacus.AI account ID and node role ARN are correct.
- Confirm that the IAM role has the correct Permissions Policy attached with access to the required Athena, Glue, and S3 resources.
What if queries fail with "Access Denied" on S3?​
- Ensure the IAM role has
s3:GetObjectands3:ListBucketpermissions on both:- The data bucket where your source tables reside.
- The output bucket where Athena writes query results.
- If your data is managed by AWS Lake Formation, ensure the IAM role has
lakeformation:GetDataAccesspermission and that Lake Formation grants are configured for the relevant databases and tables.
What if I need to query tables across multiple Glue databases?​
- You can reference tables in other databases using the fully-qualified format:
<database_name>.<table_name>in your SQL queries. - Ensure the IAM role has Glue and S3 permissions for all databases you intend to query.
What if the Athena workgroup is not found?​
- Verify that the workgroup name is spelled correctly and exists in the specified AWS region.
- If no workgroup is specified, the
primaryworkgroup is used by default. Ensure the IAM role has access to the workgroup.
How do I allow Abacus.AI to write results back?​
- The S3 Results Read/Write policy statement (
S3ResultsReadWrite) in the permissions policy already allows Abacus.AI to write query results to the output S3 bucket. Ensure the output location is correctly configured either in the connector or in the Athena workgroup settings.
Who should I contact for further help?​
- Reach out to connectors@abacus.ai for assistance with connector setup or troubleshooting.