Skip to main content

Setting Up an S3 Bucket for Ingestion

Robby Dunigan avatar
Written by Robby Dunigan
Updated over 2 weeks ago

Bucket Setup

To let access data in S3, you'll need to create a bucket and a new role for use with STS assume role.

1) Open your AWS console and complete the following steps.

Create an S3 bucket that is only used for MaestroQA, using a descriptive name such as "maestroqa-data-import-{your company name}". Record this bucket name, as it will be needed for in the steps below and in MaestroQA's integration information.

Set S3 compression and encryption, error logging, and tags, if desired.

2) Next we'll create the role and grant them the required permissions:

click "Roles"

Click "Create Role"

For “Trusted entity type”, select “AWS account”

Under "An AWS account", select "Another AWS account" and enter the MaestroQA AWS Account ID: Ask your MaestroQA Contact for this value

(recommended) Under "Options", select “Require External ID” and enter any valid value for “External ID” (Note that MFA is not currently supported). Record this External ID to be provided to MaestroQA

We'll now create the S3 policy for this role with the s3:GetObject and s3:ListBucket actions. Click "Create policy", "JSON", and then paste the policy found here (replacing YOUR_BUCKET_NAME_HERE with the name of the bucket you created before):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::[YOUR_BUCKET_NAME_HERE]",
"arn:aws:s3:::[YOUR_BUCKET_NAME_HERE]/*"
]
}
]
}

Once completed, give the following details to you MaestroQA POC:

  • Bucket Name

  • Region Code

  • Role ARN

  • External ID (if created)

Logic Required to Delta Load

The bucket must have one of two configurations so we can effectively delta load files / records:

  • Files / records can be placed in YYYYMMDDHH date prefix folders

  • The bucket can be set up with a catalog file configuration with the following parameters:

    • Add "Last modified" to "Additional metadata fields"

    • Set frequency to "daily"

    • Specify a prefix under "Inventory scope" (ex: maestroqa-inventory)

    • Write the file in parquet

Did this answer your question?