Hands-On Approach to S3 Replication

Published in

AWS Tip

4 min readFeb 22, 2022

Replicating objects between S3 buckets is a very common ask in the world of data engineering. Let’s take a look at how we could copy data across buckets.

Prerequisites

🌏 Knowledge of Terraform.
☁️ An active AWS account.
☕️ Coffee.

Setting the Stage

Say we have a colleague who needs access to data that currently lives in an S3 bucket. We would not want to provide access to this particular bucket for security reasons, however, we decide to create a dedicated bucket for our teammate’s use case.

Bucket-to-Bucket Replication

The simplest solution is creating a destination bucket with the appropriate replication and permission configuration. After setting up the replication policy between the two buckets, any object uploaded to the source bucket is automatically copied to the destination bucket.

❗️Noteworthy

Both source and destination buckets must have versioning enabled.
By default, encrypted objects are not replicated.
Objects that existed before the replication policy are not replicated.

Terraform Code

Let’s take a look at each configuration file.

providers.tf

If you have the AWS CLI properly installed and configured, then you can find the name of your profile in ~/.aws/credentials. Alternatively, your AWS access and secret keys can be passed in the provider resource block instead of the profile name.

Terraform Registry

Edit description

registry.terraform.io

account.tf

The aws_caller_identity data source provides us access to account-level information. In this case, we will use our AWS account id to create a unique name for the source and destination S3 buckets in the next step.

s3.tf

The source and destination buckets are created with versioning enabled.
The replication policy is kept simple here. All objects in the source bucket are replicated.

iam.tf

An IAM role is created that gives S3 permission to read and replicate objects from source to destination.
This role is referenced in the s3.tf configuration within the replication policy resource.

Setting up permissions

When setting up replication, you must acquire the necessary permissions as follows: Amazon S3 needs permissions to…

docs.aws.amazon.com

Terraform Init, Plan, & Apply 🚀🚀🚀

After using Terraform to deploy the resources, navigate to the AWS console to test out the replication.

Upload a file to the source bucket.

2. Navigate to the destination bucket and confirm the replicated object.

🪞Replication!🪞.

What Else?

Replicating all objects in a bucket may pose a security problem. Instead, we can target a subset of objects to copy by specifying a prefix.

Let’s alter the aws_s3_bucket_replication_configuration resource by adding a prefixparameter to the ruleblock.

Now, only objects with the word “replicate” in their prefix will get replicated. Let’s test this out by uploading two files — world.txt and replicate/hello.txt .

According to our configuration, the only object that should have been replicated is the replicate/hello.txt . Switch over to the destination bucket to confirm this.

Perfect! we are now able to target specific objects to replicate.

Object Storage Features - Amazon S3

Amazon Simple Storage Service (S3) Replication is an elastic, fully managed, low cost feature that replicates objects…

aws.amazon.com

🧠 Additional Thoughts…

To replicate objects that existed before the replication policy, you can set up an S3 batch operation via the console.
Replication is not suitable for ETL-like jobs that involve data transformations. Instead, use a service like Glue.

Terraform Registry

Edit description

registry.terraform.io

Thanks for reading. If you found this article useful please give a clap 👏🏼 and a follow! Also, follow me on Twitter.

JavaScript is not available.

Edit description

twitter.com

Hands-On Approach to S3 Replication

Prerequisites

Setting the Stage

Bucket-to-Bucket Replication

❗️Noteworthy

Terraform Code

providers.tf

Terraform Registry

Edit description

account.tf

s3.tf

iam.tf

Setting up permissions

When setting up replication, you must acquire the necessary permissions as follows: Amazon S3 needs permissions to…

Terraform Init, Plan, & Apply 🚀🚀🚀

What Else?

Object Storage Features - Amazon S3

Amazon Simple Storage Service (S3) Replication is an elastic, fully managed, low cost feature that replicates objects…

🧠 Additional Thoughts…

Terraform Registry

Edit description

JavaScript is not available.

Edit description

Written by Jonathan Duran