Hands-On Approach to S3 Replication

Jonathan Duran
AWS Tip
Published in
4 min readFeb 22, 2022

--

Replicating objects between S3 buckets is a very common ask in the world of data engineering. Let’s take a look at how we could copy data across buckets.

Photo by Phil Shaw on Unsplash

Prerequisites

  • 🌏 Knowledge of Terraform.
  • ☁️ An active AWS account.
  • ☕️ Coffee.

Setting the Stage

Say we have a colleague who needs access to data that currently lives in an S3 bucket. We would not want to provide access to this particular bucket for security reasons, however, we decide to create a dedicated bucket for our teammate’s use case.

Bucket-to-Bucket Replication

The simplest solution is creating a destination bucket with the appropriate replication and permission configuration. After setting up the replication policy between the two buckets, any object uploaded to the source bucket is automatically copied to the destination bucket.

❗️Noteworthy

  • Both source and destination buckets must have versioning enabled.
  • By default, encrypted objects are not replicated.
  • Objects that existed before the replication policy are not replicated.

Terraform Code

Let’s take a look at each configuration file.

providers.tf

  • If you have the AWS CLI properly installed and configured, then you can find the name of your profile in ~/.aws/credentials. Alternatively, your AWS access and secret keys can be passed in the provider resource block instead of the profile name.

account.tf

  • The aws_caller_identity data source provides us access to account-level information. In this case, we will use our AWS account id to create a unique name for the source and destination S3 buckets in the next step.

s3.tf

  • The source and destination buckets are created with versioning enabled.
  • The replication policy is kept simple here. All objects in the source bucket are replicated.

iam.tf

  • An IAM role is created that gives S3 permission to read and replicate objects from source to destination.
  • This role is referenced in the s3.tf configuration within the replication policy resource.

Terraform Init, Plan, & Apply 🚀🚀🚀

After using Terraform to deploy the resources, navigate to the AWS console to test out the replication.

  1. Upload a file to the source bucket.

2. Navigate to the destination bucket and confirm the replicated object.

🪞Replication!🪞.

What Else?

Replicating all objects in a bucket may pose a security problem. Instead, we can target a subset of objects to copy by specifying a prefix.

Let’s alter the aws_s3_bucket_replication_configuration resource by adding a prefixparameter to the ruleblock.

Now, only objects with the word “replicate” in their prefix will get replicated. Let’s test this out by uploading two files — world.txt and replicate/hello.txt .

According to our configuration, the only object that should have been replicated is the replicate/hello.txt . Switch over to the destination bucket to confirm this.

Perfect! we are now able to target specific objects to replicate.

🧠 Additional Thoughts…

  • To replicate objects that existed before the replication policy, you can set up an S3 batch operation via the console.
  • Replication is not suitable for ETL-like jobs that involve data transformations. Instead, use a service like Glue.

Thanks for reading. If you found this article useful please give a clap 👏🏼 and a follow! Also, follow me on Twitter.

--

--