4 minute read

AWS DataSync is a managed data transfer service that makes it easy to move data between on-premises storage and AWS storage services, as well as between AWS storage services themselves. In this guide, i’ll walk through the steps to move data from one S3 bucket to another S3 bucket using AWS DataSync.

What is AWS Datasync?

AWS DataSync is a fully managed data transfer service that simplifies moving large amounts of data between on-premises storage and AWS services, or between AWS storage services such as Amazon S3, EFS, and FSx. In this guide, i’ll walk through the steps to move data from one S3 bucket to another S3 bucket using AWS DataSync.

Architecture Overview

The following architecture is used to transfer data between two S3 buckets using AWS DataSync.

Foo
Move CSV File from S3 to S3 using Datasync

Prerequisites

Before starting, make sure you have:

  • Two S3 buckets (source and destination)
  • IAM permissions to create DataSync resources
  • Terraform installed (v1.x or later)
  • AWS CLI configured

Terraform Implementation

Two S3 Bucket (Source & Destination)

# provider configuration
provider "aws" {
  region  = "ap-southeast-1"
  profile = "default"
  default_tags {
    tags = {
      Owner   = "Nugroho-L1"
      Project = "Testing"
    }
  }
}

# create source and destination s3 bucket
resource "aws_s3_bucket" "s3_source" {
  bucket        = "nugrohosource-testing-bucket"
  force_destroy = true
}

resource "aws_s3_bucket" "s3_destination" {
  bucket        = "nugrohodestination-testing-bucket"
  force_destroy = true
}

# disable ACLs for s3 destination account
resource "aws_s3_bucket_ownership_controls" "s3_destination_disable_acls" {
  bucket = aws_s3_bucket.s3_destination.id
  rule {
    object_ownership = "BucketOwnerPreferred"
  }
}

IAM Policy for S3 Bucket


# create iam role for datasync (This Policy is created for the source bucket)
resource "aws_iam_role" "datasync_role" {
  name = "source-datasync-role" # the name of the role should be harcoded

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "datasync.amazonaws.com"
        }
      }
    ]
  })
}

# inline policy for the iam role
resource "aws_iam_role_policy" "datasync_policy" {
  name = "DataSyncS3Policy"
  role = aws_iam_role.datasync_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      # Permissions for the source bucket
      {
        Action = [
          "s3:GetBucketLocation",
          "s3:ListBucket",
          "s3:ListBucketMultipartUploads"
        ]
        Effect   = "Allow"
        Resource = aws_s3_bucket.s3_source.arn
      },
      {
        Action = [
          "s3:AbortMultipartUpload",
          "s3:DeleteObject",
          "s3:GetObject",
          "s3:ListBucket",
          "s3:ListMultipartUploadParts",
          "s3:PutObject",
          "s3:GetObjectTagging",
          "s3:PutObjectTagging"
        ]
        Effect   = "Allow"
        Resource = "${aws_s3_bucket.s3_source.arn}/*"
      },
      # Permissions for the destination bucket
      {
        Action = [
          "s3:GetBucketLocation",
          "s3:ListBucket",
          "s3:ListBucketMultipartUploads"
        ]
        Effect   = "Allow"
        Resource = aws_s3_bucket.s3_destination.arn
      },
      {
        Action = [
          "s3:AbortMultipartUpload",
          "s3:DeleteObject",
          "s3:GetObject",
          "s3:ListBucket",
          "s3:ListMultipartUploadParts",
          "s3:PutObject",
          "s3:GetObjectTagging",
          "s3:PutObjectTagging"
        ]
        Effect   = "Allow"
        Resource = "${aws_s3_bucket.s3_destination.arn}/*"
      }
    ]
  })
}


# update the source s3 bucket policy
resource "aws_s3_bucket_policy" "source_bucket_policy" {
  bucket = aws_s3_bucket.s3_source.id

  policy = jsonencode({
    Version = "2008-10-17"
    Statement = [
      {
        Sid    = "DataSyncCreateS3LocationAndTaskAccess"
        Effect = "Allow"
        Principal = {
          "AWS" : "${aws_iam_role.datasync_role.arn}"
        }
        Action = [
          "s3:GetBucketLocation",
          "s3:ListBucket",
          "s3:ListBucketMultipartUploads",
          "s3:AbortMultipartUpload",
          "s3:DeleteObject",
          "s3:GetObject",
          "s3:ListMultipartUploadParts",
          "s3:PutObject",
          "s3:GetObjectTagging",
          "s3:PutObjectTagging"
        ]
        Resource = [
          aws_s3_bucket.s3_source.arn,
          "${aws_s3_bucket.s3_source.arn}/*"
        ]
      }
    ]
  })
}

# update the destination s3 bucket policy
resource "aws_s3_bucket_policy" "destination_bucket_policy" {
  bucket = aws_s3_bucket.s3_destination.id

  policy = jsonencode({
    Version = "2008-10-17"
    Statement = [
      {
        Sid    = "DataSyncCreateS3LocationAndTaskAccess"
        Effect = "Allow"
        Principal = {
          "AWS" : "${aws_iam_role.datasync_role.arn}"
        }
        Action = [
          "s3:GetBucketLocation",
          "s3:ListBucket",
          "s3:ListBucketMultipartUploads",
          "s3:AbortMultipartUpload",
          "s3:DeleteObject",
          "s3:GetObject",
          "s3:ListMultipartUploadParts",
          "s3:PutObject",
          "s3:GetObjectTagging",
          "s3:PutObjectTagging"
        ]
        Resource = [
          aws_s3_bucket.s3_destination.arn,
          "${aws_s3_bucket.s3_destination.arn}/*"
        ]
      }
    ]
  })
}


Datasync Task and Location


# create datasync source and destination location
# Create DataSync source location
resource "aws_datasync_location_s3" "source" {
  s3_bucket_arn = aws_s3_bucket.s3_source.arn
  subdirectory  = "/"
  s3_config {
    bucket_access_role_arn = aws_iam_role.datasync_role.arn
  }
}

resource "aws_datasync_location_s3" "destination" {
  s3_bucket_arn = aws_s3_bucket.s3_destination.arn
  subdirectory  = "/"
  s3_config {
    bucket_access_role_arn = aws_iam_role.datasync_role.arn
  }
}

# create datasync task
resource "aws_datasync_task" "s3_same_region" {
  name                     = "s3_same_region"
  destination_location_arn = aws_datasync_location_s3.destination.arn
  source_location_arn      = aws_datasync_location_s3.source.arn

}

Running the Data Transfer

After applying the Terraform configuration, the DataSync task can be triggered manually from the AWS Console or via AWS CLI. For this demo, i will upload the CSV file to the S3 source bucket using CLI. You can refer to this documentation about how to set up AWS CLI.


# Upload a sample CSV file to the source S3 bucket
aws s3 cp currency.csv s3://nugrohosource-testing-bucket/ --profile default

Once the file is uploaded, start the DataSync task either from the AWS Console or by using the AWS CLI. And after task execution sucessfully, you’ll see the CSV file is in the destination bucket.

# Verify the file in the destination S3 bucket
aws s3 ls s3://nugrohosource-testing-bucket/ --profile default


Monitoring and Logging

AWS DataSync integrates with Amazon CloudWatch to provide:

  • Task execution status
  • Bytes transferred
  • Error logs

Common Errors and Troubleshooting

Access denied. Ensure bucket access role has s3:ListBucket permission.

Errors:

  • Access denied. Ensure bucket access role has s3:ListBucket permission.

Solution:

  • Ensure the IAM Role has “Action”: “sts:AssumeRole”

References:

  • https://repost.aws/questions/QUPD3ZX7p3T3OQkk19QN-iEw/datasync-between-s3-buckets-failing-ensure-bucket-access-role-has-s3-listbucket-permission

Cross-Account S3 to S3

If you want to transfer data between s3 in the different account, you can refer to this documentation:

References:

  • https://docs.aws.amazon.com/datasync/latest/userguide/tutorial_s3-s3-cross-account-transfer.html
  • https://medium.com/@nayanarora55/cross-account-s3-migration-aws-datasync-does-it-so-you-dont-have-to-9b85a23d1464