As you might have guessed from last week’s post, I have been working a lot in Jira and Confluence. This week, I have been working on the process to copy production data into our staging environment to be able to do plugin and upgrade testing. Part of that process involves copying about 300G of data from our production EFS to a staging EFS that happens to reside in a separate AWS account.

I have used AWS DataSync in the past to move EFS data around in the same account, so I decided to give it a try for moving data between accounts. While I was able to find information that said this was possible, I couldn’t find any direct instructions on how to do it, so I thought I would share what I learned here.

Setting up the Agent

The first thing that needs to be done is to launch a DataSync Agent AMI in the VPC that you are going to be transferring the data from (production in my case). I learned that even though I am transferring data between two different accounts, I can still launch the DataSync agent into a private subnet. This is good since you have to have port 80 access open to the instance to get the registration key and I don’t want it open to the world. While you can do this through the GUI, I will include the CLI commands that I used here.

First, lookup up the AMI ID of the DataSync agent with the aws ec2 describe-images command.

export DATASYNC_AMI=$(aws ec2 describe-images \
  --owners 633936118553 \
  --filters 'Name=name,Values=aws-datasync-*' \
  --query 'sort_by(Images,&CreationDate)[-1].ImageId' \
	--output text)

Once you have the AMI ID, you can launch the instance with the aws ec2 run-instances command.

aws ec2 run-instances \
	--image-id ${DATASYNC_AMI} \
  --count 1 \
  --instance-type m5.2xlarge \
  --key-name aws-main \
  --security-group-ids sg-017411dbf0f06e4bf \
  --subnet-id subnet-0c55c8f419c7828b7 \
  --tag-specifications ResourceType=instance,Tags=[{Key=Name,Value=atlassian-datasync}]' 'ResourceType=volume,Tags=[{Key=Name,Value=atlassian-datasync}]'

Get the activation code for the agent using curl.

curl "http://10.17.37.232/?gatewayType=SYNC&activationRegion=${AWS_REGION}&no_redirect"

Setting up DataSync

Once you have the agent registration key, you can then go to the destination AWS account and configure the DataSync. The first step is to register the Agent.

aws datasync create-agent \
	--agent-name atlassian-datasync-agent \
	--activation-key 2TRPR-SOHQ9-2HHN1-DNMD5-KQ3TD

Next, you need to create the source location. You will need the DNS name of the EFS file system, subdirectory and the ARN for the agent that you just created. (Note: Make sure the agent has access to the EFS file system.)

aws datasync create-location-nfs \
	--server-hostname fs-c604e9be.efs.us-east-2.amazonaws.com \
	--on-prem-config AgentArns=arn:aws:datasync:us-west-2:111111111111:agent/agent-041644af8d1409984 \
	--subdirectory "/"

Once the source directory is created, you can create the destination directory. You will need the EFS ARN, a security group ARN that has access to the EFS, and an ARN for a subnet that has access to the EFS.

aws datasync create-location-efs \
	--subdirectory "/" \
  --efs-filesystem-arn 'arn:aws:elasticfilesystem:us-west-2:111111111111:file-system/fs-0255ab07' \
	--ec2-config SecurityGroupArns='arn:aws:ec2:us-west-2:111111111111:security-group/sg-0ef2c6e6311ae3bfa',SubnetArn='arn:aws:ec2:us-west-2:111111111111:subnet/subnet-0fc0878c32a09c58d'

If you want to enable logging, you can use CloudWatch to log any issues. Start by creating a log group.

aws logs create-log-group --log-group-name jira-prod-to-staging

Create a file called policy.json and add the following.

{
    "Statement": [
        {
            "Sid": "DataSyncLogsToCloudWatchLogs",
            "Effect": "Allow",
            "Action": [
                "logs:PutLogEvents",
                "logs:CreateLogStream"
            ],
            "Principal": {
                "Service": "datasync.amazonaws.com"
            },
            "Resource": "*"
        }
    ],
    "Version": "2012-10-17"
}

Finally, add the policy to the log group.

aws logs put-resource-policy \
	--policy-name trustDataSync \
	--policy-document file://policy.json

Now you can create the DataSync task.

aws datasync create-task \
	--source-location-arn 'arn:aws:datasync:us-west-2:111111111111:location/loc-00a98231f73b0414f' \
	--destination-location-arn 'arn:aws:datasync:us-west-2:111111111111:location/loc-0c509091cb5eb929e' \
	--cloud-watch-log-group-arn 'arn:aws:logs:us-west-2:111111111111:log-group:jira-prod-to-staging' \
	--name jira-prod-to-staging \
	--options LogLevel=BASIC

Running the Task

To start the task, you can run the aws-datasync start-task-execution command.

aws datasync start-task-execution \
	--task-arn 'arn:aws:datasync:us-west-2:111111111111:task/task-09ff54b4023a4910f'

You can check on the task using the aws datasync describe-task-execution command and passing it the task arn.

aws datasync describe-task-execution \
	--task-execution-arn 'arn:aws:datasync:us-west-2:111111111111:task/task-09ff54b4023a4910f/execution/exec-0b434e049d2d9dbbb'

It took just over two hours for my 300G to sync between accounts, which is a whole lot faster than it would have taken had I just tried to copy the data via normal means. I was able to get a new development jira environment up and running in about three hours. Going forward, I can use the tool to only sync the differences, which takes a whole lot less time.