- Newest
- Most votes
- Most comments
Depending on how fast data is changing on the disks of the source machine, a t2.small could easily be too small to keep up. For the t2 family, the exact network and EBS throughput aren't documented publicly, but for the newer generation instance of equivalent size, t3.small, the sustained network throughput is 128 Mbit/s (16 MB/s) and the sustained EBS throughput is 174 Mbit/s (21.75 MB/s).
You could try switching the replication instance to a larger instance in the t3 family, for example, or a fixed-capacity instance, such as an m6a family instance. You can find the network and EBS throughput specifications for general purpose instances (like t2, t3, and m6a) here: https://docs.aws.amazon.com/ec2/latest/instancetypes/gp.html
For the snapshot appearing to get stuck, open the EC2 console and the snapshots view, sort the snapshots in descending order by start time, and check if any are in progress. If they are, that means that permissions are properly configured and creating the snapshot is just taking time. Creating an initial snapshot can take quite a while, particularly if using an inexpensive EBS volume type or if the volume is simply large. The snapshots view in the console will also allow you to track the progress.
If no snapshots are in progress and the EDR console still shows the snapshot as pending, check CloudTrail logs in the region EDR is running in for the event names CreateSnapshot
and CreateSnapshots
. In the CloudTrail event history console view, you can modify the column selections to include "Error code" to help you to find API calls that failed. Check this view for any events that are showing an error code, particularly for insufficient permissions or API throttling.
Leo, thank you for the continued help.
Unfortunately neither the snapshots from EC2 show anything pending and the Cloudtrail shows no snapshots in error or the API call even being made.
The last entry for the source server in the Cloudtrail Event history is DescribeSourceServers with no error code.
Are you seeing snapshots of the disks attached to the replication instance having completed successfully? If the snapshots are completing okay, the EDR console is just not accurately reflecting the stage where it's getting stuck or taking time. If the issue persists, the quickest way might be to contact AWS support. They can see the details of what is happening in your account.
Problem resolved. The issue ended up being VPC Endpoints that were added to access a Private subnet later after the initial deployment. It was a network issue after all. However I do believe the t2.small was still an issue.
I have removed the VPC endpoints for now for testing and will re-configure them.
Relevant content
- asked 3 years ago
- asked 2 years ago
- AWS OFFICIALUpdated a month ago
Thank you for the input. The steps I have taken:
Now the issue seems to be that I complete the initial sync to 100% but it gets stuck on creating a snapshot. Again no errors to determine why it can't move past the Initial Sync phase completely. I see the new volumes attached to the replication instance. I am not sure where I have gone wrong. I have triple checked the documentation. Everything seems to be in place.