AWS Backup Receiving Multiple S3_BACKUP_OBJECT_FAILED Errors

0

I have setup an AWS Backup plan to backup all of my S3 buckets. Each bucket backup job completes successfully with one exception. That bucket backup job completes with the status of "Completed with issues". After setting up SNS Email-JSON notifications I can see multiple messages with the EventType of S3_BACKUP_OBJECT_FAILED. Reading the document Troubleshoot errors for Amazon S3 backups that fail directs me to attach the AWSBackupServiceRolePolicyForS3Backup and AWSBackupServiceRolePolicyForS3Restore policies to the backup role being used. I am using the default role and had already attached the policies. However, I verified that with the command aws iam list-attached-role-policies --role-name AWSBackupDefaultServiceRole. The output of that command is:

{
    "AttachedPolicies": [
        {
            "PolicyName": "AWSBackupServiceRolePolicyForRestores",
            "PolicyArn": "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForRestores"
        },
        {
            "PolicyName": "AWSBackupServiceRolePolicyForBackup",
            "PolicyArn": "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup"
        },
        {
            "PolicyName": "AWSBackupGatewayServiceRolePolicyForVirtualMachineMetadataSync",
            "PolicyArn": "arn:aws:iam::aws:policy/service-role/AWSBackupGatewayServiceRolePolicyForVirtualMachineMetadataSync"
        },
        {
            "PolicyName": "AWSBackupRestoreAccessForSAPHANA",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupRestoreAccessForSAPHANA"
        },
        {
            "PolicyName": "AWSBackupAuditAccess",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupAuditAccess"
        },
        {
            "PolicyName": "AWSBackupDataTransferAccess",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupDataTransferAccess"
        },
        {
            "PolicyName": "AWSBackupFullAccess",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupFullAccess"
        },
        {
            "PolicyName": "AWSBackupServiceRolePolicyForS3Restore",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupServiceRolePolicyForS3Restore"
        },
        {
            "PolicyName": "AWSBackupServiceRolePolicyForS3Backup",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupServiceRolePolicyForS3Backup"
        },
        {
            "PolicyName": "AWSBackupOrganizationAdminAccess",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupOrganizationAdminAccess"
        },
        {
            "PolicyName": "AWSBackupOperatorAccess",
            "PolicyArn": "arn:aws:iam::aws:policy/AWSBackupOperatorAccess"
        }
    ]
}

I do not know why the backup continues to have failed objects. How can I further diagnose this issue? There is no additional information provided in the notifications and I can find no further documentation on troubleshooting S3 backup failures.

William
asked a month ago200 views
2 Answers
0
Accepted Answer

I believe I may have inadvertently discovered the cause of the failures. When dumping details of the bucket and objects using the aws s3api cli tool I noticed some objects with the DEEP_ARCHIVE and GLACIER storage classes. Doing a random spot check shows all of the items with those storage classes are in the failure messages. I've initiated a restore of all of the objects and will watch to see if the errors clear.

William
answered a month ago
  • Alright, those are indeed offline storage classes requiring the separate restore step to access, so your conclusion is almost certainly precisely correct.

  • For completeness, when you restore objects from the offline storage classes, the original object will remain in the archive class, while the restore operation creates a copy of its contents in the Standard storage class. If you want to move the object back to an online storage class, you'll have to copy the temporary restored object into a new object that you'll retain in an online class. More details: https://docs.aws.amazon.com/AmazonS3/latest/userguide/restoring-objects.html

  • Thank you. I didn't realize that and after waiting the 48 hours for the deep archive restore was wondering why it was still the offline class. I'm now copying/replacing the offline objects with the online objects and putting them in the glacier_ir class. I'm pretty confident this will clear up the very last of the errors.

  • After modifying the storage class on all of the failed objects the errors have cleared.

    Amazon, please include better error messaging. All of this could have been avoided with a simple "invalid storage class" error.

0

The first simple thing you could check would be if the S3 bucket is using SSE-KMS (instead of SSE-S3) for its default encryption. If so, check which KMS key is used and whether it's an AWS-managed or customer-managed key. If it's customer-managed, then you'll need to grant the IAM role AWS Backup is using the kms:Decrypt and kms:GenerateDataKey permissions to the KMS key in its key policy.

EXPERT
Leo K
answered a month ago
  • Thanks for the response. I think the troubleshooting guide mentioned encryption somewhere in the doc so I did check what keys were being used and it is using SSE-S3 so there should be no need for the additional attachments.