How do I use the AWS CLI to upload a large file in multiple parts to Amazon S3?

5 minute read
1

I want to copy a large file to an Amazon Simple Storage Service (Amazon S3) bucket as multiple parts, or use multipart uploading. I want to use the AWS Command Line Interface (AWS) to upload the file.

Short description

Use the AWS CLI that has either high-level aws s3 commands or low-level aws s3api commands to upload large files to Amazon S3. For more information about these two command tiers, see Use Amazon S3 with the AWS CLI.

Important: It's a best practice to use aws s3 commands, such as aws s3 cp, for multipart uploads and downloads. This is because aws s3 commands automatically perform multipart uploading and downloading based on the file size. Use aws s3api commands, such as aws s3api create-multipart-upload, only when aws s3 commands don't support a specific upload. For example, the multipart upload involves multiple servers, or you manually stop a multipart upload and resume it later. Or, the aws s3 command doesn't support a required request parameter.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you use the most recent AWS CLI version.

Before you upload the file, calculate the file's MD5 checksum value as a reference for integrity checks after the upload.

Use high-level aws s3 commands

To use a high-level aws s3 command for your multipart upload, run the following command:

$ aws s3 cp large_test_file s3://DOC-EXAMPLE-BUCKET/

This example uses the command aws s3 cp to automatically perform a multipart upload when the object is large. You can also use other aws s3 commands that involve uploading objects into an S3 bucket. For example, aws s3 sync or aws s3 mv.

Objects that you upload as multiple parts to Amazon S3 have a different ETag format than objects that you use the traditional PUT request to upload. To store the MD5 checksum value of the source file as a reference, upload the file that has the checksum value as custom metadata. To add the MD5 checksum value as custom metadata, include the optional parameter --metadata md5="examplemd5value1234/4Q" in the upload command:

$ aws s3 cp large_test_file s3://DOC-EXAMPLE-BUCKET/ --metadata md5="examplemd5value1234/4Q"

To use more of your host's bandwidth and resources, increase the maximum number of concurrent requests that are set in your AWS CLI configuration. By default, the AWS CLI uses 10 maximum concurrent requests. This command sets the maximum concurrent number of requests to 20:

$ aws configure set default.s3.max_concurrent_requests 20

Use low-level aws s3api commands

  1. Split the file that you want to upload into multiple parts.
    Tip: If you use a Linux operating system, then use the split command.

  2. To initiate a multipart upload and to retrieve the associated upload ID, run the following command:

    aws s3api create-multipart-upload --bucket DOC-EXAMPLE-BUCKET --key large_test_file

    The command returns a response that contains the UploadID.

  3. Copy the UploadID value as a reference for later steps.

  4. To upload the first part of the file, run the following command:

    aws s3api upload-part --bucket DOC-EXAMPLE-BUCKET --key large_test_file --part-number 1 --body large_test_file.001 --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk --content-md5 exampleaAmjr+4sRXUwf0w==
    

    Replace all values that has the values for your bucket, file, and multipart upload. The command returns a response that contains anETag value for the part of the file that you uploaded. For more information on each parameter, see upload-part.

  5. Copy the ETag value as a reference for later steps.

  6. Repeat steps 4 and 5 for each part of the file. Make sure to increase the part number that has each new part that you upload.

  7. After you upload all the file parts, run the following command to list the uploaded parts and confirm that the list is complete:

    aws s3api list-parts --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk
    
  8. Compile the ETag values for each file part that you uploaded into a JSON-formatted file.
    Example JSON file:

    {
        "Parts": [{
            "ETag": "example8be9a0268ebfb8b115d4c1fd3",
            "PartNumber":1
        },
    
        ....
    
        {
            "ETag": "example246e31ab807da6f62802c1ae8",
            "PartNumber":4
        }]
    }
  9. Name the file fileparts.json.

  10. To complete the multipart upload, run the following command:

    aws s3api complete-multipart-upload --multipart-upload file://fileparts.json --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk
    

    Replace the value for --multipart-upload that has the path to the JSON-formatted file that has the ETags that you created

  11. If the previous command is successful, then you receive a response similar to the following one:

    {
        "ETag": "\\"exampleae01633ff0af167d925cad279-2\\"",
        "Bucket": "DOC-EXAMPLE-BUCKET",
        "Location": "https://DOC-EXAMPLE-BUCKET.s3.amazonaws.com/large_test_file",
       
        "Key": "large_test_file"
    }

Resolve upload failures

If you use the high-level aws s3 commands for a multipart upload and the upload fails, then you must start a new multipart upload. Multipart upload failures occur due to either a timeout or manual cancellation. In most cases, the AWS CLI automatically cancels the multipart upload and then removes any multipart files that you created. This process can take several minutes. If you use aws s3api commands and the process is interrupted, then remove incomplete parts of the upload, and then re-upload the parts.

To remove the incomplete parts, use the AbortIncompleteMultipartUpload lifecycle action. Or, use aws s3api commands to remove the incomplete parts.

  1. To list incomplete multipart file uploads, Run the following command:

    aws s3api list-multipart-uploads --bucket DOC-EXAMPLE-BUCKET
    

    Replace the value for --bucket that has the name of your bucket.

  2. The command returns a message similar to the following one that has any file parts that weren't processed:

    {
        "Uploads": [
            {
               
        "Initiator": {
                    "DisplayName": "multipartmessage",
                    "ID": "290xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        "
                },
                "Initiated": "2016-03-31T06:13:15.000Z",
               
        "UploadId": "examplevQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB_A-",
                "StorageClass": "STANDARD",
               
        "Key": "",
                "Owner": {
                    "DisplayName": "multipartmessage",
                   
        "ID": "290xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx "
                }
            }
       ]
    }
  3. To remove the incomplete parts, run the following command:

    aws s3api abort-multipart-upload --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id examplevQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB

Related information

Uploading and copying objects using multipart upload

AWS OFFICIAL
AWS OFFICIALUpdated a year ago