How do I resolve the "Could not read remote logs for the logs_group" error in Amazon MWAA?

5 minute read
0

I want to resolve the "Could not read remote logs for the logs_group" error in Amazon Managed Workflows for Apache Airflow.

Short description

When you run tasks or DAGs in your Amazon MWAA environment, you might receive the following or similar error:

"Error: Reading remote log from Cloudwatch log_group : Could not read remote logs from log_group *** exception from MWAA task log group"

To troubleshoot the preceding error, check the following:

  • AWS Identity and Access Management (IAM) permissions
  • Logging configuration settings
  • Network connectivity
  • Log groups
  • Requirements installation
  • Log delays
  • Auto scaling
  • CPU utilization and memory contention

Resolution

To troubleshoot the Could not read remote logs from the logs_group error, check the following:

IAM permissions

To check your IAM permissions, complete the following steps:

  1. Make sure that task logs are set to the INFO/WARN/ERROR level logging and celery.sync_parallelism is set to 1 in your Amazon MWAA environment. For more information, see Using Apache Airflow configuration options on Amazon MWAA.

  2. Check the associated IAM role's permissions policies. Make sure that the policies grant permission to invoke the resources of other AWS services. For more information, see Amazon MWAA execution role.

  3. Check whether the associated Amazon CloudWatch log group has a customer managed key. If the log group has a customer managed key, then make sure that access is granted to the AWS Key Management Service (AWS KMS) key:

    Note: Replace example-account-id with your account ID.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Action": [
            "kms:Decrypt",
            "kms:DescribeKey",
            "kms:GenerateDataKey*",
            "kms:Encrypt"
          ],
          "Resource": "arn:aws:kms:*:<example-account-id>:key/*",
          "Effect": "Allow"
        }
      ]
    }
  4. Make sure that this previous key policy has the following permissions:

    Note: Replace example-account-id with your account ID.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::example-account-id:root"
          },
          "Action": [
            "kms:Create*",
            "kms:Describe*",
            "kms:Enable*",
            "kms:List*",
            "kms:Put*",
            "kms:Update*",
            "kms:Revoke*",
            "kms:Disable*",
            "kms:Get*",
            "kms:Delete*",
            "kms:ScheduleKeyDeletion",
            "kms:CancelKeyDeletion",
            "kms:GenerateDataKey",
            "kms:TagResource",
            "kms:UntagResource"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          },
          "Action": [
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey"
          ],
          "Resource": "*",
          "Condition": {
            "StringEquals": {
            /* Commented this line...
               "kms:ViaService": "s3.us-west-2.amazonaws.com", */
              "kms:CallerAccount": "example-account-id"
            }
          }
        },
        {
          "Sid": "Allow logs access",
          "Effect": "Allow",
          "Principal": {
            "Service": "logs.us-west-2.amazonaws.com"
          },
          "Action": [
            "kms:Encrypt*",
            "kms:Decrypt*",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:Describe*"
          ],
          "Resource": "*",
          "Condition": {
            "ArnLike": {
              "kms:EncryptionContext:aws:logs:arn": "arn:aws:logs:us-west-2:<example-account-id>:*"
            }
          }
        }
      ]
    }

Logging configuration settings

Check your logging configuration settings within your Amazon MWAA environment. Make sure that they are correct. Incorrect configurations settings or Airflow DAGs block access to the remote logs in the designated log group. For example, if the logging configuration in the DAG is set to write logs to a different log group, then a read failure occurs.

Network connectivity

Network issues or a deleted Amazon Virtual Private Cloud (Amazon VPC) endpoint for CloudWatch cause remote log read failures. This failure doesn't allow your Amazon MWAA environment to communicate with CloudWatch Logs. Make sure that the Amazon VPC endpoint for CloudWatch Logs is present. For more information, see Creating the required VPC service endpoints in an Amazon VPC with private routing.

Log groups

To check your log groups, follow these steps:

  • Check whether the CloudWatch log group associated with your Amazon MWAA environment is deleted or renamed. If log groups are deleted or renamed, then attempts to access those log groups fail. It's not a best practice to manually rename or delete log groups.
  • Check whether the ARN that's specified for the CloudWatch log group in Amazon MWAA is incorrect or outdated. If the ARN is incorrect or outdated, then MWAA can't locate and access the log group. This error occurs when the ARN is mistyped or when the log group is moved to a different AWS account.
  • Check the retention policy of the CloudWatch log group. If you attempt to access log groups that are expired by the retention policy, then read failures occur. Modify your retention policy setting to never expire log groups or to keep log groups for a longer time period.

Note: By default, logs are kept indefinitely and never expire.

Requirements installation

Check your requirements installations. Make sure that there are no failures because of version incompatibility. To check for requirements installation failures, navigate to your worker log group in your Amazon MWAA environment. To test custom plugins and Python dependencies, use aws-mwaa-local-runner. For more information, see aws-mwaa-local-runner on the GitHub website.

To check for dependency mismatches, review the following requirement installations under the CloudWatch log path:

  • Log groups > airflow-MyMWAAenvironment-WebServer requirements_install_xxxx.ec2.internal_xxxx.log
  • Log groups > airflow-MyMWAAenvironment-Scheduler requirements_install_xxxx.ec2.internal_xxxx.log
  • Log groups > airflow-MyMWAAenvironment-Worker requirements_install_xxxx.ec2.internal_xxxx.log

Log delays

If you have long delays for tasks that run hourly or occasionally, then check the celery.worker_autoscale setting. To reduce the error rate, set your celery.worker_autoscale setting second value to 0. For example, set large environments to 20,0, medium environments to 10,0, and small environments to 5,0.

Auto scaling

Auto scaling that increases or decreases the number of worker nodes might cause the Could not read remote logs from logs_group error. If auto scaling causes this error, then turn off Amazon MWAA auto scaling. To turn off Amazon MWAA auto scaling, set your min-workers setting to the same value as your max-workers setting. Or, increase the max-workers setting value. For more information, see Configuring Amazon MWAA worker automatic scaling.

CPU utilization and memory contention

High CPU utilization and memory contention within an MWAA environment might affect the ability to read remote logs. To troubleshoot issues from high CPU utilization or memory contention, follow these steps:

AWS OFFICIAL
AWS OFFICIALUpdated 24 days ago