CodeBuild GitHub runners are randomly failing

0

This is about the self-hosted runner integration in CodeBuild: https://docs.aws.amazon.com/codebuild/latest/userguide/action-runner.html

This worked well for a month or so.

And for the past several days a most of the workflow runs are getting stuck and webhooks are not picked up. Sometimes CB executions get triggered, and CB runs, but there are no logs in GH and CB runs for a long time, failing in the end, and wasting CI minutes.

This only started a few days ago, and we did not change anything on our end.

The only thing I can think of is a bad release that introduced a bug, which is very likely given the recent announcement of supporting org webhook events. It started happening the same day or day before.

https://aws.amazon.com/about-aws/whats-new/2024/06/aws-codebuild-organization-global-github-webhooks/

Is anyone else experiencing this?

profile picture
m0ltar
asked 2 months ago117 views
3 Answers
0
Accepted Answer

Ok, thanks for confirming, AWS. I know when AWS says nothing, that means the bug is there, and they are working to fix it. 👍

profile picture
m0ltar
answered 2 months ago
profile picture
EXPERT
reviewed 2 months ago
0

Can someone from the AWS CodeBuild service team please look into this.

I can reproduce this bug consistently.

Here's a repro case:

  • Setup a GitHub Actions workflow that triggers two jobs at the same time via a matrix
  • Use CodeBuild-based runners
  • Trigger a run by pushing or whatever

What happens then:

  • One of the job gets picked up and runs
  • The second job gets stuck "Waiting for a runner to pick up this job... " and it never completes
  • CodeBuild UI meanwhile shows no running jobs

My hunch is that there's a queue that receives these webhook events, and it probably deduplicates the events by something like a project name, or some run ID, that is not supposed to be unique.

profile picture
m0ltar
answered a month ago
0

I think I found the issue.

In the Webhook request history in GitHub I see:

{"message":"Cannot have more than 1 builds in queue for the account"}

Which is really weird, provided that I did not make any changes on my account, and I have had > 2 running jobs in parallel before.

And all of the quotas are set to 15.

profile picture
m0ltar
answered a month ago