Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reclaim Hosting (BitNinja) blocking Google Cloud IP addresses #885

Closed
willtm opened this issue Sep 1, 2019 · 20 comments
Closed

Reclaim Hosting (BitNinja) blocking Google Cloud IP addresses #885

willtm opened this issue Sep 1, 2019 · 20 comments

Comments

@willtm
Copy link

willtm commented Sep 1, 2019

Hello,

I've recently attempted to use Bridgy with my WordPress website to post to Twitter. My procedure is pretty simple, I write a short post (see below) and select the option to Syndicate to Twitter via Bridgy. A few days ago, this succeeded but since then I've just had a series of errors.

The post can be found here: https://willtmonroe.com/uncategorized/104/

The error I can see on Brid.gy is below.

138.197.169.52 - - [01/Sep/2019:12:43:47 -0700] "POST /publish/webmention HTTP/1.1" 400 477 "https://brid.gy/publish/webmention" "WordPress/5.2.2; https://willtmonroe.com; sending Webmention"

2019-09-01 19:43:16.921900 I Params: [('source', u'https://willtmonroe.com/uncategorized/104/'), ('target', u'https://brid.gy/publish/twitter'), ('bridgy_omit_link', u'maybe')]
2019-09-01 19:43:16.924536 I requests.head https://willtmonroe.com/uncategorized/104/ {'headers': {'User-Agent': u'...'}}
2019-09-01 19:43:31.916378 W Couldn't resolve URL https://willtmonroe.com/uncategorized/104/ : (<requests.packages.urllib3.contrib.appengine.AppEngineManager object at 0x150cdb6bc1d0>, DeadlineExceededError('Deadline exceeded while waiting for HTTP response from URL: https://willtmonroe.com/uncategorized/104/',))
2019-09-01 19:43:31.950439 I Source: https://brid.gy/twitter/willtmonroe , features [u'publish', u'listen'], status enabled, poll status polling
2019-09-01 19:43:32.149888 D Publish entity: 'aglzfmJy...'
2019-09-01 19:43:32.202242 I requests.get https://willtmonroe.com/uncategorized/104/ {'headers': {'User-Agent': u'...'}}
2019-09-01 19:43:47.192272 I Connection failure: (<requests.packages.urllib3.contrib.appengine.AppEngineManager object at 0x150cdb2c3210>, DeadlineExceededError('Deadline exceeded while waiting for HTTP response from URL: https://willtmonroe.com/uncategorized/104/',))
Traceback (most recent call last):
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/publish.py", line 216, in _run
    resp = self.fetch_mf2(url, raise_errors=True)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/webmention.py", line 64, in fetch_mf2
    fetched = util.requests_get(url)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/util.py", line 209, in requests_get
    resp = util.requests_get(url, stream=True, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/oauth_dropins/webutil/util.py", line 1327, in call
    return getattr(requests, fn)(url, *args, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests_toolbelt/adapters/appengine.py", line 172, in urlopen
    **response_kw)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/urllib3/contrib/appengine.py", line 152, in urlopen
    raise TimeoutError(self, e)
TimeoutError: (<requests.packages.urllib3.contrib.appengine.AppEngineManager object at 0x150cdb2c3210>, DeadlineExceededError('Deadline exceeded while waiting for HTTP response from URL: https://willtmonroe.com/uncategorized/104/',))
2019-09-01 19:43:47.201390 I Converting code ... to 504
2019-09-01 19:43:47.201531 I Could not fetch source URL https://willtmonroe.com/uncategorized/104/
Traceback (most recent call last):
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/publish.py", line 216, in _run
    resp = self.fetch_mf2(url, raise_errors=True)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/webmention.py", line 64, in fetch_mf2
    fetched = util.requests_get(url)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/util.py", line 209, in requests_get
    resp = util.requests_get(url, stream=True, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/oauth_dropins/webutil/util.py", line 1327, in call
    return getattr(requests, fn)(url, *args, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/requests_toolbelt/adapters/appengine.py", line 172, in urlopen
    **response_kw)
  File "/base/data/home/apps/s~brid-gy/8.420691788766043574/local/lib/python2.7/site-packages/urllib3/contrib/appengine.py", line 152, in urlopen
    raise TimeoutError(self, e)
TimeoutError: (<requests.packages.urllib3.contrib.appengine.AppEngineManager object at 0x150cdb2c3210>, DeadlineExceededError('Deadline exceeded while waiting for HTTP response from URL: https://willtmonroe.com/uncategorized/104/',))

Do you have any advice on how I could troubleshoot this?

@snarfed
Copy link
Owner

snarfed commented Sep 2, 2019

hey, sorry for the trouble! and thanks for pulling the log.

bridgy's HEAD and GET requests to https://willtmonroe.com/uncategorized/104/ timed out after 15s. key parts of the log below. normal requests from my laptop in browsers and curl work fine, in well under 15s, so either your site is blocking bridgy's requests for some reason, or maybe your site had some trouble around 2019-09-01 19:43:47 UTC?

19:43:16 I requests.head https://willtmonroe.com/uncategorized/104/
19:43:31 W Couldn't resolve URL https://willtmonroe.com/uncategorized/104/ : ... DeadlineExceededError('Deadline exceeded while waiting for HTTP response from URL: https://willtmonroe.com/uncategorized/104/',))
...
19:43:32 I requests.get https://willtmonroe.com/uncategorized/104/ {'headers': {'User-Agent': u'...'}}
19:43:47 I Connection failure: ... DeadlineExceededError('Deadline exceeded while waiting for HTTP response from URL: https://willtmonroe.com/uncategorized/104/',))
@willtm
Copy link
Author

willtm commented Sep 2, 2019

Thanks so much for that reply. Given that I have encountered this problem before (a few days ago), I'm going to guess that my site is blocking bridgy requests.

I'm not quite sure where to go from here though. Do you have any advice?

@snarfed
Copy link
Owner

snarfed commented Sep 2, 2019

i assume you've already tried disabling plugins, etc.? are there any similar support forum threads on wordpress.org? also, check with your hosting provider. maybe a firewall rule or similar.

@willtm
Copy link
Author

willtm commented Sep 2, 2019

I haven't tried any of those yet. But I will investigate them. Thank you!

Reclaim Hosting is my provider and they've been very helpful in the past. I may check to see if the recent change I needed to make to my .htaccess file might be making a difference.

@willtm
Copy link
Author

willtm commented Sep 4, 2019

Hello!

I contacted my host, Reclaim Hosting, and they had this to say:

Hello there,

Thanks for reaching out! It looks like the pastiebin log is showing request timeouts coming from willtmonroe.com, it would be something we'd need more information about from Brid.gy to see if there are any adjustments that need to be made to mitigate that.

If you hear back and let us know if there's anything we'd be able to do, we'll be more than happy to assist.

Is there anything you'd like me to relate to Reclaim Support?

@snarfed
Copy link
Owner

snarfed commented Sep 4, 2019

hey, sure! short answer is, point them to my coment above, or here. specifically, HTTPS GETs to willtmonroe.com by users in browsers and curl work fine, in well under 15s, but the same GETs (and HEADs) from Bridgy's IP addresses, with User-Agent: Bridgy (https://brid.gy/about), always hang and never return. seems like the TCP connections are probably being dropped.

@willtm
Copy link
Author

willtm commented Sep 4, 2019

Hello again,

Just heard from Jim Groom at Reclaim and he asked this question:

Also, do you have any IP address for the Brid.gy service so I can make sure they are not being firewalled?

Is there anything I could share? Also, if you wanted to send me a private message, I could copy you on the emails that Reclaim Hosting is sending me. Just an idea.

@snarfed
Copy link
Owner

snarfed commented Sep 4, 2019

sure! google has a large number of outgoing IP address blocks, so it would take a bit of work to enumerate them all. the "Bridgy's IP addresses" link above has details.

(also, these are IPv4 connections, since willtmonroe.com's DNS doesn't currently advertise any IPv6 addresses.)

@timmmmyboy
Copy link

Hey guys, just to add to this conversation we're looking into what our options are, however whitelisting such a massive range for all Google Cloud customers would expose us pretty heavily (cloud services like that are often used for malicious intent since they can be fired up and taken down easily). We may be able to flag particular user agents but I'm not sure. Are there no options to get a dedicated IP from them or smaller range for your project @snarfed? If it helps the service we are using that is likely flagging this is https://bitninja.io which sends notices to the IP owners when their IPs are flagged. We can whitelist on a case by case basis but I'd rather do it for a particular project than whitelist all users on Google Cloud by default.

@snarfed
Copy link
Owner

snarfed commented Sep 5, 2019

hi tim! honestly, if BitNinja is indiscriminately blacklisting huge swathes of IP blocks like Google Cloud's, that seems extremely aggressive to me. i expect it's also blacklisting many others too, and probably affecting many legitimate users, both automated (like Bridgy) and human. i'd maybe take a look at whether they're really necessary, and if you truly think they are preventing some concrete harm, whether there's another way to avoid it.

IndieWeb in particular expects HTTP requests from other web servers to work (for webmentions, etc) as well as from humans. if BitNinja is aggressively blocking "bots," that often translates to any server, period, which will materially harm IndieWeb interop.

as concrete examples besides Bridgy, a number of other IndieWeb community members run sites and services on Google Cloud, including @aaronpk (telegraph, watchtower, ownyourgram), @kevinmarks (mention.tech, others), and more.

@snarfed snarfed changed the title WordPress to Twitter: Cannot resolve URL error Sep 5, 2019
@snarfed
Copy link
Owner

snarfed commented Sep 5, 2019

btw @timmmmyboy have you confirmed that BitNinja is blocking Google Cloud IP addresses specifically, and that's the root cause here? if not, might be worth double checking before we dive too deep. could also be User-Agent, or something other then BitNinja, etc.

@timmmmyboy
Copy link

Certainly not implying that Bitninja is blocking all Google Cloud IPs. I think they are blocking yours based on a false positive in their web application firewall. However when we requested the IP to check on that we were given only the broad ranges. It sounds like Google doesn't offer a dedicated IP. The issue you're describing is one with many firewalls, right now for better or worse IP addresses are the way hosts identify themselves on the web. Even if we find a way to detect user agent and whitelist that (which I'm not sure is a current feature) a user agent is easily spoofed. Even though Google says no IP is permanent, is it possible that the IP is the same or a smaller range right now for the purposes of further investigation (like one of the 4 A records brid.gy resolves to)? If it really is as big as those ranges I have no way of even narrowing down to find the activity.

@aaronpk
Copy link
Contributor

aaronpk commented Sep 5, 2019

I just don't think IP address blocking is a good idea in general, since as we're seeing here, it's too easy to get false positives.

Google provides 4 A records for incoming HTTP requests to bridgy, but there is no guarantee that the requests that bridgy makes outgoing will come from those IPs. This is a similar pattern when deploying things on AWS, which I've done in the past. Inbound HTTP server kicks jobs to a background queue which are processed by ephemeral machines that get their own short-lived IP addresses, and those background processes are the ones making outbound requests.

@snarfed
Copy link
Owner

snarfed commented Sep 5, 2019

@timmmmyboy Google Cloud has many different products. some do include dedicated IPs, but Bridgy runs on App Engine, a serverless platform, so it doesn't have dedicated IPs per se. more importantly, outbound HTTP requests from App Engine (and many other Google Cloud services) go through a separate, large scale HTTP fetching service (linked earlier), which has many outbound IP addresses. that's why enumerating a few Bridgy-specific IPs isn't possible. from that doc:

Outbound services, such as the URL Fetch, Sockets and Mail APIs, make use of a large pool of IP addresses. The IP address ranges in this pool are subject to routine changes. In fact, two sequential API calls from the same application may appear to originate from two different IP addresses.

@aaronpk has the right idea. if the root cause here is indeed that BitNinja has blacklisted massive swathes of IPs, including some of the largest cloud hosting providers in the industry, which then also breaks legitimate individual users, and requires in depth debugging and discussion and special cases...that seems like a bad approach in general.

(fwiw, Bridgy has been running since 2013, has over 5k accounts right now, and has successfully sent webmentions to over 2k different domains and web sites. i've only ever heard a few instances of hosts blocking it like this. if the root cause here is indeed BitNinja's aggressive IP blocking, it's in a small minority, not the common case.)

@Lewiscowles1986
Copy link

Hey guys, just to add to this conversation we're looking into what our options are, however whitelisting such a massive range for all Google Cloud customers would expose us pretty heavily (cloud services like that are often used for malicious intent since they can be fired up and taken down easily).

I realize I'm late to the party, but @timmmmyboy I think what you're saying is that malicious actors may impact your service, so you blocked them, or you use a whitelist which does not include them? These are two separate concepts, which would produce odd results if used together. The result would be that only an intersection of visitors could reach your servers, so I have to believe this is communication issue.

As an alternative strategy, have you considered a web application firewall? It's a few layers above IP address filtering so this may be a matter of commercial viability and cost, but it seems odd that having hosted sites with millions of hits, I've never once had to block CIDR or IP on 80 or 443. Perhaps a premature optimization is hurting customers?

RE: Bridgy, one thing I did notice (although it shouldn't cause a block) is the user-agent. I've encountered systems which whitelist known user-agents and UA patterns. It's as shaky as the IP blocking in that it harms users without much benefit to hosts of private individual content (not a jab, more recognition that it's a different ballpark to hosting corporate or government sites).

@timmmmyboy
Copy link

You can read more about how the service, Bitninja, that we use works at https://bitninja.io. The theoretical discussion of whether hosting companies should employ firewalls is frankly not up for debate here so if the answer for us is to drop our firewall this issue can be closed without resolution. I'm not going to spend time debating that with folks that aren't responsible for protecting thousands of customers. I've mentioned earlier that I don't think a massive range is being blocked, but rather a massive range is the only one being provided thus far and it sounds like Google can't provide further specifics due to how their infrastructure works so we're at a bit of a crossroads here. We remove and whitelist false-positives when they come up (I just had one where Jetpack for WordPress was seen as an XML-RPC brute force for example). But if we can't even identify the traffic because it has no distinguishing markers other than user agents then I can't really resolve this on our end.

@snarfed
Copy link
Owner

snarfed commented Sep 11, 2019

definitely understood @timmmmyboy! no one's proposing that you drop your firewall. we are suggesting that BitNinja may be harmful, though, so you might want to consider alternatives.

@Lewiscowles1986's mention of WAF was one good, practical suggestion. i can add a wide range of distinguishing markers to Bridgy's traffic, including HTTP headers, which WAFs can easily handle. if BitNinja can only work at the IP level, though, then the only distinguishing marker it can use is IP address/CIDR range. if you know of another, let me know! i'm happy to try. otherwise, we're probably out of luck here.


(and apologies for continuing the broader discussion, but i do need to reiterate: if BitNinja has indeed indiscriminately blocked even just some of Google Cloud, that will cause many more problems for visitors to your sites, beyond just IndieWeb/Bridgy. Google Cloud is one of the largest cloud hosting providers, so a ton of other tools hosted there will also fail - CDNs, VPNs, intermediate caches, link previews in messaging and social media sites, monitoring tools, content filters, client-side malware detectors, web accelerators (ie prefetchers) - which will cause user-visible breakages for a wide range of your end users and use cases.)

@snarfed
Copy link
Owner

snarfed commented Sep 20, 2019

seems like we've figured out the root cause and possible paths forward here, so i'm going to tentatively close this. apologies for the inconvenience, and thanks for the contributions and feedback, all!

@snarfed snarfed closed this as completed Sep 20, 2019
@willtm
Copy link
Author

willtm commented Sep 21, 2019

@snarfed I will look forward to seeing if it's possible to use Brid.gy to post to Twitter from my Wordpress blog on Reclaim. Fingers crossed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants