Traffic routing to AWS resources via VPN using TGW

0

VPN

I am having a bit of a challenge getting some traffic routing from On-Prem to AWS services. First - the VPN router is only capable of 1 tunnel active at a time. I troubleshot this with AWS support for 2 hours and we determined that the reason I was having tunnel issues before is because some traffic was going out on tunnel 1 and trying to come back on tunnel 2 which would get lost. That is resolved and the tunnel is up. Second - I can connect to on prem using on prem local IPs from my remote machine when connected to the client VPN so I know traffic can transverse the VPN/TGW both directions in that case. Third - IKEv2, static routing are the settings for the site-to-site (Yes, tried to get dynamic working but could not figure out how to get it working). Fourth - security group rules allow all necessary traffic to the required ports and all traffic that is within the security group is completely open.

What I cannot seem to get working is the traffic from On prem into the NLB or the SES service SMTP. The on prem network has a device that needs to at least be able to send emails using SMTP and connect to the AWS Transfer Family FTP via the NLB (Yes, I know FTP with no encryption... blah blah blah, if you want to discuss the why of that please PM me and I will be happy to explain).

I have tried setting the on prem device to point at both the private DNS name and the private IP of the endpoints but I cannot get it to send an email or FTP to connect and I have no idea why.

Here is a manual that has the same OS as the on prem router - https://www.antaira.com/core/media/media.nl?id=1524556&c=685553&h=b6ac7f58c81ec6ba671f&_xt=.pdf

I am not sure if the issue is on the AWS side with my setup, routing tables, or something else. I am also not sure if I need to set a static route on the on prem router to guide that traffic.

Edit: I should note that I have tried setting static routing on the on prem router, but it didn't fix the issue so I am not sure I did it correctly.

5 Answers
0

I see no issue with your routing configuration or the combination of the site-to-site VPN, TGW, TGW route table, and VPC route table.

I know this is an obvious question, but do the security groups of the VPC endpoint for SES, the NLB (if you've attached a security group to the NLB), and the Transfer server endpoint behind the NLB permit the on-prem traffic? Have you got VPC flow logs for the target VPC where you could check easily if the traffic is arriving in your VPC through the site-to-site VPN?

Which IP range is routed from on-prem to the site-to-site VPN? Is it the whole 172.31.0.0/16 CIDR of the VPC?

EXPERT
Leo K
answered a month ago
profile picture
EXPERT
reviewed a month ago
  • There is only one VPC and a single subnet being used by all of these and the security group rules all of all traffic within the security group to all ports. I would think that would be sufficient to allow traffic within the VPC. I also have set specific rules before with 0.0.0.0/0 CIDR allowed before and it changed nothing and I changed them for obvious reasons to only the Client VPN CIDR. I have flow logs for the tunnels, but not the VPC; let me turn the flow logs for the VPC on too.

    So if you look at that link for the router manual and find the static routing, you'll see that it's a bit clunky that interface. I THINK that destination IP is only allowed to be a specific IP so I set them to the VPC endpoint private IPs for the destination, the "Next Hop" for the gateway for those IPs using the WAN connection.

  • I tried to send an email from the router itself for example: Jul 10 13:05:46 mailtool: cannot connect 172.31.1.21

    The problem is that the ENI for that IP flow logs only show a bunch of this:

    2 632257070288 eni-05d92145317a08599 - - - - - - - 1720641908 1720641977 - NODATA

  • Okay, that means that the VPC flow log is only reporting that logging is working but that there was no traffic during the reporting period. If the VPC flow log is set to record all traffic, as I'm assuming it is, that means that it's almost certainly the on-prem VPN box that isn't sending the packets through the VPN tunnel. I haven't read the manual you posted yet.

  • Here is a TGW Attachment flow log - 6 TransitGateway 632257070288 tgw-08d4f3c6e05fc0cfd tgw-attach-035207b6f7ad62dd6 632257070288 632257070288 - vpc-0e76a8dac196d006c - subnet-0cb5441dffb9e6f60 - eni-0bb424eb6675b7f8d usw1-az1 usw1-az3 tgw-attach-025bd083a6c4c4538 172.20.104.4 172.31.1.13 50027 21 6 3 144 1720643589 1720643634 OK IPv4 0 0 0 0 2 us-west-1 egress - -

    If I am reading this correctly, this is the on prem device connecting successfully to the NLB private IP for FTP on port 21. However, I am not seeing any data at all landed in s3. So, what the heck is happening there I am not sure.

  • Yes, surprisingly, it does show tcp/21 traffic from on-prem. And for the same timestamp, the VPC flow log doesn't show the attempt?

0

Is the TGW attachment in the destination VPC in the same subnet with the endpoint and NLB, or in a dedicated TGW attachment subnet?

EXPERT
Leo K
answered a month ago
  • Yes, they are all in the same subnet at the moment. I was trying to KISS it so I didn't have to try and deal with NAT between subnets. Here is more information - On a whim I tried to connect directly to the AWS Transfer Family Private IP in the same subnet (172.31.1.22) from the on-prem.... I connected:

    CONNECTED SourceIP=172.20.104.4 User=ClientFTPUser HomeDir=/ECT-data/Black_Water_Systems/client Client=UNKNOWN Role=arn:aws:iam::632257070288:role/service-role/EpicFTPDataRetrieve-role-lh22tofx Protocol=FTP IdentityProviderRequestId=a100de4c-5d3f-46dc-8e4c-0f271798d587

    However, I am still not seeing data land in S3; I am at a bit of a loss on that. I am also still not seeing emails being sent so that might be something that has to do with whatever issue that you are thinking about from this question. On transfer family I see the connection, but never a file or data.

    The current subnet CIDR is 172.31.1.0/27 I have another subnet which is 172.31.1.64/27

    To be fair, I am working on this for a start-up and they are slowly but surely coming up with more requirements and I may just need to re-architect the whole thing so all their sites (which are very similar to the one that was the catalyst for this) use a VPN. It's just a matter of cost and redoing the FTP setup without a load balancer because they can connect to it without it would be a good cost savings and way less complicated. Provided I can get the email sending and the data to actually flow that is.

0

If the Transfer server is only used to provide an FTP protocol adapter for on-prem, then you certainly don't need the NLB, and it's probably not worthwhile to troubleshoot. An NLB would typically be needed to provide access to a Transfer server with private IPs via public IPs. The NLB would provide the frontend layer with public IPs in that case. Otherwise, an NLB is just added complexity and cost.

Did the flow logs for the VPC (not the TGW) show the successful connection you made directly to the Transfer server endpoint? And is the VPC flow log continuing not to show connections to the VPC interface endpoint for SES, while the TGW flow log is showing them? If so, just to double-check that we're not getting tripped up by the default limitations for tcp/25, would you test connecting to tcp/465 or tcp/587 instead, plus testing with tcp/443 (where it would provide the REST API) to the SES endpoint?

For the file upload problem, is ECT-data the name of the bucket or a prefix inside your bucket?

EXPERT
Leo K
answered a month ago
  • There are 5 other sites currently that are connecting to the FTP via the NLB (All the same devices as the site I am currently working on), but I am discussing with the client to possibly bring them all into a VPN via the same TGW and get rid of the NLB. The SMTP settings are using 587 port for the SES SMTP server. It took me a while, but I cannot see any traffic hitting the VPC for port 587 and I still am not seeing data coming in to s3. I can see the login lambda firing, but the workflow Lambda on the AWS Transfer Family server is not firing. So it is getting lost somewhere in there.

0

Would you be able to run a Wireshark packet trace on the on-prem source machine attempting to upload the files? It'd be easy to see from a cleartext FTP packet capture exactly how far it's getting with the file upload attempt.

Is the tcp/587 traffic showing in TGW flow logs but missing from VPC flow logs?

EXPERT
Leo K
answered a month ago
0

You need two ports for inbound FTP connections. The port 21 is for control connections and ports 8192-8200 are used by AWS FTP for data connections. When a data connection is negotiated the FTP control message returns a PASV address and port combination where the FTP data connection must happen. By default the service does not provide an address, only a port unless the PassiveIP configuration is done which is a requirement when NLB is used in front of the FTP endpoint. Additionally when NLB is used the number of concurrent connections for inbound FTP data transfers is limited to 8! This is because of the small range of ports provided for data connections. This is an unfortunate side effect of the NLB which places its own PPv2 header IP address on the inbound connections.

I see nowhere in the descriptions provided the necessary configurations for proper NLB operation with FTP protocol. If you can eliminate the NLB in this setup your problems will be reduced. Security groups must be configured for your inbound traffic to allow the port range 8192-8200 to make all this work.

AWS
Alex
answered 25 days ago