Why pay for NAT solutions in AWS?

TLDR; We deeply understand what all NAT options are and how they work. Super strong team technically. We’re challenging our traditional wisdom and assumptions to try to justify why we should spend extra $$$ on NAT solutions in AWS. We are not restricted by cost but certainly do not spend money purely for religious reasons (as in, because that’s how we’ve always done it).

We are currently considering three mechanisms to reach EC2 instances:

  • IGW -> Subnet(s) | 1-1 NAT (When PIP or EIP attached to instance)
  • NGW -> Subnet(s) | 1-Many NAT
  • Nat Instances (or roll your own) | Can accommodate both above scenarios

We started with a NAT instance years ago prior to NGW’s existing. We rolled our own. It was easy; NGW’s were easier. We switched to NGW’s some years later.

Over the past half decade our traffic increased and the bi-directional billing on our NGW’s has grown. We are addressing the concept now. We had an open discussion on our team about our architecture and our assumptions. We are having a hard time justifying leveraging the NGW or Nat Instance solutions. Some of our concerns:

  • Bi-Directional billing on the NGW
  • Maintenance / Support of NAT Instances
  • Limits on bandwidth for both NGW / Instances
  • Additional Complexity; No functional gain (over IGW)

We are considering various other designs; loosely:

  • All IGW 1-1 NAT (on instances that need internet)
  • Instances like DB’s / Internal machines - No PIP / EIP
  • VPCE -> S3/Dynamo for that traffic
  • Still leveraging ALB’s the same way
  • This list is not exhaustive

(I would like to acknowledge a user in your slack brought up the valid point that IGW’s (and NAT Instances) can permit inbound traffic at all which may be a differentiator between NGW’s and the other two options.)

Why would an IGW 1-1 NAT flag an audit over a NAT Instance? Have you guys seen that in practice?

Our few machines that need access to the internet would still maintain a tight sec group with Deny All outside of our internal VPC traffic. None of our instances will effectively be reachable from the internet. This all feels functionally the same; but, I confess it challenges my 20 years of conventional wisdom and I’m very much interested in any real world compliance tests that would fail this and why.

Thanks for giving this some thought!

We obviously don’t know your use case or deployment in specific, but speaking in generalities it definitely makes some sense to route what you can through a VPC endpoint to avoid some of the data transfer charges. That we can confirm from a cost standpoint.

With regards to the security audit, we haven’t seen that yet with any of our customers. However, we haven’t really seen any of our customers use that approach to internet access. Think the only common sense issue that I could raise is the following: (1) that a catastrophic failure of your security protocol would potentially cause more damage than it would with the NGW setup…meaning if you don’t follow your own procedures your risk of exposure might be higher, (2) that I think semantically, which may have some bearing on the audit, any VPC that is connected to a IGW is called a public VPC…not sure if that matters but it might.

Also, I know its trite, but I would check your compression strategies in terms of what you are transferring, sometimes companies get sloppy.