TLDR; We deeply understand what all NAT options are and how they work. Super strong team technically. We’re challenging our traditional wisdom and assumptions to try to justify why we should spend extra $$$ on NAT solutions in AWS. We are not restricted by cost but certainly do not spend money purely for religious reasons (as in, because that’s how we’ve always done it).
We are currently considering three mechanisms to reach EC2 instances:
- IGW -> Subnet(s) | 1-1 NAT (When PIP or EIP attached to instance)
- NGW -> Subnet(s) | 1-Many NAT
- Nat Instances (or roll your own) | Can accommodate both above scenarios
We started with a NAT instance years ago prior to NGW’s existing. We rolled our own. It was easy; NGW’s were easier. We switched to NGW’s some years later.
Over the past half decade our traffic increased and the bi-directional billing on our NGW’s has grown. We are addressing the concept now. We had an open discussion on our team about our architecture and our assumptions. We are having a hard time justifying leveraging the NGW or Nat Instance solutions. Some of our concerns:
- Bi-Directional billing on the NGW
- Maintenance / Support of NAT Instances
- Limits on bandwidth for both NGW / Instances
- Additional Complexity; No functional gain (over IGW)
We are considering various other designs; loosely:
- All IGW 1-1 NAT (on instances that need internet)
- Instances like DB’s / Internal machines - No PIP / EIP
- VPCE -> S3/Dynamo for that traffic
- Still leveraging ALB’s the same way
- This list is not exhaustive
(I would like to acknowledge a user in your slack brought up the valid point that IGW’s (and NAT Instances) can permit inbound traffic at all which may be a differentiator between NGW’s and the other two options.)
Why would an IGW 1-1 NAT flag an audit over a NAT Instance? Have you guys seen that in practice?
Our few machines that need access to the internet would still maintain a tight sec group with Deny All outside of our internal VPC traffic. None of our instances will effectively be reachable from the internet. This all feels functionally the same; but, I confess it challenges my 20 years of conventional wisdom and I’m very much interested in any real world compliance tests that would fail this and why.
Thanks for giving this some thought!