Networking on AWS (2018)

petercooper | 533 points

There's quite a few things you've missed that are significant and should have been included, maybe one for part two:

* Network ACLs, which describe the ruleset (consider it like a stateless firewall) for subnets and their respective routes. Whilst they are optional, having a default set it straightens out a lot of duplication that may end up in Security Groups (which are more stateful in nature).

* Elastic (public) IPs. NAT instances/gateways require their use, and there is dance to be done around their allocation in account, and attaching to instance interfaces.

* IPv6 components. Egress-only Internet Gateways operate differently to IGWs, as there is no NAT they need a route applied across all subnets both public and private. IPv6 CIDR which allocates the VPCs /56 (and thus each subnet gets a /64, and each instance's interface thus gets a /128 which is bananas, but IPv6 is a second class citizen on AWS). Finally updating the subnets so automatic IPv6 address assignment happens.

* VPC Gateways - these are broken into two types, the older type that support S3/DynamoDB and effectively allow traffic in a public/private subnet to bypass NAT. These enabled can have significant advantages to access and throughput. The newer "PrivateLink" services are different and having pricing costs associated with them.

* DNS and DHCP: It's a rule in the VPC that the delegated resolver lives on ".2" of the VPC's CIDR, and operates in dual-horizon - EC2 hostnames setup accordingly resolved by instances inside the VPC will get the private VPC CIDR address, not any Elastic IP.

becauseiam | 5 years ago

Off topic, but as a network guy by heart I've always been fairly happy with how AWS implements the network side of things, especially in comparison to something like Azure.

AWS you have the same basic concepts of a network, and the terminology aligns enough that you can make sense of it fairly quick if you're in the network realm. Azure however takes all of that 'network' stuff and turns it into this abstraction where you have to carefully follow one of their guides to realize it's out of date, or the UI doesn't show the appropriate information etc. Also you have Azure network portions that block ICMP because of 'security'.

This is all anecdotal from my experience of course, but it's why I keep referring to Azure as the "Excel spreadsheet of the cloud" because the entire design of it is in your face and non intuitive.

For instance if I wanted to make a direct connection like DirectConnect to multiple VPC's in AWS, I'd use the Transit Gateway, connect to it from on-prem, add the VPC and the route, and be done.

In Azure, I'd use expressroute, add the Expressroute circuit to a Subscription, add a gateway for that, and then an additional gateway for each VPC equivalent, create an authorization key for each 'VPC' equivalent and sync them, and then define routing per gateway. Then when you go in to trace the network path ICMP is blocked.

I know AWS is more mature than Azure, so it's not entirely fair to criticize them, but every time I touch Azure I miss AWS, or even GCP. Perhaps it's just me not being familiar enough with Azure. ¯\_(ツ)_/¯

llama052 | 5 years ago

NAT gateways are one of the things that blindsided me on the whole "serverless" idea for hobby projects. To have a Lambda function with access to the outside world and your private network resources your $0.01/month function becomes a $35/month+ expense if you don't want to manage your own t2 NAT instance (and required patches, upgrades, scaling, monitoring, etc).

See https://forums.aws.amazon.com/thread.jspa?threadID=234959

benmanns | 5 years ago

I taught some courses on AWS for a year and a half. The networking piece is something that is trivial for any network engineer, but for any developer (which is my background) working through the network piece is crucial. It takes a while and this looks like a good reference. However, it's best to also check out the AWS docs https://docs.aws.amazon.com/vpc/latest/userguide/what-is-ama... . They are not always the easiest read, but I find them to be pretty authoritative.

I also like this video https://www.oreilly.com/library/view/amazon-web-services/978... (part of http://shop.oreilly.com/product/0636920040415.do ). Full disclaimer, I used to work with Jon.

mooreds | 5 years ago

I created a collection of terraform modules that gets a minimal AWS network set up for a single-region webapp: https://github.com/lopopolo/hyperbola/tree/master/terraform/...

p4lindromica | 5 years ago

I want to add a few notes useful for packet crafting. AWS, Google Cloud, and Azure don't work at layer 2 (Ethernet) as expected since they provide services at layer 3 and up.

For example, if you modify the MAC destination address it will not work in AWS. To be able to do that you should disable source/destination checks as is specified in [1].

The last time I checked you cannot do that in Google Cloud or Microsoft Azure.

When we experienced this issue, Reddit was the best resource for answers. I put the Reddit threads as they can help others working in projects requiring packet crafting:

https://www.reddit.com/r/sysadmin/comments/51xypj/vpc_amazon...

https://www.reddit.com/r/networking/comments/51y52n/aws_vpc_...

https://www.reddit.com/r/sysadmin/comments/533e14/google_com...

[1] https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Ins...

wslh | 5 years ago

Nice, I found this explanation a bit more in depth and super helpful as well

https://start.jcolemorrison.com/aws-vpc-core-concepts-analog...

ninetax | 5 years ago

As long as IPv6 is a second-class citizen, things are going to continue to be painful in AWS.

I did the whole "here is your /56, now segment it yourself" thing. Its crude. It should not be neccessary, if V6 was central to the model, you'd be assigned /64 from your covering prefix automatically, as you deploy regional nodes.

ggm | 5 years ago

In my opinion, the most annoying thing about AWS networking - and some other services - is that they often use IDs and do not show labels which forces me to remember them partly, go back and forth or have multiple windows open. The AWS console is not the best UX piece on the web, but this part is especially error prone.

cygned | 5 years ago

I’ve been banging my head against the wall for a week trying to set up a site to site VPN in AWS with a Cisco ASA. The auto generated config file have a lot of missing info.

If anyone knows of a good resource on the subject it would be greatly appreciated.

tambourine_man | 5 years ago

Something that's missing from this (otherwise great!) guide, that has puzzled me for a while - what's the point? What does this configuration actually gain you/AWS? My best guess is that private subnets are for DDOS protection, but that seems like something that would be better handled by throttling. Given the amount of complaints I've heard about how difficult VPC/Subnet setup is, why bother with it at all? Staving off IP address exhaustion?

Or, to ask it another way - what would be the downside of all your resources being in 1 single-Subnet VPC, spread evenly across AZs?

scubbo | 5 years ago

Tangential question: This guy's blog has fantastic content but I don't see an RSS feed or any other way of subscribing (apart from a much broader Twitter feed). What's the best way to keep up?

abalone | 5 years ago

Been recently putting together a homelab and it's done wonders to help make some of these more abstract things like routing tables and CIDR mentioned a lot more concrete.

vvanders | 5 years ago

Would be really good to go into managed VPNs and VPC peering. These are some of the amazing things that the VPCs provide you that took me a while to figure out.

gravypod | 5 years ago

The part that irks me is that if you’re doing any VPC design that’s going to even potentially include peering you need to carefully understand the limitations first. This means that within the same region you can refer to security groups in rules as if they’re in the same VPC, but you can’t do that if you’re peering across regions. Then add in some DNS restrictions (like not being able to directly resolve a peer VPC’s entries, somewhat solvable by use of VPC private zones to serve as DNS across regions) and it can be real awkward. Then there’s overlapping VPC CIDR issues (VPC Transit Gateways can only sorta help this)

The primary caveats beyond basic networks that impact designs is that multicasting is not enabled by the network layers but at the network interface (ENI) layer and you need to carefully look at how security groups really work (they’re attached to an ENI fundamentally, which is how you can route between networks with a single instance as long as it’s within the same AZ)

All of this I’ve found was completely disregarded / unknown by almost every company outside the F500 or high end tech start-ups when they first started with AWS and I’ve spent a lot of my career having to migrate production environments between VPCs so that we can get enough room to grow adequately. Making subnets as small as possible is not what you should be doing in AWS, folks. In fact, making them real small means you spent a fair bit of effort which means you decided to put in a lot of effort without stopping to read the documentation in earnest for a couple hours. And using a default VPC CIDR repeatedly from the console is a pretty grand way to make sure you can never let two VPCs communicate with each other via anything other than a third intermediate VPC that you’ll have to migrate to eventually.

Some of the overly-cautious networking approaches I’ve seen include making a VPC for every single application / service, using a NACL for every application (multiplied by every AZ used to isolate each subnet and cutting off cross-AZ routing thereby, of course), creating your own NAT instance that doesn’t do anything better than a NAT gateway, NAT gateways in every AZ (for a whole $1 of traffic / mo each). The story of problems in AWS infrastructure is the same - trying to plan too far ahead for the wrong things and not realizing the limitations of the right things that are not flexible anymore. This is much more common when companies hire traditionally experienced network engineers that have just a little too much confidence.

devonkim | 5 years ago

For the public subnets where the NAT gateways are, you can use 1 route table for all public subnets together.

Besides that: nice article

mvanbaak | 5 years ago

The title says "everything you need to know about networking on AWS". I wish it were this simple.

The article is well written, but it simply represents maybe 1% of what you need to know here. I would have called it "A simple introduction to networking on AWS".

simonebrunozzi | 5 years ago

Does anyone know of any similar resources for Google Cloud?

rkangel | 5 years ago

AWS security groups and ACLs are the most worthless things. you cant treat them like a real firewall. you end up just allowing anything outbound or inbound. they dont let you be detailed enough

Gelob | 5 years ago

Ugh! That's the most complex explanation of AWS I've ever seen.

He just described a NETWORK, not AWS.

AWS has renamed lots of things, but all the scary text configs that used to be the domain of wizened sysadmins have been replaced with very simple single-page-app GUI controls. LIke routers and gateways: those terms are largely gone from the AWS vocabulary.

No need to get into subnets and route tables I think.

The majority of clients I've worked with use AWS for web hosting with an ELB load balancer (the most important part), an EC2 instance policy & image (for handling traffic fluctuations), an RDS (database), an S3 and Route53 (external DNS entries)

Point the load balancer to the outside world and then let it spin up instances. That's the most common model I've encountered.

IT's almost cartoonishly simple compared to what the OP wrote here. Almost. Having an understanding of network architecture helps, but not THAT much.

iheartpotatoes | 5 years ago