AWSNetworkingInterview Prep

AWS VPC Interview Questions for Senior and Staff SRE Roles

Shivan Bhimireddy·April 28, 2025·10 min read

Why VPC Questions Are a Staple of Staff SRE Interviews

VPC is the network foundation of every AWS architecture. Staff SRE interviews probe VPC knowledge because it reveals whether a candidate understands how AWS services actually communicate — not just which services exist.

A senior engineer knows: "put the database in a private subnet." A Staff engineer knows: why private subnets, which route tables, what the security group rules need to be, how DNS resolves across the VPC, and what breaks when the NAT Gateway goes down.


Fundamentals

1. What makes a subnet "public" in AWS?

Two things must both be true: (1) the subnet's route table has a route for 0.0.0.0/0 pointing to an Internet Gateway, and (2) instances in the subnet have public IP addresses (either auto-assigned or Elastic IPs). The subnet itself has no "public" property — the route table and IP assignment are what matters.

2. Can an instance in a private subnet initiate outbound internet connections?

Yes, via a NAT Gateway (managed) or NAT instance (self-managed) in a public subnet. The private subnet's route table sends 0.0.0.0/0 to the NAT Gateway. The NAT Gateway has an Elastic IP and translates the private IP to its public IP for outbound traffic. Inbound connections from the internet are not possible — NAT is one-way.

3. Why deploy one NAT Gateway per AZ instead of one shared NAT Gateway?

A single NAT Gateway is a single point of failure. If the AZ it is in goes down, all outbound internet traffic from other AZs fails — even if those AZs are healthy. With one NAT Gateway per AZ, each AZ's private subnets route through their own NAT Gateway. Cost is higher (you pay per NAT Gateway plus data processing), but availability matches your multi-AZ architecture.

4. What is an Internet Gateway and what does it do?

An Internet Gateway (IGW) is an HA, horizontally scaled VPC component that enables communication between your VPC and the internet. It performs NAT for instances with public IPs (translates private IP ↔ public IP). One IGW per VPC. The IGW itself is not a bottleneck — bandwidth is only limited by the instance's network interface speed.

5. What is the difference between a security group and a NACL?

Security groups are stateful (return traffic is automatically allowed), instance-level, allow-only (no deny rules). NACLs are stateless (you must explicitly allow return traffic in both directions), subnet-level, support both allow and deny rules, and are evaluated in rule number order (lowest first). Security groups are the primary access control mechanism. NACLs are used for subnet-level blocking — e.g., blocking a known malicious IP range across an entire subnet.


Routing and Connectivity

6. What is VPC Peering and what are its limitations?

VPC Peering connects two VPCs (same or different account/region) with a private network connection. Traffic does not traverse the internet. Limitations: (1) no transitive routing — if VPC-A peers with VPC-B and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through VPC-B; (2) overlapping CIDR ranges cannot be peered; (3) you must update route tables in both VPCs manually. For more than a handful of VPCs, Transit Gateway is better.

7. What is AWS Transit Gateway and when do you use it instead of VPC Peering?

Transit Gateway is a regional network hub that connects VPCs and on-premises networks. It supports transitive routing — any attached network can reach any other. Use TGW when: you have more than 3–4 VPCs that need to communicate, you need a hub-and-spoke topology, you need to connect multiple VPCs to on-premises over Direct Connect or VPN without a mesh of connections. TGW costs more per hour than peering but reduces complexity significantly at scale.

8. What is AWS PrivateLink and how is it different from VPC Peering?

PrivateLink (Interface VPC Endpoints) exposes a specific service from one VPC to consumers in another VPC, without exposing the entire VPC. Traffic flows via an ENI in the consumer's VPC to an endpoint service backed by a NLB in the provider's VPC. The provider's VPC CIDR is never accessible — only the specific service endpoint. Use PrivateLink when you want to share a service (not a full VPC network) with other teams or customers. AWS services (ECR, S3, SSM, etc.) also offer PrivateLink endpoints so you can access them privately without a NAT Gateway.

9. What is an AWS Gateway Endpoint (for S3 and DynamoDB)?

Gateway endpoints are a different type of VPC endpoint, only for S3 and DynamoDB. Instead of creating an ENI, they add an entry to your route table pointing to the endpoint. Traffic to S3/DynamoDB goes directly from your VPC to the service over the AWS backbone — no NAT Gateway, no internet, no per-GB data processing charge. Always use Gateway endpoints for S3 and DynamoDB — they are free and eliminate NAT costs.


EKS-Specific VPC Questions

10. What tags must subnets have for EKS to use them for load balancers?

For the AWS Load Balancer Controller to discover subnets: public subnets need tag kubernetes.io/role/elb=1, private subnets need tag kubernetes.io/role/internal-elb=1. Both need kubernetes.io/cluster/<cluster-name>=owned (or shared if multiple clusters use the subnet). Missing these tags is a common cause of ALB creation failures with the Load Balancer Controller.

11. Why do EKS nodes need VPC endpoints for ECR?

EKS nodes pull container images from ECR. Without VPC endpoints, image pulls go: node → NAT Gateway → internet → ECR (which is a public endpoint). This costs NAT Gateway data transfer fees and adds latency. With VPC endpoints for com.amazonaws.region.ecr.dkr, com.amazonaws.region.ecr.api, and com.amazonaws.region.s3 (for image layers), traffic stays on the AWS backbone. If a node is in a subnet that does not have the ECR endpoint attached, image pulls fail — a common cause of ImagePullBackOff in EKS.

12. An EKS pod cannot reach the internet. Walk through your diagnosis.

(1) Check the node's subnet — is its route table routing 0.0.0.0/0 to a NAT Gateway? (2) Check the NAT Gateway — is it in a public subnet with an Elastic IP? Is its status available? (3) Check the security group on the node — does it allow outbound traffic? (4) Check NetworkPolicy — is there a default-deny-egress policy blocking the pod? (5) Check if the destination is an AWS service — if so, use a VPC endpoint instead of internet routing. (6) Check if the NAT Gateway is in the same AZ as the node (recommended — cross-AZ NAT adds cost and latency).


Design Questions

13. Design the VPC architecture for a production three-tier web application.

Three AZs. Subnets per layer per AZ (9 subnets total): public (ALB), private-app (EKS/EC2), private-data (RDS, ElastiCache). Route tables: public subnets → IGW for 0.0.0.0/0; private-app subnets → NAT Gateway in the same AZ; private-data subnets → no internet route (truly isolated). VPC endpoints: S3 (Gateway), ECR, STS, CloudWatch Logs (Interface). Security groups: ALB SG allows 443 from 0.0.0.0/0; app SG allows 8080 from ALB SG only; data SG allows 5432 from app SG only. No direct peering between app and data tiers — enforced by SG rules.

14. What breaks if you accidentally set overlapping CIDRs when creating a new VPC?

VPC CIDR blocks cannot be changed after creation. If two VPCs you want to peer have overlapping CIDRs (e.g., both use 10.0.0.0/16), peering is impossible. Planning: use non-overlapping CIDRs from the start — a /8 block divided into /16 per environment (/16 per region per account is a common allocation). Also check that your VPC CIDR does not overlap with on-premises networks if you plan a Direct Connect or VPN connection.


All 25 questions from this topic area are covered in Hone's Day 15 lesson — including the full AWS architecture framework that connects VPC, IAM, EKS, RDS, and CloudWatch into a single mental model.

Want to go deeper?

15 weeks of structured SRE curriculum.

Hone covers every topic in this article — and 100 more — in a structured 15-week path built for engineers aiming at Staff and Principal SRE. Production scenarios, hands-on labs, and Staff-level interview Q&As in every lesson.