Navigating Cloud Network Complexity
Jan 02,2024
Cloud networking is different from traditional networking and comes with its own complexity, resulting in challenges around how network fundamentals are applied to cloud environments, understanding cloud-native network services, or avoiding networking issues known from traditional network environments (such as accidentally exposing unauthenticated services).
Before we dive into some more examples we want to emphasize that a strong understanding of networking concepts and principles is still required to design effective cloud environments – it never hurts to get the basics right.
A common problem from on-premise networking is still relevant in cloud environments: Determining whether a certain network path is available between two given assets. While AWS offers a built-in reachability analyzer, it only determines whether there is a direct path between two assets, e.g. an internet gateway and a certain instance:
However, it does not perform a full analysis of all potential network paths, e.g. it will not show you a load balancer forwarding packets from the Internet to a target system – which may be relevant if your leading question (maybe even from a security perspective) was whether a system is internet-exposed in any way:
Performing this analysis manually requires you to review a variety of networking functions and their relationships to each other, having to juggle as many IDs or, at least, if your cloud maturity allows that, consistent names. If we add modern overlay networking or service mesh solutions like Calico or Istio to mix, a platform-based analysis becomes even harder and we need to juggle even more dashboards for manual analysis.
Basic network elements for typical connectivity comprise VPCs, subnets, routing tables, NAT/internet gateways, network ACLs, and security groups. All of those are managed as part of the VPC service, however, to comprehensively determine whether a system might be exposed to a certain source, you also have to look at advanced network elements such as transit gateways, firewalls, peering, endpoints, load balancers, API gateways or even application-level constructs like data streams or message queues.
Modeling all of those elements and performing path analysis is a (directed) graph problem that grows in complexity with the size of your cloud environment. There are a few great open tools (cloudmapper and cartography) available that may support your effort by visualizing all existing infrastructure in your account. You will need to also differentiate between the graph of cloud elements (which would show a security group associated with an instance) and a network graph (which may be used to determine whether two instances can reach each other based on their associated security groups), like illustrated below:
With all the awareness for network complexity you gained from this post we can summarize relevant areas for your cloud network (security) management:
- Ensure that your teams have both network fundamental understanding and cloud networking skills and that you support their continuous learning journey around it!
- Understand graphs and why they are important for cloud environments – or even any reasonably complex environment.
- Evaluate whether your existing tool stacks cover all of your cloud assets and whether they can sufficiently support your analysis needs.