
"Running a global observability platform means one thing above all: your infrastructure must never go down. When you're responsible for monitoring thousands of customers' applications 24/7, network failures aren't just inconvenient, they're existential threats. At New Relic, hundreds of clusters run on multiple clouds, and regions. These clusters depend on a complex web of network connections: regional transit gateways, inter-regional hubs, and cross-cloud links."
"While we have resiliency to avoid single points of failures, If too many connections fail, our ability to provide real-time observability is compromised. The challenge? We needed to know instantly if connectivity fails at any layer; within availability zones, between regions, or across cloud providers. So we did what we do best: we built a solution using New Relic's own platform to monitor our entire network."
"In our large, complex environment, the challenge wasn't detecting complete network outages. Those were obvious. The challenge was answering specific diagnostic questions quickly: Can clusters in the same region communicate through the regional transit gateway? Is the hub gateway properly routing traffic between regions? Is the cross-cloud connection between clouds operational? Are we experiencing packet loss during peak traffic hours?"
New Relic operates hundreds of clusters across multiple clouds and regions that depend on regional transit gateways, inter-regional hubs, and cross-cloud links. Too many failed connections compromise real-time observability. Weather Station performs continuous validation of critical network paths, executing over 100,000 connectivity checks per hour across the multi-cloud infrastructure. Weather Station detects failures within availability zones, between regions, and across cloud providers, and identifies packet loss and routing issues. Continuous automated checks eliminate excessive manual SSH, ping, traceroute, and route-table inspection, and provide instant diagnostic answers for network connectivity and routing at scale.
Read at New Relic
Unable to calculate read time
Collection
[
|
...
]