Sometimes your API call takes a few seconds longer than expected. Or users start reporting slow page loads. One of the most common reasons? Network latency.
What is network latency?
Network latency is the time it takes for data to travel from one point in a network to another. In practice, it’s the delay between sending a request and receiving the first byte of the response. It doesn’t include the time it takes to download the full response—just the initial wait.
In this guide, we’ll cover:
- What causes latency in distributed systems
- How to measure and monitor latency in practice
- A few ways to reduce latency and improve responsiveness
Network Latency vs. Bandwidth vs. Throughput
Network latency is often lumped in with other performance metrics, but they measure different things.
- Latency is the delay between sending a request and receiving the first byte of the response. It’s measured in milliseconds and represents how long it takes for data to make a round trip between two points.
- Bandwidth is the maximum amount of data that can be transmitted over a network in a given time. Think of it as the capacity of your connection, measured in Mbps or Gbps.
- Throughput is the actual amount of data successfully transferred over the network in real time. It’s affected by both latency and bandwidth, as well as packet loss, jitter, and other network conditions.
A simple way to understand the difference:
You're downloading a 100MB file:
- Latency is the delay before the download even begins.
- Bandwidth controls how fast the file transfers once it starts.
- Throughput reflects what you get, factoring in real-world inefficiencies.
You can have high bandwidth and still experience slow performance if latency is high or if there’s network congestion reducing throughput.
Types of Network Latency Developers Run Into
Not all latency is the same. Different layers in the network stack introduce their delays, and understanding each one helps you pinpoint where things slow down.
Propagation Latency
This is about distance. It’s the time it takes for a signal to physically travel from source to destination.
- Light in fiber travels at ~200,000 km/s.
- So a round trip from New York to London takes at least 56ms, even under ideal conditions.
You can’t eliminate this kind of latency — it’s limited by the speed of light. But you can reduce its impact by:
- Hosting services closer to users
- Using CDNs or edge locations
Transmission Latency
This is the time required to push bits onto the wire. It depends on:
- Packet size
- Link speed
For example:
- A 1500-byte packet on a 10 Mbps link = ~1.2ms
- Same packet on a 1 Gbps link = ~0.012ms
Switching between mediums (fiber, copper, wireless) can also add small delays, which stack up in longer paths.
Processing Latency
Each device along the route — routers, switches, servers — has to inspect and act on the data.
- Routing table lookups
- Firewall rules
- App-level logic
The more hops, the more processing delay. Especially true if devices are underpowered or misconfigured.
Queuing Latency
If a router or switch gets overwhelmed, packets wait in line.
This is one of the biggest contributors to sudden latency spikes during:
- Peak traffic
- Congested links
- DDoS or large data transfers
Storage Delays
Some network devices temporarily store packets before forwarding them — switches, bridges, load balancers, etc. These delays are usually small, but over multiple hops, they can add up.
What Causes High Network Latency?
A fast app can feel slow when latency creeps in. Here are some of the most common reasons why:
- Geographic Distance
The farther the data has to travel, the longer it takes. A request from San Francisco to Tokyo will always have more latency than one to a server in the same region. - Network Congestion
Too much traffic on a shared path leads to delays. Just like cars backing up on a highway, packets start queuing when the route gets overloaded. - Router Hops
Every router or switch along the way adds a bit of processing time. The more hops in your network path, the more latency adds up. - DNS Resolution
Before a connection even starts, DNS has to map a domain name to an IP address. Slow DNS lookups can delay every new request. - Protocol Overhead
Protocols like TCP add extra steps, for example, the three-way handshake. It’s reliable, but slower than UDP, which skips those setup steps. - Server Response Time
Not all latency is network-related. If your server takes too long to respond — due to slow database queries, heavy computation, or blocking I/O — it shows up as latency on the client side.
How to Measure Network Latency
You can’t optimize what you can’t see. Measuring network latency helps you understand where delays are happening, whether in the network path, at the edge, or deep inside your app stack.
Here are a few reliable ways to measure it:
1. Ping
One of the simplest tools, ping
sends ICMP echo requests to a target host and measures the round-trip time (RTT):
ping google.com
What it tells you:
- The average time (in milliseconds) it takes for a packet to travel to the destination and back.
- Packet loss (if any)
Keep in mind:
- Ping uses ICMP, which some servers deprioritize or block entirely.
- It doesn’t reflect the behavior of higher-level protocols like HTTP or gRPC.
Still, it’s a quick way to check basic connectivity and baseline latency.
2. Traceroute
While ping
gives you an overall RTT, traceroute
shows each step (hop) a packet takes to reach its destination:
traceroute google.com # macOS/Linux
tracert google.com # Windows
What it tells you:
- The number of hops between you and the target
- The latency to each hop along the route
Why it matters:
- If latency spikes after a certain hop, the issue is likely at that point or beyond.
- Helps identify whether the problem is within your network, your ISP, or further upstream.
3. MTR (My Traceroute)
MTR is like ping
and traceroute
combined with continuous measurement and live updates:
mtr google.com # macOS/Linux (install may be required)
winmtr # Windows GUI version
What makes it useful:
- Real-time view of packet loss and latency per hop
- Aggregates data over time, so you can catch intermittent issues
- Easier to spot patterns during network instability
If you’re diagnosing flaky network behavior or performance drops that only happen under load, MTR is a great tool.
Key Latency Metrics to Track
When things slow down, the hard part isn’t knowing that something’s wrong, it’s figuring out where. Is it the network? The server? The app itself?
That’s why it helps to break latency into specific, measurable chunks. Each metric tells a different part of the story.
Time to First Byte (TTFB)
TTFB measures the time from when a request is made to when the first byte of the response arrives. That includes:
- DNS resolution
- TCP and TLS setup
- Server processing
- Network transit time (up to the first byte)
It’s a good general indicator of backend and network health. A high TTFB might mean your server is overloaded, your database queries are slow, or there’s simply too much distance between the client and server.
TTFB is especially useful for web apps and APIs because it's easy to measure across different browsers and monitoring tools.
Round Trip Time (RTT)
RTT is the total time it takes for a request to go from the client to the server and back. Tools like ping
or mtr
give you a quick view of this.
Keep in mind: RTT includes both network latency and, depending on what you’re testing, a bit of server processing time. It’s more of a raw network measurement, while TTFB gives you a more application-aware view.
Stick to One Metric
If you’re tracking performance improvements, don’t mix metrics. If you start with TTFB as your baseline, keep using TTFB throughout your testing. Jumping between RTT, TTFB, and other numbers can get confusing fast, and it’s easy to misinterpret the results.
What is Application-Level Latency?
For web applications and HTTP APIs, it's useful to break latency into smaller steps. Modern browsers and monitoring tools make this easier than ever.
Here’s what you should be looking at:
- DNS Lookup Time
The time it takes to resolve a domain name (likeapi.example.com
) to an IP address. If you’re relying heavily on third-party APIs or CDNs, this can add unexpected delays. - TCP Connection Time
How long does it take to establish the TCP connection between the client and server? If this is slow, it could be due to network congestion or problems at the edge. - TLS Handshake Time
For HTTPS connections, this includes the time needed to establish a secure session. Usually small, but on slower devices or poor networks, it can become noticeable. - TTFB (again)
Yep, it shows up here too, because it includes the time spent on the server generating a response. If your backend is under load or running inefficient queries, this will spike. - Content Download Time
This is the time it takes to download the full response once it starts arriving. For large payloads or slow connections, this can add up quickly.
You can view all these breakdowns in your browser’s Network tab — open DevTools, make a request, and you’ll see exactly where time is being spent. For API calls, tools like Postman or curl with -w
flags can give you similar timing data.
How to Optimize Network Latency
Once you’ve measured where latency is creeping in, it’s time to do something about it. Some optimizations are low-hanging fruit; others take a bit more planning. Here are a few approaches that consistently make a difference:
Use a CDN
Content Delivery Networks (CDNs) help by bringing your static assets physically closer to your users.
Instead of every request traveling across continents to hit your origin server, a CDN serves cached content from an edge node nearby, which could mean shaving off hundreds of milliseconds.
Common use cases:
- Static files (images, CSS, JavaScript)
- Public API responses (if cacheable)
- Video or large media delivery
Most CDN providers also compress files and optimize delivery under the hood, giving you even more speed for free.
Optimize Database Performance
Even if your network is fast, a slow query can drag the whole experience down.
Some proven fixes:
- Add indexes for frequently queried fields
- Rewrite inefficient joins or nested queries
- Use connection pooling to reduce database handshake overhead
- For global apps, consider read replicas in multiple regions so reads don’t have to cross oceans
Database latency often hides behind “network latency” symptoms — a request might take 500ms, but only 20ms of that is network travel. The rest is your DB working too hard.
Cache More Aggressively
Caching is one of the most effective (and underused) ways to reduce latency, and it works at multiple levels:
- Browser caching: Let clients store static content locally, so they don’t hit your server every time.
- Application-level caching: Cache the results of expensive computations or DB queries. Even a 60-second TTL can take pressure off.
- Reverse proxy caching: Use tools like Nginx or Varnish to serve repeat responses directly, skipping the app altogether.
Small changes here can have a huge impact, especially on read-heavy or high-traffic services.
Upgrade Your Connections
Modern protocols are built for speed. If you’re still on HTTP/1.1 by default, it might be time for an upgrade.
- HTTP/2 allows multiplexing — multiple requests over a single connection
- HTTP/3 uses UDP under the hood (via QUIC), which can avoid TCP’s handshake overhead on lossy networks
- Connection keep-alive avoids creating a new TCP connection for every request
- Connection pooling in your backend code lets you reuse established connections when talking to databases, APIs, or other services
Each of these changes chips away at those milliseconds of handshake and setup time.
Go Global
If your users are global, your servers should be too. Having everything sit in one region, even a fast one, means long-distance trips for users elsewhere.
Setting up multi-region deployments or using edge computing for critical services can cut down latency dramatically. Even replicating just your read-heavy services to a closer location helps.
Troubleshooting High Latency Issues
Latency can creep in from multiple layers — your app, the network, the client’s location, or even something as basic as DNS resolution. When things start feeling slow, resist the urge to guess. Instead, take a methodical path to identify what’s going wrong.
Here’s how to approach it:
1. Start with the Basics
Before diving into logs or dashboards, fire up your terminal.
Then try traceroute
(or tracert
on Windows):
traceroute google.com
This shows every hop your packet takes. If one hop suddenly jumps from 20ms to 300ms, that’s probably where the slowdown is happening.
Run a simple ping
:
ping google.com
This checks if the server is reachable and how long the round-trip is taking. If the latency here is high, that’s a red flag right away.
These tools give you a rough map of the journey your request is taking — and where it’s stalling.
2. Check Server Health
Next, make sure the server you're connecting to isn’t struggling on its own.
- Look at CPU and memory usage: High usage can cause slow response times.
- Check disk I/O: A busy or failing disk can stall processes and inflate response latency.
- Review active connections and process queues: Are requests backing up?
A server that’s under pressure won’t respond quickly, and from the client side, that looks exactly like a “network” issue.
3. Monitor Your App Internals
Sometimes the network gets blamed for what’s an app-level bottleneck. Dig into your telemetry data to find:
- Slow DB queries: Long-running SQL or missing indexes can add hundreds of milliseconds.
- Third-party API calls: External services can cause delays you don’t control.
- Heavy processing: Large payloads, data transformations, or blocking code can all add latency.
Observability tools like Last9a (or even simple logs and timers) can help you pinpoint what part of your request pipeline is slow.
4. Watch Traffic Patterns
Is latency only high at specific times of day? Does it spike during product launches or traffic surges?
- Look for a correlation between latency and traffic volume
- Check autoscaling policies: Sometimes instances don’t scale fast enough
- Monitor rate limits on APIs or databases
A flood of traffic can overwhelm services, slow things down, and even cause packet queues to build up in routers or load balancers.
5. Test from Multiple Locations
Just because things are slow for you doesn't mean they’re slow for everyone. Try:
- Pingdom or WebPageTest to test from different cities
- curl with
--resolve
to manually simulate DNS resolution - Checking Cloudflare or CDN analytics if you’re using one
If latency is only high from one region (e.g., Asia to US-East), it’s likely geographic distance, not a broken app.
Common Latency Scenarios (and What to Do)
Here’s a breakdown of issues you can commonly run into:
DNS Latency
- What it looks like: Slow initial page loads, then everything feels faster after that
- Why it happens: Slow DNS resolvers or long TTLs
- What helps: Use faster DNS providers like Cloudflare (
1.1.1.1
) or Google (8.8.8.8
). Cache DNS where possible.
Geographic Latency
- What it looks like: Consistently slow access for users in one region
- Why it happens: Your servers are too far from those users
- What helps: Deploy regional servers or use a CDN to serve static content closer to users
Network Congestion
- What it looks like: Latency spikes randomly, especially during busy hours
- Why it happens: Shared infrastructure (like your ISP or cloud provider) is overloaded
- What helps: Use CDNs, optimize bandwidth usage, or talk to your provider about traffic shaping
Server Response Delays
- What it looks like: High Time to First Byte (TTFB), no matter where users are connecting from
- Why it happens: Slow backend code, inefficient queries, or CPU-bound workloads
- What helps: Profile your code, optimize queries, offload background tasks, and scale out under heavy load
How to Monitor Network Latency in Production
Once your app is in production, latency isn’t just some number on a dashboard. It shows up as a spinning loader, a stalled checkout, or a user bouncing before your page even loads. By the time someone complains, the damage is already done. That's why keeping an eye on latency, all the time, matters.
Set alerts, but make them meaningful
It’s tempting to set an alert every time latency spikes. But that just leads to alert fatigue. Instead:
- Pick thresholds that impact your users (e.g., >300ms TTFB for critical endpoints).
- Separate alerts for internal services vs. public APIs — they may have different tolerances.
Percentiles > Averages
Averages lie. A single fast request can make everything look healthy when a big chunk of users are seeing 2-second delays.
- Focus on the 95th or 99th percentile latency — that shows how the slower requests behave.
- This helps catch tail latencies that hit users on flaky networks, mobile devices, or overloaded servers.
Monitor from the user’s side
You’ll get very different numbers measuring latency from inside your data center vs. from a user’s browser or phone.
- Real User Monitoring (RUM) captures actual performance as users experience it, across geos and devices.
- Synthetic Monitoring (like running scheduled checks from test nodes) gives a controlled, predictable baseline.
Use both if you can. Synthetic helps you track infrastructure health. RUM tells you if that health translates to fast pages.
Track against latency budgets
If you’ve broken your app into services, give each one a latency budget, how much time it’s allowed to take per request. For example:
- Frontend → 100ms max for fetching metadata
- Auth service → 50ms max to respond
- DB read → no more than 200ms, even under load
When a service consistently blows through its budget, treat it like a warning sign. You may need to optimize it or re-architect the dependency tree to reduce its impact.
What Counts as “Good” or “Bad” Latency?
Latency isn’t one-size-fits-all. What’s acceptable depends on what your app does and what your users expect. Here's how it typically breaks down:
Web Apps
- <100ms: Feels instant. Ideal for clicks, taps, or fast UI interactions.
- 100–300ms: Still fine for most things — users likely won’t notice.
- >500ms: Starts to feel laggy, especially for things like form submissions or search.
- For anything interactive, like search-as-you-type or payment flows, aim for under 200ms end-to-end.
Real-Time Apps (video, gaming, trading)
- These are in a different league.
- <50ms: Smooth, real-time experience.
- 150ms+: Users will feel it — video gets choppy, controls feel delayed, and trades might fail.
APIs and Backend Services
- Internal microservice calls: Keep it tight — 10–50ms is a solid range for things that talk over a local network.
- External APIs: These can afford a little more slack, especially if they’re outside your control. 200–500ms is generally acceptable, but treat anything above that as technical debt.
Mobile Apps
- Mobile users are stuck with 4G, spotty Wi-Fi, and shared networks, so latency expectations are different.
- Design with 200–800ms in mind, and use caching aggressively to hide delays wherever you can.
Also worth noting: latency isn’t the same everywhere. A user in Tokyo and another in São Paulo might have wildly different experiences, even if they’re using the same app. That’s why latency optimization, especially for global products, can’t just be a one-region job.
Network Latency Best Practices for Developers
Latency isn’t just a backend concern; it leaks into every layer of your stack.
Here’s what matters when you’re building systems that operate over a network.
1. Design for latency, not just for correctness
It’s one thing to make your API call “work.” It’s another to make it fast across flaky networks, mobile connections, or when your user is halfway across the globe. Assume some parts of the network will be slow or unreliable, and design accordingly.
2. Minimize round trips
The fewer network calls you make, the faster things feel. This doesn’t mean batching everything into a bloated API response — it means being deliberate:
- Combine dependent calls where you can
- Avoid “chatty” APIs that require multiple sequential requests
- If your frontend needs five things to render a view, think about an aggregate endpoint or edge cache
3. Set timeouts. Always.
Letting a request hang indefinitely is a rookie mistake. Timeouts are your safety net.
But also: don’t just slap a retry on every failure. Be specific:
- Timeouts are based on the expected response time for that service
- Retries with exponential backoff and jitter
- Fail fast if the downstream service is known to be slow or unhealthy
4. Test under bad network conditions
Latency bugs hide under good Wi-Fi. Use throttling tools to simulate:
- High latency (like mobile networks or cross-continent requests)
- Low bandwidth
- Packet loss
It’s the only way to know how your app behaves when things aren’t ideal, which is often.
5. Cache with a purpose
Caching can mask latency or amplify bugs. Use it well:
- Client-side caching for static assets and data that doesn’t change often
- Reverse proxies (Nginx, Varnish) to reduce pressure on origin servers
- Edge caching via CDNs for globally distributed apps
But make cache invalidation part of your plan, not an afterthought.
Monitoring Latency with Last9
Once you’ve got a handle on where latency comes from, the next step is knowing when it’s hurting your app — and catching that before users feel it.
Last9 helps you do just that. It works well with Prometheus and OpenTelemetry, so you don’t need to rethink your stack. You get a clean view of latency across services, endpoints, and geographies — even when you’re dealing with messy, high-cardinality data.
Teams like Probo, CleverTap, and Replit rely on Last9 for observability that doesn’t become a cost or complexity problem. Get started with us today!
FAQs
What is network latency in simple terms?
Network latency is the delay between sending a request and starting to receive a response. It's like the pause between asking someone a question and hearing them start to answer, not including how long their full answer takes.
What's considered good network latency?
For web applications, under 100ms is excellent, 100- 300ms is acceptable, and over 500ms starts feeling slow to users. Real-time applications like gaming need under 50ms for optimal experience.
How do I check my network latency?
Use the ping command: ping google.com
shows your round-trip latency to Google's servers. Browser developer tools also show detailed timing breakdowns for web requests.
Can high latency be fixed?
Yes, through various strategies: using CDNs, optimizing your code, caching responses, placing servers closer to users, and reducing the number of network requests your application makes.
What's the difference between latency and speed?
Latency is delay (how long you wait), while speed (bandwidth) is capacity (how much data can flow). You can have high-speed internet but still experience high latency, especially on satellite connections.
Why is my latency higher at certain times?
Network congestion during peak usage hours can increase latency. More traffic competing for the same network resources creates delays, similar to rush hour traffic jams.
How does network latency affect mobile users differently?
Mobile networks typically have higher latency than wired connections. 3G networks can add 200-800ms of latency, while 4G/5G networks usually provide 20-100ms. Mobile applications need to account for these higher latencies through caching and optimized API design.
What's the difference between TTFB and RTT?
Time to First Byte (TTFB) measures server response time – how long until the first byte reaches your browser. Round Trip Time (RTT) measures the complete network journey, including the time for your request to reach the server and return. TTFB includes server processing time, while RTT focuses purely on network travel time.