How to track packet loss
How Does it Work?
Every IP packet can specify how many hops it can go through before it is no longer forwarded on. When a packet is no longer forwarded on, that router just forgets all about it, but it also will usually send out a message to the source host saying, "Hey, sorry, but your packet died here." So, traceroute cleverly manipulates these values so that the first round of packets it sends out to the designated host are specified such that they can only go through one hop before dying. So that first hop gets those packets, sees that it's not supposed to forward them on any further and doesn't, and then sends a message back to the source host telling it that the packets died. When traceroute receives the "your packets died here" message from the router, it knows that's the first hop. It then sends on the second round of packets specifying that they can only go through two hops, and the cycle continues. It finishes when it gets a response from the final destination. For each hop, traceroute then displays the RTT, Round Trip Time, or the time difference between when the probe was sent from traceroute and the time the response arrived for each packet.
Let's take a look at an example traceroute:
From this traceroute, you can see that it took 17 hops to go from onyx.training.verio.net to www.neo.com and that the round trip time was roughly 72-98 ms (based on the 3 numbers on the last line). Keep in mind that the RTT's reported are the round trip times from the source host to that router hop. It's not a cumulative sum of the previous times. Each hop is going to add some time to the path, so you'd expect each hop to take a little bit more time to get to than the last. Looking at this example, you can see that this is pretty much the case here, except for slight fluctuations on the orders of milliseconds due to network traffic.
Now an important thing to know when using traceroute is what the asterisks/stars mean. If you see traceroute print out a star instead of a round trip time, that means that either your probe packet got dropped, or the reply back to you for that probe got lost along the way. This is usually referred to as "packet loss," and we will discuss this later.
Understanding the Route
To understand how to interpret a route, you will need to know a little bit about interpretting reverse DNS. When ever a traceroute is done, the program will look up the reverse DNS of each host as it goes and print that information as part of each line. This can help to give you clues as to each network that a packet goes through when it travels from you to the final destination. Let's go through an example and show how to interpret it.
In this example, we have tracerouted to www.idsoftware.com from a host within Verio's network. We can now analyze each hop along the way.
- border1 is the router for the this network.
- host-c-129.vcso.verio.net is the router that provides connectivity for the entire 8700 location.
- int1-s1-0.dlls.tx.verio.net is Serial 1/0 on the router dlls.tx.verio.net, the other side of the T3 between the 8700 location and the Infomart.
- ge-5-0-0.a10.dllstx01.us.ra.verio.net is the Gigabit Ethernet Interface 5/0/0 on a10.dllstx.01.us.ra.verio.net.
- ge-6-0.r00.dllstx01.us.ra.verio.net is another Gigabit Ethernet Interface on a different router within the same location, the Infomart. This router handles our peering for Verio in Dallas.
- ATM3-0.BRI.DFW9.ALTER.NET is Alter.net's router that peers directly with Verio in Dallas.
- 140.at-6-0-0.XR2.DFW9.ALTER.NET is an ATM connection on another router with the Dallas/Fort Worth area.
- 284.ATM7-0.XR2.DFW4.ALTER.NET is another ATM connection in the Dallas/Fort Worth area.
- 194.ATM9-0-0.GW1.DFW1.ALTER.NET is yet another ATM connection on a different router within Alter.net's network in Dallas/Fort Worth.
- savvis-dfw-gw.customer.ALTER.NET is the serial IP savvis.net's router that connects to Alter.net's network.
- idsoft-1.CR-1.usdlls.savvis.net is Id Software's serial IP for their router that connects to savvis.net's network. It appears that ID Software is a customer of savvis.net.
- charon.idsoftware.com is the actual name of the machine where www.idsoftware.com's website is hosted.
From this traceroute, we can tell that www.idsoftware.com is hosted by ID software themselves in the Dallas/Fort Worth metroplex. We also know that ID Software is a customer of savvis.net, who is in turn a customer of Alter.net.
Let's look at another traceroute:
This traceroute follows much the same path as the last one up to hop 5. Hop 6 is the Verio router in Orem, Utah that Iserver uses. The router on hop 7, pvu1.vwhpvu1.verio.net is the border router (the router that connects Iserver to the Verio backbone) for Iserver. Finally, we can see that www.7up.com is hosted on an Iserver platform.
We will look at one more traceroute that shows another example of what you might see.
This last traceroute is to www.hellers.com. This website is hosted on the Verio network, however, it is hosted by a customer of Verio's, so is not Verio's responsibility, other than maintaining connectivity.
Caveats and Quirks
Before we continue on, there are a couple little caveats to using traceroute that you should be aware of, so you don't accidently misinterpret the results.
The first caveat to be aware of is that sometimes it will look like the last hop on a traceroute dropped a packet, when it really didn't. This is due to both the fact that this host is the actual final destination of your traceroute probes, and how certain Operating Systems handle ICMP. (ICMP, Internet Control Message Protocol, is one protocol that machines on the Internet use to send messages to each other, and the "Your packet died here" message that traceroute relies on is an ICMP message.) Since the last hop is your destination, instead of that host sending you back an ICMP message saying "Sorry your packet died here," that host will send back a different ICMP message saying "Hi, your packet made it here, but this port is unreachable." This is because traceroute purposefully sets the probe packet's destination to be some large port number that will most likely be unreachable at the destination host because it wants to receive that "port unreachable" message back. The caveat here has to do with the fact that some OS's, such as IOS (which Cisco routers run) and Sun Solaris. purposefully drop ICMP responses like "port unreachable" if it gets too many of them in a short period of time. They do this presumably as a security precaution. So, if you were to add in more delay between probes, you wouldn't see this erroneous packet loss.
Another caveat of traceroute is that ICMP, which is the protocol traceroute relies on to get responses from each hop, is usually the lowest priority protocol. So if one router is really busy it might decide to drop ICMP messages, and you will see lots of packet loss, but that router might be forwarding on more common, higher priority traffic just fine.
Also, some sites will filter ICMP for various reasons, so it might appear in a traceroute that a site might be unreachable, but in fact it is reachable.
Tracking Down Network Problems
So now that you have a basic understanding of traceroute, it's time to learn how to use traceroute to track down network problems. The first kind of network problem that traceroute can help you debug would be a loss or lack of connectivity to a site. If you appear to be having problems reaching a remote site, like a web site, do a traceroute to that site. If the traceroute reaches that site fine, then chances are that you have connectivity to the host, but that the web server on that host crashed. But, if the packets start to die somewhere along the path, it's likely that some router along the way, or the host itself is down. Here is an example traceroute:
Just remember that such a traceroute can also be an example of a firewall that is filtering packets, or a router that throws away the kinds of packets that traceroute
depends on when it gets overloaded.
Debugging Network Slowdowns
Using traceroute's results to see what hops IP packets take from you to a remote host is really straight forward. However, using traceroute's results to debug where "slowness" occurs in a link is fairly tricky for a number of different reasons. The first of which is the fact that traceroute only shows you the hops from you to a remote host, not the hops from the remote host to you. So, the best way to determine where network slowness is occurring is to do a traceroute from host A to host B, and then another traceroute from host B back to host A. By looking at both, a trained eye can usually get a pretty good idea where the network slowness is occurring. This is due to the fact that pretty much every Tier1 ISP on the Internet uses closest-exit routing which often results in asymmetric routes (completely different routes from host A to B than from host B to A).
For instance, host A might be on the west coast using ISP X, and host B might be on the east coast using ISP Y. The path from host A to host B will then probably exit ISP X as soon as it can, most likely at some peering point on the west coast and enter ISP Y's network from there onto host B. Conversely, the path from host B to host A will most likely exit ISP Y's network as soon as it can on the east coast, and enter ISP X's network and continue on to host A.
Here's an example:
Note the vastly different paths that these two traceroutes take from host A to host B and from host B to host A, each with a different number of hops. The first traceroute shows the path from MIT to geo.net goes through Sprint Nap, an exchange point in New Jersey. This makes sense, since MIT is on the east coast and BBN is using closest exit routing. The second traceroute shows that the path from geo.net in San Francisco back to MIT goes through MAE West, an exchange point in the San Francisco Bay Area, the closest exit point for geo.net.
Now, to make the issue more confusing, the second reason why tracking down network "slowness" is tricky is the fact that in networking there is no "slow" or "fast", but instead there are bandwidth and latency, which are two different concepts that can both determine how "fast" a network is. (If you are unclear on the difference between bandwidth and latency, check out a cool paper written by Stuart Cheshire called "It's the Latency, Stupid" .
Tracking Down Packet Loss
So now we know that bandwidth is how many packets you can stuff in your pipe and that latency is the delay, and that packet loss can adversely affect both. So, in general, when trying to track down network "slowness", you should be looking for packet loss. But this can get kind of tricky because packet loss is random. So, you might actually be getting packet loss at hop #2, but with the default 3 probes per hop, maybe all 3 will get back OK. Then at later hops you will start noticing the packet loss that really occurs at hop #2, but it might look like it's occurring at hop #3. So, it's usually better to do more than 3 probes per hop.
Let's try to debug a bad traceroute and see what might be causing the problem. So as to not try to make any other specific ISP look bad, some hostnames and IP addresses will be changed to protect the innocent. Let's say you're connected to GeoNet via a T1, and you have another office in Chicago that is connected via a different ISP. One day you notice some definite slowness in transferring files and/or logging into machines at the remote site and you want to see where the problem lies. So you decide to do some traceroutes. A traceroute from your GeoNet connected office shows you:
So looking at this traceroute, you can see that there is some packet loss, but it's hard to tell exactly where it starts. It could be the link between hops 6 and 7, but it's hard to know for sure. So, being an educated tracerouter, you decide to do a traceroute from Chicago back to your office in San Francisco. You get:
So now you have more to go on. First of all you see that this route is an asymettric one. The first route is 11 hops and the route back is 9 hops. Now the number of hops doesn't make any significant difference in how fast your connection is, but it can make things like packet loss and latency increases appear to be occur between two hops when it really isn't there. This is because the packet loss or increase in latency might be between two hops you don't even see because the route back to you is completely different.
So now you can make an educated guess as to where the packet loss might be occurring. Based on the first traceroute, it looked like the bad link might be between core2.SanFrancisco.other-isp.net and core1.Denver.other-isp.net, and by looking at the route back in the other direction, it appears that this assumption might be correct. At this point, your best bet it to copy and paste your traceroutes and get these sent to the appropriate NOC (Network Operations Center). With this type of information, you will now have a lot better chance of tracking down the problem than if you just sent an e-mail saying "my connection to my Chicago office is slow." It also gives you a better understanding of how traffic is exchanged on the Internet.
In summary, traceroute is a network diagnostic tool that will show you the hops your Internet traffic takes from your host to a remote location. It will also tell you how long it takes for packets to get from your host to each hop as well as if packets get lost along the way, which can be useful in tracking network problems. Since routes on the Internet are often asymmetric, it's usually a good idea to do traceroutes in both directions if possible when trying to debug network slowness. In doing so, you can provide your ISP with crucial information that can help them to fix the network problem.
Here are some exercises that you can do to practice your traceroute skills and learn to interpret the output better.
- The easiest we can do is to run a traceroute to www.google.com. Go to the Online Traceroute Tool and enter www.google.com and press enter.
How many hops did it take to get there?
Where did the traceroute originate from?
Where do you think www.google.com is located at physically?
Does the traceroute finish or does it stall out somewhere?
Does Disney host their own webserver?
What ISP provides the connectivity for this website?
How many hops did it take for you to reach www.sega.com from the Traceroute tool on the this site?
How many hops did it take to get there from MAE-East?
Does a larger amount of hops automatically indicate that there will be more delay in loading from the site?
Which link actually does take longer(milliseconds) to reach the destination?
If you have read through this article and gone through the exercises, then you should be ready to take a quiz on this to see how well you retained this knowledge.Source: networking.ringofsaturn.com