Tuesday, November 16, 2010

Tracking Down Real-life Problems with Wireshark

This summer, our only Internet access was using Google WiFi. If you aren't familiar with it, Google WiFi is a network of wireless routers on streetlight poles scattered throughout the city. It's a great idea, and it has the opportunity to be a great service, but we had terrible experiences. Occasionally we could successfully browse the web for about 15 minutes, but more often, our average load time for a page was about 30 seconds. It was not uncommon for connections to timeout repeatedly, and there were times when I spent more than 60 minutes trying to use a single website. All in all, it was a pretty awful system.

A few times, I tried to figure out why Google WiFi was so bad. Unfortunately, as the user of a complex system like this, there are too many possible sources of problems, and there isn't enough information available. However, I had some success in tracking down one particular problem.

We noticed that the connection was particularly awful when Google WiFi required us to reauthenticate by redirecting us to a login screen. It was occasionally difficult to realize that this was happening because the system attempted to transparently redirect to the login page, check our cookie, and then transparently redirect back to the page we requested. I think this was only supposed to happen once a day, but there are times when I noticed it 3 or 4 times within an hour. And although this process was supposed to be instantaneous, it usually took about 10 minutes to authenticate because pages would timeout. I decided to use Wireshark to try to figure out what was happening, and I found a horrible configuration error on Google's part. I noticed that when my browser was redirected to Google's login page that it would create a bunch of DNS requests for hosts like "ocsp.thawte.com". The browser would then connect to these hosts and receive an HTTP response redirecting to Google's authentication page. Looking up "OCSP", I learned that this was an SSL certificate revocation protocol, and I realized that my browser was trying to verify Google's certificate, but that these OCSP requests were getting intercepted and redirected by Google's firewall because we weren't authenticated yet. But the browser couldn't authenticate because the OCSP requests were getting redirected. This dance could continue for a long time. Anyway, I reported this problem on Google's WiFi forum, and who knows if they ever dealt with it. As a user, I got the feeling that free WiFi was probably the lowest priority project in the company.

Wireshark was able to help me track down one problem, but I'm not aware of any great tools for diagnosing other instances of dropped traffic. I couldn't tell whether packets were getting dropped in the air between our computer and the closest router, or during wireless transmission to further upstream routers, or at some higher protocol level (as with the OCSP problem). In the end, I came to the conclusion that we would have been better off without Google WiFi, as it wasted many hours of our life; we could have avoided this in advance if we had known how bad it would be. Unfortunately, if even Google can't make wireless mesh networks work, I have my doubts that the technology is ready yet. For all of the exciting promises of wireless, I can't say that it "just works" like wired networks usually seem to.

No comments:

Post a Comment