Tuesday, September 28, 2010

Slow Start

I've always wondered how a system with an exponential rate of packet transmission could be called "slow start". Then I saw Figure 3 ("Startup behavior of TCP without Slow-start") from Congestion Avoidance and Control. This congestion-control-free TCP sends as many packets as possible, resulting in huge amounts of packet loss. The graph shows a sequence of sharp saw teeth. In the trace, one sequence packets was resent four times!

With TCP Tahoe, on the other hand, the trace showed a smooth increase and a steady rate of delivery with few retransmissions. Overall, the TCP with congestion control transmitted data at more than twice the rate because it didn't wastefully retransmit packets. Compared to a near-vertical line with sharp slope, an exponential plot really does look slow. I am now at peace with the name "slow start".

Congestion Control in User Space

I ran into iTCP, which stands for "interactive TCP". The idea is that applications can interact with the TCP system. A better name might be "layer violating TCP", but I can't blame the authors for picking "iTCP" instead (it's much catchier). On the one hand, this model seems to add unnecessary complication and opportunities for abuse. But on the other hand, there is a perverse sensibility to this approach in the spirit of end-to-end architectures. For example, iTCP makes it easy for a video streaming application to recognize that packets are being dropped and to lower the video quality as a result.

Unfortunately, iTCP is inherently platform-dependent, and it doesn't seem likely that every OS would incorporate this feature. Rather than building this into the TCP implementation of an OS (which would probably slow down network processing for all applications and complicate the APIs), I think it would make more sense to achieve this functionality in userspace for those few applications that need it. A userspace "TCP" library could send and receive packets over UDP and provide all of the hooks and features in iTCP. This implementation would be platform-independent, and although it might be a bit slower than the in-kernel implementation, it would be much more flexible, and it wouldn't affect other applications.

Friday, September 24, 2010

In the Clouds: Networks and Society

We are studying computer networking. One thing that is interesting about it is the relationship between computer networking and society. I would say a computer network is highly constrained and simplified compared to a network of people. Yet the similarity may be strong enough that much can be learned. Particularly, we might be able to learn more about ourselves.

Since we are in the business of designing computers, we get insights into why we organize things certain ways. Since we are not in the business of designing people, we are not in the same position to get those types of insights. Following this argument, we may learn more about people through computer science, or we may learn more about society through the study of computer networking.

One of the things that inspired this thought was Andrew's description in class of roles and responsibilities in relation to coordination in distributed systems. He used terminology traditionally applied to society, which is now applied to computer science.

People who study society are typically not computer scientists. Can they learn enough about computer science to make connections to their discipline?

Within-Flow Measurement

The paper "TCP Revisited: A Fresh Look at TCP in the wild", describes an approach to internet measurement which does scale to very large numbers of flows. This was my complaint about the first measurement paper we studied together, that it doesn't scale because it requires end-to-end measurements. This paper recognizes the need to make within-flow measurements and describes new algorithms for doing so, using statistical techniques. A few end-to-end measurements were made to validate the new algorithms.

Some might argue that end-to-end measurements are needed in order to get accurate results. While this is probably true, it may not be so true after looking at the big picture.

For example, let's suppose I can get very accurate measurements using end-to-end. Yet, because the measurements are end-to-end, the number of measurements is necessarily limited. If the number of measurements are limited, then we have less data with which to make inferences. It may be better to have a large number of less accurate measurements, than to have a small number of highly accurate measurements.

Thursday, September 23, 2010

Coordination in Distributed Systems

I have recently read several papers on services that provide coordination for distributed systems. This turns out to be a fascinating area, and as I read each paper, I found myself getting sucked into more. I group these papers into two different categories: one type involves services that provide mechanisms for distributed coordination (including ZooKeeper, Chubby, and Sinfonia); the other involves protocols that guarantee certain properties despite various types of failures (such as Paxos, PBFT, Aardvark, and Zab).

The high-level papers describe specific services for distributed coordination. A group of servers (five seems to be a popular number) communicate with each other and agree on state. Clients communicate with one or more of these servers to read or modify this global state. The main property is that if one or two of the five servers fail, the others can keep the service running, even if the "leader" fails. Various guarantees about consistency may be provided--not only is consistency difficult to achieve in the event of different types of failures, but being able to tolerate failures usually requires sacrificing performance. I was very impressed by the work that has been done in this area.

The low-level papers were also fun. Protocols are designed to withstand Byzantine faults, which seem to encompass just about anything that can go wrong in a distributed system, including crashed servers, lost or repeated messages, corrupted data, and inconsistencies. The Practical Byzantine Fault Tolerance (PBFT) algorithm, introduced in 1999, seems to have launched a whole range of fascinating research. It reminds me of security in that cynical thinking is critical.

Friday, September 17, 2010

Lost in a Maze of DHTs

I enjoyed reading about Chord, and I'm impressed by the ideas behind Distributed Hash Tables. This area of research seems to be less than 10 years old, but as far as I can tell, there are dozens of different DHT designs and systems. In trying to make sense of all of this, I came across a recent 24-page survey that covers both design and applications, named appropriately Distributed Hash Tables: Design and Applications.

It seems like a great intro, but I feel like I'm missing something. As far as I can tell, CAN, Chord, Pastry, and Tapestry were all introduced about the same time in 2001, and Kademlia came out a year later. I still haven't read enough to know whether one is much better than the others. If one had been introduced a year earlier than the others, would there still be as many, or would the others just built on the work of the predecessors?

Great, Except That it Doesn't Scale

The paper, "End-to-End Internet Packet Dynamics", seems to make a great contribution to the study of networks. It applies a new method for analysing packet dynamics, which proves very effective. The new method is to install a service on selected 'ends' of the network and then to pass TCP traffic between each pair of 'ends'. Obviously much can be discovered using this approach. So, is there an even better way to study packet dynamics? In particular, is there a method which scales better than this one, which is quadratic in the number of 'ends'? Because this measurement method scales poorly, it necessarily limits the amount of measuring which can be done. Is it possible to make similar measurements in such a way as to support large-scale, real-time measurements? For example, How effectively can a single router be used to deduce or infer the dynamics of packets passing through it?

Since I study Natural Language Processing (NLP), I have been looking for connections between NLP and the internet. The questions asked above suggest a possible connection. In NLP, we are typically trying to infer things by observing only the traffic that flows between two or more people. The traffic I refer to is language, speech or text for example. We typically do not have direct access to the thought processes and intents of the people who either send or receive the traffic. These people are like the 'ends' of the network. In NLP, rather than making end-to-end measurements, as done in this paper, we do our measuring from the middle.