Saturday, December 18, 2010

The End

One of the things I am taking away from this class is what I have learned about writing a good research paper. We have read alot of papers, reported on them to the class, asked and answered questions about them. Throughout the semester Dr. Zappala has pointed out aspects of good writing, which has been very helpful. He has especially commented on what makes an introduction useful.

He has taught us how to find new papers, how to break into new research areas on our own. He has taught us, through the activities of this class, how to separate a research area into parts, how to begin to comprehend the parts and their relationship to whole. I have learned a lot about networking, but perhaps more importantly I have learned a lot about writing and research.

The interactive format of the class has been very productive and fun. Having the consistent experience of presenting papers to each other was, I thought, a great use of class time. The instructor jumped in to help us a lot, but that was much better than being left out of the teaching experience. We really do learn by doing.

Think by Writing

I am the polar opposite of a blogger. So why am I blogging? Because it is required for this class. I must admit it may have been a good experience. I have used this blog as an opportunity to think about a lot of different things relating to computer networking. Writing about my thoughts has helped me to think more clearly. I know that this is a well understood principle, but it really has helped me. Blogging throughout the semester helped me to formulate the ideas leading to my research proposal, just submitted at the end of the semester.

Writing isn't my favorite activity; talking is much easier for me. All this writing is probably making writing easier, and perhaps more enjoyable. See, look at me, I'm blabbering on and on to an invisible audience. Save yourself. Please stop reading now while you can. If you are still reading one of two things might be true. Either I have become really good at blogging, or you are putting together my grade for this class.

Tuesday, December 14, 2010

Deployment, Deployment, Deployment

A major recurring theme in my study of computer networking is overcoming barriers to deployment. It seems that just about every proposal must address this issue.

By design, the Internet is something that everyone can use. It is the network of networks. The primary objective which motivated the original design of the Internet was to tie all types of different networks together into one network. Since the Internet is designed for such universal use, it is a special challenge to deploy any sort of substantive change.

Any change which might break compatibility of systems currently communicating over the Internet faces a steep uphill battle, no matter how green the grass might be on the other side. It seems that improvements do come, but require some sort of non-disruptive deployment path.

Deployment tends to be the domain of engineers, rather than researchers. However, it seems to me that deployment is central to research in this field.

Additional Header Information

Headers are used to convey certain information about a packet. There are situations where additional information might be helpful, beyond that originally provided in the header.

One example of providing additional header information is explicit congestion notification. Routers mark headers with congestion information which can then be used by senders to avoid congesting the network.

Another example is NetFence. In this proposed solution to the problem of DoS attacks, a special NetFence header is inserted between the IP and TCP headers. The purpose of this header is to facilitate communication between routers and hosts, aimed at minimizing damage from malicious hosts.

In these examples, problems related to congestion and security are addressed by adding information to packet headers as they travel through the network. This is an observation that is interesting to me. I am seeing headers as more than just a place to store static information, but rather has a means for routers and hosts to communicate with each other regarding the packets they are transporting.

Smart 'Middles'

NetFence, discussed in class, is a proposed solution to the problem of DoS attacks. It departs from previous work in that it places the 'middle' in the first line of defense against these types of attacks, rather than the 'ends'.

We have already learned that the 'middle' has access to important information that can be very difficult for the 'ends' to infer. For example, the total number of flows at a bottleneck link and the capacity of that link. This type of information is easily accessible at the bottleneck router, but can be very inaccessible to the affected senders. Such information can make it possible for senders to avoid congestion.

Security is yet another concern which demonstrates the need for smart 'middles'. This is all interesting in light of the end-to-end principle. This principle might be interpreted as stating that the 'middle' should not be replicating work which is better done at the 'ends'. However, there is significant work that the 'ends' are ill-equipped to do.

If, as discussed in NetFence, a sender and receiver collude to overwhelm a link, the 'ends' are both malicious and the target of the attack is the 'middle' itself. Certainly in such a case, initiative is needed in the 'middle'.

Thursday, December 9, 2010

Bufferbloat and TCP Performance

I ran into two articles by Jim Gettys, entitled The criminal mastermind: bufferbloat and Whose house is of glasse, must not throw stones at another. In a nutshell, the point is that operating systems, home routers, cable modems, etc. are buffering too much data, which hurts the performance of TCP because it doesn't get notification of congestion. His results show that the performance effects of poorly tuned buffer sizes are dramatic and widespread. It's an interesting read.

Security and Laziness

I read that Morris had suggested the theoretical possibility of an Initial Sequence Number attack in Bell Labs Computer Science Technical Report #117, February 25, 1985. In 1995, Kevin Mitnick carried out this attack (described in Tsutomu Shimomura's book Takedown and in many other resources online). I find it very interesting that security problems so often go unfixed until after they're exploited.

I had this experience once as a system administrator in the CS department. I had noticed that a course's submission system was insecure and that students could steal from and/or overwrite other students' submissions. I emailed the professor about this in November 2003 and again in February 2004 when the problem still had not been fixed. In March 2005, a student was caught cheating, and it turned out that the student had exploited the security problem that I had reported more than a year earlier. If the professor had taken half an hour to carry out the single-command fix that I had proposed, then this student may not have cheated.

Another recent example of this is website encryption. For years, it has been common knowledge that unencrypted HTTP sessions can be hijacked. However, almost no major websites used SSL by default. Finally, Firesheep made people realize that this was a real problem (even though it had already been a real problem for years earlier). Somehow, people feel justified in ignoring security problems if they think the exploit sounds hard or unlikely, even if it is neither.

Security for Internet Routing

So far, I haven't heard much about secure routing protocols. As I mentioned in an earlier post, BGP is vulnerable to both mistakes and attacks. I recently came across a statement on RPKI from the Internet Architecture Board (IAB). The statement seemed fairly vague technically, but they seemed to be saying that using public key cryptography to secure routing protocols should be a high priority. Of course, I don't know whether statements from the IAB are common nor whether they carry much weight.

Wikipedia has a very short article on RPKI, which only said that it is in some stage of the standardization process. The most detailed information I have found is a brief RPKI summary on the APNIC site. It would be interesting to learn more about what security systems have been proposed for routing and which of them seem to be moving forward.

Tuesday, November 30, 2010

Last-Mile Monopolies

ArsTechnica recently reported on how Comcast is indirectly charging Netflix in preferential treatment to its own IPTV service. The report has the colorful title How Comcast became a toll-collecting, nuke-wielding hydra.

In a nutshell (as far as I understand it), Comcast has a customer relationship with L3 Communications, a top-tier network. Since Comcast has so many customers, they have not traditionally had to pay L3 for their connection. In the meantime, L3 has been building a CDN business, and Netflix (the number one source of bandwidth on the Internet) is now using this CDN service. Comcast, whose customers are paying $50 per month for their Internet connections, are now charging L3 to fulfill these customers' requests to Netflix.

In other words, these people are not just paying Comcast for their Internet connection and Netflix for their movies, but they are now also paying Netflix to pay L3 to pay Comcast for their Internet connection again. Since Comcast is a last-mile monopoly, neither their customers nor service providers have any practical recourse.

Of course, customers don't currently see any of this. My suggestion is for Netflix to charge more per month for Comcast customers than for anyone else. Not only would this keep other Netflix customers from having to foot the bill, but it would also help Comcast's customers understand what is going on. Those few who have the option of switching to a different ISP can do so, and the others can become proponents of network neutrality legislation.

Monday, November 29, 2010

Research is Cheating?

I recently read a blog post entitled Google and Microsoft Cheat on Slow-Start. Should You?. The article points out that Google has an initial congestion window of 9 packets, and Microsoft's value is even larger, while most websites have a value between 2 or 4. All of this is interesting, but the article goes on to accuse Google and Microsoft of "cheating," citing a "violation" of RFC 3390. Although the tone of RFC 3390 does seem to encourage this sort of reaction, I think that this is an unfortunate attitude.

First, we should be encouraging people to develop protocols, not to keep them stagnant. Holding strictly to decade-old values (4 KB initial windows were proposed in 1998) does not necessarily do anyone any favors. RFC 3390 mentions that this particular value of 4 KB was tested and found to not cause additional congestion on a "28.8 bps dialup channel." Twelve years later, when most Americans have links that are orders of magnitude faster, shouldn't this be reconsidered?

Second, I am bothered by the use of the word "cheating," which implies that a larger initial congestion window would help the perpetrator to the detriment of all other users. Although this may be true over some specific link, in general web sites are motivated to pick a good value. If the value is unnecessarily low, than web pages load too slowly, and if it is too high, then web pages also load too slowly (due to dropped packets). If web sites are trying to pick the optimal value, should this be considered cheating?

I think we should try to foster an attitude that is positive toward experimenting with improvements to Internet Protocols as long as they retain backwards compatibility and don't risk causing catastrophic problems.

Tuesday, November 23, 2010

Minimum Loss

The paper "Routing Metrics and Protocols for Wireless Mesh Networks" explores the utility of various routing metrics. I was particularly interested in minimum loss (ML). It is interesting because of its simplicity, its performance, and its relationship to probability theory.

It is a simple metric, very much like expected transmission count (ETX). In the paper, the performance of ETX, ML, and two other metrics are compared. Performance was measured in four ways: Number of hops, loss rate, RTT, and throughput. ML consistently led to the highest number of hops, yet the lowest loss.

Throughput was measured from a starting node to each of the other mesh nodes in the network. For all metrics, there was a sharp drop off in throughput for all nodes which were more than one hop away from the starting node. It was interesting to me that this drop was much less pronounced for ML. The drop off curve for ML was much smoother. Throughput using ML for nodes two hops away from the starting node was about twice that of throughput using all other metrics.

The key difference between ETX and ML is multiplication as opposed to addition. When calculating ETX over multiple hops, total ETX is the sum of ETX for each hop. When calculating ML, total loss is the product of losses for each hop. It is interesting to me that the multiplication approach is similar to what is done in probability theory. Total probability of independent events is the product of each of the individual events.

The Central Role of Routing

It seems to me that routing plays a central role in almost every area of computer networking. It is such a central part of networking that I suppose it is fair to say that routing is the core of networking. Certainly routers are at the physical core of networks.

Transport is distinct from routing. Our study of transport centered on congestion control. Congestion happens at routers. Congestion is essentially a clogged route. The most effective congestion control schemes make use of explicit congestion feedback from routers.

Much of what we studied in the application space had to do with locating distributed content, which can also be seen as a routing problem. Perhaps this is a stretch, but the big picture is that applications are being used to find a better path for content delivery.

Our study of Internet architecture focused on naming and addressing. Names and addresses are for routing. Our study of wireless has been dominated by routing challenges specific to the wireless environment.

We started this class with the end-to-end principle. What is the middle? It is the network. What is a network? Isn't it essentially routing?

Friday, November 19, 2010

More on the Economics of Net Neutrality

I ran into a blog today by a telco analyst. His posts give a different perspective on the net neutrality debate. One of the most interesting posts I saw was A Grave of Their Own Making which uses a back-of-the-envelope calculation to estimate that Google probably makes about $1 per month per customer. His point is that it's not actually worth it to the ISPs to try to charge Google money to access their customers because even if Google paid it wouldn't be much money. Of course, if Google decided not to pay and an ISP cut off the service, the ISP would lose more money for each customer that left than Google would lose for 35 customers who stayed with the ISP and got cut off.

Another post, The Slow Suicide of Net Discrimination summed up this point and a few other arguments by the author to show that ISPs really shouldn't waste their time worrying about net neutrality. The author made some interesting points that I hadn't thought about before.

Tuesday, November 16, 2010

Tracking Down Real-life Problems with Wireshark

This summer, our only Internet access was using Google WiFi. If you aren't familiar with it, Google WiFi is a network of wireless routers on streetlight poles scattered throughout the city. It's a great idea, and it has the opportunity to be a great service, but we had terrible experiences. Occasionally we could successfully browse the web for about 15 minutes, but more often, our average load time for a page was about 30 seconds. It was not uncommon for connections to timeout repeatedly, and there were times when I spent more than 60 minutes trying to use a single website. All in all, it was a pretty awful system.

A few times, I tried to figure out why Google WiFi was so bad. Unfortunately, as the user of a complex system like this, there are too many possible sources of problems, and there isn't enough information available. However, I had some success in tracking down one particular problem.

We noticed that the connection was particularly awful when Google WiFi required us to reauthenticate by redirecting us to a login screen. It was occasionally difficult to realize that this was happening because the system attempted to transparently redirect to the login page, check our cookie, and then transparently redirect back to the page we requested. I think this was only supposed to happen once a day, but there are times when I noticed it 3 or 4 times within an hour. And although this process was supposed to be instantaneous, it usually took about 10 minutes to authenticate because pages would timeout. I decided to use Wireshark to try to figure out what was happening, and I found a horrible configuration error on Google's part. I noticed that when my browser was redirected to Google's login page that it would create a bunch of DNS requests for hosts like "ocsp.thawte.com". The browser would then connect to these hosts and receive an HTTP response redirecting to Google's authentication page. Looking up "OCSP", I learned that this was an SSL certificate revocation protocol, and I realized that my browser was trying to verify Google's certificate, but that these OCSP requests were getting intercepted and redirected by Google's firewall because we weren't authenticated yet. But the browser couldn't authenticate because the OCSP requests were getting redirected. This dance could continue for a long time. Anyway, I reported this problem on Google's WiFi forum, and who knows if they ever dealt with it. As a user, I got the feeling that free WiFi was probably the lowest priority project in the company.

Wireshark was able to help me track down one problem, but I'm not aware of any great tools for diagnosing other instances of dropped traffic. I couldn't tell whether packets were getting dropped in the air between our computer and the closest router, or during wireless transmission to further upstream routers, or at some higher protocol level (as with the OCSP problem). In the end, I came to the conclusion that we would have been better off without Google WiFi, as it wasted many hours of our life; we could have avoided this in advance if we had known how bad it would be. Unfortunately, if even Google can't make wireless mesh networks work, I have my doubts that the technology is ready yet. For all of the exciting promises of wireless, I can't say that it "just works" like wired networks usually seem to.

Reconsidering Assumptions

We recently read a paper about network coding in wireless networks. Not only was this an amazingly clever idea, but it also serves as a reminder of the importance of reconsidering assumptions (or conventional wisdom). Great effort has gone into building protocols on top of antenna-based communications to try to make this awkward, noisy broadcast medium as much like wired networks as possible. The wireless coding paper is particularly significant because the authors stepped back and considered whether there might be any advantages to using a broadcast medium.

I've always been something of a wireless skeptic. I appreciate the convenience and flexibility of wireless communications, but I've been frustrated by their slowness and unreliability. Because of this attitude, I'm a person who would never have come up with the network coding idea. Anyway, it's important to have an open mind and to consider whether a problem might have hidden strengths in addition to the obvious challenges.

Routing Security

It's been a few weeks since our routing section, and in a few more weeks we'll get to security. Since we're right in the middle, it seems like a good time to mention security in the context of routing.

We all know that BGP has problems with scalability and reliability, but I usually haven't focused on the security implications. It turns out that a malicious telecommunications company could cause some pretty serious problems. On April 8, 2010, BGPmon reported an incident where China Telecom "originated about ~37,000 unique prefixes that are not assigned to them" for about 15 minutes. Such incidents are fairly common in that a few times a year, some ISP causes disruption to large portions of the Internet. However, this situation was different because traffic moved through China Telecom's routers without being dropped. If an event like this were carried out intentionally, it could result in tremendous damage to individual, corporate, or national security.

Monday, November 15, 2010

You Gotta Love It

Several papers we have studied apply theoretical results from related fields to the solution of practical problems in computer networking. Examples of this are the application of consistent hashing in Chord, the application of network coding in COPE, and the application of cooperative diversity in ExOR.

Generally, I think this sort of thing can be very productive, because it leverages work which has already been done. In order to do this sort of thing it is necessary for researchers in one field to be aware of work done in other fields. Particularly, it is important for researchers in applied fields to be abreast of work in theoretical fields.

I think that can be challenging, to stay abreast of theoretical work going on in related disciplines. It's challenging enough just to stay abreast of your own field. I suppose that is why it is important for researchers to love to read and learn about other people's work, because there is so much of that to do.

Friday, November 12, 2010

An Ever Darkening World

One thing that we haven't studied is the effect of malicious users and perverse content on the Internet as a whole. From the beginning the Internet has been plagued with such problems. Now that we have spam filtering for our email, things seem better for me.

In Book of Mormon terms, I wonder how hard we are laboring to support iniquity. President Monson spoke once about prostituting presses. This was in relation to printing pornographic material. Certainly computer networks are being prostituted in a significant way. I wonder to what extent this is happening. What are the trends, compared to legitimate traffic? What are the costs and how are they distributed? What is the effect on society as a whole? Are we winning or losing?

The words of the prophets speak of an every darkening world. As the world collectively descends into the pit of sinful living, I suppose our networks are becoming more corrupt over time. I also suppose that this negative force will eventually threaten the benefits associated with the power of the Internet.

Network Simulation and Inspection

Omnet Inet and Wireshark are tools I used in our most recent lab. They enabled me to see how a network works at a very fine level of detail. The level of detail even went down to interactions between various OSI layers on a host, and the length of delay on a wire. I found it fascinating. It helped me to better understand how things fit together.

Wireshark is very easy to use. I had a much harder time making sense of the other two tools.

I think it would be interesting to get a hold of some real network traces and analyze them. Real data, rather than simulated, probably has a lot more to say. I would be interested to see what research has been done in the area of inferring characteristics of network traffic from data. Doing so would involve ways to analyze very large amounts of data.

The Great Divide

I was very surprised to learn how different wireless networking is from wired networking. What I expected was to study extensions to what we had already learned which were applicable to wireless. What I learned is that the constraints of wireless motivate drastic changes from the was networking is normally done.

For example, in opportunistic routing for wireless networks an ACK to the sender and new data to the receiver are effectively relayed by some intermediate party in the same message. This is totally different. Another example of the extent of the difference is that routing may be handled to a significant extent below the IP layer, near the MAC.

Another indication of the divide between wired and wireless is that 2 of the 4 papers we have studied so far came from a conference dedicated to mobile computing and wireless.

I expect to see other big differences in the future. I also suspect that much of what I have learned in the context of wired networking may not be very applicable in the context of wireless.

Friday, October 29, 2010

Conventional Wisdom of Routing

System design involves tradeoffs, and there is a natural tendency for improvements to be incremental instead of revolutionary. Although it's true that radical changes are often more difficult to implement, it's important to have people who challenge the conventional wisdom. I recently read ROFL: Routing on Flat Labels, which proposes a peer-to-peer-inspired routing architecture. Instead of merely separating location from identity, the paper explores a routing algorithm where location is completely unnecessary. The authors modestly write of ROFL, "While its scaling and efficiency properties are far from ideal, our results suggest that the idea of routing on flat labels cannot be immediately dismissed." In fact, the authors' preliminary algorithm has an attractive scalable design, and its performance seems acceptable given its benefits.

ROFL basically casts the Internet as a big hierarchical DHT. Since there is no lower network layer on which to overlay the DHT, each system keeps a cache of source routes which are sufficient to ensure that every host is reachable. Each packet contains a source route to a host whose ID does not exceed the ID of the destination. Each router, if possible, replaces this route with a new source route to a host whose ID is closer to the destination ID. In the spirit of DHTs, routers keep "fingers" (in this cased, memorized source routes) to routers that are physically nearby but logically distant on the ring.

At first glance, this sounds inefficient: routers blindly send packets based on the structure of the ring, which is completely unrelated to the structure of the network. However, as the cache sizes increase, the stretch approaches 1 (stretch is the worst case ratio of actual path length to optimal path length). Of course, if ROFL needs huge caches to get good performance, then it doesn't seem to be much of an improvement over BGP's huge routing tables, but there is an important difference: BGP needs huge tables to work, ROFL uses huge caches as an optimization. In other words, some routers can have large caches while others have small caches, and the algorithm will still work.

The one thing ROFL needs is a really good animation that shows how it works and the effect of caching on performance. In any case, after reading the paper, thinking about the algorithm, and trying to understand the results section, I am convinced that ROFL is worth exploring. It might be difficult to implement in an IP world, but it is a refreshingly novel approach.

Internet Protocol Research

It wasn't long ago at all when the Internet was first opened up to public use, 1994 I believe. Since then, it has seen explosive growth. Long ago, the need for an upgrade to the Internet Protocol (IP) was foreseen. A new version of IP (IPv6) has been designed and standardized. In spite of an ever increasingly glaring need for the new protocol, is still has yet to be adopted.

Now we are counting the months until our current system for handling IP addresses will be broken. The interesting thing to me is that research in this area seems so dormant. I get the feeling that we feel that the problem has long since been solved (IPv6). But is that really true? I have to wonder. If that was the case, why would adoption be lagging to the point that it is today?

Is it the case that IPv6 doesn't really solve the problem? IPv6 is not backwards compatible with IPv4. Is that why it may indeed not be a very good solution?

Perhaps backwards compatibility is far more important than we had supposed. Evidence for this lies in the fact that current efforts to implement IPv6 involve dual stacks, which means running IPv4 and IPv6 stacks concurrently. In other words, we are making systems which are backwards compatible, even though IPv6 itself isn't.

Compact Routing

I thought the compact routing paper was very interesting in light of all of the attention we have given to the scalability and stability problems associated with inter domain routing.

It seems to make a lot of sense to find short, rather than shortest paths. It is amazing to me how small routing tables can become by relaxing the shortest path requirement.

I wonder if this paper doesn't understate the significance of compact routing. The paper points out that even with compact routing, we are still stuck with linear growth in routing update messages. Certainly there is good reason to be concerned about the growth of update messages. But we are stuck with that problem anyway.

Is compact routing really an awesome idea? It may be. You can enforce an upper bound on stretch. This means that you can limit stretch as much as you want to. In other words, if you are concerned about stretch (no stretch means shortest path) just increase routing table size accordingly. If you get concerned about routing table size, just increase stretch accordingly. The nice thing is that very small amounts of stretch result in substantial routing table size savings.

Thursday, October 28, 2010

Open Standards

Our patent system is getting out of control. Once our mobile industry finishes suing itself out of existence (and our country becomes the only place in the world without smart phones), I wonder whether we'll eventually get reform. Obviously, mutually assured destruction is inadequate prevention because it just results in lots of destruction. In the latest news, Oracle has claimed that method names and signatures in public APIs are copyrighted.

In my opinion, it doesn't make sense to call a standard "open" if it is encumbered with patents. Encumbered protocols and file formats have stifled innovation on the Internet, and this will continue to get worse in the future. A few controversial technologies have included GIF, H.264, Flash, VoIP, hyperlinks, plugins, Java, and OOXML. Some of these were encumbered with proprietary baggage before achieving status as de facto standards, while others developed these problems later. In both cases, lawsuits and rumors of lawsuits have stifled innovation.

I'm still undecided whether I think patents should be banned entirely. At the very least, the lifespan of a patent (or copyright) should be limited, it should be easier to get rid of trivial ideas without expensive lawsuits, and companies proposing protocols for standardization should not be able to threaten patent litigation over those protocols.

A Market for IPv4 Addresses?

Now that IPv4 addresses are about to run out, what will people do after the address pool is completely empty? I recently read an interesting article speculating about what sort of IP address market might emerge.

The article made a number of interesting points that I mostly agree with. For example, most IP addresses go to consumer ISPs, not content providers. ARIN might stop giving out addresses to end-user ISPs before they stop giving them to data centers, simply because this would defer catastrophe. In any case, this explains why there is a serious 3-phase Comcast IPv6 trial, while most of the U.S. seems to be ignoring IPv6.

The author argues that a black market would be unlikely. The main argument is that big users like Comcast, which are responsible for most of the demand for addresses, would never pay very much per address. It just wouldn't be economical compared to IPv6 or even evil NAT.

The most interesting thought was, "I wouldn't be surprised if, when the IPv4 address supplies have run out, people will simply usurp address space that appears to be unused." I had never thought of this before, but it seems likely to me that someone might at least try this, especially in areas of the world that are particularly stressed for addresses.

In a year or two, I suppose we'll see what happens.

Tuesday, October 26, 2010

Straining at a [purpose for] NAT

With Halloween approaching, it seems like an appropriate time to write about NAT. When I first learned about Network Address Translation (NAT), it seemed cool because even though lame ISPs would only give one address per customer, we could still set up a whole network of computers behind a router. It was a great hack.

To my horror, I later learned that some people view NAT as a security feature. These misguided souls fall into two categories: a) friendly but confused people who aren't aware that firewalls can have deny-by-default policies, b) dangerously naive people who believe that NAT is a security panacea even though they recognize that it merely provides security-by-obscurity. The University of Michigan has produced a document, Security Considerations of NAT, that criticizes the use of NAT for security in a much more friendly tone than I would be willing to take. An adequate summary is that NAT doesn't provide nearly as much obscurity as it is usually given credit.

BYU spent tremendous amounts of money a few years ago to roll out NAT across campus, when they should have spent that money to configure firewalls and implement IPv6 (in my opinion, of course). Most people at BYU are nice, so I assume that those responsible fall under group (a), but I'm disappointed in the results.

I hope that as sites eventually start making the move to IPv6, they will consider dropping NAT instead of keeping the "conventional wisdom" of IPv4 and repeating the same mistakes. If we can finally get rid of NAT, I think this would open up a huge amount of innovation for peer-to-peer applications that we can't even imagine yet, in addition to the great applications we already have which are being stunted by the prevalence of NAT. As a user whose home network currently sits behind two layers of NAT, I'm really looking forward to change, although I'm still scared that we might get stuck with the status quo.

Happy Halloween.

A Culture of Sharing

It seems to me that the advent of the Internet marks the dawn of a new era of sharing. I believe the internet enabled the development and proliferation of open source, freely shared software. I believe it has also enabled an explosion of freely available content of all sorts. Interestingly enough, a requirement in this class is to share our thoughts in a blog, which is made freely available throughout the world.

At the core of Internet infrastructure are peering agreements, which are essentially agreements to share freely. The protocols which allow the Internet to function efficiently are for the most part followed voluntarily. Customer provider relationships, which are not about sharing but rather buying and selling, rely on the sharing that exists at the core.

One might argue that the world's academic institutions together with the contributions of scholars throughout recorded history lie at the center of modern civilization. Modern democracies are also at the core of modern society and also are largely based on voluntarism and sharing. Commercial activity in society relies upon the sharing which exists at the core.

As we study the technical innards of the Internet, it is apparent to me how much voluntary cooperation is relied upon for everything to work well together. It is a little surprising, and very interesting for me to think about. It helps me to appreciate others more and the contributions they have and are making to my quality of life.

Politics, Law, Business, and Technology

Our brief focus on net neutrality was a broadening experience. This was probably the only time in my academic career I will be asked to read a paper outside the field of computer science. I was surprised that I could follow an article in a law journal at all.

Residents of communities can and ought to work together on problems of community interest, like utilities and internet access. If you are unhappy with things, you could work to change them. Moving to a different community is not the only option.

Friday, October 22, 2010

Net Neutrality Debate

Our debate in class over network neutrality didn't end up being nearly as heated and confrontational as I had imagined, but the fun we sacrificed was replaced by a useful and insightful discussion. The issues of network neutrality are a mix of economics, politics, and technology. It's hard enough to agree on what network neutrality is, much less to balance the desires of numerous conflicting stakeholders. Although the members of the class had vastly differing opinions, I was surprised that there was an issue that everyone in the class could agree on: transparency. Even the most libertarian among us agreed that ISPs need to be open about their traffic shaping and discrimination practices.

Beyond transparency, there is very little we could agree on. Part of the problem is coming to an agreement about whether there is a healthy level of competition in the last-mile ISP market. Having recently shopped for service multiple in Utah and Salt Lake counties, I have some definite opinions on the matter. In each county, I am aware of only four serious companies: Comcast (cable), Qwest (DSL), Digis (wifi), and Utopia/iProvo (municipal fiber). I'll address each of these individually. Comcast consistently gets some of the worst results in the American Customer Satisfaction Index; granted, there are signs that things are slowly getting better, but that's only because they're losing customers who are able to find reasonable alternatives. Qwest is only able to provide decent speeds if you live in certain areas, and I have not yet lived in a place where they are a viable alternative. Digis seems to be a growing competitor, and they give me some hope that competition will continue to improve in the future, but their customers must have line-of-sight to their antennas, which our previous apartment did not have. Utopia seems to be great in those cities that participate, but I have never had the fortune to live in one of them. The iProvo system is a disaster; one reason of many is that they required all participating ISPs to provide voice and IPTV service, thus barring decent ISPs from joining. My point is that many places are worse than Utah, but even here I've generally only had one or two choices at any given residence. Some people in class argued that if you want to change ISPs, you can move to a different home; I thank them for so perfectly illustrating the high cost of switching providers.

I was surprised at how many members of the class were supportive of common carrier policies (for example, requiring Comcast to allow other ISPs to run signals over its wires). This would certainly increase competition (and I believe it currently adds competition within the DSL market), but I expected those with libertarian leanings to object. Perhaps they were persuaded by my comments about how cable and phone companies get plenty of taxpayer aid due to subsidies and municipal franchise agreements, but it's more likely that they just see it as the lesser of two evils (the other being increased net neutrality regulation across the board). Although the class disagreed on whether the current level of competition was healthy, I think that most of us feel that if competition were healthy, that transparency measures would probably be adequate to "solve" the net neutrality problem (at least for now).

Given that we don't have healthy competition (in my opinion) and that there are not any common carrier policies on the table, I am tentatively in favor of network neutrality regulations. Granted, regulation always comes with side effects, but I don't think the market is healthy enough to solve the problem on its own, though I might be willing to entertain an approach of increasing transparency for now and readdressing the issue in another two years.

With respect to policy, I would prefer an approach that allows ISPs to perform protocol-agnostic shaping but not to discriminate against competitors or to double dip (charge other ISPs' customers even though their own customers are already paying). If I were shopping for an ISP in a healthy market, I think it would be reasonable for them to throttle traffic based on the time of day and the overall bandwidth usage of the customer. Perhaps there could be tiers: for $10 a month, your traffic is always low priority, for $20 a month you're low priority if you've used more than 2 GB, and for $30 you're high priority. Or something like that. I can imagine a number of healthy protocol-agnostic models. The only model I would hate is the cell phone model (once you use more than 2 GB, we charge you $10 per MB).

To summarize my opinion, I think that transparency alone would be adequate given a healthy market, but given the lack of competition, it might also be necessary to enact legislation to require ISPs to only discriminate based on the usage of the customer, in a manner irrespective of protocols and remote addresses.

Tuesday, October 19, 2010

Multicast and the End-to-end Principle

Traditionally, multicast protocols operated at the network layer. Unfortunately, this made them almost impossible to deploy. Convincing people to use a product or feature is hard enough; getting protocol support from equipment and software companies is much more difficult; and getting service companies (like ISPs) to put them into practice is almost impossible. Trying to get all ISPs to explicitly support a new protocol sounds foolish. Of course, this is all spoken with plenty of hindsight. Back in the day, the Internet was much smaller, and global changes to network protocols must have been much easier to implement. Over time, the growing size and increasingly commercial nature of the Internet have made casual changes less common. Although the network researchers of the 1990s may appear to have been naive, this is only because we are now familiar with the history of multicast, IPv6, etc.

The 1990s have taught us an important lesson, a reinterpretation of the end-to-end principle: if you want to do something cool on a network and actually get it used, design it to run on one of the ends. Peer-to-peer and cloud applications can actually get adopted. This lesson isn't as depressing as it sounds. Most innovative network-layer protocols can be (and perhaps have been) revamped as application-layer protocols. Although we may never see IP multicast, we will likely use plenty of peer-to-peer and CDN technologies based on multicast research from the 1990s.

Saturday, October 16, 2010

Single Point of Failure?

One of the Internet's key design objectives was to be robust to a single point of failure. So much about it's design and implementation is redundant. It is interesting to me to learn that inter AS routing is described as brittle. Inter AS routing is at the very core of what we call the Internet, the network of networks. It seems that there is no redundancy for inter AS protocols, and that this is potentially a single point of failure for the Internet. Would it be possible to have multiple competing inter AS protocols?

I suppose that the answer is probably yes, and that it may someday happen. It seems that the inefficiency associated redundancy and competition is necessary. I suppose that the same holds true for the Internet protocol as well. Maybe instead of slowly transitioning from IPv4 to IPv6, what is really happening is that we are transitioning from a single Internet protocol to multiple Internet protocols.

BGP - Big Gateway Problems?

BGP, a.k.a. Border Gateway Protocol, is the protocol for inter AS routing on the Internet today. It has problems which are well known and many solutions to those problems have been proposed. However, the proposals are largely still just proposals, the problems persist and continue to grow.

When we studied transport, we saw a similar pattern. In time, problems with the status quo became apparent and many solutions were proposed. Actual implementation of proposals seems to come very slowly, if ever.

Evidently, we will run out of IPv4 addresses shortly, and everyone has known about it for a long time. IPv6 was not only proposed as a solution, but has begun to be implemented. But implementation has been very very slow.

It seems that things which become used by a very large number of people, become very resistant to change, even if change is sorely needed. I think about those people whose life's work is proposing solutions which, no matter how good, are highly likely to never be implemented. It could definitely be discouraging. I suppose that academics need to find satisfaction in simply illuminating an important point which may end up being only one small piece in a large and complex puzzle.

Is there any way we could make substantial progress faster?

Tuesday, October 12, 2010

Incentives on the Internet

Promoting the use of end-to-end congestion control in the Internet proposed a few possible approaches for avoiding congestion collapse. The bulk of the paper considers router-based incentives to encourage applications to use congestion control. Unfortunately, these incentives are not strong enough to dissuade users who are actively trying to game the system.

In the end, the only thing keeping the Internet from imploding are the combined good intentions of network architects, application developers, and users. I am reminded from the anecdote described in Freakonomics: a daycare started charging a fee for late pickups, so the number of late pickups increased. Sometimes adding specific incentives can backfire by taking away the guilt that stops people from abusing the system. The paper focused on technological solutions, but I think it's important not to neglect the social issues.

Friday, October 8, 2010

This Class Format

I wanted to say something about the format of this class. I really like the fact that we are getting so much exposure to current research. I am getting a lot out of my BYU graduate study experience generally. However, it have felt a need to have more exposure to what is going on right now in other places, in labs other than my own.

I can see that professors have a lot of demands on their time and are not able, or it is not easy for them to stay abreast of all important developments. What Dr. Zappalla is doing now seems very effective. Have the students go and and find new stuff, and then help them to understand it and put it in perspective, as they help you to see what else is going on.

Not only am I getting so much more exposure, but I am gaining what I think is valuable experience in finding relevant work. We are surely very blessed to have such effective modern tools to help us find such work. I feel I am getting better at using those tools.

I feel I am also getting better at picking up a new paper and quickly digesting its primary contributions. That is a valuable skill for the budding scientist.

Why Not Three More Bits?

The paper, One More Bit is Enough, got me thinking. If the use of a SINGLE bit in the IP header can prove so dramatically useful, then those IP header bits must be very precious, all of them.

The volume of traffic on the internet is exploding. The amount of data we can store per dollar is exponentially increasing at a much faster rate than processing speed per dollar. The volume of data we store and transport just grows and grows. We seem to find ever more uses for data. Today, a gigabyte is no big deal. In 1988, I remember, there was still a huge 1 gig hard drive on the fourth floor of the Clyde building, about the size of two large refrigerators.

So today, when a gig is no big deal, why would it be too much to ask, to ask for three more bits, or 100 more? If we are effectively stuck with a fixed size IP header, in spite of IPv6, it seems that this is a very big problem and potentially a fruitful area for future research.

Informed End to End

Our study of the transport layer leads me to firmly believe that the end to end principle works so much better when the ends are informed about what is going on in the middle.

It seems that we spent almost all of our time discussing TCP congestion. Congestion is something that happens in the middle. The response to congestion is for the ends to do something about it. Many, many approaches have been proposed, yet all follow a common pattern, figure out what is going on in the middle and change your sending rate accordingly.

The advent of XCP and VCP shows to me that the most effective way to figure out what is going on in the middle, is for the middle to give explicit feedback to the ends. If this is true, then this may have implications which extend beyond just the transport layer. Explicit feedback from the middle to the ends may have broad applicability.

Network Simulations: Good Enough?

I have read a few papers recently which relied on network simulations to evaluate some new approach to solving a problem. At first I had a bit of a negative attitude about that practice, simply because a simulation isn't the real thing.

On second thought, I am starting to warm up to the idea. One nice thing about simulations is of course that they tend to be more practical. But I don't believe that is the only reason to use them. I believe that simulations can be used to test extreme conditions which might rarely occur naturally. Simulations can be used to test a much broader set of conditions than those which most commonly occur.

It seems to me that few people actually complain that some new protocol was evaluated using simulations. I'm used to people addressing what they perceive are the weaknesses in some new proposal. But those weaknesses seem to be revealed through simulations just as well as by observing real traffic.

I don't remember ever reading that some proposal evaluated in simulation, turned out not to hold in the wild. In fact, I am wondering now if the simulation environment isn't potentially a better place to validate a proposal.

Thursday, October 7, 2010

More on Distributed Coordination

I finally finished writing my summary of Distributed Coordination for the class wiki. In the process, I had a lot of fun reading 24 different papers. Of course, this showed me that there are about 20 additional papers that I would need to read to really understand the area. And I would definitely have to reread 5 or 10 papers more carefully. And by that point I would find another 20 essential papers to read. :) In the end, the more you learn about something, the more you realize is still left.

Wednesday, October 6, 2010

Ping Experiment on PlanetLab

Kevin and I recently completed our PlanetLab project. This was basically a "Hello World" sort of task: we set up a slice of 130 nodes and had each node ping all of the others. Some prior familiarity with pssh made it fairly easy to set up the experiment. We generated a script that sequentially pinged each of the other nodes in the slice with "ping -c 10 -i .5 hostname" and then piped this script to pssh with the options "pssh -o output -e error -t 0 -h nodes.txt -l byu_cs660_1 -P -v -I -p 100 -O StrictHostKeyChecking=no". That looks like a lot of options, but it's not so bad when you consider all of the information it needed (where to store output and error files, which nodes to connect to, which user name to use, etc.). Anyway, pssh conveniently gave us one output file per node, which made the results easy to parse.

As usual, most of our time was spent analyzing and interpreting results. Availability on PlanetLab was surprisingly low. Five machines (4%) were completely down and never responded to even a single ping attempt. Nine additional machines (7%) responded to pings but never allowed us to log in. Even among the more cooperative 89% of nodes, packet loss was 38.3%. Additionally, about 5% of host pairs exhibited high RTT variance. Since ICMP traffic is lowest priority, I presume that UDP datagrams would have experienced less loss, but 38.3% is still significant.

I don't suspect that PlanetLab is particularly unreliable. Rather, any experiment on a large number of machines across a best-effort network is bound to run into problems. The takeaway message is that failures are inevitable, and systems should always be designed to tolerate such failures.

Tuesday, September 28, 2010

Slow Start

I've always wondered how a system with an exponential rate of packet transmission could be called "slow start". Then I saw Figure 3 ("Startup behavior of TCP without Slow-start") from Congestion Avoidance and Control. This congestion-control-free TCP sends as many packets as possible, resulting in huge amounts of packet loss. The graph shows a sequence of sharp saw teeth. In the trace, one sequence packets was resent four times!

With TCP Tahoe, on the other hand, the trace showed a smooth increase and a steady rate of delivery with few retransmissions. Overall, the TCP with congestion control transmitted data at more than twice the rate because it didn't wastefully retransmit packets. Compared to a near-vertical line with sharp slope, an exponential plot really does look slow. I am now at peace with the name "slow start".

Congestion Control in User Space

I ran into iTCP, which stands for "interactive TCP". The idea is that applications can interact with the TCP system. A better name might be "layer violating TCP", but I can't blame the authors for picking "iTCP" instead (it's much catchier). On the one hand, this model seems to add unnecessary complication and opportunities for abuse. But on the other hand, there is a perverse sensibility to this approach in the spirit of end-to-end architectures. For example, iTCP makes it easy for a video streaming application to recognize that packets are being dropped and to lower the video quality as a result.

Unfortunately, iTCP is inherently platform-dependent, and it doesn't seem likely that every OS would incorporate this feature. Rather than building this into the TCP implementation of an OS (which would probably slow down network processing for all applications and complicate the APIs), I think it would make more sense to achieve this functionality in userspace for those few applications that need it. A userspace "TCP" library could send and receive packets over UDP and provide all of the hooks and features in iTCP. This implementation would be platform-independent, and although it might be a bit slower than the in-kernel implementation, it would be much more flexible, and it wouldn't affect other applications.

Friday, September 24, 2010

In the Clouds: Networks and Society

We are studying computer networking. One thing that is interesting about it is the relationship between computer networking and society. I would say a computer network is highly constrained and simplified compared to a network of people. Yet the similarity may be strong enough that much can be learned. Particularly, we might be able to learn more about ourselves.

Since we are in the business of designing computers, we get insights into why we organize things certain ways. Since we are not in the business of designing people, we are not in the same position to get those types of insights. Following this argument, we may learn more about people through computer science, or we may learn more about society through the study of computer networking.

One of the things that inspired this thought was Andrew's description in class of roles and responsibilities in relation to coordination in distributed systems. He used terminology traditionally applied to society, which is now applied to computer science.

People who study society are typically not computer scientists. Can they learn enough about computer science to make connections to their discipline?

Within-Flow Measurement

The paper "TCP Revisited: A Fresh Look at TCP in the wild", describes an approach to internet measurement which does scale to very large numbers of flows. This was my complaint about the first measurement paper we studied together, that it doesn't scale because it requires end-to-end measurements. This paper recognizes the need to make within-flow measurements and describes new algorithms for doing so, using statistical techniques. A few end-to-end measurements were made to validate the new algorithms.

Some might argue that end-to-end measurements are needed in order to get accurate results. While this is probably true, it may not be so true after looking at the big picture.

For example, let's suppose I can get very accurate measurements using end-to-end. Yet, because the measurements are end-to-end, the number of measurements is necessarily limited. If the number of measurements are limited, then we have less data with which to make inferences. It may be better to have a large number of less accurate measurements, than to have a small number of highly accurate measurements.

Thursday, September 23, 2010

Coordination in Distributed Systems

I have recently read several papers on services that provide coordination for distributed systems. This turns out to be a fascinating area, and as I read each paper, I found myself getting sucked into more. I group these papers into two different categories: one type involves services that provide mechanisms for distributed coordination (including ZooKeeper, Chubby, and Sinfonia); the other involves protocols that guarantee certain properties despite various types of failures (such as Paxos, PBFT, Aardvark, and Zab).

The high-level papers describe specific services for distributed coordination. A group of servers (five seems to be a popular number) communicate with each other and agree on state. Clients communicate with one or more of these servers to read or modify this global state. The main property is that if one or two of the five servers fail, the others can keep the service running, even if the "leader" fails. Various guarantees about consistency may be provided--not only is consistency difficult to achieve in the event of different types of failures, but being able to tolerate failures usually requires sacrificing performance. I was very impressed by the work that has been done in this area.

The low-level papers were also fun. Protocols are designed to withstand Byzantine faults, which seem to encompass just about anything that can go wrong in a distributed system, including crashed servers, lost or repeated messages, corrupted data, and inconsistencies. The Practical Byzantine Fault Tolerance (PBFT) algorithm, introduced in 1999, seems to have launched a whole range of fascinating research. It reminds me of security in that cynical thinking is critical.

Friday, September 17, 2010

Lost in a Maze of DHTs

I enjoyed reading about Chord, and I'm impressed by the ideas behind Distributed Hash Tables. This area of research seems to be less than 10 years old, but as far as I can tell, there are dozens of different DHT designs and systems. In trying to make sense of all of this, I came across a recent 24-page survey that covers both design and applications, named appropriately Distributed Hash Tables: Design and Applications.

It seems like a great intro, but I feel like I'm missing something. As far as I can tell, CAN, Chord, Pastry, and Tapestry were all introduced about the same time in 2001, and Kademlia came out a year later. I still haven't read enough to know whether one is much better than the others. If one had been introduced a year earlier than the others, would there still be as many, or would the others just built on the work of the predecessors?

Great, Except That it Doesn't Scale

The paper, "End-to-End Internet Packet Dynamics", seems to make a great contribution to the study of networks. It applies a new method for analysing packet dynamics, which proves very effective. The new method is to install a service on selected 'ends' of the network and then to pass TCP traffic between each pair of 'ends'. Obviously much can be discovered using this approach. So, is there an even better way to study packet dynamics? In particular, is there a method which scales better than this one, which is quadratic in the number of 'ends'? Because this measurement method scales poorly, it necessarily limits the amount of measuring which can be done. Is it possible to make similar measurements in such a way as to support large-scale, real-time measurements? For example, How effectively can a single router be used to deduce or infer the dynamics of packets passing through it?

Since I study Natural Language Processing (NLP), I have been looking for connections between NLP and the internet. The questions asked above suggest a possible connection. In NLP, we are typically trying to infer things by observing only the traffic that flows between two or more people. The traffic I refer to is language, speech or text for example. We typically do not have direct access to the thought processes and intents of the people who either send or receive the traffic. These people are like the 'ends' of the network. In NLP, rather than making end-to-end measurements, as done in this paper, we do our measuring from the middle.

Does it Scale?

The paper, "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications", clearly answers what is perhaps one of the most important questions in computer science, "Does it scale?" I like the whole idea of distributed scalable systems. The paper offered a strong theoretical foundation for the proposed method. The method scales very well and is robust to node failures. I have been trying to think of some constructive criticism, but have not been successful.

One question I have is, How does cloud computing work today? Is it based on something like Chord? What advantages could a more centralized hash table have over Chord, given that the centralized solution scales just as well?

Tuesday, September 14, 2010

Making Sense of Packet Dumps

I recently read a paper called End-to-End Internet Packet Dynamics by Varn Paxson. This paper tried to make sense of packet dumps from 20,000 100 Kb TCP connections. It's a little old (1999), but I think it does a great job. The task is extremely difficult because of the complexity of the system being measured. Any particular effect could be caused by the packet sniffer, the TCP implementation on the sender or the receiver, or any of the network links or routers in between. And as a passive observer, the analysis program can only guess at the internal state of each component of the system.

I was particularly interested by some of the unexpected effects described in the paper, such as non-FIFO queuing, non-independent loss events, and route fluttering. In the face of such idiosyncratic behavior, I wonder what other bizarre effects have continued unnoticed for years. Occasionally, I'm just amazed that such a complex system as the Internet works at all.

I'm impressed both with the insightful observations from the author, and also with his acknowledgement that some of the conclusions might be wrong. Unfortunately, as the author acknowledges, many of the measured quantities exhibit extremely high variance, and some of the observations only apply to particular links or operating systems. This analysis is difficult to perform, but it really needs to happen again and again as the Internet continues to evolve as a system.

Monday, September 13, 2010

Tussles on the Internet

David Clark et al. wrote a paper in 2002 called Tussle in Cyberspace: Defining Tomorrow's Internet. This paper, which is related to the idea of an invariant that I mentioned in an earlier post, has even more relevance today than when it was first published. Beginning with, "The Internet was created in simpler times", the paper reminds us that our networks will reflect the conflicts from our society. Competing interests will always result in some sort of conflict, and technology cannot dictate the end results. The paper recommends that architectures be designed with enough flexibility to avoid breaking under social tension.

For example, "conservative governments and corporations put their users behind firewalls, and the users route and tunnel around them. ISPs give their users a single IP address, and users attach a network of computers using address translation." The thought of firewalls and tunnels may make designers cringe with images of trenches in wartime, but the more they try to protect their protocols from being used for evil purposes, the more these designs will be defiled. As stated by the authors: "Do not design so as to dictate the outcome. Rigid designs will be broken."

The authors try (rather pitifully) to keep a neutral stance about the many ongoing struggles on the Internet. However, they eventually break out of character and give a chapter on how to keep the Internet innovative and reliable in spite of these tussles. Their recommendation is to "bias the tussle" with open architectures, fault-tolerant designs, and encryption.

Although the outcome cannot be dictated by the designers, I agree that open designs may occasionally succeed at motivating reluctant parties to allow openness. One example that I thought of during my reading is open source software (particularly under the GPL), which is often more expensive to fork than to contribute to. This effect increases with the activity and usefulness of the project. Biasing outcomes is extremely difficult to pull off, but sometimes it works.

Evaluating Architecture Design with Invariants

I recently read Invariants: A New Design Methodology for Network Architectures. This paper defines an "invariant" as a property of a design that limits backwards compatibility, and it contrasts "explicit" invariants, which are designed interntionally, from "implicit" invariants, which are unintentional.

The idea of an implicit invariant is especially interesting. When a design fails to address a need from its users, they will use the system in ways that the designers did not intend. For example, port numbers were intended for the simple purpose of multiplexing connections; however, well-known port numbers are now built in to the logic of firewalls and routers.

As the authors acknowledged, the approach is fairly early, but it definitely seems like an interesting way to think about architecture.

Friday, September 10, 2010

What is Allowed? What is the Internet?

Today, Travis asked the interesting question, should middleboxes be allowed? It is interesting to me because it touches on the whole idea of what the internet is. I thought the internet was a network of autonomous networks, not the network to rule all networks. My understanding is that middleboxes, generally, are tools which are employed in specific networks, not in the internet itself.

If there are middleboxes in the internet itself, then I suppose that they provide some service that is necessary for the interconnection of all networks to function properly. At that high level, the interconnection of all networks, the policy is of necessity to place as few requirements as possible on connecting networks. Whether or not a network employs middleboxes can obviously not be a requirement for all networks. The more requirements, the less possibility of connecting all networks.

I think the real question is: Do I want my network, or the networks I use, to have middleboxes?

Grasping

I am trying to grasp a new area of study, network architecture. I appreciate Dr. Zappala's approach to helping us do that. So, what have I actually learned? Also, what avenues do I think might be promising for future research? It is interesting to me that as a graduate student I am asked to begin to comprehend an entirely new area of computer science, and at the same time throw out some ideas for possibly contributing to the area. I like the challenge. Maybe that is why I am a graduate student.

I tend to like network architecture ideas which emphasize small interchangeable parts, rather than large-scale integrated solutions. In my mind, large integrated solutions belong to the networks which connect to the internet. The internet itself should be dedicated to providing basic communications between disparate networks.

It seems that many ideas we have studied are an attempt to solve problems of typical users. In my mind, the problems of typical users should be solved by specific networks which cater to typical users. The internet should be treated separately from the networks which connect to it. Otherwise, we treat the internet as a single inflexible behemoth.

Friday, September 3, 2010

Branding

The paper, A Data-Oriented (and Beyond) Network Architecture, proposes to replace well known internet names, like www.google.com, with long names which are not human readable. As a newbie to this field I appreciate the paper for the exposure it provides to related work. Routing by name and anycast seem to be important.

I'm sure there are a lot of things to learn from this work. However, my gut reaction is that the idea of getting rid of human readable names will never ever work. Those names are too valuable. An internet name is essentially a brand. Brands can have a lot of value. This issue of internet naming is a financial issue, as well as a networking issue. Naming is also an important human computer interaction issue.

To me, naming is what abstracts the data from the specific host from which it may be obtained. When I type in a name, I expect a service, and I don't care which specific host provides it. I do expect the name to be persistent. And it generally is.

The new approach to naming proposed in this paper is intended to improve persistence of names for data or services. It seems to me that the proposed changes to naming may actually make names less persistent than they are today. The new names are associated with public-private key pairs. This means that if the key changes, then the name is no longer valid.

Wednesday, September 1, 2010

First Impressions on a Data-Oriented Network Architecture

The paper assigned for tomorrow is A Data-Oriented (and Beyond) Network Architecture. I've decided to share some of my first impressions, though my opinions are subject to change after our discussion in class tomorrow.

My first thought is that the paper doesn't seem to propose as much of a "clean-slate" as it promises in the introduction. Is this architecture really intended to completely replace DNS? It seems to say that I will need to have a bookmark to get to any site (p.3). How would I get to a search engine to find other sites? How would I get to a site that isn't indexed by search engines? How would I follow up on, say, a radio advertisement? Additionally, if the object names are based on a hash of the principal's public key, what happens when the key expires (typically keys are valid for about a year)? If users consider DONA names less usable than DNS names, then this would limit the usefulness of the new system.

On a lower level, the paper implies that HTTP would be rebuilt on top of DONA, but there aren't enough specifics that I would know what this looks like. At one point, it mentions that the URL would not be needed because the name is taken care of by DONA (p.5), but at another point it states that a principle may choose to just name her web site or name her web site and each page within it (p.3). It's hard to tell just how revolutionary the design is intending to be. Would it work alongside DNS and HTTP, or would it really be a complete replacement?

My biggest concern is that the design is data-oriented instead of service oriented. The paper was published in 2007, but even by then the "Web 2.0" phenomenon was in full swing. The feasibility section estimates the number of public web pages on the order of 10^10, but considering that personalization could multiply this by the number of users, I think that the estimates in the feasibility analysis (p.9) are orders of magnitude too small. I could imagine a company like Google or Facebook easily producing 10^12 to 10^16 unique data objects per day. Since there wasn't any discussion of cookies or AJAX or even latency, I just don't know what to think about feasibility. Delivering static content may indeed be better in the new architecture, but what if the video producer chooses to create a separate stream for each user for the purposes of watermarking? The design seems to be focused on completely static content.

Of course, if HTTP and DNS are intended to stay in its current form, then many of my concerns may be irrelevant: I could see DONA being the basis of a great worldwide CDN. However, the promise of "a clean-slate redesign of Internet naming and name resolution" leaves me with a lot of big questions. Hopefully some of these will be answered tomorrow.

Worthwhile Objectives

The paper, 'The Design Philosophy of the DARPA Internet Protocols', by David Clark is interesting to me, in part because it wasn't written years earlier. DARPA began developing what we now recognize as the internet, 15 years before this paper was written. The reason for the paper is to explain the goals of DARPA's 'internet' research project, or rather the author's view of those goals.

How is that possible, that the goals of a research project of such magnitude were not recorded earlier? He doesn't address that question. It is a bit unbelievable to me. Today, about 22 years later, we are starting out a graduate course by studying this paper. That is evidence to me of the importance of clearly understanding goals, or objectives.

I suppose that to push research forward, we will need to identify worthwhile objectives. Which objectives are most worthwhile? Thinking about the internet, the top level objective was to interconnect existing networks. Such an objective certainly was worthwhile, or at least has had great impact on the world. As part of this class I would like to understand which objectives are being pursued and which might prove most worthwhile.

Net Neutrality

Net Neutrality seems to be one of the biggest current issues in Internet architecture. Telecommunication companies seem to think of themselves as providing a unique service, but I think of them as a simple utility.

A recent article on the topic reports that AT&T believes that net neutrality rules should allow "paid prioritization". I'm fine with a corporate network using QoS to shape its internal traffic, but I'm uncomfortable with ISPs doing this with other people's traffic. I have had too many experiences where "cleverness" with traffic shaping caused inexplicable problems. For example, on BYU's network I have seen SSH connections terminated or stalled for no obvious reason, while other types of traffic worked fine. Even the network engineers can't necessarily figure out what is going on.

Rather than ISPs implementing complicated traffic shaping, why not keep the network simple? If a customer is using enough traffic to cause congestion, they should throttle that user's traffic irrespective of which protocol is being used. They should charge a customer for the bandwidth being used without trying to extort extra fees.

Above all, ISPs should fairly advertise what service is being provided: if the network is congested enough that there is a consistent need for paid prioritization, then the ISP is not providing the advertised bandwidth to its customers. Instead of "Download speeds up to 15 Mbps with PowerBoost", the fair advertisements might read "Peak bandwidth 15 Mbps, minimum guaranteed bandwidth 20 kbps". Another ISP might advertise "Peak bandwidth 15 Mbps, minimum guaranteed bandwidth 500 kbps" by better throttling users that are "hogging" the network. I just don't see how extorting content providers does anything to solve the problem. How could you fairly advertise bandwidth to reflect complicated traffic shaping methods?  "Bandwidth of 1 Mbps for google.com; 800 kbps for yahoo.com; 100 kbps per second for Skype; SSH and Bittorrent traffic have lowest priority; forged reset packets may be whimsically sent..."

NAT already makes it hard enough to try interesting new protocols on the Internet. Do the ISPs really need to make things even worse?