Live-blogging HotNets 2012, Day Two

This is Day Two. Day One is here.

Mobile and Wireless

Calum Harrison presented work on making rateless codes more power-efficient. Although rateless codes do a great job of approaching the Shannon capacity of the wireless channel, they're computationally expensive, and this can be a problem on mobile devices. This paper tries to also optimize for cost of decoding measured in terms of CPU operations, and gets 10-70% fewer operations with competitive rate. [Calum Harrison, Kyle Jamieson: Power-Aware Rateless Codes in Mobile Wireless Communication]

Shailendra Singh showed that there isn't one single wireless transmission strategy that is always best. DAS, Reuse, Net-mimo — for each there exists a profile of the user (are they moving, how much interference is there, etc.) for which that scheme is better than the others, which this paper experimentally verified. TRINITY is a system they're building to automatically get the best of each scheme in a heterogeneous world. [Shailendra Singh, Karthikeyan Sundaresan, Amir Khojastepour, Sampath Rangarajan, Srikanth Krishnamurthy: One Strategy Does Not Serve All: Tailoring Wireless Transmission Strategies to User Profiles]

Narseo Vallina-Rodriguez argued for something that may be slightly radical: "onloading" traffic from a wired DSL network onto wireless networks. We sometimes think of wireless bandwidth as a scarce resource, but actually your wireless throughput could easily be twice your DSL in some situations. If there is spare wireless capacity, why not use it? 40% of users use less than 10% of their allocated wireless data volume. They tested this idea in a variety of locations at different times and can get order-of-magnitude improvements in video streaming buffering. Apparently the reviewers noted that wireless providers wouldn't be a big fan of this — but Narseo noted that his coauthors are all from Telefonica. Interesting question from Brad Karp: How did we get here? Telefonica owns the DSL and wireless; if you need additional capacity is it cheaper to build out wireless capacity or wired? The answer seems to be that wired is way cheaper, but we need to have wireless anyway. Another commenter: this is promising because measurements show congestion on wireless and DSL peaks at different times. Open question: Is this benefit going to be true long term? [Narseo Vallina-Rodriguez, Vijay Erramilli, Yan Grunenberger, Laszlo Gyarmati, Nikolaos Laoutaris, Rade Stanejovic, Konstantina Papagiannaki: When David helps Goliath: The Case for 3G OnLoading.]

Data Center Networks

Mosharaf Chowdhury's work dealt with the fact that the multiple recent projects improving data center flow scheduling are dealing with just that — flows — with each flow in isolation. On the other hand, applications mean there are dependencies: for example, a partition-aggregate workload may need all of its flows to finish, and if one finishes earlier, it's useless. The goal of Coflow is to expose that information to the network to improve scheduling. One question that was asked was what is the tradeoff with complexity of the API. [Mosharaf Chowdhury, Ion Stoica: Coflow: An Application Layer Abstraction for Cluster Networking]

Nathan Farrington presented a new approach to build hybrid data center networks, with both a traditional packet-switched network and a circuit-switched (e.g., optical) network. An optical switch provides much higher point-to-point bandwidth but switching is slow — far too slow for packet-level switching. Prior work used hotspot scheduling, where the circuit switch is configured to help the elephant flows. But performance of hot spot scheduling depends on the traffic matrix. Here, Nathan introduced Traffic Matrix Scheduling: the idea is to repeatedly iterate between a series of switch configurations (input-output assignments), such that the collection of all assignments fulfills the entire traffic matrix. Q: Once you reach 100% traffic over optical, is there anything stopping you from eliminating the packet switched network entirely? Still there is latency on the order of 1 ms to complete one round of assignments; 1 ms is much higher than electrical DC network RTTs. Q: Where does the traffic matrix come from? Do you have to predict, or wait until you've buffered some traffic? Either way, there's a tradeoff. [Nathan Farrington, George Porter, Yeshaiahu Fainman, George Papen, Amin Vahdat: Hunting Mice with Microsecond Circuit Switches]

Mohammad Alizadeh took another look at finishing flows quickly in data centers. There are a number of recent protocols which are relatively complex. Their design is beautifully simple: each packet has a priority, and routers simply forward high priority packets first. They can have extremely small queues since the dropped packets are likely low priority anyway. End-hosts can set each packet's priority based on flow size, and perform very simple congestion collapse avoidance. Performance is very good, though with some more work to do for elephant flows in high-utilization regimes. [Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker: Deconstructing Datacenter Packet Transport]

Lunch!

Routing and Forwarding

Gábor Rétvári tackled a compelling question: How much information is actually contained in a forwarding table? Can we compress the FIB down to a smaller size, making router hardware simpler and longer-lasting? Turns out, there's not so much information in a FIB: with some new techniques, a realistic DFZ FIB compresses down to 60-400 Kbytes, or 2-6 bits per prefix! A 4 million prefix FIB can fit in just 2.1 Mbyte of memory. Now the interesting thing is that this compression can support reasonably fast lookup directly on the compressed FIB, at least asymptotically speaking, based on an interesting new line of theory research on string self-indexing. One problem: They really need more realistic FIBs. The problem is that widely-available looking glass servers obscure the next-hops, which affect compression. "We are inclined to commit crimes to get your FIBs." Before they turn to a life of crime, why not send them FIBs? They have a demo! Question for the future: Can we use compressed forwarding tables at line speed? [Gábor Rétvári, Zoltán Csernátony, Attila Körösi, János Tapolcai, András Császár, Gábor Enyedi, Gergely Pongrácz: Compressing IP Forwarding Tables for Fun and Profit]

Nicola Gvozdiev wins the award for best visualizations with some nice animation of update propagation among iBGP routers. Their work is developing the algorithms and systems necessary to propagate state changes in iBGP, without causing any transient black holes or forwarding loops. [Nikola Gvozdiev, Brad Karp, Mark Handley: LOUP: Who's Afraid of the Big Bad Loop?]

Vasileios Kotronis's work takes SDN-based routing a step further: Don't just centralize within a domain, outsource your routing control to a contractor! One cool thing here, besides reduced management costs, is that you can go beyond what an individual domain can otherwise do — for example, the contractor has interdomain visibility and can perform cross-domain optimization, debug policy conflicts, etc. [Vasileios Kotronis, Bernhard Ager, Xenofontas Dimitropoulos: Outsourcing The Routing Control Logic: Better Internet Routing Based on SDN Principles]

User Behavior and Experience

Rade Stanojevic presented results from a large data set of mobile service plans (roughly a billion each of calls, SMS/MMS messages, and data sessions). The question: Are economic models of how users select bandwidth and service plans realistic? What choices do real people make? In fact, only 20% of customers choose the optimal tariff. 37% mean overpayment, 26% median. Another interesting result: use of service peaks immediately after purchase, and then decays steadily over at least a month, even with unlimited service (so it's not just because people are conservative as they near their service limits). Several Questions: Do these results really demonstrate irrationality? Users may buy more service than they need, so they don't need to worry about (and pay) comparatively pricey overage fees. Comment from an audience member: One has to imagine the marketing department of Telefonica has that exact same CDF of "irrationality" as their metric of success. [Rade Stanojevic, Vijay Erramilli, Konstantina Papagiannaki: Understanding Rationality: Cognitive Bias in Network Services]

Athula Balachandran presented a study working towards a quantitative metric to score user experience of video delivery (in particular, how long users end up watching the video). The problem here is that predicting user experience based on quantitative observables is hard: it's a complex function of initial startup delay, how often the player buffers, buffering time, bit rate, the type of video, and more. The paper analyzes how well user experience can be predicted using several techniques, based on data from Conviva. [Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, Hui Zhang: A Quest for an Internet Video Quality-of-Experience Metric]

Vijay Erramilli presented a measurement study of how web sites act on information that they know about you. In particular, do sites use price discrimination based on information they collect about your browsing behavior? Starting with clean machines and having them visit sites based on certain high- or low-value browsing profiles, they could subsequently measure how a set of search engines and shopping sites present results and prices to those different user profiles. They uncovered evidence of differences in search results, and some price differences on aggregators such as a mean 15% difference in hotel prices on Cheaptickets. Interestingly, there were also significant price differences based on the client's physical location. Q from Saikat Guha: How can you differentiate the vendor's intentional discrimination from unintentional? For example, in ad listings, having browsed a certain site can cause a Rolex ad to display, which bumps off an ad for a lower priced product. [Jakub Mikians, László Gyarmati, Vijay Erramilli, Nikolaos Laoutaris: Detecting Price and Search Discrimination in the Internet]

That's it! See you all next year...

Live-blogging HotNets 2012

Note: This blogging might be rather bursty. If you want something more deterministic, here's the HotNets program.

This is Day One. Day Two is here.

Session 1: Architecture and Future Directions

Teemu Koponen spoke about how combining the ideas of edge-core separation (from MPLS), separating control logic from the data plane (from SDN), and general-purpose computation on packets (from software routers) can lead to a more evolvable software defined Internet architecture. [Barath Raghavan, Teemu Koponen, Ali Ghodsi, Martin Casado, Sylvia Ratnasamy, Scott Shenker: Software-Defined Internet Architecture]

Sandeep Gupta discussed rather scary hardware trends, including increasing error rates in memory, and how this may affect networks (potentially increasing loss rates). [Bin Liu, Hsunwei Hsiung, Da Cheng, Ramesh Govindan, Sandeep Gupta: Towards Systematic Roadmaps for Networked Systems]

Raymond Cheng talked about how upcoming capabilities which will be widely deployed in web browsers will enable P2P applications among browsers, so free services can really be free. Imagine databases in browsers, or every browser acting as an onion router. [Raymond Cheng, Will Scott, Arvind Krishnamurthy, Tom Anderson: FreeDOM: a new Baseline for the Web]

Session 2: Security and Privacy

Scott Shenker examined how to build inter-domain routing with secure multi-party computation (SMPC), to preserve privacy of policies. The idea is that interdomain routing really is a multi-party computation of global routes, and participants want it to be secure. The benefits of using SMPC: autonomy, privacy, simple convergence behavior, and a policy model not tied to computational model. The last item should be emphasized: there's a lot more potential policy flexibility here with a much easier deployment story, just changing software at the set of servers running the computation. For example do other classes of policies have different or better oscillation policies? Part of this (convergence) seems to connect with Consensus Routing. Jeff Mogul mentioned an interesting point: By adding the layer of privacy it may be very hard to figure out what's going on inside the algorithm and debug why it arrived at a particular result. [Debayan Gupta, Aaron Segal, Gil Segev, Aurojit Panda, Michael Schapira, Joan Feigenbaum, Jennifer Rexford, Scott Shenker: A New Approach to Interdomain Routing Based on Secure Multi-Party Computation]

Katerina Argyraki spoke about how we can change the basic assumption of secure communication: creating a shared secret not based on computational difficulty, but on physical location. The idea is to use different wireless interference across location. Security is more robust that you might think, in that you just need a lower bound on how much information Eve misses, rather than which pieces of message Eve missed. An implementation generated 38 secret Kbps between 8 nodes. However in a few corner cases Eve learned a substantial amount about the secret. There is some hope to improve this.[Iris Safaka, Christina Fragouli, Katerina Argyraki, Suhas Diggavi: Creating Shared Secrets out of Thin Air]

Saikat Guha linked the problem of data breaches to money and proposed data breach insurance ("Obamacare for data") In a survey, 77% of users said they would pay, a median of $20. (Saikat thought this may be optimistic.) They're working to develop a browser-based app to monitor user behavior, offer individuals incentives to change to more secure behavior, and see if people actually change. [Saikat Guha, Srikanth Kandula: Act for Affordable Data Care.]

Lunch!

Session 3: Software-Defined Networking

Aaron Gember spoke about designing an architecture for software defined middleboxes, taking the idea of SDN to more complex processing. Distributed state management is one challenge. [Aaron Gember, Prathmesh Prabhu, Zainab Ghadiyali, Aditya Akella: Toward Software-Defined Middlebox Networking]

Monia Ghobadi has rethought end-to-end congestion control in software-defined networks. The work observes that TCP has numerous parameters that operators might want to tune — initial congestion window size, TCP variant, even AIMD parameters, and more — that can have a dramatic effect on performance. But the effects they have depend on current network conditions. The idea of the system they're building, OpenTCP, is to provide an automatic and dynamic network-wide tuning of these parameters to achieve performance goals of the network. This is done in an SDN framework with a central controller that gathers information about the network and makes an educated decision about how end-hosts should react. Experiments show some very nice improvements in flow completion time. Questions: Did you see cases when switching dynamically offered an improvement? And in general, how often do you need to switch to get near the best performance? Some of that remains to be characterized in experiments. [Monia Ghobadi, Soheil Hassas Yeganeh, Yashar Ganjali: Rethinking End-to-End Congestion Control in Software-Defined Networks]

Eric Keller, now at the University of Colorado, spoke about network migration: Moving your virtual enterprise network between cloud providers, or moving within a provider to be able to save power on underutilized servers, for example. Now, doing this while keeping the live network running reliably is not trivial. The solution here involves cloning the network and using tunnels from old to new, and then migrating VMs. But then, you need to update switch state in a consistent way to ensure reliable packet delivery. Some questions: How do you deal with SLAs, how do you deal with networks that span multiple controllers? [Eric Keller, Soudeh Ghorbani, Matthew Caesar, Jennifer Rexford: Live Migration of an Entire Network (and its Hosts)]

Session 4: Performance

Ashish Vulimiri presented our paper on making the Internet faster. The problem: Getting consistent low latency is extremely hard, because it requires eliminating all exceptional conditions. On the other hand, we know how to scale up throughput capacity. We can convert some extra capacity into a way to achieve consistent low latency: execute latency-sensitive operations twice, and use the first answer that finishes. The argument, through a cost-benefit analysis and several experiments, is that this redundancy technique should be used much more pervasively than it is today. For example, speeding up DNS queries by more than 2x is easy. [Ashish Vulimiri, Oliver Michel, P. Brighten Godfrey, Scott Shenker: More is Less: Reducing Latency via Redundancy]

The questions are getting interesting. Where is Martha Raddatz?

Udi Weinsberg went in the other direction: redundancy elimination. This is an interesting scenario where a kind of content-centric networking may be a big help: in a disaster which cuts off high-throughput communication, a DTN can provide a way for emergency response personnel to learn what response is most effective, through delivery of photos taken by people in the disaster area. But in this scenario, as they have verified using real-world data sets, people tend to take many redundant photos. Since the throughput of the network is limited, smart content-aware redundancy elimination can more quickly get the most informative photos into the hands of emergency personnel. [Udi Weinsberg, Qingxi Li, Nina Taft, Athula Balachandran, Gianluca Iannaccone, Vyas Sekar, Srinivasan Seshan: CARE: Content Aware Redundancy Elimination for Disaster Communications on Damaged Networks

Onward to Day Two...

Notes on ACM, Open Access, and Copyright

My last post listed the comments on open access and copyright of the candidates in the 2012 ACM Council Election. Since I first posted, several more responses came in, so you might be interested to check it out. Vicki Hanson's note, in particular, provided a concise summary of the rationale for ACM's current policies.

So what did the candidates think? There are at least two important issues:

  1. Not preventing access to papers: This is a question of the copyright or licensing policy. Does it inhibit researchers from distributing their own work?
  2. Actively facilitating greater access to papers: This implies that ACM itself would somehow openly distribute papers.

Not preventing access to papers

The candidates' statements differed fairly significantly on this point — so you have a meaningful choice in your vote!

Many candidates noted that already the ACM allows authors many rights. However, it still prevents uses such as posting on arXiv and commercial distribution.

The co-chairs of the ACM Publications Board explained ACM's copyright policy in the October 2011 CACM. Regarding copyright transfer, they write:

One might wonder, given the generous rights retained by authors, why ACM requires authors to transfer copyright to ACM at all. In fact, the transfer of copyright to ACM provides substantial benefit to the computing research community and to authors. By owning exclusive publication rights to articles, ACM is able to develop salable publication products that sustain its top-quality publishing programs and services; ensure access to organized collections by current and future generations of readers; and invest continuously in new titles and in services like referrer-linking, profiling, and metrics, which serve the community. Furthermore, it allows ACM to efficiently clear rights for the creation, dissemination, and translation of collections of articles that benefit the computing community that would be impossible if individual authors or their heirs had to be contacted for permission. Ownership of copyright allows ACM to pursue cases of plagiarism. The number of these handled has been steadily growing; some 20 cases were handled by ACM in the last year. Having ACM investigate and take action removes this burden from our authors, and ACM is more likely to obtain a satisfactory outcome (for example, having the offending material removed from a repository) than an individual.

My summary of this is that ACM gets the following from holding the copyright:

  • More revenue. Question: how much more?
  • Easier dissemination without contacting individuals. Question: wouldn't this be fixed with a non-exclusive perpetual license to distribute the work?
  • Ability to pursue plagiarism. Point of comparison: 20 papers represents a fraction 0.000065 of the 307,000 articles in the Digital Library, i.e., one in every 15,350.

Actively facilitating greater access to papers

Exactly zero of the candidates fully endorsed open access in the sense of ACM providing all publications freely online, though Radia Perlman came closest.

Open access does not necessarily mean that all the Digital Library's services would be free — only that papers would be distributed freely somehow (for example, many ACM conferences already distribute their proceedings freely online). Still, full open access certainly could impact revenue, perhaps significantly. Here are some interesting numbers. In 2011, the ACM DL grew by over 31,000 full-text articles, or 11%, to a total of 307,000 (up from 21,000 new articles in 2010). In 2011, from publications, ACM earned $18,275,000 in revenue (28% of its total) and incurred $11,750,000 in expenses. Thus, for each new publication last year, ACM took in $590 and spent $379 leaving about $211 to support numerous other activities beneficial to the community.

I assume those numbers include not only digital but also print distribution of some papers and articles. It would be interesting to have ACM's digital-only costs as a comparison to the arXiv. In 2010 arXiv wrote,

The annual budget for arXiv is $400,000. With over 60,000 new submissions per year one may think of this as an effective cost of <$7 per submission. Alternatively, with over 30,000,000 full-text downloads per year this is an effective cost of <1.4 cents per download.

The one-time cost of $7 per submission is as much as three orders of magnitude lower than some other estimates of the cost of providing open access per paper. In 2009 Michel Beaudouin-Lafon wrote in CACM:

But how much are authors ready to pay to publish an article? A few hundred dollars? The most prominent Open Access publisher, the Public Library of Science (PLOS), is a nonprofit organization that has received several million dollars in donations. Yet it charges between $1,350 and $2,900 per paper, depending on the journal. In fact, many in the profession estimate that to be sustainable, the author-pay model will need to charge up to $5,000–$8,000 per publication.

Some of these numbers might include additional services such as editing, but the arXiv numbers and similar numbers from JMLR imply that the cost of archiving and distribution is far lower than the thousand-dollar estimates. Indeed, PLOS ONE publisher Peter Binfield left to found Peerj which will apparently charge authors a $99 lifetime membership fee to publish open access papers starting fall 2012.

Remember the ACM election runs just a few more days, till May 22.

Statements of ACM candidates on open access and copyright policy

In the May issue of CACM, Moshe Vardi argues that the interests of authors and commercial publishers have irreconcilably diverged. But "in the case of publishing by a professional association, such as ACM, the authors, as ACM members, are essentially also the publishers", so when choosing a publishing model, "the decision is up to us: ACM members, authors, publishers."

Good point! With the 2012 ACM Council Election happening now through May 22, what are the candidates' positions on progressive copyright policies? I asked the candidates the following:

Do you have a position on the appropriate copyright policy for ACM's publications? Specifically, should the copyright on published research papers be assigned to ACM, or should the authors retain the copyright with ACM holding a non-exclusive license to distribute the work, similar to USENIX's policy? What is your position on moving ACM's publications to an open-access model?

Here are the candidates and their responses, filled in as they come.

Update May 14 2012: Additional notes and thoughts over here.

President

Barbara G. Ryder, Virginia Tech: "The ACM Digital Library (DL) has been designed and constructed by ACM, led by the vision of computing researchers in the SIGs. It now has become THE repository to go to for computing publications, having listings for many more than only ACM publications. This effort was undertaken for and supported by the computing research community; more recently, ACM has enhanced the DL with author metrics, additional search capabilities, the Authorizer tool, etc -- all in support of the research community. So the ACM DL is an important resource for computing. But ACM is a membership organization, not a for-profit company which can choose to invest in services for the community, funded by other revenue streams. At this time, the ACM DL generates a significant income stream for ACM and its SIGs, which, in part, supports further DL development as well as other activities. Any discussion of Open Access publication and ACM has to consider the financial consequences of the choices to be made. It is not just a philosophical discussion. ps These comments have already been posted on the Web, after answering similar questions from Matt Welsh: link [Updated: Regarding question about copyright policy:] please look at [ACM copyright policy] ... This allows non-commercial personal use by an author of her/his paper after the copyright has been signed away to ACM [and] the right to post a unique link using the Author-Izer ACM Linking Service on either the author’s homepage or Institutional Repository (wherever the author’s bibliography is maintained) which enables free access from that location to the definitive version of the work permanently maintained in the ACM Digital Library."

Vinton G. Cerf, Google: "I much prefer a kind of creative commons method or licensing method that leaves the authors with copyright and ACM with sufficient privilege to carry out its work."

Vice President

Mathai Joseph, Tata Consultancy Services (excerpt of longer response): "... I am quite happy with the ACM copyright policy because it represents a sensible balance between the rights of the author and the rights of a publisher who has invested time and effort in making the publication available to the community. ACM is competing with commercial publishers with far more restrictive policies and has to protect its rights in a fairly predatory market. ... Thank you for raising this important question. Some time back I talked to people in ACM HQ about it, thought of alternatives and then decided that the ACM policy is actually fairly sane."

Show/hide Mathai's full response

Thank you for your message and question about my stand on a copyright policy for ACM.

First, you refer to the USENIX policy as having 'a non-exclusive license'; in fact USENIX asks for exclusive rights for a specified period (12 months) and rights to continue to maintain its copies with public access after that period.

More broadly, I think the important question is the expected period of interest in a publication. I may be wrong, but I would guess that material published by USENIX has immediate interest for a specific community that diminishes over time as the important ideas of the content become part of a more permanent repository for long-term reference. In that context, 12 months is probably the period when there is most interest and it is covered by exclusive rights.

In contrast, journals provide a long term repository for material that has been carefully selected, refereed by the community and published as part of the accepted knowledge of a field (accepting of course that errors may be found at a later time). A paper like the one by Fischer, Lynch and Paterson on 'Impossibility of Distributed Consensus with One Faulty Process' which appeared in J. ACM in April 1985, has now had over 3000 citations, many of which have appeared in the last decade, or over 15 years after original publication. So rights have to be preserved over a very long time.

I am quite happy with the ACM copyright policy because it represents a sensible balance between the rights of the author and the rights of a publisher who has invested time and effort in making the publication available to the community. ACM is competing with commercial publishers with far more restrictive policies and has to protect its rights in a fairly predatory market.

The ACM Digital Library took a very large investment from ACM members to create. It not only holds the final version of a publication, it allows it to be seen along with other similar or related publications by the same author, or on similar topics. So the value of the DL should not be seen in the context of a single publication but over a range of publications that may be of interest at the same time. If the DL did not exist, it would have to be created in order to give us all the facilities that are needed for research. ACM does have consortium agreements for access to the DL and this brings down the cost for access (to zero, in most cases, since it is the institution and not the individual who has to pay the consortium charges).

I would like to turn the question around and ask you what the ACM policy prevents an author from doing: in what important way is the author unable to make use of his or her publication because of ACM's policy?

Thank you for raising this important question. Some time back I talked to people in ACM HQ about it, thought of alternatives and then decided that the ACM policy is actually fairly sane.

Alexander L. Wolf, Imperial College London: "Obviously, this is a very important and timely issue. ... I can tell you that the USENIX licensing model, the IEEE Security and Privacy licensing experiment, and related ideas are all under active study by various ACM volunteer groups. One thing I've learned from these discussions is that open access is a deceptively and desperately complex issue ... Personally, I subscribe to the general principle that outcomes of activities supported through public funds ... should be available for use by all citizens. ... ACM provides a staggeringly rich set of services (not just the management of professional conferences within a restricted intellectual domain, which is the predominant role of USENIX) to its members and to the larger (non-member) community. Those services cost money. ... How do we compensate for the loss of DL revenue, the funds that effectively subsidize many of ACM's other activities? Should we raise member dues? Should we raise conference fees? (BTW: dues and fees would have to be raised substantially, to the point that we would seriously risk the viability of both our organization and our conferences. Have you looked at the fees being charged for NSDI 2012 this week? And that's just to cover the conference costs, a bit of USENIX staff time, and a small share of maintenance cost for the USENIX content servers.) ... For instance, ACM is able to provide substantial financial and organizational aid to CSTA [... which supports CS K-12 school teachers. Alex also mentioned ACM's role in policy, developing nations, inclusion of women, curricula guidance, the 35 ACM SIGs, and student participation.] The overall point here is that we face a difficult trade off. ... The aim is to find a balance between the potentially conflicting goals of giving individuals easy access to the information generated by the community at the same time as helping guarantee a revenue stream for an organization that, frankly, plays a key role in sustaining the community. ... I urge you to take a look at several articles that have appeared in CACM related to the open access issue if you haven't already done so: [1, 2, 3]. I largely agree with them, and as such they also represent my position on the topic. Of course, the environment is dynamic, and new ideas are likely to emerge. I think the important thing for an officer of our association to do is maintain an understanding of and appreciation for the full context of the situation."

Show/hide Alex's full response

Thanks for getting in touch. Obviously, this is a very important and timely issue. It is one that is discussed and debated regularly by the ACM Council and ACM Publications Board. I take that as a healthy sign: the serious thought and effort that ACM volunteers are putting into consideration of the issue. I can tell you that the USENIX licensing model, the IEEE Security and Privacy licensing experiment, and related ideas are all under active study by various ACM volunteer groups.

One thing I've learned from these discussions is that open access is a deceptively and desperately complex issue, and one for which there is a lot of mis-information floating about. For example, the notion that one needs to "mov[e] ACM's publications to an open-access model". We should begin with the question: by what definition of "open access"? ACM publications are already considered "Green Open Access" as defined by various leading advocates of open access. So, we need to understand in what way GOA might not be sufficient or appropriate for ACM publications. Consider, too, ACM's new Author-izer service, which gives authors a mechanism for granting non-DL subscribers cost-free access to their publications. Access can be granted from a personal web page or from an organizational corpus (e.g., a university's publication repository). And, of course, the standard ACM copyright agreement already permits various forms of free dissemination.

Personally, I subscribe to the general principle that outcomes of activities supported through public funds (whether directly through government research grants, or indirectly through the education, training, and employment of people who carry out research at public institutions no matter the sponsor of that research) should be available for use by all citizens. (As a general principle it leaves aside many thorny issues, of course, such as what about partial support, what about certain specific and potentially harmful dual-use outcomes, how do we best promote industrial innovation, are not-for-profit organizations such as MIT and ACM "public" institutions, etc. Let's accept that we don't have answers to those questions for the moment.)

Now, how does that principle relate to your questions? It could be that this principle is exactly what you had in mind. Or it could be that you believe authors should have exclusive rights to what they produce, which could very well be in conflict with the principle outlined above. (Consider, for example, that if one follows the principle above, then by accepting public funds one has already given up certain rights.) And, then, which perspective is supported by the notion of licensing to which you alluded? I would suggest the latter (exclusive author rights), in which case we may well disagree. You see, some people may think that licensing, as opposed to copyright transfer, better supports public access, when in fact it may instead simply support exclusive author rights, at which point we must then trust each individual author (or the organization that employs the author) to make the works publicly available, and on a continuing basis. So perhaps it is actually the detail of the agreement that is put in place that is important, not so much the vehicle (license or copyright transfer) that is used to carry it. See, for example, this commentary on the IEEE Security and Privacy license experiment:

https://freedom-to-tinker.com/blog/dwallach/ieee-blows-it-security-privacy-copyright-agreement/

There are many, many other issues to consider. Here is a sampling:

  • ACM is a not-for-profit, volunteer, member organization. The decisions that ACM takes are decisions made by you and me, the members of the organization, not the headquarters staff.
  • Why is it that libraries and library consortia are willing to pay ACM for DL access? Two simple answers: (1) because ACM content is not only of the highest quality, it is far, far less expensive than the fees charged by commercial publishers -- value for money in an extremely tight economy; and (2) because it is a managed-access corpus supported by a professional organization. We must be very careful to consider this value model.
  • ACM provides a staggeringly rich set of services (not just the management of professional conferences within a restricted intellectual domain, which is the predominant role of USENIX) to its members and to the larger (non-member) community. Those services cost money. Do we believe that these services are valuable? Then we must find ways to generate the money to fund them. Should we shut down the ACM DL and let authors take full responsibility for making their papers publicly accessible? Should authors be charged a fee for ACM to provide the DL service? How do we compensate for the loss of DL revenue, the funds that effectively subsidize many of ACM's other activities? Should we raise member dues? Should we raise conference fees? (BTW: dues and fees would have to be raised substantially, to the point that we would seriously risk the viability of both our organization and our conferences. Have you looked at the fees being charged for NSDI 2012 this week? And that's just to cover the conference costs, a bit of USENIX staff time, and a small share of maintenance cost for the USENIX content servers.)
  • We need to consider that there are multiple constituencies involved in this issue. Authors, yes, but also readers, other ACM members, research sponsors, practitioners, governments, companies, teachers, students, the public at large, and libraries and library consortia. Of particularly concern to me, I must admit, are those benefiting from the other services made possible in part by the revenue generated by the ACM DL. For instance, ACM is able to provide substantial financial and organizational aid to CSTA, the Computer Science Teachers Association, which is an activity (started by the ACM) to support K-12 teachers ("school teachers" in the UK) around the world. ACM operates USACM, which provides informed technical opinions to US policy agencies and law makers, whose decisions, like it or not, have huge impact around the world. ACM is helping developing nations, such as India, organize their computer science education and research communities. ACM is promoting the inclusion of women in the profession through ACM-W and related activities. ACM provides curricula guidance used in establishing educational programs and accreditation criteria. The 35 ACM SIGs and their members receive substantial support from the ACM DL revenue, again effectively subsidizing their operations, such as to promote student conference attendance. There are many other examples.
  • Should we allow this issue to be resolved on a case-by-case basis by individual authors? By that I mean, should authors decide for themselves what rights to assign or not? My feeling is that such an approach is not viable, much in the same way that (health or car) insurance as a concept only works if the society as a whole is compelled to participate. We are in a society of sorts, a computer-professionals society, and as such we must also consider what is required of the individual to maintain the viability of the society. Of course, this is the essence of the debate, and we must resolve opposing viewpoints on that question.

The overall point here is that we face a difficult trade off. Any action we take in one direction with respect to this issue must certainly be taken in consideration of its impact on the others. Facile solutions and proposals must be considered suspect.

The trade off, and the ACM response to it, are well represented by the emerging notion of "fair access", which is obviously an allusion to the related DRM notion of "fair use". The aim is to find a balance between the potentially conflicting goals of giving individuals easy access to the information generated by the community at the same time as helping guarantee a revenue stream for an organization that, frankly, plays a key role in sustaining the community. As ACM volunteers, let's be careful not to let our not-for-profit, professional association get caught up in the swirl surrounding the for-profit, commercial publication companies, such as Elsevier. Yes, the ACM volunteers want to maintain a revenue stream, but to support and sustain good works for the community, not to generate a "profit".

I hope I've answered your questions. I urge you to take a look at several articles that have appeared in CACM related to the open access issue if you haven't already done so:

http://cacm.acm.org/magazines/2009/7/32075-open-closed-or-clopen-access/fulltext

http://cacm.acm.org/magazines/2010/2/69353-open-access-to-scientific-publications/fulltext

http://cacm.acm.org/magazines/2012/5/148564-fair-access/fulltext

I largely agree with them, and as such they also represent my position on the topic. Of course, the environment is dynamic, and new ideas are likely to emerge. I think the important thing for an officer of our association to do is maintain an understanding of and appreciation for the full context of the situation.

Secretary/Treasurer

George V. Neville-Neil, Neville-Neil Consulting: "At the moment this entire question is being gone over by the Publications Board of ACM. They are meeting this June to talk about this issue as well as others. This has not been an area of ACM policy that I have been involved with in the past, but I agree that it's extremely important, not only to authors, but to the organization as a whole. I remain open minded about what the policy ought to be in the future, and am interested in seeing what the publications board comes up with as a recommendation. Having published several articles, and a monthly column, with ACM I have to say that I do not find the current system to impose unnecessary strictures on my ability to share my work or for others to gain access to it. [After a short exchange concerning arguments for open access:] Thanks for the pointers, I've looked them over and they're certainly food for thought. I'll keep these in mind as and when I get to see what the publications committee comes up with. I suspect that if ACM does move to a similar model to USENIX that this will take time as there are actual financial questions to deal with in this area. While the cost of publishing has diminished, there remain costs other than printing and shipping paper that ACM has to deal with. Figuring out a path from the current model to a more open one is certainly something I'd be involved with if I were elected as Secretary/Treasurer."

Vicki L. Hanson, University of Dundee: "I appreciate your thoughtful questions put to ACM candidates for election. The issues you raise have been, and continue to be, extensively discussed within ACM. ACM’s Publications Board regularly considers questions of licensing and open access and strives to continue with its high quality service while providing authors rights to their published work. As you are likely aware, the Pubs Board Chairs published an editorial in the October, 2011 issue of CACM about ACM’s copyright policy. Since that editorial, ACM has made available the Author-Izer service that allows authors to put a link on their personal or institutional web page that will enable anyone to download the definitive version of published papers from ACM’s Digital Library (DL) at no charge. This service also makes available the display on these personal and institutional pages of ACM's up-to-date download and citation statistics for the publications. ACM is exploring the implications of allowing authors to retain copyright, transferring a license to ACM for archiving, indexing, and electronic distribution. It is worth noting that such a change, according to my understanding, would make it somewhat easier for authors to distribute their work but would preclude ACM from protecting those works from plagiarism and unauthorized distribution by other entities including for-profit ones. The current policy must be reviewed, weighing the importance of such protections and other author needs. The fully open access issue is more difficult still and requires a careful consideration of business practices and organizational sustainability. There are substantial costs involved in publishing and maintaining the high quality archival collection of materials provided by ACM’s DL. I agree with the Pubs Board’s resistance to the author pays model of open access in that this does not allow poorly-funded authors to have the same access to publishing as well-funded ones. An economic model that places the financial burden on conferences for proceedings publications similarly tends to place financial roadblocks to publication for those less able to pay. This latter model also does not address larger questions of how the DL would be funded to support journals, educational materials, and other non-conference content. The current ACM business model attempts to gives authors flexibility and rights to make their work available to the community while, at the same time, being able to provide the DL service for aggregating articles, collecting bibliometrics, and investing in further development of the DL as a resource for the computing community. I realize that the above answers are not the definitive answers you might have sought in your questions to me. At this point in time, the issues you raise are critical ones for the future of ACM and continuing dialog is needed to consider the best way forward in terms of meeting the needs of authors and readers of DL materials as well as determining a sustainable business model that will allow authors and readers continued access to the DL, an important resource for ACM’s community of researchers and practitioners."

Members at Large

Radia Perlman, Intel: "I'd like to hear arguments on all sides before having a cast-in-stone position. Some companies have worked out an agreement with IEEE and ACM for something like what you said...that ACM has non-exclusive right, but the authors also get to post and distribute. So that implies, I think, that it wouldn't be totally detrimental to ACM to do that for everyone. Some conferences post the papers online, freely accessible. That seems like the right thing to do. Going beyond the rights of authors (and/or the company they worked for at the time they wrote the paper) having the right to post and distribute, I think the model of only letting people see the title and abstract of papers, and then having to pay to download the article, is really bad for facilitating research. When one is doing research, and browses on the web, and finds a 15 year old paper that looks like it might be relevant, but you have to pay $25 to download the paper, only to find it really is not relevant... A lot of companies and most universities have a blanket access to ACM and IEEE publications, so people at those companies probably don't notice the issue. I wonder how much ACM depends on revenue from people downloading papers. Especially really old ones. Perhaps a compromise might be to say that after, say, 3 years, the articles should be free. Anyway, my heart is in having everything easily accessible on the web, for free. I wouldn't care, as an author, whether I could distribute the paper or just a link to the paper, as long as the link allows the person to see the whole paper. For facilitating research, my inclination would also be for anyone to access all the published papers, without having to get a link from the author. [...] But as I said, I'd like to hear other points of view and legal/economic issues that I may not fully appreciate, before getting too entrenched in a position."

Ricardo Baeza-Yates, Yahoo! Research, Barcelona/Santiago: "In general I am in favor of open access models and giving the author more control of their copyrights. On the other hand we need to do this without jeopardizing the financial stability of ACM."

Feng Zhao, Microsoft Research Asia, Beijing: "My platform is primarily around building a sustained and quality engagement between ACM and the regional computing community in China and the rest of Asia, building on the tremendous momentum of the Council's China and India initiatives. As part of that, I felt it is important to lower the cost of access for people from the developing regions. I have not really thought through the copyright issue at any depth. But one thing is clear. The old model of publication, dissemination, and monetization is broken in the online world today. If elected, I will work with the Council to study and innovate on ways that can expand the ACM reach and at the same time ensure the financial sustainability of the society."

Eric Allman, Sendmail: "This is neither my area of expertise nor do I have all the information (particularly about finances), so I do not (as yet) have a strongly held position on this. However, I don't understand why it is necessary for ACM to actually hold copyright as long as it retains the rights to use the materials in the ways that it already does. In particular, as I read the copyright policy, the authors retain the right to privately publish the materials on non-ACM web sites, so the usual financial argument about the Digital Library doesn't seem to fit here. It also seems clear to me that research that was funded with public money should be available to that public with no more than a cost recovery fee. Obviously not all authors are funded by government grants, and the ACM audience transcends any particular government, but trying to sort articles on this basis seems excessively complex. I'm also a supporter of the concept of replication to maintain long-term integrity and retention of archival material, which is antithetical to centralized administration. Note that I'm not saying that the DL is superfluous or needs to be free. The DL provides value through indexing, providing a stable reference copy (URLs are notoriously unstable), and assisting ease of access. Maintaining the DL is not without costs which need to be recovered, and any reductions in revenue resulting from changing the copyright policy must be balanced in some way. Fiscal responsibility is important."

Mary Lou Soffa, University of Virginia: [not yet responded]

PJ Narayanan, IIIT-Hyderabad: "I personally believe the authors should have all rights to distribute their work and hence should hold the copyright. ACM as the publisher and maintainer of the electronics library should have non-exclusive rights to distribute the content."

Eugene H. Spafford, Purdue University: "Well, I'm not expert in ACM's policies, so I am not sure I am the best person to ask right now. However, I'll try to answer. My understanding is that there is a publications board that considers ACM policy for copyright. It is regularly reviewed. I know there have been many changes during the time I've been a member, in response to changing times, needs, and user requests. I haven't heard of any problems with the current policy, and things seem to be working okay. So, I'll assume that the current policy is appropriate until presented with evidence indicating otherwise. [In response to whether authors should retain the copyright:] ACM is not like USENIX -- I know, as I was a member of Usenix for 25 years. ACM publishes journals and maintains a curated digital library that must be supported over a long time to be of real value. The Usenix model is okay for some conferences, and for authors to maintain for a limited period of time, but that is not the same as immutable copies maintained in a curated collection, indefinitely. The current model seems to work fine, so, that gives a proof by example. [In response to whether publications should be open access:] Please define 'open access' and what it provides that the current model does not. Does it provide the necessary support and resources to maintain and enhance the ACM digital library in a global environment for an indefinite time? I'd then want to see a response from someone in ACM about the current model. I'm open to considering changes, but I need complete information to understand the issues and potential effects."



Update: Candidates' positions on open access from two years ago. A couple candidates are running this year as well.



Notes: You can read more recent discussion of open access here and here. Thanks to all the candidates for taking the time to reply. Thanks to George Porter for suggesting this.

Jellyfish: Networking Data Centers Randomly

People have been designing communication network topologies for more than 150 years. But usually their structure is quite constrained. Building a wide area network, one has to follow the locations of cities or railroads. Building a supercomputer, a regular structure enables simple, deadlock-free routing. Building a traditional data center network, one might use a tree-like design amenable to the Spanning Tree routing protocol. Even state-of-the-art high-capacity data center networks might use a multi-rooted tree like a fat-tree.

3-level fat-tree · 432 servers, 180 switches, degree 12

In our Jellyfish paper appearing this week in NSDI 2012, we're proposing a slightly radical alternative: a completely random network.

Jellyfish random graph · 432 servers, 180 switches, degree 12

This project, the work of Ph.D students Ankit Singla and Chi-Yao Hong, along with Lucian Popa of HP Labs and myself, has two goals. First, high bandwidth helps servers avoid bottlenecks while streaming big data across the network, and gives cloud operators the agility to place virtual machines on any physical host without worrying about bandwidth constraints between hosts.

Second, we want a network that is incrementally expandable. Cloud service providers continually expand and modify their networks to meet increasing demand and to reduce up-front capital expenditure. However, existing high-capacity network interconnects have relatively rigid structure that interferes with incremental modification. The fat-tree, hypercube, butterfly, and other proposed networks can only be built from a limited menu of fixed sizes and are difficult to expand by, for example, adding a rack of servers at a time. Of course, there are some workarounds: one can replace some switches with ones of larger port-count or oversubscribe them, but this can make network bandwidth constrained or uneven across the servers. One could leave ports free for future network connections but this wastes investment while they sit idle.

Our solution is to simply give up on structure, and build a random network among the network routers. This sloppiness yields significantly more flexibility than past designs: a few random link swaps is all it takes to incorporate additional components, making a new random network. It can naturally support varying port-counts, and scales to arbitrary sizes rather than limiting the network to coarse design points. In fact, we show in the paper that Jellyfish reduces the cost of incremental expansion quite substantially over a past expansion heuristic for fat-tree-like (Clos) networks. Intuitively, Jellyfish makes network capacity less like a structured solid and more like a fluid. Coincidentally, it also looks like a jellyfish.

Arctapodema jellyfish · Bill Curtsinger, National Geographic

At this point, one natural reaction is that a completely random network must be the product of a half-deranged intellect, somewhere between 'perpetual motion machine' and 'deep-fried butter on a stick'. Won't network capacity decrease, due to the sloppy interconnections? How does one route packets through a completely unstructured network? Isn't it possible to randomly pick a bad network, or for failures to cause problems? How do you physically cable up a network that bears more than a passing resemblance to a bowl of spaghetti? How could it possibly work?

The first surprise is that rather than sacrificing bandwidth, Jellyfish supports roughly 25% higher capacity than a fat-tree of equal cost. That is, a completely random network makes more efficient use of resources than a carefully-structured one. The intuition is that Jellyfish's diverse connections — in theoretical terms, it is a good expander graph — give it low average path length, which in turn means that sending each packet across the network takes less work. In fact, there is reason to believe that the random graph is pretty close to the best possible network for maximizing throughput, perhaps within 10%.

Routing is another interesting question. Chi-Yao and Ankit found that load-balanced routing works well as long as the routing protocol provides sufficiently high path diversity, as can be obtained with OpenFlow, MPLS, or other recent proposals that go beyond STP or simple shortest path routing. In addition, it turns out that the questions of consistent performance, resilience, and cabling have favorable, and we believe reasonably practical, answers. There are some very interesting theoretical questions that come up as well, which we're now looking into.

Jellyfish is sufficiently unlike past designs that implementation challenges will certainly arise. But so far it seems like an unstructured network just might work. And happily, rather than running into a tricky tradeoff, the two design goals of high bandwidth and incremental expansion appear to be satisfied by one network.

Congratulations to Ankit and Chi-Yao who have done (and are continuing to do) great work on this project. Don't miss Chi-Yao's talk in the Thursday afternoon data center networking session! Finally, thanks to NSF for supporting this research.

Like a spider's web

“It is anticipated that the whole of the populous parts of the United States will, within two or three years, be covered with net-work like a spider's web.”

— Illustration and quote from The London Anecdotes, 1848
Quoted in The Victorian Internet

Google+ vs. Facebook engagement

Here's a little statistically-insignificant self-experimentation, based on 51 near-simultaneous posts to both Google+ and Facebook, from the beginning of September 2011 to the present.

"Engagement" is the number of unique people (excluding me) who responded, either by commenting, liking or +1'ing the post, or liking or +1'ing a comment on the post. (The scatterplot points are perturbed slightly from their true integral values so they don't completely overlap.) What's remarkable here is how coincidentally similar the engagement is on the two networks — the difference is under 2 percent (!) despite the fact that my social network on Facebook is currently 2.25 times as large as on Google+.

Google+ has very slightly higher engagement on STEM-related posts (science, technology, engineering, and mathematics), while Facebook is slightly higher for other posts, but the differences are well within 95% confidence intervals.

It's possible my social network has somewhat shifted to Google+. Here is the post set split into five chronological partitions with 10 or 11 posts in each.

Engagement as defined above excludes re-sharing posts, because I wasn't confident the two social networks are reporting these in the same way (e.g., do they both report recursive shares?). But there is some interestingly significant difference in sharing behavior on Google+ with STEM posts seeing nearly seven times as much sharing as non-STEM posts in this very small data set, an effect which didn't appear on Facebook.

Of course, all of this is specific to my social network, and really, the sample size is too small to draw any conclusions at all. Now, if someone were to compare posts for a large number of people that cross-post publicly to Facebook and Google+, that could start to get interesting...