What Bandwidth RSS Uses

I ran the calculations on how much bandwidth RSS aggregators are sucking from my Web server by scanning for retrievals of files named index.xml, index.rdf, rss.xml, atom.xml, and scriptingnews2.xml. I looked at just the HTTP code 200 transactions, not the 304 (no modification) retrievals which are just a handful of bytes. The chart is below:
rss_bandwidth_041113
Most of this RSS is for Wi-Fi Networking News; a tiny fraction for blog.glennf.com and a few other blogs. You can see the growth and the weekends pretty obviously--weekends make the most sense as I'm least likely to post updates, so well-behaved RSS aggregators are least likely to get changed files, while ill-behaved ones are more likely to be on computers that are turned off for the weekend. In early October, the weekday average was about 275 Mb; in mid-November, we're up to 375 Mb. (BoingBoing linked to this post, and their own stats: They feed 50 Gb per month of news aggregator feeds -- that's more than they ship in HTML!)

Now my co-location host, digital.forest, has great bandwidth pricing: a buck a gig over the 80 Gb per machine that I have co-located. I transfer about 2 Gb per day in Web site traffic from the machine that's now pushing out nearly half a gig in RSS traffic. I may have to build a custom RSS Apache doohickey that will force a 304 (no change) to an RSS aggregator if it doesn't have an If-Modified-Since tag in its request.

I did a quick look at which aggregators represent the most traffic, and a very small number of users employing lwp-trivial, a perl-based HTTP query system, appear to be using over 10 percent of my RSS bandwidth! Time to fix their wagons, to be sure. It makes sense that various Mozilla browsers that have RSS support are using about 15 percent. NetNewsWire makes a very strong showing of 10 percent of usage lately. (Click image to see the full-sized chart; I dropped out days in which aggregators retrieve less than 7 MB, which is why you see some gaps. You can also see NetNewsWIre's beta 6 adoption curve. If you have a better way to graph this, I'm open to it: click here to download the Excel file which generated this chart.)

I can tell that Mozilla-derivatives like Firefox and NetNewsWire are well behaved because the bandwidth-abusing aggregators don't drop their traffic usage much on weekends; the well-behaved ones drop by about 80 percent. This could also indicate that the poorly behaved ones are more likely to be running on servers instead of on personal computers, too.