Today I Lift from the Book

Okay, so I'm not so bright. After posting the the previous item on Amazon's search feature, I went and used it more extensively.I didn't quite realize that it was presenting full book pages. The Author's Guild has sent out a note to its members, which includes me, warning that the system actually allows not just contextual results -- my first thought at seeing the search results -- but also entire pages. Many pages. In fact, with a little poking, you can retrieve basically entire books. For reference books -- cooking titles, computer books, travel books, etc. -- this could devastate sales. I mean, if you can read the five pages you need, why buy the book? Of course, this points out the flip side: many books, including most of the ones but not all of the ones I write, have marginal utility for the reader and maximum utility for the bookstore but only marginal return for the publisher and marginal return for the author. That is, the folks who make the most money with the least capital are the folks selling books. The other steps in the chain have more marginal returns, requiring higher volumes of sales to be viable. This isn't saying that booksellers are ripping us off or have it easy; rather, that their part of the value chain has the highest return on capital where capital is being expended. (Authors' ROI is harder to measure: are we trying to make a living, buy a house, earn a specific dollar wage? My return on capital is pretty vast, but that doesn't equate to making a great living from it.) One way I've tried to get out of this loop has been through discussions over the last few years about launching a publishing company that would have its primary focus on short, niche titles, sold electronically in small volumes at a low price. Adam and Tonya Engst, publishers of TidBITS, have launched such a venture: the Take Control series. I've known the Engsts for more than a decade, and have had many talks on this subject with Adam, with whom I've co-authored two editions of The Wireless Networking Starter Kit. The Take Control series has a few unique aspects: First, the Engsts run a weekly newsletter which has tens of thousands of subscribers. Second, Adam is one of the best-known Mac people, just below a couple of Apple employees, like Steve Jobs. Third, the Engsts are trustworthy and have assembled a bunch of writers who sell lots of books and have a lot of activities already that give them a chance to promote what they're doing. The first Take Control book was on installing and upgrading to Panther (Mac OS X 10.3). It cost $5. Nearly 2,000 have been sold in under 72 hours -- and that's not the end of the sales of this book by any means. There's no digital rights management on the PDF at all: we're relying on the price and the general utility to make piracy a pointless or at least irrelevant activity. I think we might have a model here.

Today, I Live in the Book

Amazon launched its book-searching feature today; we were talking about this idea all the way back in late 1996 when I worked there as catalog manager. It's so cool to see it come to fruition.My friend, old boss, current officemate, and colleague Steve Roth had this idea way back in the mid-90s: why not have a site at which you could search fulltext, see a little context, and then buy the book? It took a long time for rights, technology, and integration to make it happen. I've been using O'Reilly's Safari Bookshelf for a few months, and it's a similar idea taken a step further. For a fee per month, you're licensing the rights to search and read any page in a book on their site up to a certain number of books at one time. You can search for free, actually, and the results are useful because they show context. All of this is to the good for authors: it allows our work to be seen as useful in context, and to increase sales based on utility. What's the deal with the title of this post? When I worked at Amazon, we had gotten this email from Japan asking something that I can't recall. But it opened in bad translation as something like "Today, I live in the book." The rest escapes me but was equally beautiful and senseless.

How to Beat Bayes

Spam is an adaptive virus: we only see the successes, as more and more filtering wipe out the less adaptive versions. Lately, I've been seeing an increasing amount of spam that's passed through three layers of filtering, two of them involving Bayesian notions of word frequency. This new spam has a bunch of randomly created word-length text strings. The subject lines have punctuation introduced in strange places so that the words are legible, but they don't "read" as words. (Of course, an easy parsing solution is to normalize words and then run filters against them.)Obviously, this is the latest end-run around the latest spam innovation. It shows that Bayesian filtering, while a wonderful idea, has its limits because of spammers' cleverness and adaptability. Ultimately, these exercises show that no matter what algorithm we use, spam will still filter through. (I'm still seeing Nigerian variants, which amazes me.) The next approach is going to be digital certificate-based: you can't forge those, and you prevent non-trusted sources from connecting. If you put certificates on the mail servers -- and make sure that VeriSign isn't the only company controlling the issuing of these certificates, but that non-profits and other organizations can be root certificate authorities -- then only mail servers configured with them will be able to exchange email with other servers. It'll be tricky, but I believe the next change in the net will come that way. Technology and legislation aren't stopping spam. Digital certificates could dramatically reduce it because of the ability to revoke certificates, eliminating an entire mail server from a system without requiring a blacklist. (Yeah, and then who decides to revoke certificates? And on and on.)

The Latest Kind of Comment Spam

I discovered this very reasonable sounding comment just now on my Wi-Fi news site:"We live in strange times, but someday I think we will look back on all of this and marvel at how crazy it was. God, I hope so. I sure wouldn't want this insanity to become the norm." Unfortunately, it was totally offtopic. The URL of the poster was a scum site, trying to get Google Whuffie.

Killing Movable Type Comment Spam

Jay Allen gets my vote for supreme arbiter of goodness for this page which documents installing a variety of plug-ins and templates inside Movable Type to block the display of comments which contain URLs repugnant to you. It's not a complete solution, but it does mean that the idiots and scum who have started to spam the comments section of Movable Type (and other) blogs can be suppressed.It's based on looking for domain names and URLs in the posts and author info in comments, which means that you get the spammers where they leave. They can circumvent all kinds of content restrictions, but in comment spam, they have to link you somewhere. This shows off the power of Movable Type's extensible architecture. Hooking in Jay's mods took a few minutes. I just had to install a few simple plug-ins, copy his template, and add a bit of If..Then code into the comment templates, and voila!