Reflections on the 25th Anniversary of Spam

While many only encountered spam (junk e-mail or junk newsgroup postings) in the mid 1990s, my research has found it goes back much further than that.

In fact, the earliest documented junk e-mailing I've uncovered was sent May 3, 1978 -- 25 years ago this month. (It was written May 1 but sent on May 3.) And in a surprising coincidence (*), just a month ago marked the 10th anniversary of March 31, 1993, the first time a USENET posting got named a spam.

I learned of that first spam through a report from Einar Stefferud who read a history I prepared of the term "spam" and how the name of the canned ham became our name for junk e-mail. I had original set out to research the history of the term, but it became impossible not to research a bit of the history of the act.

That first spam(*) was sent by a marketer for DEC - Digital Equipment Corporation. Today, you may not know DEC, since it was bought by Compaq and is now a unit of HP, but in those days it was the leading minicomputer maker, and its computers provided the platform for the development of Unix, C and much of the internet, to cite just a few minor events.

By 1978 the Arpanet (as the internet was then known) had already provided network E-mail to a large number of folks at universities, government institutions and tech companies for over 6 years. E-mail was the biggest source of traffic on the Arpanet. A few years prior, Dave Farber had created "MsgGroup," the first network mailing list. (Though Plato and other timesharing systems had laid the foundations for online community and conferencing some years before that.)

The DEC marketer, Gary Thuerk, identified only as "THUERK at DEC-MARLBORO" (There were no dots or dot-coms in those days, and the at-sign was often spelled out) decided to send a notice to everybody on the ARPANET on the west coast. In those days there was a printed directory of everybody on the Arpanet which they used as source for the list. The message trumpeted an open house to show off new models of the Dec-20 computer, a foray into larger, almost mainframe-sized systems.

This was a spam, though the term would not be used to refer to it for another 15 years. Thuerk had his technical associate, early DEC employee Carl Gartley, send the message from his account after several edits. Alas, at first he didn't do it right. The Tops-20 mail program would only take 320 addresses, so all the other addresses overflowed into the body of the message. When they found that some customers hadn't got it, they re-sent to the rest.

As you can guess there was quite a response, with (as is typical) far more volume of debate than actual spam. It's amusing to see that one future celebrity -- a young free software guru Richard Stallman -- at first wondered why people were so upset about the message. He later said the mistaken placement of all the addresses into the body did bother him, but he gets the dubious honour of being perhaps the first spam defender. Of course like all of us he was 30 years younger and the problem was brand new.

In those days the Arpanet had an official "acceptable use policy" which limited it to use in support of research and education. So this message was a pretty clear violation, and the DCA, which ran the Arpanet, gave a very stern call to Thuerk's boss about the matter. The policy was well enough known over time that we would not see significant spam for many years to come after that.

More detailed history

You can read my history of the term spam and how it came to mean abuse of the net.

You can also just go directly to the spam and the reaction to it, as well as more from my recent conversation with Gary Thuerk. You can also go directly to Stallman's defence of spam.

This site contains a number of essays on the spam problem, which I have been studying for many years, trying to find solutions which don't destroy the core values embodied in the mail system. In spite of what some may feel, we wanted a extremely cheap e-mail system where anybody could mail anybody, which protected anonymous communication and fostered values like free speech, the ability to do unsolicited communication. Those are not bugs, so fighting spam while keeping those values, along with other core social goals, is a delicate task.

You can read my current best plan to end spam if your interests lie that way. Other essays can be found at my spam essay page.

Escalation of the battle

Spam fascinates me because it sits at the intersection of three important rights -- free speech, private property and privacy. It was also the first major internet governance issue (possibly in tandem with DNS) that the members of the internet community developed a deep concern over.

The reaction to it has been remarkable. By attacking something we hold dear, and goading us by using our own tools and resources to do it, spam generates emotion far beyond its actual harm, even though that actual harm is quite considerable.

Spam pushes people who would proudly (and correctly) trumpet how we shouldn't blame ISPs for offensive web sites, copyright violations and/or MP3 trading done by downstream customers to suddenly call for blacklisting of all the innocent users at an ISP if a spammer is to be found among them. People who would defend the end-to-end principle of internet design eagerly hunt for mechanisms of centralized control to stop it. Those who would never agree with punishing the innocent to find the guilty in any other field happily advocate it to stop spam. Some conclude even entire nations must be blacklisted from sending E-mail. Onetime defenders of an open net with anonymous participation call for authentication certificates on every E-mail. Former champions of flat-fee unlimited net access who railed against proposals for per-packet internet pricing propose per-message usage fees on E-mail. On USENET, where the idea of canceling another's article to retroactively moderate a group was highly reviled, people now find they couldn't use the net without it. Those who reviled at any attempt to regulate internet traffic by the government loudly petition their legislators for some law, any law it almost seems, against spam. Software engineers who would be fired for building a system that drops traffic on the floor without reporting the error change their mail systems to silently discard mail after mail.

It's amazing.

Dozens of anti-spam companies have sprung up in the past few years, offering a range of solutions including content-based filtering, blacklists, collaborative filtering, spamtrap detection and removal, e-stamps and some bulk detection. Remarkably, one new company called Habeas (trying an old idea of mine) is selling not a spam-blocking service, but a magic trademarked term that will let legitimate mailing list owners get past the collateral damage caused by existing spam-blocking tools. Their product is to get you past the spam filters.

Attempts to nail down a definition of spam seem to always end in quagmire. Each party to the debate seems determined to make sure that everything they think is spam be included in the definition, lest one spam slip through, but of course also keen that nothing they don't think is spam be blocked. Reconciliation seems near impossible.

The solutions

Here's a brief summary of some of the current active methods and proposals and how effective they are.

Content-based filters

Filters have a big advantage because they only need to be installed at the receiver. Some of the latest filtering tools, like SpamAssassin and the latest Bayesian algorithms are doing quite well in terms of the amount of spam they stop. However, they all have "false positives" which means they falsely identify real mail as spam, and block it. Most filters have no way to identify that mail was sent in bulk (the core requirement to spot a spam) and thus must rely on finding common patterns used by spammers.

The hand-tuned filters need regular updating by people. The learning filters adjust automatically but only by letting some spam through.

In terms of effectiveness, these are 2nd only to challenge/response tools.

Blacklists

There are many competing blacklists, some of strong ethics, others more dubious. Nonetheless all rely on blocking mail from accused or confirmed spammers, with debate over the standards of proof and the definitions of spam. Some have gone so far as to blacklist entire ISPs or even nations. Almost everybody who runs a mail server, it seems, has a story about getting on a blacklist and having to figure out how to get off, if they were able to.

Blacklists certainly do scare ISPs, and the blacklisting of open relay servers had, over the course of many years, done a lot to get people to close up their relays (at the cost of making it harder for roaming users to send E-mail.)

Collaborative filters

These filters, such as Vipul's Razor (now via CloudMark) rely on the first poor souls who get a spam reporting it to a central server. As the reports come in, the spam can be identified and rules can be written to block it. These are reasonably effective, and go after bulk, which is good. They have fewer false positives if done well. They are very similar to...

Spamtrap filters

These are primarily used by BrightMail Inc., which is probably the largest commercial anti-spam operation. Brightmail maintains huge numbers of addresses seeded onto spammer lists. When messages arrive, they are almost surely spam, and human beings look at them to derive rules to filter out and retroactively delete the messages. Very few false positives, but unfortunately reportedly only about 60-70% effective.

Challenge/Response

Dear to my heart because to the best of my knowledge, I wrote the first of these, a never-productized program called Viking-12. These tools know all your existing contacts, and when they receive mail from a new correspondent, they send out a "challenge" E-mail that asks the mailer to do something to confirm they are a real human being and not a spammer. When they do, the held mail is automatically delivered and they are on the good list from then on.

These tools are extremely effective; only a few spammers ever respond to the challenges. However, for various reasons some legitimate correspondents also don't response, so it is necessary to browse the list of messages that never got a response to quickly search for the real messages. However, they are few, and they usually have low spam scores when used in combination with filtering tools. This can get the false positive rate extremely low.

Challenge/response without scanning the non-respondents blocks anonymous mail.

Today several companies offer them, and there are free software projects like TDMA which perform this function. A number of research projects have developed what could be termed "Turing tests" for the challenges, to assure that the respondent is a human being.

ISP bulk detection

A number of large ISPs, AOL in particular, have their own spam detectors which rely on the fact that due to their size, they have so many addresses that any spam attack is sure to arrive multiple times. They can thus detect these and get rid of them. A good approach, but past history shows some nasty false positives, with ordinary mailing lists being blocked for their volume. One notorious case involved AOL blocking acceptance letters from Harvard, which were sent out as a highly desired mass mailing.

This is a worthwhile technique but needs to be done with more care. Today's collateral damage is too high.

Spam-banning laws

There have been may proposed anti-spam laws, and indeed around half of all U.S. states have such laws -- California has two! While most of these state laws will eventually be declared unconstitutional since it is important that states not have the power to regulate something as geography independent as E-mail, what can't be disputed is that they are having essentially no effect. There have been very few prosecutions under them, and spam levels continue to increase tremendously. Some hold out more hope for a U.S. federal law, however an increasing percentage of spam comes from overseas. Advocates hope that even overseas spam can be stopped by a federal law if a U.S. connection can be found. Fellow EFF board member Larry Lessig advocates that a law which pays a bounty to those who hunt down the U.S. connection on any spam without a mandatory label could do the trick.

Torts

There's been a fair bit of success against big institutional spammers in tort law. AOL and other companies have sued spamming companies using a variety of torts to shut them down. So far, alas, like Whack-a-moles, other spammers keep coming up. However, there have also been disturbing trends in the tort area. For example, Intel has sued an ex-employee who spammed Intel's entire employee base with his grievances against the company using a legal doctrine called "trespass to chattels." Unfortunately, the consequences of declaring E-mail to be trespass are even nastier than spam.

A large number of spams are already illegal, of course, amounting to confidence tricks or illegal selling of prescription drugs. Some of those laws are being used against the spammers too.

Opt-out lists

Recently, a federal do-not-call list was instituted in the USA to stop phone spam. Unfortunately, doing the same for E-mail is difficult and faces the same problems all laws would.

Hiding your address

The most common technique today seems to be hiding your E-mail address so that it can't be harvested by spammers. Unfortunately, by using dictionary attacks, they are managing to spam people who have never exposed their E-mail in public. I consider this desire to never reveal your E-mail one of the greatest damages done by spammers, so I don't view hiding as a great solution to the problem.

Vigilante attack

Some anti-spammers have resorted to harassment and even extra-legal efforts against spammers. They make a great tale to tell, but so far do not seem to be stemming the tide. And they have all sorts of nasty backbite, since they amount to sinking to or below the level of the enemy.

Up and coming solutions

E-Stamps

This idea is regularly re-generated. I first proposed it myself back in 1995 and later came to reject it. The idea is to put some low (or routinely not collected) cost on sending an E-mail that does not bother ordinary senders, but stops spam from being cost-effective. It has many advocates, and might work if it could be universally adopted. However, it suffers from a "you can't get there from here" problem. Until people are offering stamps with their E-mail, you can't demand them, and they have little incentive to offer them if nobody is demanding them. This technique could only be built by piggybacking on other techniques, such as doing challenge/response and offering stamps as a means to bypass the challenge.

Throttling bulk volume from unaccountable addresses

My current favoured proposal, detailed here.

Authentication

A number of people on anti-spam lists propose putting an authentication regime into E-mail, to the extent that one could refuse mail that was not digitally signed or otherwise verifiable. This would stop forging return addresses or the use of non-existent return addresses. A number of laws also address this.

Such schemes unfortunately abandon the longtime goal of an open E-mail system without central management (such as a certificate authority) which allows anonymous speech.

The Future

The spam problem will get worse before it gets better. Spammers will try new and nastier techniques to get around the blockers, and the blockers will try new and improved technologies. Spammers are already moving to even nastier techniques, such as using worm programs, or exploiting windows in widely deployed software systems to take over other people's machines and get them to do the spamming. It is rumoured that some spammers are using some of the wide number of open wireless networks to drive up to a building and spam using the network inside. Such tactics can't be countered with blacklists, for example, though they are fortunately highly illegal.

However the spam problem is solved, or partially solved, it will remain fascinating as the internet community grapples with its first serious abuse issue from within. Most other abuse issues have involved outsiders, ranging from the religious conservatives trying to ban smut to the RIAA trying to stop file-sharing, trying to regulate the net. Spam has caused the network insiders themselves to seek to regulate it.

This is important because it will, of course, not be the last such issue. How we manage ourselves here will be an indicator of things to come.

I hope that as we do this we will remember the principles that make free societies free, and the principles upon which the internet was built. End-to-end, open designs. The ability for anybody to communicate with anybody, even without an invitation. Ubiquitous, deliberately low-cost communications that are not accounted for on a packet-by-packet basis.

In addition, we must realize that though all internet traffic flows over private property, this does not mean we should forget constitutional principles like the U.S. 1st amendment. As I view it, the 1st amendment isn't just the law, it's a good idea. We owe a duty to preserve the values it contains -- and the long history of how to protect them that is embodied in 1st amendment jurisprudence -- as we architect the communications systems of the future. For in building and regulating the internet, we are doing no less than creating the primary platform for speech and the press of the new century.

That is not a task to be taken lightly.

30 Years Later

Here's an update in 2008, 30 years after that spam. Gary Thuerk has moved on, though he's not quite in hiding.

The spam war has continued as an arms race. Indeed, a controversial topic in the anti-spam community is whether the success of anti-spam efforts has simply caused the spammers to try harder, thus doing more damage. However, today's filters are indeed much better and many people have relatively spam-free mailboxes.

Spam laws (and the many other laws all big spammers break) have convicted some spammers, but much spamming effort has moved offshore, and into botnets. Most spam laws, for all the bluster about them, have done little to dent the problem.

Botnets

One sinister new development is the botnet. Many people believe that most spam is now being sent by zombie computers. These are ordinary PCs, infected with code that puts them under control of a criminal agent who sells their services to spammers and DDOSers. It is suggested that around 30% of ordinary PCs are now botted. If it's not the guy to the left or the guy to the right it's probably you.

Most zombies run Windows, both because of its security holes and its monoculture popularity. Botnets can be used for far more dangerous things than spam, so this is an area of genuine concern. Botnets also make it much harder to use volume based techniques to prevent spam. You can no longer notice that one system is sending a million messages; instead a million systems are sending one message. This, in turn has led to more use of content based filtering -- sophisticated ways of blocking messages that mention Viagra -- which also means more false positives.

It also means that many ISPs have blocked individual PCs from being able to send mail, breaking the neutral internet we otherwise love.

Botnets are a symptom of something far more dangerous than spam. They are becoming major criminal enterprises, and the level of intrusion is staggering.

Leaving E-mail

Spam is one of the reasons that people cite for leaving regular E-mail. I've heard teen-agers say that "E-mail is what I use to communicate with my parents." They use other things, such as SMS and Facebook messaging, to communicate with peers. A departure from open E-mail would be sad and dangerous.

We don't want a system like Facebook to become the e-mail platform for people for two big reasons. One, a single company would control the system. E-mail needs to be open and interconnecting, not proprietary. Two, what people like about social networking systems is they divide the outside world from their friends. We could end up with systems where people take e-mail only from their circle of contacts, and not outside. It would be a shame if the internet's gift to communications turned out to be mail between closed circles of friends.

There were other bulk messages sent before this that remained within a single computer's e-mail system. It was not unusual for such systems to include a "mail to all" function (usually privileged) which was sometimes abused. In 1971, an MIT user sent a message to all people on CTSS that "There is no way to peace. Peace is the way."