Saturday, March 04, 2006

the many faces of data mining

Data mining is a term that covers many different techniques for finding out things you want to know, given a big vat of data. Anyone who has glanced over their own bank statement, looking for inconsistencies or unfamiliar entries, has a taste of how to do data mining-- using computers just expands how big and how complicated a statement you can search around in.

There are whole scientific conferences about data mining, and some of its uses are quite good, e.g. identifying patterns of buying which indicate that your credit card may have been stolen. A close relation of mine did indeed have his credit card stolen (in a fancy-shmancy suburban mall), and before he even knew it was missing, the company was calling him to verify whether he had indeed gone on the suspicious spree that had begun. The mess would have been much worse if those patterns had not detected the theft within half an hour.

On the other hand, like any other useful technology, data mining can be used for not-so-good purposes. Take this family who got some of their assets frozen for--horrors!--paying off their credit card balance. How is that a terrorist activity? Did anyone call them first, as was done for the fraud alert? Most Americans put up with the typical commercial uses of data mining, such as having credit card companies produce neighborhood-based information on who is more likely to buy this or that magazine. But having such information gathered, monitored for terrorist activity, and used to suspend basic freedoms without a warrant or a clear method of redress? That's going too far.

I watched some of the current Attorney General's testimony to Sen.'s Specter and Leahy. Bad for my digestion, but revealing in spite of the insistent refrain that he couldn't say anything about what is actually being done. What did the AG reveal? Excuses, excuses, excuses, mostly based on technological mystification and fear-mongering.

The technological excuse is pathetic. The dredging up of our 9/11 tragedy to excuse poor governance is worse.

Let me explain:

On the technology side, first of all, how hard is it to get warrants up to three days after your system has detected signs of suspicious activity, from a court which almost never denies them?

Building those systems isn't easy, which is why Poindexter was shopping around for research teams from top American universities and commerical labs. Any computational smarts you put inside, from neural nets to probabilistic decision algorithms, would be hard to explain to a lawmaker, because not many of them were computer science majors. To build such a complicated system you've got to lay out the criteria first, though, to define how to tell if it works. So say the designers want to cast the telephone net wider than a rule like, let's snoop on anyone who's called a suspicious number.... instead, let's snoop on anyone who has called anyone who's called one of those numbers. Logical, right? That part is not hard to discuss with a lawmaker or enforcement officer. Those officers might even be providing some of your new rules based on cases they themselves tracked down.

Apologists might claim, oh gee, these new techniques give us so many new leads to pursue, we might lose them all if we wait to file the paperwork! But this is not 1920, okay? We're not talking manual typewriters. Who in this country has not received a form letter? The sender writes some text, the computer fills in the name, boom, there you go. That kind of technology is very easy to create. So the NSA would not need an army of junior scribes to handle due process on computer-discovered (data-mined) leads, they would just need to work with the FISA people a bit along the way to establish what kinds of evidence are credibly "fishy"-- then, if two hundred people fit fishy pattern A3, and 75 fit fishy pattern C2, requests for warrants on those bases could be composed, printed, faxed, you name it-- automatically. Including as much detail as would be required for each case, or streamlined according to the court's guidelines.

So don't let some non-email-reading appointees snow you with their tales of advanced technology requiring a break with the rule of law, because they're just showing how little they understand the technology they're trying to use.

Television is a new technology, and after it was widely adopted, some adjustments had to be made to integrate it into our system of legal protections against abuse of power. Didn't we have a recent case where the Bush administration wanted to bend those rules, too, and got caught paying to disguise policy advertisements as independent news? New computational discoveries can and absolutely should be used to help fight crime and defuse threats. These technologies are powerful and flexible, thus they absolutely should and definitely can be implemented while respecting the Constitution and the rule of law.

Finally, any time the Attorney General or any other appointee claims that the 9/11 attacks broke down our Constitution and showed it to be somehow out of date, remember: Senator Russ Feingold had it right. Bush is living in a pre-1776 world. No new search algorithm will make the President read the alerts his experts deliver to him, no data processing breakthrough will get him to leave his vacation and take the helm of a country under threat, unless he chooses to.

The warrantless wiretapping debate isn't just about new technology. It's about an elected leader scorning our civil liberties and our Consititution. This President cares even less about the rule of law than he does about doing his job.