Stay Free! magazine



issue 13 issue 14 issue 15 issue 16 issue 17 issue 18 issue 19 issue 20 issue 21 issue 22

t e x tv e r s i o n


Privacy, Shmivacy.
Corporations don't want to know US,
they want to know OUR DATA

[ by Bret Dawson ]

Human? Data?Chuck Petrakis is excited. "Let me give you an example," he says. "Here's Mrs. Smith, who's just been in to see her doctor for a checkup. Say you're her insurance company. You run her records, and the software tells you that she's likely to develop diabetes. Well, that gives you an opportunity to be really proactive."

The software Petrakis is talking about is a new package from Orlando's MedAI. "Chronic Disease Identification," as it's called, represents the cutting edge of healthcare data systems; it crunches through huge volumes of medical records and, using artificial-intelligence algorithms, actually predicts life-threatening diseases. For American health insurers, this is big news.

"As an insurer," Petrakis continues, "you can say, `Well, we need Mrs. Smith to do ten or fifteen things right now. We need her to do something about her weight, we need her to watch her intake of certain foods, we need her to make appointments for blood work,' and so on.

"Most insurance companies already have call centers set up. So you could have a nurse or a clinician phone up Mrs. Smith--to find out what she's been eating, to find out whether she's sticking to her program, to set up appointments for her, to monitor her progress," Petrakis says. "You don't have to wait for her to call; you can get moving right away. And with this software, you can take those steps before she develops diabetes. That's a much more cost-effective way to handle things."

As MedAI's sales director, Petrakis unveiled the software this spring at a medical industry conference. He won't talk numbers, but slyly assures me that CDI caught the undivided attention of several big insurance industry players. He's betting on a big sales year.

"Insurance companies," he says, "are already very proactive. Not just in terms of getting the best possible care but in doing that in the most cost-effective manner possible. And this software lets them take that to the next level."

This, I think, hanging up the phone, is an obscene understatement. Health care in the U.S. is an unabashedly moneymaking undertaking, and HMOs and insurance companies make no bones about their focus on the bottom line. Thus those call centers: When you're diagnosed with an expensive illness in the States, you can expect to be harassed about your lifestyle, lest you cost your insurer unbudgeted-for dollars. In HMO marketing-speak, this is variously called the "wellness" or "managed care" approach.

But CDI isn't just about managing the care of people with serious illnesses. It's about computers deciding who's going to get sick: not by examining patients but by playing statistical games with data. None of this is unique to MedAI; the direct-marketing industry has used similar techniques--massaging personality profiles out of large databases--for some time now.

The real issue here, though, is not that an insurance company can fool around with Mrs. Smith's medical records, or target her for junk mailings. It's about the way our society is undergoing a fundamental shift. It's about how institutions--banks, hospitals, governments--would now rather deal with our data rather than with us. Our digital profiles are taking on traits that may have nothing to do with our real-world selves, and these profiles are now beginning to live our lives for us.

Back in October, Undercurrents (the Canadian media and technology program I used to work for) sent me to Ottawa for a day to attend Privacy International's "Advanced Surveillance Technologies II" conference. In a plain hotel meeting room, I listened to lectures from and rubbed shoulders with some of the rising stars of privacy activism: Phil Agre, Simon Davies, David Banisar, Ann Cavoukian, that Garfinkel guy who writes for Wired. There was a lot of frightening talk, but also a lot of backpatting; people casually talked about privacy issues as "the new environmentalism."

It's true: Those people are certainly succeeding in raising public awareness, if only in the ozone-layer-good-styrofoam-cups-bad way that the environmental lobby has succeeded. So, in true Earth Day fashion, we're facing a rising tide of media scaremongering. Articles in Wired, The New York Times, Details, and a parade of blue-ribbon-bedecked websites all approach the issue in similar ways--they rattle off long lists of ways people can find out stuff about you. Your grocery-store loyalty card tracks where and when you bought what. Surveillance cameras follow you almost everywhere you go. Your bank pays attention to where you use ATMs. Your credit file is widely distributed. And, uh, that sucks.

What's largely escaped the mass-media take on privacy is something called "database enrichment," or "merge-and-purge." This is a big oversight because merge-and-purge turns a person's dataset into a fully formed electronic character sketch.

Here's how it works: Take two or more databases and combine them. (For this example, let's use U.S. News and World Report's subscriber list and a "wealthy householders" list available from any number of national brokers.) Keep only the names that appear on both lists. Now combine the new database with another--say, a list of households with no children--and, again, keep only the names that appear on both lists. As the process goes on, and the list is enriched with more and more information, a sort of superrecord emerges. It's now a list of profiles, a set of personalities, if you will. (This is no idle example, by the way. USNR is currently engaged in a multitrial legal battle with a Virginia man named Ram Avrahami over precisely this practice.)

This business--the creation and sale of electronic profiles--has turned firms such as Chicago's Metromail into multimillion-dollar operations. Metromail got its start back in 1948 as a printer for mass mailings--a "lettershop." Over the following decades, the company branched into direct mail and database marketing, selling mailing lists and list-processing services. Today, it's a multifaceted data-and-marketing powerhouse, with $281 million in annual sales (U.S.), and more than 3,000 employees. It collects and purchases personal data from a variety of sources--public records, surveys, warranty-card registrations, the U.S. Postal Service's change-of-address files, and so on--and sells the data in a variety of forms: mailing lists, reference services, and, of course, merge-and-purge processing. The company's central database now contains records on nearly every household in the United States. If you have a list of names, and you want to attach income, marital status, home ownership, or nearly any other kind of data to those names--in short, you want to buy a set of profiles--you go to a company like Metromail.

Tim Fitzpatrick, the firm's VP of corporate Communications, is a little defensive on the phone. He's heard the privacy lobby's concerns, and he's got answers.

"It's very important," he says, "to keep one thing in mind. This is about finding out what your customers have in common. I mean, marketers may care about knowing me, Tim Fitzpatrick, but they care a lot more about knowing about a group of Tim Fitzpatricks. This is about learning what your customers' needs are, so that you can do a better job of serving them. And this is the fundamental truth: customers' needs are being satisfied. People are voting with their wallets."

If I understand him correctly, Fitzpatrick is saying this: yes, many businesses have detailed files about their customers. Yes, it's easy to enhance those files with outside data sources. No, it's not the Big-Brother threat the privacy lobby would have us believe. Nobody at your grocery store is looking at your individual purchase history and saying, "Uh oh. That's the third time Bret's bought Preparation H this month." Customer data is important--and useful--only in the aggregate. The data traders of the direct-marketing industry (and now, the healthcare industry) aren't interested in knowing you at all. They're happy just deciding which aggregate your profile belongs in, and what it says about your future behavior.

Roger Clarke runs an information-systems consulting business and is a visiting fellow at the Australian National University in Canberra. In 1994, he wrote an article for The Information Society entitled "The Digital Persona and its Application to Data Surveillance." The piece introduced one of the most interesting--and most widely ignored--concepts in all of privacy activism. Here's the basic idea:

Think of the digital persona as the shadow you cast into cyberspace. It's a profile of you that grows more detailed as databases are merged and as you interact with evermore systems. In time, this persona develops its own personality; it makes certain kinds of purchases at certain times on certain days of the week, and it has an employment history. It's been preapproved for a new credit card, and it subscribes to three or four magazines. It owes a bit of money on its Visa, and a lot on its student loans. It uses Sprint for long distance.

It's a chilling thought. A profile with this kind of detail gives away a lot about what sort of person you are. But there's more to it. Your digital persona doesn't just describe you, it is you.

Public life doesn't happen in streets and offices and shops anymore because the arena of public life--birth, school, work, death--has moved. Public life is data. Making a purchase, applying for a job, voting, placing a phone call, buying insurance--these are activities our digital personae now do for us by proxy.

"You can trace it back quite some distance," Clarke says, "through two trends. The first is the increasing intensity of data exchange between people and institutions. The second, which doesn't necessarily involve technology, is a trend toward centralization of authority. In the past, my bank manager had to know me before deciding whether I was worthy of a loan. Well, in practice, that authority is no longer in that manager's hands."

oh nevermind, he'll never make it...just look at this data... Decision-making processes at financial institutions have become so automatic, so data-centric, so disinterested in the personal details of their customers' transactions, that there's no longer any need for physical branches. Our digital persona is now so detailed that machines are in positions to make decisions about our creditworthiness.

"We sometimes say that traditional database analysts sit in smart air," chuckles Rick Makos, the VP of sales and marketing for the Toronto-based Angoss Knowledge Engineering. "They have to build a model, then test it, over and over again. If you come up with the model that works, you must be sitting in smarter air. But it's kind of a backward process. You shoot, then you aim. Our approach is more data-driven. It's more based on the reality that's in the data."

Makos is arguably at the cutting edge of "data mining": a new kind of information analysis that makes plain old merge-and-purge look positively timid by comparison. Data mining uses artificial intelligence software to hunt for patterns (in marketing-speak, "actionable characteristics") in large databases. The basic theory is simple: any large set of data holds patterns, some of which are obvious, and some of which may not be. The goal is to have a computer find those nonobvious connections and then exploit them to your financial advantage. (For example, early data miners at a grocery chain found that people who buy diapers also tend to buy a lot of beer. The result was "Parties for Parents.")

What really distinguishes data mining from ordinary database analysis is that data mining systems don't need hypotheses. They don't need to be asked, "Is it true that people who buy diapers buy more beer?" They're designed to answer tougher, more open-ended questions like, "Who buys a lot of beer?"

Practical data mining is only a few years old. It grew out of the wave of academic research into artificial intelligence and that started in the early 1980s. At the time, algorithms for machine learning--decision-tree generators and so on--existed only as theoretical concepts. But as high-test processors got cheaper, data-analysis firms began to write AI into their custom software. Then, approximately five years ago, some marketing genius slapped the name "data mining" on the process, and a new industry was born. Today, with the proliferation of Sun workstations and Pentium Pro-based PCs, it has begun to show up in everyday business.

Makos has been in the trenches since the early days, doing database-query demonstrations for hardware companies, and then data mining for banks as the owner of his own consulting firm. He joined Angoss in October of last year, working on the company's pride and joy--a data-mining package called KnowledgeSeeker.

Among his biggest clients is the Canadian Imperial Bank of Commerce's "risk management" division. The CIBC uses the package to track the bank's mortgage customers, finding, in Makos's words, "which buckets of behavior yield what results."

Risk management being what it is, data mining at CIBC came to revolve around predicting which types of mortgage accounts were most likely to slip into delinquency. Surprisingly, perhaps, the bank found that people with a history of late mortgage payments were not those who tended to default. It was those who'd always paid on time but were suddenly late with a single check who tended to fall into the deep end of the financial pool. For Jim Carswell, the bank's managing director of credit scoring, the result meant a big shift of priorities.

"When we first discovered this, we thought it was an error," he says. "But then it dawned on us: Almost everyone falls behind once in a while, if you're talking about credit cards or phone bills. But people who take their mortgages seriously take them really seriously. Someone like that isn't going to miss a payment unless they're in some difficulty."

What does this mean for individual mortgage holders? Well, he notes, if you've got a spotless record, and, for once, you're two days late with a payment, you can expect the risk-management division to target you much more aggressively than ever before. "It might make the difference between a form letter and a phone call. Or it might mean that we'd call you today, rather that three days from now."

For reasons I don't quite understand, this grates on me. If my payment record has been flawless, I figure I'm owed the benefit of the doubt when my check's a day or two late. And I certainly don't want the laggard who's always late to have an easier time of it than me, no matter what KnowledgeSeeker thinks I'm going to do. "I understand your point," Carswell says, "and I can see why some people might be uncomfortable with that. But the reality is this: even in this higher-risk group, people still pay you. My position is that, if we talk to people, work things out, maybe spot a problem early, that's better for everyone."

In early March of 1997, a Boston-area woman named Wendy Eldredge found a mysterious envelope in her mailbox. There was no return address on the envelope. Just a typewritten address, a California postmark, and a standard-issue 32-cent U.S. stamp. Inside was a full-page ad, torn from a newspaper, for a weight-loss pill called "Berry Trim Plus." At the top of the page, someone had written, "Wendy, try it. It works!" in blue pen.

"I was just crushed," she says. "Crushed. I mean, I've had two kids. I could lose twenty pounds. But that, oh, man. I was crying leaving the post office, and my four year old was asking, `Mommy, what's wrong?' and I didn't even know what to say to her.

"So then I thought, 'Okay. Some wacko's bought himself a mailing list.' So I got on the phone to Health Labs of North America [the company selling Berry Trim Plus] to ask them if they knew that someone was using their ad like that. And the woman I spoke to said, `Oh, I know. This is one of our advertising campaigns.' Well, I just blew up. I said, `How dare you insult me like that?' And she said, `We're trying to help you.' "

This didn't sit well either, and Eldredge launched a private campaign against Health Labs. By the time it was over, the Boston Globe had run two separate pieces about her situation, and she'd become something of a local celebrity.

Man, check out the data on that guy! I've never met anyone who actually liked junk mail, so it's hardly surprising that Eldredge reacted so badly to the mailing. But it's telling, I think, to look at why she reacted the way she did. Eldredge is a self-described weight-loss candidate, and Health Labs was selling a weight-loss product, so it wasn't unreasonable for her name to appear on the mailing list. It had, no doubt, been cross-referenced across myriad other databases to make sure that her income, housing, marital status, and occupation fit the ideal demographic for Berry Trim Plus. In short, the data was correct.

The problem, I would argue, is that data is not a very good tool for describing real people. The digital persona is a complex thing; as it is merged-and-purged, mined and manipulated, it acquires character traits that may have nothing to do with its real-world namesake. In Eldredge's case, the error was only humiliating, but it's not tough to imagine a situation with much uglier consequences.

Before CDI, Florida's MedAI cut its data-mining teeth with something called the "Myocardial Infarction Predictor." (A myocardial infarction is a heart attack.) This is a software system designed to be used by emergency room doctors to diagnose patients suffering from severe chest pain.

According to the company's president, Steve Epstein, the package grew out of a genuine shortcoming in ER procedures.

"This is the thing," he says. "People tend to be over-conservative with chest pain. They'll spend huge amounts of money running really expensive tests trying to rule out heart attacks." In practical terms, he says, this means that a majority of the people admitted from emergency rooms into coronary-care units are not actually having heart attacks. The MI Predictor's job is to spot those people before they're admitted in to intensive care, before those costly tests are performed.

MedAI isn't actively marketing the MI Predictor. The system was developed, tested, and is now being used at a single hospital in Florida. Regulatory difficulties will probably keep it there for the near future. But the project was really intended as a kind of pilot project, one that set the stage for the nationwide rollout of CDI. That system's goal, Epstein promises, is nothing short of revolutionary change.

"The old techniques are just not good enough. Before artificial intelligence, you only had actuarials--you might only know that five people out of a given population are going to develop an illness. Now, you can know which five it's going to be."

This is pure medicine-by-statistics, a frightening parallel universe where the digital persona's health determines the real person's treatment.

"The digital persona is a bit like a voodoo doll," Roger Clarke says. "A kind of crude model of you that can be used, from a distance, to put a curse on you."

As comforting as it might be to think so, the dangers of the digital persona--its arbitrariness, its inaccuracy--are not just by-products of well-meaning data manipulation. In fact, there's an entire industry--something called "segmentation"--whose sole job is to tack arbitrary personality types onto individuals' datasets.

The idea is to cut the population up into a few dozen categories, and people inside each will have similar incomes, tastes, residences, and behaviors. Find out which segments like your products and you'll know which people to chase with your new marketing campaign.

One of the biggest players in this field is "MicroVision," a product of Equifax National Decision Systems. (Yes, the same Equifax that maintains your credit rating.) The system fits every household in the U.S. into one of fifty demographic categories, each with a nickname like "A Good Step Forward" or "Metro Mix."

If you can give MicroVision a zipcode, the system will tell you which category it fits, and will happily provide lists of other zipcodes where people behave in the same way. And if you want names for mailing lists, well, you can have those, too. By selling these profiles, MicroVision does more than $40 million-dollars' worth of business each year.

MicroVision assembles its segments by merging hundreds of data sources, among them the 1990 U.S. census (which gives average ages, incomes, and home ownership), consumer and financial data from the Equifax credit-rating databases, and a lifestyle survey of 20,000 people which asks about data as specific as restaurants, oil changes, long-distance companies, and TV shows.

The research is unquestionably thorough, but the results are an absolute howl to read. Here's a taste:

People in category #9, "Building a Home Life," are supposedly do-it-yourselfers who spend a good deal of money on home improvement and car repair projects. "They also tend to eat dinner at upscale restaurants and watch college football bowl games on television."

Those in segment #14, "Middle Years," "are the most likely to be a member of a frequent-flyer program, maintain a municipal bond fund, own a hot tub and have a gold Mastercard. They also like to read travel magazines and listen to all-news radio." "Stars and Stripes" eat at Taco Bell and play lots of Nintendo.

The system is positively elegant in its self-assured completeness. But as Eldredge discovered, people who look identical in data often won't behave anything like each other in real life. These are the great thorns in the side of the marketing industry: subtlety, unpredictability, fickleness. They're also the qualities that make us human.

Consider the official description of MicroVision segment #49, "Anomalies":

"Functionally, these zipcodes represent a small number of unusual areas which should not be included in a marketing plan. While data exists for the zipcodes in this segment, by definition, they are not homogeneous and cannot be expected to behave in a consistent manner."

In the midst of all this, the privacy lobby remains woefully focused on soft-headed scare stories: We're being tracked. We're under surveillance. Big Brother is watching us. There oughta be a law.

The harpings of the digerati notwithstanding, it's more complex than that. We're seeing a profound shift in who represents "us" in public space, and the consequences could be utterly devastating. Mrs. Smith, Wendy Eldredge, the mortgage customers of CIBC, these are people who've lost control of their virtual personae, who've really had their privacy violated.

In the September 25, 1996, issue of his email journal, Netfuture, Stephen Talbott--a prominent critic of technology--described privacy as a fundamental respect for the sovereignty of others over their own affairs: as "a certain willingness to lower one's eyes and hold sacred what one knows about the other person." I like this definition. But it flies in the face of just about everything our economy holds sacred. As Talbott wrote in his FAQ on Computerized Technology and Human Responsibility:

"It is possible--although it will be a tremendous stretch--for us to extend our gestures of human respect to the abstract, placeless, and timeless data representations of other people. But it isn't conceivable that we will succeed in this greater challenge while failing the lesser and more familiar one. We cannot--as programmers, application users, corporate employees, consumers--enlarge our respect for persons to embrace data when we are forgetting what respect for persons means in the first place."

Big Brother isn't watching us at all. He's playing with our voodoo dolls.