Proposals for Implementing the Terrorism Information Awareness System

August 7, 2003 33 min read

Paul Rosenzweig

Former Visiting Fellow, The Heritage Foundation (2009-2017)

Former Visiting Fellow at The Heritage Foundation (2009-2017)

The Terrorism Information Awareness (TIA) program under development by the Defense Advanced Research Projects Administration (DARPA) at the Department of Defense has generated substantial controversy. That controversy has led the Secretary of Defense to convene an advisory committee, the Technology and Privacy Advisory Committee (TAPAC), to provide him with advice on how, if at all, the TIA program should proceed. No doubt Congress will (and, indeed, ought to) weigh in as well.

It is therefore appropriate to begin asking a practical, concrete question: Can TIA be developed, deployed, implemented, and operated in a manner that allows it to be used as an effective anti-terrorism tool while ensuring that there is minimal risk that use of the TIA tool-set will infringe upon American civil liberties?

Some believe it is not possible. Critics of the TIA program believe that it is a "Big Brother" project that ought to be abandoned. They begin with the truism that no technology is foolproof: TIA may generate errors and mistakes will be made. And, as with the development of any new technology, risks exist for the misuse and abuse of the new tools being developed. From this, critics conclude that the risks of potential error or abuse are so great that all development of TIA should be abandoned. To buttress their claim that TIA should be abandoned, these critics parade a host of unanswered questions. Among them: Who will operate the system? What will the oversight be? What will be the collateral consequences for individuals identified as terrorist suspects?1

These questions are posed as if they have no answers when all that is true is that for a system under development, they have no answer yet. The same is true of any new government program; thus, we know that these implementation issues are generally capable of being resolved.1

In fact, there are a number of analogous oversight and implementation structures already in existence that can be borrowed and suitably modified to the new technology. Thus, TIA can and should be developed if the technology proves usable. It can be done in a manner that renders it effective, while posing minimal risks to American liberties, if the system is crafted carefully with built-in safeguards to check the possibilities of error or abuse. This paper is an effort to sketch out precisely what those safeguards ought to be. In summary, they are:

Congressional authorization should be required before data mining technology (also known as Knowledge Discovery (KD) technology) is deployed;
KD technology should be used to examine individual subjects only in compliance with internal guidelines and only with a system that "builds in" existing legal limitations on access to third-party data;
KD technology should be used to examine terrorist patterns only if each pattern query is authorized by a Senate-confirmed official using a system that: a) allows only for the initial examination of government databases, and b) disaggregates individual identifying information from the pattern analysis;
Protection of individual anonymity by ensuring that individual identities are not disclosed without the approval of a federal judge;
A statutory or regulatory requirement that the only consequence of identification by pattern analysis is additional investigation;
Provision of a robust legal mechanism for the correction of false positive identifications;
Heightened accountability and oversight, including internal policy controls and training, executive branch administrative oversight, enhanced congressional oversight, and civil and criminal penalties for abuse; and
Finally, absolute statutory prohibition on the use of KD technology for non-terrorism investigations.

In short, TIA can be safely implemented. Failing to make the effort poses grave risks and is an irresponsible abdication of responsibility.2 As six former top-ranking professionals in America's security services recently observed, we face two problems--both a need for better analysis and, more critically, "improved espionage, to provide the essential missing intelligence." In their view, while there was "certainly a lack of dot-connecting before September 11," the more critical failure was that "[t]here were too few useful dots."3 TIA technology can help to answer both of these needs.

Indeed, resistance to new technology poses practical dangers. As the Congressional Joint Inquiry into the events of September 11 pointed out in noting systemic failures that played a role in the inability to prevent the terrorist attacks:

4. Finding: While technology remains one of this nation's greatest advantages, it has not been fully and most effectively applied in support of U.S. counterterrorism efforts. Persistent problems in this area included a lack of collaboration between Intelligence Community agencies [and] a reluctance to develop and implement new technical capabilities aggressively . . . .4

It is important not to repeat that mistake.

Understanding the Question: What TIA Technology Really Means

Any implementation structure must be based on an accurate conception of precisely what systems are being put into operation. Unfortunately, the popular understanding of the TIA program is wildly at odds with reality.5 Few people, including many of TIA's critics, seem to understand what the TIA program entails or how it would work.

TIA Is a Broad Research Program

TIA is, in fact, a broad research program with several dozen different components--programs ranging, for example, from efforts to develop machine language capabilities to translate Arabic directly into English, to ones dedicated to the development of information technologies that will permit creation of a secure Virtual Private Network where classified information can be exchanged without threat of compromise.6 A number of these programs--which no public observer has critiqued as unacceptable--have nonetheless been delayed or questioned because they fit within the broad TIA umbrella.7 Thus, the first goal of any statutory or regulatory structure implementing TIA is to clearly define its various components and focus the systematic controls and limitations on the components that are most problematic and most in need of oversight.

Another aspect of the TIA program that often gets obscured is that it is a research program, not a development or implementation program. DARPA specializes in "outside the box," speculative tech-nological research and development. When its research pays off (as in the development of the Internet), the benefits can be spectacular. But with far greater frequency its research programs produce much more modest (and sometimes non-existent) results.8 While the prospect that some of the more advanced technologies will be developed is sufficiently great that it should be taken seriously, it bears repeating that the research project very well might not succeed. The success of the endeavor has yet to be determined--but at this stage the significant step is to recognize the research nature of the program, and thus to avoid strangling nascent technology in its crib by imposing unreasonable and unrealistic "proving requirements" long before the technology has had a chance to be explored.9

To be sure, the ultimate efficacy of the technology developed is a vital antecedent question. If the technology proves not to work--if, for example, it produces 95 percent false positives in a test environment--than all questions of implementation may be moot. For no one favors deploying a new technology--especially one that threatens liberty--if it is ineffective. There may, however, be potentially divergent definitions of "effectiveness." Such a definition requires both an evaluation of the consequences of a false positive and an evaluation of the consequences of failing to implement the technology. If the consequences of a false positive are relatively modest (as this paper suggests), and if the mechanisms to correct false positives are robust (again, as recommended in this paper), then we might accept a higher false positive rate precisely because the consequences of failing to use TIA technology (if it proves effective) could be so catastrophic. In other words, we might accept 1,000 false positives if the only consequence is heightened surveillance and the benefit gained is a 50 percent chance of preventing the next smallpox pandemic attack. The vital research question, as yet unanswered, is the actual utility of the system and the precise probabilities of its error rates.

All of which is merely another way of saying that the implementation of any new technology must be cautious and carefully weigh costs and benefits: "Any new intrusion must be justified by a demonstration of its effectiveness in diminishing the threat. If the new system works poorly . . . it is suspect. Conversely, if there is a close `fit' between the technology and the threat (that is, for example, if it is accurate and useful in predicting or thwarting terror), the technology should be more willingly embraced."10 For purposes of the implementation questions discussed in this paper, it is assumed that TIA--and in particular pattern-recognition technology--has demonstrated its general utility in rigorous testing. But it must be recognized that this assumption is no more than an assumption--it has yet to be borne out by practice.11

Understanding Knowledge Discovery Technology

After making the important definitional distinctions already noted, the legal control structure developed should thereafter focus on the aspects of TIA technology that have generated the most controversy. Most emblematic of those technologies is the program known as Evidence Extraction and Link Detection--the technology that, if developed and deployed, will enable the use of hardware/software mechanisms to access disparate databases. Closely related to this technology are projects such as Proximity and the Group Detection Algorithm,12 which might develop means of analyzing data to create a graphic depiction of social connections between various individuals who might be linked in committing, for example, terrorist acts. The characterization of these technologies as "data mining" is clearly a misnomer, as the name leads to a factually inaccurate inference: that data is removed from a source (in the same way that "gold mining" implies the removal of gold). For simplicity's sake and because those with the greatest expertise in the field use a different nomenclature, this paper refers to this grouping of technologies as Knowledge Discovery ("KD") technology.13

Within the TIA data collection research program there are three further preliminary distinctions that must be understood before the proper legal structure for implementing KD technology can be developed:

First, KD technology can be used to access data that resides either in government databases containing information already lawfully collected by the government, or in private databases containing information collected by non-governmental (either commercial or non-commercial) databases. Though there will be difficult issues at the margin--e.g., databases merging private and government information or information requests that require access to both government and private databases--the utility of the distinction is clear. Inquiries that access only government databases pose far fewer civil liberties concerns than those that seek access to private databases.14

Second, KD technology can be used to access both foreign and domestic databases. Those databases can contain information about both "foreign persons"15 and American citizens. Again, the distinctions drawn (which may also suffer from marginal definition concerns relating to merged databases) have powerful utility. Using KD technology, for example, to secure information about al-Qaeda terrorists from the foreign database of a terrorist enemy is substantially less problematic than using the same technology to access information about an American citizen in a private, commercial American credit card database.

Third, and most significant, KD technology may be used in two distinct ways--means that have been described as "subject-oriented" and "pattern-based" data inquiries. Here, the need for a precise distinction is vital.

Through a subject-oriented query of databases containing information, KD technology might, by focusing on a specific individual (identified by name or other unique identifying characteristic) attempt to gain a more complete understanding of a suspect, his activities, and his relationships with others. In other words, KD technology would be used to access databases with information about a particular individual (e.g., government driver's license records, telephone toll records, or airplane flight records) to develop an understanding of his conduct. From this, one could develop a greater understanding of those he associated with--who shares his apartment? who does he call long distance? where do the funds in his bank come from? --that would produce additional leads and could, potentially, allow the development of a complete picture of an individual suspect, his associates, and their potential connection to terrorist activity.

By contrast, a pattern-based query is not focused on a specific uniquely identifiable individual or individuals. Rather, using existing intelligence data, intelligence analysts will develop detailed models of potential terrorist activities. The models will be developed in an iterative process: A group of analysts intended to replicate the conduct of potential terrorists (called "Red Teams" because American enemies traditionally are colored red on charts and maps) will conduct operations in a virtual (i.e., fictitious and artificial) world of cyberspace, creating data transactions (by securing driver's licenses, boarding airplanes, purchasing goods, etc.). They will repeat these operations for as many different terrorist scenarios as their imaginations will support. Then a separate team of analysts (the "Blue Team"), using the same intelligence data and their own imaginations, will try to develop database search inquiries that are capable of identifying the terrorist operation patterns created in virtual data space with a high degree of accuracy and distinguishing them from patterns of innocent transactional activity.

Thus, the utility of the effort will turn on its ability to accurately sift terrorist patterns from innocent patterns of activity. This accuracy may be usefully defined as the degree to which a database search inquiry identifies false positives (i.e., denotes as a potential terrorist pattern a pattern that is wholly innocent of terrorist connection) and false negatives (i.e., fails to identify an actual terrorist pattern). When (and if) a pattern-based data query proves successful in virtual, artificial cyberspace it might then (with the controls identified below) be used to examine real-world data.

Here, too, the distinction drawn is one that is likely to have legal and policy significance. A subject-oriented inquiry using KD technology raises questions about enhancing government efficiency, but it is, fundamentally, little different from existing law enforcement or intelligence practice. This sort of inquiry is most akin to a classic law enforcement "lead" inquiry. Some predication for suspicion about a particular individual (or individuals) exists and that predication leads law enforcement or intelligence officials to develop a more thorough understanding of the individual's actions. In a murder, case, for example, a detective starts with a list of known associates of the victim. Working outward from that list the detective develops information about each individual (and other individuals whose identity is revealed through association to those under initial inquiry) in an effort to determine who the killer is. KD subject-oriented technology does little more than render this process of expanding inquiry more effective (so that the "detective" misses fewer promising leads) and more efficient (so that the "detective" can conduct the inquiry more quickly).

Pattern-based inquiries, by contrast, are fundamentally a new conception--they focus not on predication about an individual but on predication about a pattern of conduct based upon analysis of the pattern, not the individual. This new conception of how law enforcement and/or intelligence agencies may develop investigative leads raises far more questions: Use of KD technology in this way is not merely an efficiency enhancement; it has the potential for substantially new and different intrusions into American privacy. 16

The Scope of the Problem: What Controls Are Necessary?

Any structure of control and implementation will need to appreciate and accommodate these significant distinctions. Within the context of KD technology these distinctions give us the following general outline of principles that should guide any structure.

For foreign uses:

Uses of KD technology against foreign targets are not generally subject to the same concerns as domestic uses. As a general rule such foreign uses should follow existing regulations and limitations on the collection of foreign intelligence.17

For domestic uses:

Subject-based queries of government databases are the least problematic, as they involve queries of a form traditional in law enforcement or intelligence gathering on pre-existing databases to which the government already has almost unlimited access.
Subject-based queries of private, non-government databases, though posing somewhat greater risks of privacy intrusion are, if suitably constrained, likely to pose no significant risk to civil liberty. Such queries, again, mirror traditional methods of inquiry and a wide-ranging consensus already exists that they are well supported in law and policy.18
Because of the new and unique nature of pattern-based queries, the use of such inquiries on government-only databases poses special concerns (though less than those posed by inquires of private databases) and should be subject to particular limitations.
Pattern-based queries in private databases are the most problematic and pose the greatest potential for misuse, mistake, and abuse. Thus, the greatest limitations and most careful checks need to be implemented before such technology is deployed.

The next question is: What are the mechanisms and structures that can be put in place that will protect civil liberties concerns regarding the use of KD technology in these four domestic contexts, while allowing the use of KD in narrow areas where it will do significant good in combating terrorism? 19

As the prospect of new and different information technologies has grown, it has become apparent that existing laws do not adequately address limits on their use or establish suitable control structures for their deployment. Neither the Fourth Amendment nor privacy laws appear to have any general applicability to KD technology used on American databases or American citizens.20 An appropriate legislatively or regulatorily created structure would have components addressing the following areas:

Pre-use restrictions
What limits should be imposed before a search of domestic non-governmental databases is conducted using KD technology?
Identifying the subject
What limits should be imposed before the results of a pattern-based search query, and in particular, uniquely individual identification data derived from a pattern-based query, are disclosed to government authorities?
Data Use and Error Correction
What consequences should arise from a positive identification? What programs should be implemented to ensure that false positives that mistakenly identify individuals as potential terror suspects are quickly and permanently corrected?
Accountability and Oversight
What structures and systems need to be put in place to ensure that any KD-based system is used accountably, for appropriate purposes, and that mistakes are corrected and misuse or abuse suitably punished?
Mission Creep
How might a KD-based system be best structured to ensure that it is used only to fight terrorism and that it does not become a general law enforcement tool for whatever "war" on crime is currently in fashion?21

Mechanisms and Structures for Controlling the Domestic Use of KD Technology

Any suitable structure for controlling and limiting the use of a new investigative tool must answer two questions: what standard should be applied? and who decides? In other words, in considering how best to effectively use a new technology such as KD while ensuring that it is used within bounds that respect American liberties, we must ask the fundamental questions of when it is appropriate to use the technology (the standards question) and who is in charge (the authorization question). The control mechanisms envisioned in this paper propose answers to these two questions based upon: a) when the authorization is sought (before or after the data query), and b) which sort of query is proposed (a subject-oriented or a pattern-based inquiry).22

Pre-Use Authorization

As already noted, the general effectiveness of KD technology--and particularly pattern-based recognition technology--will need to be demonstrated before KD technology is authorized and implemented. But a general demonstration of effectiveness is different from an assessment of the utility of any particular search query. Both aspects of effectiveness will need to be addressed before KD technology is deployed.

With respect to the broader question of the general utility of KD technology, in light of the underlying concerns over the extent of government power, the best answer is that formal congressional consideration and authorization of the use of KD technology, following a full public debate, should be required before the system is deployed.23 Before any KD technology--with both great potential utility and significant potential for abuse--is implemented, it ought to be affirmatively approved by the American people's representatives.

In making this decision, Congress should ensure that it bases its judgment on a sound technical understanding of the program. It might, for example, have the technology reviewed by an independent board consisting of non-governmental experts. Staffing for such an oversight board could come from experts identified by the National Academy of Sciences or some other respected scientific body, with additional representatives from the law enforcement and intelligence communities.

The remainder of this section addresses principally the second question: What pre-use authorization should be required before KD queries are deployed in specific cases? More specifically, what standards should be applied to assess the appropriateness of a particular proposed use, and who will decide whether or not the standard has been met?

Subject-Oriented Inquiries
In general, a subject-oriented inquiry is closely akin to the "normal" operation of routine law enforcement or intelligence activities--based upon some predication government agents have a factual basis (albeit one insufficient to be conclusive) for believing that a particular, identifiable individual or individuals are engaging or will be engaging in criminal/terrorist activity.

In the law enforcement context, the standard for the initiation of an investigation and for conducting further examination of a particular individual is minimal. No judicial authorization is needed for a government agent to initiate, for example, surveillance of a suspected drug dealer. All that is generally required is some executive determination of the general reliability of the source of the predication and, within the context of a particular agency, approval for initiation of an investigation from some executive authority. For example, the initiation of various anti-terror investigations is governed by guidelines promulgated by the Attorney General (in the case of domestic investigations) and an Executive Order (in the case of foreign intelligence investigations).24 By contrast, routine drug or robbery investigations are often authorized at the local level by a unit supervisor.

To the extent that any such subject-oriented investigation requires access to data regarding the subject of the investigation, the same executive authorization suffices to permit a search of existing government databases for information regarding the individual. Such searches might include, for example, a check of the National Criminal Information Center for data on prior criminal convictions and a search of a state driver's license database for information on residence. By contrast, when the information is being collected from third-party data holders (such as telephone records or credit-card information) the government inquiry must proceed by way of subpoena--a method that affords the data holder the opportunity to object to production of the data if it is unduly burdensome or if the government seeks irrelevant information.25

With respect to subject-oriented queries of this sort, KD technology is best understood as enhancing the efficiency of the information gathering process. When an individual subject is identified, the use of KD systems simply accelerates the speed at which the information may be collected. Instead of taking hundreds of FBI agents several months to track the activity of 19 terrorists, the same information might (if KD works) be gathered in a few minutes, thereby affording the government the opportunity to short-circuit a terrorist strike proactively, rather than gather information reactively after the attack has already occurred. The best example of the potential success of this process is private technology that has already been used to identify non-obvious relationships among individuals. Using that technology and publicly available information, one could have readily "linked" the 19 terrorists.26 Notably, the same analysis using non-public, government-held data identified the same 19 terrorists and 11 additional suspects.27

The only conceivable objection to the use of KD technology in this form is an objection against enhanced government efficiency. This is not an objection to be slighted as negligible--the entire conception of separation of powers and checks and balances that inheres in our constitutional system is premised on the notion that barriers to effective government action enhance individual liberty.28 It cannot, however, be gainsaid that such principles stem from a time when the potential consequences of government inaction were less significant--in the current environment, the consequences might, literally, be the destruction of a city. Thus, in the unique context of terrorist threats that hold the potential for mass destruction, it appears advisable to relax our disapproval of an efficient government if suitable controls can be implemented. 29

In principle, then, the subject-based inquiry using KD technology appears to be a welcome prospect. But the technology should not be deployed without limit, even in this traditional context. Rather, KD should be implemented only in a manner that mirrors existing legal restrictions on the government's ability to access data about private individuals--nothing more and nothing less. In other words, the implementing regulations and/or legislation should require that:

Authorization for a subject-oriented use of KD technology be based only upon adequate predication to believe that criminal or terrorist activity is or is likely to occur;
Any authorization be issued only on the basis of internal guidelines;
Any authorization be embodied in writing, the records of which will be available for oversight and review;
Authorization may be granted only by a supervisory level officer within the investigating agency;
KD technology that is developed for the use of accessing data held by non-government third parties must incorporate hard-wired programming that prevents access to data in the possession of non-government data holders without notice to and the consent of the data holder; and
In the event the non-government data holder interposes an objection to access to the data, KD technology must provide a mechanism (whether electronically or, if necessary, through more conventional means) for the objection to be referred to an Article III judge (that is, one appointed by the President, confirmed by the Senate, and holding life tenure) for resolution using the same standards and principles that would govern such a data request today if it were made by means of a paper subpoena rather than, in effect, an electronic KD subpoena. 30

Pattern-Based Queries
Unlike subject-oriented queries, pattern-based KD queries have no ready analog in contemporary law enforcement practice. Like the investigation of "tips" about a particular subject, pattern analysis is predictive in nature. Both may look to collect information about possible future criminal or terrorist conduct. But unlike subject-oriented analysis, which takes an identity and works outward to examine whether a criminal pattern exists, pattern-based queries presuppose the ability to successfully identify criminal patterns and propose, as it were, to work inward from the pattern to individual identities.

As has already been noted, the general utility and prospects for success of this sort of analysis are by no means certain. Many critics have suggested that it will routinely generate far too many false positives and miss too many terrorists.31 Researchers in the field believe, however, that because of the high correlation of data variables that are indicative of terrorist activity, a sufficient number of variables can be used in any model to create relational inferences and substantially reduce the incidence of false positives.32

But even assuming the utility of the practice generically, KD pattern-based technology will also need to be vetted and approved on a more particularized basis. In other words, just because testing has established that some pattern-based inquiries may generate useful information about potential terrorist threats, it does not necessarily follow that any particular pattern-based query is well-constructed and ought to be deployed.

Plainly, the definition of a well-constructed query is subject to substantial debate. It reflects a policy judgment as to the degree to which false positives are tolerable in a new investigative system and an assessment of the consequences arising from tightening the inquiry to avoid false positives with a concomitant increase in the number of false negatives or in the failure to detect potential terrorist activity because the technology has not been deployed at all. Defining that balance point is difficult and it would be hubristic for this paper to offer a resolution. It would, however, be appropriate for our elected representatives to make that value judgment by providing guidance on the characteristics of an effective technology that would identify KD pattern search methodology that ought to be deployed.33

The other question--and perhaps the more important one--is identifying who must approve the use of a particular pattern inquiry before it is deployed. Who, in other words, must make the decision that a pattern query is likely to be effective given the guidance provided in law or regulation? Here, many potential decision makers can be identified and the choice among them is, essentially, one of preference and practicality.

Operational constraints limit the ability for particularized oversight from outside the executive branch. Yet unfettered discretion is unacceptable. Particular pattern-based queries must be deemed effective prior to use--but requiring approval by either the legislative or judicial branch prior to their use would be cumbersome (and judges may be particularly ill-suited to the scientific judgments that will be required). Given that post-use structures (described below) can provide significant protection of individual liberty, the best analogous model would appear to be the requirement for "high-level" approval before a particular pattern-based query is deployed. As with, for example, the use of a subpoena to seek records from a press organization,34 no action should be taken without the authorization of a Senate-confirmed officer of the Department of Justice, Homeland Security, FBI, or CIA (such as an Assistant Attorney General or the FBI Director).35

If this official deemed a particular pattern-based query sufficiently well-designed that its use was warranted, that determination should suffice to allow its deployment in the real-world data stream. The question then would arise as to how the response to the query should be structured in the real world and how those results should be handled.

Identifying the Subject: Breaking the Anonymity Barrier After a Data Query

One can envision any number of ways in which the response to a data query would be structured. It could, for example, be a comprehensive response, with uniquely identifying individual data about each of the individuals whose activities fit the pattern tested in the real-world data space. Equally plausibly, the response could be simply a number, representing the number of positive matches. The first would be overly intrusive, the latter meaningless. Indeed, the response structure might vary with different applications and uses for the technology. The general question, however, will remain the same: What response structure is best at enhancing liberty values while providing useful information?

Privacy and Anonymity
To answer that question one must have in mind the nature of the liberty value at risk. For some, the liberty in question is an absolute prohibition on government scrutiny--a pure privacy model, if you will. The law, however, has long recognized that the Constitution does not protect this conception of privacy against government intrusion when the object of the intrusion is the disclosure of criminal conduct.

Rather, since 1967, the Supreme Court has recognized that the Fourth Amendment protects only those things in which someone has a "reasonable expectation of privacy" and, concurrently, that anything one exposes to the public (i.e., places in public view or gives to others outside of his own personal domain) is not something in which he has a "reasonable" expectation of privacy--that is, a legally enforceable right to prohibit others from accessing or using what one has exposed.36 Thus, an individual's banking activity, credit card purchases, flight itineraries, and charitable donations are information that the government may access because the individual has voluntarily provided it to a third party. According to the Supreme Court, no one has any constitutionally based enforceable expectation of privacy in them. The individual who is the original source of this information cannot complain when another entity gives it to the government. Some thoughtful scholars have criticized this line of cases, but it has been fairly well settled for decades.

Congress, of course, may augment the protections that the Constitution provides, and it has with respect to certain information. There are privacy laws restricting the dissemination of data held by banks, credit companies, and the like.37 But in almost all of these laws (the Census being a notable exception),38 the privacy protections are good only as against other private parties; they yield to criminal, national security, and foreign intelligence investigations.

That balance should change, at least somewhat, when the question becomes the implementation of KD pattern-based technology. Most Americans readily understand that their individual privacy is subject to limits when there is predication to believe that they have committed criminal or terrorist acts. The existence of such predication is taken as a justification for breaching the wall of privacy. Where KD technology differs, however, is in the prospect that the search engine seeking patterns in data space will necessarily be obliged to scan and, if there is no match discard, data relating to the conduct of many innocent individuals as to whom there is no predication at all. The difference is between asking for the bank records of Al Capone and asking for all the records of all the customers at Capone's bank. If KD pattern-search technology is to be used, it can only be used on the understanding that data about innocents will be examined--how then to square that reality with the liberty and privacy expectations of Americans?

The answer lies in the concept of anonymity.39 American understanding of liberty interests necessarily acknowledges that the personal data of those who have not committed any criminal offense can be collected for legitimate governmental purposes. Typically, outside the criminal context, such collection is done in the aggregate and under a general promise that uniquely identifying individual information will not be disclosed. Think, for example, of the Census data collected in the aggregate and never disclosed, or of the IRS tax data collected on an individual basis, reported publicly in the aggregate, and only disclosed outside of the IRS with the approval of a federal judge based upon a showing of need.40

What these examples demonstrate is not so much that our conception of liberty is based upon absolute privacy expectations,41 but rather that government impingement on our liberty will occur only with good cause. In the context of a criminal or terror investigation, we expect that the spotlight of scrutiny will not turn upon us individually without some very good reason. Sampling of data that is discarded poses substantially less threat to individual liberty than sampling data and retaining the sample or using unverified samples to subject an individual to heightened scrutiny.

This idea of preserving anonymity unless and until a good reason for breaching the anonymity barrier arises can and must be hardwired into any KD technology. The protections should take three forms:

First, the system must be built to ensure that no data scanned for match to a pattern inquiry is retained if the data does not fit the pattern. The algorithms must scan the requisite databases for information matching the pattern query but absolutely no data that fails to match the pattern query should be retained in any independent government database.
Second, the system must also ensure that data scanned that does not match the pattern query is never presented to an analyst for examination. The initial scanning must be automated and structured to prevent unauthorized access of this sort.
Third, and more important, the implementing legislation or regulations should mandate that any KD technology developed should structure the pattern query response in a manner that:
uses the search query in successive iterations, looking first only in government or other public databases (or perhaps subsets of government or public databases) and permiting the query to be used on private, non-government databases only after pattern-based inquires on government and public databases provide the basis for expansion of the universe of data examined to private; and
initially disaggregates the pattern-based inquiry results from the identity of the individual(s) who match the pattern. In other words the response provided must include only information about the activities identified that form the alleged pattern of terrorist activity--absolutely no uniquely identifying individual information should, at the outset, be provided.

These later requirements provide a dual structural mechanism that will have the effect of preserving civil liberty and privacy. The successive iteration requirement (sometimes called the use of primary and secondary databases)42 will ensure that those databases containing information that individuals consider the most private will not be examined absent a suitable showing of cause. The disaggregation requirement will have the effect of absolutely preserving anonymity at the initial stage of any pattern-based inquiry. Thus, those developing KD should be required to construct a system that initially searches non-public databases and disaggregates individual identifiers from pattern-based information. Only after the pattern is independently deemed to warrant further investigation should the individual identity be disclosed and more sensitive non-government databases examined. At the first iteration, those using KD technology will initially be made aware only of a potentially suspicious pattern, but not of the identity of the individual whose pattern has been identified.43

Thus, one aspect of the TIA program, the Genisys Privacy Protection program, is to be welcomed by everyone on both sides of the discussion. The Genisys program is developing filters and other protections to keep a person's identity separate from the data being evaluated for potential terrorist threats. In authorizing KD technology, Congress could mandate that a trusted third party governmental agency (as suggested below in the related context of accountability, perhaps the Inspector General of the agency where KD-technology is housed) rather than the organization's database administrator control these protections. This methodology would ensure that the privacy protections are not being circumvented.44

We must also realize that the use of enhanced technology does not uniformly impose costs on liberty--there are potential benefits as well. One could, conceivably, adopt a purely preventative mode in responding to terrorist threats, enhancing security at airports, government buildings, and the like and relying on increased physical intrusions and identity cards as a means of forestalling the next attack. But if we are not to condemn ourselves to the "citadelization" of America, we must also consider a different tack--the use of predictive technologies to attempt to anticipate and thwart terrorist attacks before they occur. These technologies come at some potential costs to liberty, but with the very real prospects of gains in other forms of liberty. Absolute protection for electronic privacy necessarily leads to even less physical privacy.

Judicial Review
The final step--and the critical one for striking a suitable balance between the use of technology to enhance the ability to predict terrorist attacks while adequately protecting liberty interests--lies in the interposition of a judicial officer before the barrier of anonymity is broken. In other words, once a pattern of potential terror activity is identified, the user of a KD pattern search ought to be obliged to present that information to a court--in effect, the equivalent of the court currently used to implement the Foreign Intelligence and Surveillance Act (FISA), if not that court itself.45 Only after the judge determines that a basis exists for concluding that the pattern identified is, in fact, a pattern of potential terrorist activity and not merely a coincidental pattern of innocent activity ought the identify of the actor whose pattern is in question be provided to law enforcement or intelligence officials. This mechanism--sometimes called selective or progressive revelation--should be built into the KD technology from the beginning, not added as a supplement after the structure of the technology has been substantially developed.46

It is absolutely vital to both the success of the KD technology and the protection of civil liberty that this judgment is made by a neutral third party. And because of the nature of the inquiry--determining the basis for official scrutiny and intrusion by law enforcement or intelligence officials--is one that has historically been conducted by judicial officers, it is fully appropriate to rely on that model for control of the system. It has proven quite successful in the FISA context, and there is no reason that it could not be directly adapted to the use of KD pattern technology.

What standard, then, should the "KD judge" apply in determining whether or not to disclose the identity of the pattern creator? What showing ought to be required of the government? The answer to those questions turns directly on the answer to another question--what are the consequences of a positive identification? For the more severe the consequences, the higher the standard necessary, and conversely, the less significant the consequences, the lower the standard acceptable.

The law has dealt with this in a variety of traditional ways: Before one may be arrested, for example, the government must have probable cause to believe that a specific individual has committed an offense.47 Similarly, before the police may intrude in an individual's home, they must secure a search warrant, supported by probable cause.48 By contrast, if the nature of the intrusion is less--if, for example, the police are making a brief investigative stop for the purposes of questioning--then all that law enforcement requires is some "reasonable suspicion" or "articulable suspicion" that will form the basis for investigation.49

Using this paradigm as a model, the most reasonable answer relies on the use of KD pattern identification solely as a predicate for further investigation. In other words, pattern-based identification is an investigative, not an evidentiary tool.50 In this case, the "KD judge" should be required to determine whether the pattern presented raises a reasonable suspicion of potential terrorist activity. If the KD pattern identification is used for something more--for example, to place a named individual on a watch list for flights or to deny a named individual employment in a secure facility, then a higher standard of probable cause ought to be required.

The Consequences of Identification

KD technology will operate, if it ever successfully operates at all, on the basis of predictive analysis. As such, it will never provide more than a basis for suspicion from a pattern of activity. And, qualitatively, there can be little doubt that the predictive power of KD identification is of substantially less probative value than is direct information about the conduct of a particular subject. Indeed, KD advocates have never contended otherwise--to the contrary they have often characterized pattern-recognition analysis as akin to the very difficult task of picking out a barely audible signal in a sea of noise.51 Thus, all recognize that positive pattern identification provides one of the weaker forms of inference about a suspect's potential activity.