MARK A. FORMAN
ASSOCIATE DIRECTOR FOR
E-GOVERNMENT AND INFORMATION TECHNOLOGY
OFFICE OF MANAGEMENT AND BUDGET
BEFORE THE
SUBCOMMITTEE ON TECHNOLOGY, INFORMATION POLICY, INTERGOVERNMENTAL RELATIONS,
AND THE CENSUS
COMMITTEE ON GOVERNMENT REFORM
UNITED STATES HOUSE OF REPRESENTATIVES
MARCH
25, 2003
Mr. Chairman and Members of the Subcommittee,
Thank you for the opportunity to appear before the Subcommittee
to discuss the Administration's views on data mining.
This committee has defined "data mining" as
a "technology that facilitates the ability to sort through masses
of information through database exploration, extract specific information
in accordance with defined criteria, and then identify patterns of interest
to its user." While there are many definitions of "data mining",
the Committee's definition is generally accepted and helpful in defining
the issue and its challenges. Additionally, data warehouses are being
used as the source of data for many data mining applications. A data warehouse
is a managed data repository of integrated, cleansed data whose source
is mainly transactional data. Data is aggregated from various sources
and structured for the use of analysis and reporting.
Commercial Types and Uses of Data Mining
The private sector uses data mining to make sense of the
wide breadth of data that companies and industries have available. Some
examples of these uses:
-
Customer Relationship Management/ Segmentation Analysis
Applied to Customer relationship management (CRM), data mining
is used to analyze disparate customer data and provide insight into
customer needs and wants. Data mining is used to analyze and segment
customer buying patterns and to identify potential goods and services
that are in demand. Companies that use data mining shorten response
time to market changes, which allows for better alignment of their products
with their customers needs. They do this to increase revenue
performance and allocate investment to products that meet consumer demand
effectively.
-
Fraud Detection Companies use software that
provides comprehensive, transaction-level financial reporting and analysis
to support automatic fraud detection and proactive alerting. Software
packages can also be used to detect anomalies, variances, and patterns
in databases. For example, BlueCross/BlueShield and other health care
payers use data mining tools to catch and prevent fraudulent and abusive
billing practices. BlueCross/BlueShields solution can quickly
search through millions of medical claims and detect inappropriate billing
practices with a high degree of reliability.
-
Retail Analysis and Supply Chain Analysis Companies
such as Wal-mart are broadly recognized for analyzing sales trends.
Retail analysis and supply chain analysis can be used to predict the
effectiveness of promotions, decide which products to stock in each
store, and help managers understand cost and revenue trends in order
to adjust pricing and promotions in anticipation of changes in marketplace
conditions. Data mining also allows supply chain tools to monitor and
analyze inventory trends, forecast product demand for replenishment,
track vendor performance and identify problems, analyze distribution
network efficiency, and understand supply chain costs and inefficiencies.
-
Medical Analysis/Diagnostics The health care
industry uses analysis to predict the effectiveness of surgical procedures,
medical tests, and medications. High-risk segments of the population
can be identified and targeted for proactive treatment. For example,
American Healthways relies on predictive modeling to identify patient
types who trend toward high-risk conditions, giving care coordinators
a proactive approach to healing. The result is improved quality of life
for the patients and reduced stress on hospitals and insurance providers.
-
Document Analysis (Text Mining) Documents can
be searched for information and insights in a fraction of the time an
individual will spend locating one document. Document analysis involves
analysis of text and structured and unstructured data, organized by
categories, to determine trends, pattern and relationships and organized
by categories. This can be highly effective in survey analysis. Content
management systems and software packages perform analyzes on an organizations
information products to help companies control information flows and
work products. For example, Autonomy at BAE Systems aggregates content
from many sources in many different formats, structured or unstructured,
including their intranet and 10,000 news feeds per day. The goal is
to personalize the delivery of that information to each user, and to
eliminate work duplication and time-consuming searches. Autonomy automatically
alerts BAE Systems employees to documents in the system that relate
to what they're doing, or to other employees in the company whose interests
and expertise match their own.
-
Use of Decision Support Systems (DSS)
Decision Support Systems may use data mining to identify trends
and present the information in intuitively useful ways -- supporting
more informed and effective decisions for business and organizational
activities. For example, one DSS solution for HR management is now providing
essential insights into The Bank of Scotland Group's HR activities worldwide,
giving managers personnel and staffing information needed to make hiring
and placement decisions. Managers can determine if job turnover in a
particular area or occupation classification is higher than expected
and investigate influences on loyalty such as the physical working environment.
-
Financial Analysis The insurance industry uses
and data mining algorithms to conduct risk analysis, such as evaluating
actuarial experience studies for mortality, withdrawal and disability,
dynamically calculating exposures and expectations for period ranges.
For example, Canada Life performs timely and accurate actuarial studies
using a data warehouse and advanced data analysis methods; the Generali
Group uses data mining tools to manage financial market risk and customer
credit risk via a common analytical framework for rapid and flexible
analysis and reporting of risk exposure.
Government Applications of Data Mining
The
Federal government analyzes data that has been collected from the public
for several purposes, including determining the eligibility of applicants
for Federal benefits, detecting potential instances of fraud, waste, and
abuse in Federal programs, and for law enforcement activities. Some of
this analysis is facilitated by data mining. Here are a few examples of
agency uses of data analysis techniques and software:
-
Financial management
Poor management practices have created opportunities for a wide
range of fraud and abuse in the use of government travel and purchase
cards. Several agency inspector general (IG) investigations have used
statistical sampling processes to document inappropriate purchases and
misuse of these cards. OMB is taking and will continue to take substantive,
affirmative steps to ensure agencies improve their internal control
systems to monitor expenditures properly.
-
Human Resources Management One of the 24 E-government
initiatives, the Enterprise HR Integration under the Office of Personnel
Management, is leading the effort to provide a government wide data
warehouse of HR information to minimize the workload as employees move
from one department to another. A key component of this is the E-Clearance
project OPM and its partner agencies on the E-clearance project
are using data mining to more quickly access information which speeds
up the overall security clearance investigation process. Given the backlog
in clearances, this use of data mining is critical to our ability to
get staff for effectively and rapidly through the human resources management
processes.
-
Reducing Erroneous Payments and Fraud Detection
Data analysis accomplished via the matching of electronic databases
between government agencies has been an important and successful tool
for identifying improper payments under federal benefit and loan programs,
as well as detecting potential instances of fraud, waste, and abuse
in Federal programs. As highlighted in the FY 2004 Presidents
Budget, agencies are now required to report the extent of erroneous
payments made in their major benefit programs. In addition, the last
decade has shown an increased reliance and increased spending on non-discretionary
social services, such as Medicare and Medicaid. These expenditures --
and therefore the potential for improper payments -- are likely to increase
unless appropriate steps are taken to protect against errors and fraud.
Through the President's Management Agenda initiative for improving financial
performance, we are getting a handle on the problem of erroneous payments.
For example, Medicare's erroneous payment rate has fallen from 6.8 percent
to 6.3 percent and the Food Stamp program reduced its national error
rate from 8.9 percent to 8.7 percent. Just these small rate reductions
prevented the waste of almost $1 billion. Furthermore, the Administration
has proposed several pieces of legislation regarding the Administrations
authority to share data that will greatly improve efforts to reduce
erroneous payments.
-
Policy Analysis The quality of policy decisions
is a function of our ability to correctly analyze enormous amounts of
data that describe a problem faced by modern society. For example, the
Department of Education mines data from a variety of its student financial
aid systems, including the Central Processing System, Pell Grant Payment
System and National Student Loan Data System, permitting professionals
to analyze Federal education programs quickly and easily, without the
time, expense, and burden on citizens of paper-driven surveys.
-
Law enforcement and Homeland Security Federal
agencies have found data mining techniques to be an important tool for
assisting law enforcement combating terrorism. For example, system such
as the Department of Homeland Securitys Bureau of Customs and
Border Protection operates the Automated Commercial Environment (ACE)
can utilize a series of data mining tools to strengthen border security
efforts. ACE will provide the IT mechanisms for making quick evaluations
on whether particular people or goods should be deemed high-risk or
low-risk. Also, ACE will enable the Department of Homeland Security
and other Federal agencies to more precisely target for inspection or
investigation the highest risk people and cargo crossing the border.
Through tools such as ACE, agencies have the ability to instantaneously
analyze vast amounts of data and intelligence to see links among businesses
and people, thus revealing security threats that might otherwise have
gone unnoticed.
-
Citizen access to government data Search sites
such as the one available at the FirstGov website provide a facility
for searching vast amounts of unstructured data across the Federal government
by using publicly available search engines. In addition, the Federal
government conducts its own data analyses for statistical purposes and
facilitates data user access to statistical data. For example, the Census
Bureau's American FactFinder System (Advanced Query) uses
a data mining tool to allow users to query Census 2000 detailed data
files. The tool provides simplified access to and extraction of data.
Benefits and Pitfalls
As outlined above, the government has found a number of
ways to use collected information to improve program effectiveness and
to reduce misuse of taxpayer dollars. While the use of data mining techniques
to access useful, timely data and to identify relationships that were
previously unknown is a powerful tool for identifying errors, fraud, threats,
etc., the application of such techniques to personal information raises
serious questions about privacy and how it should be protected. In order
for this to be accomplished, the government must continue to act in several
areas:
1. Federal data analyses must be consistent with
law
In the federal arena, data mining activities must be implemented
consistent with the protections of the Privacy Act of 1974, as amended
by the Computer Matching and Privacy Protection Act of 1988, and other
privacy statutes. These statutes do not address data-mining per se, but
they outline privacy principles the government must follow in data collection,
including: notice and reasonable disclosure; use and purpose limitations;
choice; access to government-held information, information security; redress;
and oversight. Agencies are well-versed in the legal, policy, and technical
requirements governing access to and sharing of personal data. Agencies
may aggregate information by analyzing data across databases, a concept
known as virtual data warehousing; however, when information
can be accessed or exchanged at numerous locations by many users, a potential
exists for inadvertent disclosure of personal information or misuse of
personal information, by alteration or for unauthorized purposes. Agencies
that adhere to the existing legal and policy structure including OMB and
NIST policy guidance can protect personal information in their possession
even as they participate in data-mining activities. Furthermore, the E-Government
Act of 2002 requires that an agency conduct a Privacy Impact Assessment
(PIA) when agencies develop or procure information technology to initiate
a new online collection of information that involves personally identifiable
information changing hands, such as in the case of matching.
2. Ensuring the Security of Federal IT Systems
The Federal Information Security Management Act (FISMA)
provides a comprehensive framework for ensuring the effectiveness of information
security controls over federal information resources, including resources
that result from data mining. FISMA requires the head of each agency to
periodically assess the risk and magnitude of harm that could result from
unauthorized access, use, modification, or disclosure of information.
The agency must then provide information security protections that are
commensurate with the stated risk. Agencies are required to periodically
test their information security controls and techniques to ensure that
they are effectively implemented. The results of this testing are reported
to OMB on an annual basis.
Conclusion
Data mining can have many uses. The Administration
is strongly committed to using available technologies like data mining
to serve citizens and protect citizens from other threats, while the Administration
is also strongly committed to protecting the privacy of citizens when
such tools are used. Through data analysis and data mining, the private
sector has improved customer service and customer needs, and has been
able to help customers take proactive approaches to health care. The federal
government has reduced the number of erroneous payments, and has been
able to determine patterns in databases that help predict both weather
patterns and the spread of deadly viruses.
We need to use modern analytic tools, such as data mining,
to improve government performance, from policy analysis to fraud to homeland
security. We can maintain privacy and security while improving government
productivity, but we must employ tools like data mining appropriately.
We hope to work with this Committee to ensure that the benefits of data
analysis continue to help Federal agencies to perform their missions,
while protecting against the problems that aggressive and abusive data
mining can cause.
|