[UCI KDD Archive]

Welcome to the UCI Knowledge Discovery in Databases Archive

Librarian's note [July 25, 2009]: We no longer maintaining this web page as we have merged the KDD Archive with the UCI Machine Learning Archive. For any questions, please contact us at ml-repository '@' ics.uci.edu.

This is an online repository of large data sets which encompasses a wide variety of data types, analysis tasks, and application areas. The primary role of this repository is to enable researchers in knowledge discovery and data mining to scale existing and future data analysis algorithms to very large and complex data sets.

Creation of this archive was supported by a grant from the Information and Data Management Program at the National Science Foundation. The archive is intended to serve as a permanent repository of publicly-accessible data sets for research in KDD and data mining. It complements the original UCI Machine Learning Archive , which typically focuses on smaller classification-oriented data sets.

In addition to storing data and description files, we also archive task files that describe a specific analysis, such as clustering or regression, for the data sets stored. The call for data sets lists typical data types and tasks of interest.


     Data Sets                               Task Files

Citation Information

If you publish material based on databases obtained from this repository, then, in your acknowledgments, please note the assistance you received by using this repository. This will help others to obtain the same data sets and replicate your experiments. We suggest the following pseudo-APA reference format for referring to this repository:

Hettich, S. and Bay, S. D. (1999). The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science.

We also request that you send the citation information for your article to kdd '@' ics.uci.edu. If your article is available online and you provide us with a url, we will link the data set's documentation to your file.

How to Donate Data and Task Files

We are always looking for additional data sets and task files. Note that you may submit: (1) data and a description file, (2) a task file describing a particular analysis for a data set, or (3) both. There may be multiple task files for the same data set and the author of a task file may be different from the data donor.

If you are in doubt as to whether a data set or task file would be of interest, please contact the librarian. Donations may be made with anonymous ftp as follows:

  1. ftp kdd.ics.uci.edu
  2. user name: anonymous
  3. password: your complete email address
  4. cd incoming
  5. put filename (note: you will not be able to list the placed files)
  6. bye
  7. send e-mail to kdd '@' ics.uci.edu specifying the donated file(s).

Alternatively, you may provide us a web url and we will download the data. If neither of these methods is suitable, please contact the librarian and we will arrange the transfer of data in the most convienent manner for you.

As many researchers use this archive, please carefully fill out a data documentation form when you submit data. If you are submitting an analysis of data, please fill out a task documentation form:

There are several sample files, which may help you fill out the documentation:

We prefer that the data have a standard format. For multivariate data sets that can be represented by a table, please format the data to have one instance/example per line, no spaces, commas separated attributes values, and missing values denoted by "?". For other types of data, use your best judgment.

Thank you for your donations.

David Newman (librarian)

kdd '@' ics.uci.edu
Information and Computer Science
University of California, Irvine
Irvine, CA 92697-3425
Last modified: Sept 9, 2005