Guidelines for Documenting Data Sets: TASK INFORMATION

The purpose of this page is to summarize the methods and results by the author
and others in the literature for the specific task (e.g. classification, 
regression, clustering, etc.) on the indicated data set. For example, a task 
page might summarize the relevant work that has been published to date on 
predicting the Dow Jones index at a daily level.

When filling out this form, simply place your answer after the point indicated
by '>'. We will then process the form to ensure that all documentation files
follow a common format.


1. Data Set Used
   -- Indicate the corresponding data set for this task.
>

2. Task Type
   -- Indicate the task: (association rules, classification, clustering, 
      control, density estimation, exploratory data analysis, image/spatial
      modelling, regression, retrieval, time series prediction)
   -- If the task is not listed above, please describe it.
>

3. Source
   (a) Donor of task information (name/snail address/phone/email/homepage)
>

4. Problem Description
   -- Provide a detailed description of the data analysis problem. The 
      description should answer the following questions:
      (a) What is the data analysis task?
>
      (b) What are the criteria and constraints for judging the quality
          of solutions (e.g. minimize a loss function, maximize 
          comprehensibility, minimize response time, etc.)?
>

5. Preprocessing and Modifications
   -- Describe any additional preprocessing or modifications of the original 
      data (i.e. data already in the archive) carried out for this analysis. 
>

6. Other Relevant Information
   -- Include any additional information that the researcher may find useful.
      For example:

     (a) Suggested Experimental Procedure 
         -- is there a suggested experimental procedure to evaluate algorithms?
         -- are there recommended train/tune/test sets
         -- are there variables (features, attributes)  that should not be 
            used for prediction and are for information purposes only?
 
     (b) Cost information (if applicable/available)
         -- e.g. loss matrix for misclassification errors

     (c) Other miscellaneous information
         -- e.g. Are there well known physical or theoretical models for the
            process or for individual variables?
>

7. Results
   -- Include references and a brief summary of key papers that report 
      results on this dataset. Each entry should include:
      (a) The complete reference of the article where it was described/used 
          (with a link to an online version if possible) 
      (b) The study's purpose: for example, did the paper introduce a new 
          a new algorithm, or present a comparison of several approaches.
          -- Briefly describe the algorithms used. Indicate the types of
             model structures used, as well as the fitting procedure. For
             example, the model structure could be a 1-hidden layer neural
             network trained with backpropagation.
          -- Indicate if any special data structures were used to organize
             the data (e.g., B*-trees, etc).
      (c) The major findings: for example, which algorithms worked well or
          poorly.
>

8. References & Further Information
   -- Include here references to additional information that focuses on the
      analysis of the data. (Note there is another document for references 
      that describe the data itself).
   (a) pointers to tutorial/background information
   (b) other useful web sites (parent archives, domain specific sites)
   (d) online documentation or papers
   (e) other relevant publications
   (f) any additional comments on this dataset
>