Guidelines for Documenting Data Sets: TASK INFORMATION The purpose of this page is to summarize the methods and results by the author and others in the literature for the specific task (e.g. classification, regression, clustering, etc.) on the indicated data set. For example, a task page might summarize the relevant work that has been published to date on predicting the Dow Jones index at a daily level. When filling out this form, simply place your answer after the point indicated by '>'. We will then process the form to ensure that all documentation files follow a common format. 1. Data Set Used -- Indicate the corresponding data set for this task. > 2. Task Type -- Indicate the task: (association rules, classification, clustering, control, density estimation, exploratory data analysis, image/spatial modelling, regression, retrieval, time series prediction) -- If the task is not listed above, please describe it. > 3. Source (a) Donor of task information (name/snail address/phone/email/homepage) > 4. Problem Description -- Provide a detailed description of the data analysis problem. The description should answer the following questions: (a) What is the data analysis task? > (b) What are the criteria and constraints for judging the quality of solutions (e.g. minimize a loss function, maximize comprehensibility, minimize response time, etc.)? > 5. Preprocessing and Modifications -- Describe any additional preprocessing or modifications of the original data (i.e. data already in the archive) carried out for this analysis. > 6. Other Relevant Information -- Include any additional information that the researcher may find useful. For example: (a) Suggested Experimental Procedure -- is there a suggested experimental procedure to evaluate algorithms? -- are there recommended train/tune/test sets -- are there variables (features, attributes) that should not be used for prediction and are for information purposes only? (b) Cost information (if applicable/available) -- e.g. loss matrix for misclassification errors (c) Other miscellaneous information -- e.g. Are there well known physical or theoretical models for the process or for individual variables? > 7. Results -- Include references and a brief summary of key papers that report results on this dataset. Each entry should include: (a) The complete reference of the article where it was described/used (with a link to an online version if possible) (b) The study's purpose: for example, did the paper introduce a new a new algorithm, or present a comparison of several approaches. -- Briefly describe the algorithms used. Indicate the types of model structures used, as well as the fitting procedure. For example, the model structure could be a 1-hidden layer neural network trained with backpropagation. -- Indicate if any special data structures were used to organize the data (e.g., B*-trees, etc). (c) The major findings: for example, which algorithms worked well or poorly. > 8. References & Further Information -- Include here references to additional information that focuses on the analysis of the data. (Note there is another document for references that describe the data itself). (a) pointers to tutorial/background information (b) other useful web sites (parent archives, domain specific sites) (d) online documentation or papers (e) other relevant publications (f) any additional comments on this dataset >