Guidelines for Documenting Databases: Task Information

The purpose of this page is to summarize the methods and results by the author
and others in the literature for the specific task on the indicated data set. 
For example, a task page might summarize the relevant work that has been 
published to date on predicting the Dow Jones index at a daily level.

When filling out this form, simply place your answer after the point indicated
by '>'. We will then process the form to ensure that all documentation files
follow a common format.


1. Database Used
   -- Indicate the corresponding database for this task.
>
Anonymous web data from www.microsoft.com

2. Task Type
   -- Indicate the task: (association rules, classification, clustering, 
      control, density estimation, exploratory data analysis, image/spatial
      modelling, regression, retrieval, time series prediction)
   -- If the task is not listed above, please describe it.
>
Classification, Collaborative Filtering

3. Source
   (a) Donor of task information (name/snail address/phone/email/homepage)
>
Jack S. Breese, David Heckerman, Carl M. Kadie
Microsoft Research, Redmond WA, 98052-6399, USA
breese@microsoft.com, heckerma@microsoft.com, carlk@microsoft.com


4. Problem Description
   -- Provide a detailed description of the data analysis problem. The 
      description should answer the following questions:
      (a) What is the data analysis task?
>
Predicting what areas of www.microsoft.com a user visited based
on data on what other areas he or she visited.

      (b) What are the criteria and constraints for judging the quality
          of solutions (e.g. minimize loss, comprehensibility, response
          time, etc.)?
>
Predictive accuracy
Learning time
Speed of predictions


5. Preprocessing and Modifications
   -- Describe any additional preprocessing or modifications of the original 
      data (i.e. data already in the archive) for this analysis. 
>
None

6. Other Relevant Information
   -- Include any additional information that the researcher may find useful.
      For example:

     (a) Suggested Experimental Procedure 
         -- is there a suggested experimental procedure to evaluate algorithms?
         -- are there recommended train/tune/test sets
         -- are there variables (features, attributes)  that should not be 
            used for prediction and are for information purposes only?
 
     (b) Cost information (if applicable/available)
         -- e.g. loss matrix for misclassification errors

     (c) Other miscellaneous information
         -- e.g. Are there well known physical or theoretical models for the
            process or for individual variables?
>
Experimental procedures are described in:
   J. Breese, D. Heckerman., C. Kadie _Empirical Analysis of
   Predictive Algorithms for Collaborative Filtering_ Proceedings
   of the Fourteenth Conference on Uncertainty in Artificial Intelligence,
   Madison, WI, July, 1998.

The train- and test set used in this paper are provided as
'anonymous-mswebtrain.dst' and 'anonymous-mswebtest.dst'


7. Results
   -- Include references and a brief summary of key papers that report 
      results on this dataset. Each entry should include:
      (a) The complete reference of the article where it was described/used 
          (with a link to an online version if possible) 
      (b) The study's purpose: for example, did the paper introduce a new 
          a new algorithm, or present a comparison of several approaches.
          -- Briefly describe the algorithms used. Indicate the types of
             model structures used, as well as the fitting procedure. For
             example, the model structure could be a 1-hidden layer neural
             network trained with backpropagation.
          -- Indicate if any special data structures were used to organize
             the data (e.g., B*-trees, etc).
      (c) The major findings: for example, which algorithms worked well or
          poorly.
>
Results for this dataset are reported in:

J. Breese, D. Heckerman., C. Kadie _Empirical Analysis of
Predictive Algorithms for Collaborative Filtering_ Proceedings
of the Fourteenth Conference on Uncertainty in Artificial Intelligence,
Madison, WI, July, 1998.

This paper presents a comparison of a number of memory-based (correlation
and vector similarity techniques) as well as model-based (cluster models
and Bayesian networks) methods. In terms of predictive accuracy, the results
indicate that the authors' Bayesian network approach to collaborative
filtering is the best performing approach on this dataset.


8. References & Further Information
   -- Include here references to additional information that focuses on the
      analysis of the data. (Note there is another document for references 
      that describe the data itself).
   (a) pointers to tutorial/background information
   (b) other useful web sites (parent archives, domain specific sites)
   (d) online documentation or papers
   (e) other relevant publications
   (f) any additional comments on this dataset
>

Results on this dataset were expanded as Microsoft Research Technical Report
MSR-TR-98-12. The papers are available on-line at:
http://research.microsoft.com/users/breese/cfalgs.html