This dataset records which areas (Vroots) of www.microsoft.com each user visited in a one-week timeframe in Feburary 1998.
Jack S. Breese, David Heckerman, Carl M. Kadie Microsoft Research, Redmond WA, 98052-6399, USA breese@microsoft.com, heckerma@microsoft.com, carlk@microsoft.comDate Donated: November 30, 1998
The data was created by sampling and processing the www.microsoft.com logs. The data records the use of www.microsoft.com by 38000 anonymous, randomly-selected users. For each user, the data lists all the areas of the web site (Vroots) that the user visited in a one week timeframe.
Users are identified only by a sequential number, for example, User #14988, User #14989, etc. The file contains no personally identifiable information. The 294 Vroots are identified by their title (e.g. "NetShow for PowerPoint") and URL (e.g. "/stream"). The data comes from one week in February, 1998.
Each instance represents an anonymous, randomly selected user of the web site. Each attribute is an area ("vroot") of the www.microsoft.com web site.Missing Attribute Values: The data is very sparse, so vroot visits are explicit, nonvisits are implicit (missing).
Training Instances | 32711 |
Testing Instances | 5000 |
Attributes | 294 |
Mean vroot visits per case | 3.0 |
-- Attribute lines: For example, 'A,1277,1,"NetShow for PowerPoint","/stream"' Where: 'A' marks this as an attribute line, '1277' is the attribute ID number for an area of the website (called a Vroot), '1' may be ignored, '"NetShow for PowerPoint"' is the title of the Vroot, '"/stream"' is the URL relative to "http://www.microsoft.com" Case and Vote Lines: For each user, there is a case line followed by zero or more vote lines. For example: C,"10164",10164 V,1123,1 V,1009,1 V,1052,1 Where: 'C' marks this as a case line, '10164' is the case ID number of a user, 'V' marks the vote lines for this case, '1123', 1009', 1052' are the attributes ID's of Vroots that a user visited. '1' may be ignored.
J. Breese, D. Heckerman., C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, July, 1998.