************ * OVERVIEW * ************ JARtool was a pioneering effort to develop an automatic system for cataloging small volcanoes in the large set of Venus images returned by the Magellan spacecraft. This package contains a variety of data to enable researchers to evaluate algorithms over the same images as used for the JARtool experiments reported in [1]. *********** * HISTORY * *********** Modified: 07/09/99 by M.C. Burl - Added UCI's dataset form to the package, uci_form.txt. Modified: 07/07/99 by M.C. Burl - Tried to clean things up a bit in preparation for donation to Padhraic's KDD archive. Added performance ROC curves for the baseline JARtool system. Modified: 09/25/98 by M.C. Burl - Changed the Experiments_Scoring_Table to include a column indicating the number of subexperiments for each major experiment in addition to the total area and total detection opportunities. Modified: 09/24/98 by M.C. Burl - Added a Chips subdirectory that contains view format files holding the training and test chips and labels for each major/minor experiment. Modified: 03/30/98 by M.C. Burl - Changed the groud truth files for the images used in the HET5 experiment to be consistent with those used in MLJ98 paper. Note that the number of volcanoes published in Table 2 of MLJ98 paper *should not be used*. Instead, use the numbers in the Experiments_Scoring_Table. File Origin: 02/20/98 by M.C. Burl ********* * USAGE * ********* One difficulty with comparing different learning/vision algorithms is the absence of standard data sets on which tests are performed. It is our intent that this data set become "a standard"; hence, we are asking that anyone seeking to publish results on this data perform the *FULL SUITE* of experiments defined in the Experiments_Images_Table and compare performance to that of the baseline JARtool system. Results on a small subset of images (e.g., HOM4) are not of interest and *SHOULD NOT BE PUBLISHED*. Send e-mail to jartool@aig.jpl.nasa.gov if you believe you have a compelling reason to deviate from this policy. There are two communities of researchers who we believe will be interested in the data: (1) computer vision/image processing and (2) machine learning/data mining. Hence, we have set things up to accomodate both groups. Computer vision people will probably be interested in developing or testing algorithms that start from the raw images and end up with a hypothesized list of volcano locations. Machine learning people, on the other hand, will probably be more interested in starting from a set of candidate regions and deciding whether a candidate region is a volcano or not. In the JARtool system, the process of going from images to candidate regions is called "focus of attention" or FOA. For each experiment, lists of candidate volcano locations generated by the JARtool FOA algorithm are provided. Machine learning types can use these lists to extract regions or chips around each of candidate location and then continue with their analysis (deriving features from pixels, applying neural nets, etc.). Note, however, that a small percentage of true volcanoes (10-15%) are missed by the FOA limiting the maximum achievable performance. For convenience, a set of pre-extracted chips that are separated into training and testing has recently been added to the dataset. We would appreciate that you report your results on the *FULL SUITE* of experiments (whether POSITIVE OR NEGATIVE) and a brief description of your algorithma to jartool@aig.jpl.nasa.gov. ************ * CONTENTS * ************ The specific types of data that are included in this package are as follows: 1. Images - 134 binary files. Each image is 1024 X 1024 X (unsigned) 8-bit. For convenience images follow the naming convention: img#.sdt where # is in the range [1,134]. There is also an img#.spr file, which contains header information for the images. Images can be read into Matlab with the vread.m code from the Programs subdirectory: A = vread('img1'); Note that we have included only a small subset of the full Magellan dataset (30,000 images). The full set is available as a ~150 CD-ROM "boxed set". 2. "GroundTruths" - 134 ascii files with extension .jtri (one-to-one correspondence with the images). These files are stored in the format used by the JARtool programs. Ground truth files in a simpler format .lxyr are also provided. Each of the lxyr files contains a number of rows of the form: label x y radius The quotes around "ground truth" are intended as a reminder that there is no absolute ground truth for this data set. No one has been to Venus and the image quality does not permit 100%, unambiguous identification of the volcanoes, even by human experts. The labels provide some measure of subjective uncertainty (1 = definitely a volcano, 2 = probably, 3 = possibly, 4 = only a pit is visible). See references [2,3,4] for more information on the labeling uncertainty problem. 3. Tables - This directory contains five tables. Experiments_Images_Table - an ascii table which for each experiment/subexperiment identifies the images that were used for training and testing. Experiments are designated with letters and subexperiments within an experiment (e.g., a particular cross-validation fold) are designated with a number. Ranges of images are specified using a notation similar to MATLAB, e.g., TRN = [1:3,7,10:12]; means that images 1,2,3,7,10,11,12 were used for training. This table is for researchers using the data since it details the exact set of experiments published in [1]. Images_GroundTruths_Table - an ascii table showing the "true" names of the images on the JPL system and the corresponding ground truth names. This table is for use by a JPL script that was used to copy the data to the proper directories and rename according to the img# convention; however, the image names contain information about the location of the image on the planet and hence may be of interest, e.g., f30n332_images/ff20 refers to a full-resolution Magellan data product (F-MIDR) at latitude 30 degrees north, longitude 332 degrees. Each product is subdivided into a 7 X 8 grid of "framelets". The ff20 refers to the 20th framelet. Framelet numbering goes from left to right, top to bottom. This table also contains a Labeler_ID to indicate who provided each ground truth file. Experiments_Scoring_Table - This table shows the total image area and number of usable volcano detection opportunities for each major experiment. Refer to the section on Scoring below for more information. Images_Area_Volcanoes_Table - This table shows the usable area of each image and the number of usable (ground-truth) volcanoes in each image. A 15 pixel border around the edge of the images has been excluded as "non-usable". In addition, some images have large blank regions (zeroes), where there was a gap in the Magellan acquisition or communication processes. The blank areas are not included in the area computation. (Thin image slivers are not included either.) For convenience, subtotals over image area and number of volcanoes have been taken over the appropriate images and stored in the Experiments_Scoring_Table. Labeler_Table - This table associates Labeler_ID with a string identifying the labeler. Whenever available, consensus labeling data provided by two scientists working together is used as ground truth. 4. FOA - For each experiment/subexperiment, focus of attention (FOA) files are provided for all images used in the training and testing sets. The FOA files appear in .jtri format as well as a simpler .lxyv format. The .lxyv files have the following info: label x y value where label is 1,2,3, or 4 if the region matches an entry in the corresponding ground truth file and label = 0 if the region does not appear in the ground truth. Value is the correlation value that is output from the focus of attention matched filter. The matched filter generated from each training set appears in the trn subdirectory under each experiment/subexperiment. These files can be read in Matlab as follows: mf = vread('matched_filter'); Note however that since we have implemented the FOA convolutions using the separable kernel method [5], the filter read from the file is not exactly the same as the one used in the FOA (95% approximation). Also, the matched filter is applied to a "spoiled" image that has been reduced by a factor of 2 in each dimension through block averaging. Thus, the 15 X 15 pixel filter corresponds to a 30 X 30 pixel area in the raw images. 5. Chips - For each experiment/subexperiment we have extracted the chips used for training and testing. The chips are stored in files with names like exp_A1_Ctrn and exp_A1_Ctst. There are also labels files with names exp_A1_Ltrn and exp_A1_Ltst that have the ground truth label for each chip. A third set of files exp_A1_Ntrn and exp_A1_Ntst have the image number from which each chip was extracted. This set of files probably provides the most convenient form for ML researchers interested in looking at the data. Use the vread.m utility provided in the Programs directory to read the data into Matlab. 6. Programs - There is a Matlab program provided that will enable you to read the .sdt/.spr (view format) files included in this data package. If you want to use something other than Matlab, you are on your own, but the format is fairly simple and can be understood by looking at the Matlab code. 7. Performance - For each experiment we provide an ROC curve showing the probability of detection versus the number of false alarms per square kilometer obtained with the baseline JARtool system. Each plot shows two curves - one for the focus of attention algorithm and one for the overall system. See the next section for more information on scoring. *********** * SCORING * *********** In order for results to be compared across systems, it is important that scoring be performed as described in this section (obviously computational improvements that yield the same output are allowed). Assume that there are Nd hypothesized detections whose x and y coordinates are stored in matlab arrays Dx and Dy respectively, and Ng ground truths whose x and y coordinates are stored in arrays Gx and Gy. Matlab pseudocode for scoring is as follows: s_thr = 13; d_matched = zeros(Nd,1); g_matched = zeros(Ng,1); for g = 1 : Ng for d = 1 : Nd if (sqrt( (Dx(d)-Gx(g))^2 + (Dy(d)-Gy(g))^2 ) <= s_thr) % Found a match d_matched(d) = 1; g_matched(g) = 1; end end end n_false_alarms = sum(d_matched == 0); n_true_detects = sum(g_matched == 1); Note that this algorithm allows (1) a single detection to match multiple ground truths (this rarely happens because the volcanoes are usually adequately separated from each other) and (2) a single ground truth to match multiple detections. We mention this point because it is tempting to place a break statement at the end of the "found a match" block (for speed-up), but you will obtain slightly different (worse) results by doing so. An important aspect of scoring is that before the matching process is performed, the detection list and ground truth list are "filtered" to remove any entries that are too close (<= 15 pixels) to the image border. The Experiments_Scoring_Table shows the number of volcano detection opportunities and the image area *after* border exclusion. This explains why the number of volcano detection opportunities does not precisely match the number of ground truth volcanoes. To generate performance numbers for an experiment, n_false_alarms and n_true_detects are each summed up over the test images used in each subexperiment. The total number of true detects is converted to a detection probability by dividing by the number of detection opportunities (from Experiments_Scoring_Table). Similarly, the total number of false alarms is converted to false alarms per square kilometer by dividing by the total image area (also from Experiments_Scoring_Table). For algorithms that make graduated decisions (e.g., algorithms that generate posterior probability estimates), it is possible to generate an ROC curve showing the trade-off between detection probability and false alarm rate. However, for algorithms that make only a binary decision (e.g., a standard nearest-neighbor algorithm), only a single point can be generated. For researchers who start from the JARtool FOA candidate regions or equivalently from the Chips data, please report only unconditional performance numbers, i.e., relative to the area and number of volcanoes given in the Experiments_Scoring_Table, rather than conditional numbers based on the number of non-volcanoes and volcanoes in the FOA lists or _Ctst files. ************** * REFERENCES * ************** [1] M.C. Burl, L. Asker, P. Smyth, U. Fayyad, P. Perona, L. Crumpler, and J. Aubele, "Learning to Recognize Volcanoes on Venus", Machine Learning, (March 1998). [2] P. Smyth, M.C. Burl, U.M. Fayyad, P. Perona, and P. Baldi, Chapter: "Inferring Ground Truth from Subjectively Labeled Images of Venus", In Advances in Neural Information Processing Systems 7, Morgan Kaufman, (1995). [3] P. Smyth, M.C. Burl, U.M. Fayyad, and P. Perona, Chapter: "Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth", In Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Menlo Park, CA, (1995). [4] M.C. Burl, U.M. Fayyad, P. Perona, and P. Smyth, "Automated Analysis of Radar Imagery of Venus: Handling Lack of Ground Truth", In IEEE Intl. Conf. on Image Proc., Vol. III, pp. 236-240, (1994). [5] S. Treitel and J. Shanks, "The Design of Multistage Separable Planar Filters", IEEE Trans on Geoscience and Electronics, 9(1):10-27, (1971).