The data was collected using an on-line evaluation strategy. The on-line interface was developed and is being maintained by Nikhil Shirahatti (nvs@CS.arizona.edu) The data is due to the collective effort of 32 students and we are thankful for their help. DATA DESCRIPTION ---------------- The data consists of human evaluation scores for a pair of query-result images. The pre-processing step involved an intelligent selection of the query-result pairs such that they span the broad spectrum of choices i.e colloquially speaking from a good match to a poor match. We cannot choose query/result images at random as most of them will be judged as a poor match, hence resulting in a poor data. The main stratergy is to use existing image retrieval systems to select serviceable image pairs. Unfortunately, image retrieval systems do not work well, hence we propose using an exponential function ancilliary to the retrieval system to obtain a roughly uniform distribution over the choices. We also stress on collecting more data as this will alow us to be approximate in the uniformity but still have enough examples over the human responses. The data consists of query-result pairs from four content-based image retrieval systems. This setup is conformed so as to be cautious about not introducing abnormalities in the data because of irregularities of certain image retrieval systems. The experimental routine was as follows: First, the query image and result images from the four CBIR systems were displayed in random order. Then the user rated each match on a scale of 1 to 5 with 1 being a poor match and 5 a good match. We maintain the first 100 queries as common to all users so as to reduce the variance among evaluators. The rest of the images evaluated by the users, are unique. To get a coomon domain of scores, we mapped the computer scores to the adjusted human scores by three mapping methods. The mapping method the yielded the best correlation was retained. The agreement between the mapped scores and the adjusted human scores gave a indication of performance. Also, the data gives us options to measure precision-recall and normalized-rank. We present the image retrieval community with a data set of image pairs marked with a relevance score. This score is adjusted for reducing variance among the participants that evaluted this set. The other data available for download is the annotation ground-truth system. This data is obtained from the annotation engine of Kobus' system [Words and Pictures]. The data consists of an annotation score for the same query-image pairs. In order to allow researches use our data we have made available the groundtruth data freely downloadable. For choice of performances indices, you could refer our paper for the ones we have used or you could come up with your own. The database we use in our research was provided by Corel. This database has copyright restrictions and may not be freely distributed. But for those researches who have access to the Corel, our ground-truth data should simplify things because we use Corel's indexing scheme. We hope to develop a copyright free database in the future. For any information regarding benchmarking contact: Nikhil Shirahatti (shirahatti@gmail.com)