QBE-Dataset: A Query-By-Example Dataset for Research Paper Retrieval

Introduction

Research paper retrieval plays a crucial role in academic and scientific endeavors. However, existing evaluation approaches for information retrieval systems in this context often lack realism and diversity. The QBE-Dataset aims to address these limitations by providing a realistic and challenging testbed for evaluating retrieval models and algorithms.

Dataset Description

The QBE-Dataset consists of 50 abstract-query pairs, covering a diverse range of topics and query facets. Each abstract-query pair is annotated with binary relevance judgments, indicating the degree of relevance between the abstract and the given query facet. The dataset's inclusion of multiple query facets allows for the evaluation of retrieval systems across different aspects of a research paper.

Benefits of the QBE-Dataset

The QBE-Dataset offers several benefits for evaluating information retrieval systems in research paper retrieval:

  1. Realism and Diversity: The dataset's diverse range of topics and query facets ensures its representativeness of real-world scenarios. This promotes realistic evaluation and allows for the assessment of retrieval systems in various contexts.

  2. Quality and Reliability: The relevance judgments provided by human judges ensure the dataset's quality and reliability. Multiple judges independently assess the relevance of each abstract-query pair, minimizing subjectivity and bias in the judgments.

  3. Simplicity and Comparability: The binary nature of the relevance judgments simplifies the evaluation process and enables easy comparison of different retrieval models. This aligns with common practices in information retrieval evaluation.

  4. Accessibility and Reproducibility: The dataset is publicly accessible, enabling researchers and practitioners to easily access and use it for their own experiments and evaluations. This promotes transparency and reproducibility in the field of information retrieval.

Conclusion

The QBE-Dataset provides a valuable resource for evaluating information retrieval systems in the context of research paper retrieval. Its accessibility, diversity, and reliability make it a pragmatic alternative to existing evaluation approaches. Researchers and practitioners can leverage this dataset to develop and compare different retrieval models, driving advancements in the field of information retrieval.

For more information and access to the QBE-Dataset, please visit qbe-dataset.org.

Date of publication: Not specified.


Publication source

See the PDF from which this article has been generated:

PDF source url: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/20f07591c6fcb220ffe637cda29bb3f6-Paper-round2.pdf