Deep Passage Retrieval Reranking based on Semantic Enhancement

Session

Information Systems and Security

Description

In the field of information retrieval, Passage related to Query are usually easy to get, and the passage that irrelevance is in a large collection and contain more noise. How do you get quality, irrelevant passage is vital. Based on this observation, we through a method of similarity measure, called “doc-ad-doc”, which get the irrelevant passage that is least relevant to the query and most similar to the relevant passage. Through experimental verification, The 0.8M training set constructed by our method can achieve the effect of constructing 8M training set by random sampling. Secondly, traditional information retrieval methods usually adopted sparse vector space model, such as BM25.As well as many mainstream deep relevance retrieval methods, They train the classification models of query and passage. In our work, we adopt the dense representation of the interactive query and passage,where embeddings are used a bert- base encoder model. We test it on MS MARCO data set, showing that the retrieval accuracy of our method outperforms the classification model and BM25-based Baselines.

Session Chair

Dashmir Istrefi

Session Co-Chair

Agon Mehmeti

Proceedings Editor

Edmond Hajrizi

ISBN

978-9951-437-96-7

Location

Lipjan, Kosovo

Start Date

31-10-2020 10:45 AM

End Date

31-10-2020 12:15 PM

DOI

10.33107/ubt-ic.2020.208

This document is currently not available here.

Share

COinS
 
Oct 31st, 10:45 AM Oct 31st, 12:15 PM

Deep Passage Retrieval Reranking based on Semantic Enhancement

Lipjan, Kosovo

In the field of information retrieval, Passage related to Query are usually easy to get, and the passage that irrelevance is in a large collection and contain more noise. How do you get quality, irrelevant passage is vital. Based on this observation, we through a method of similarity measure, called “doc-ad-doc”, which get the irrelevant passage that is least relevant to the query and most similar to the relevant passage. Through experimental verification, The 0.8M training set constructed by our method can achieve the effect of constructing 8M training set by random sampling. Secondly, traditional information retrieval methods usually adopted sparse vector space model, such as BM25.As well as many mainstream deep relevance retrieval methods, They train the classification models of query and passage. In our work, we adopt the dense representation of the interactive query and passage,where embeddings are used a bert- base encoder model. We test it on MS MARCO data set, showing that the retrieval accuracy of our method outperforms the classification model and BM25-based Baselines.