Deep Passage Retrieval Reranking based on Semantic Enhancement
Session
Information Systems and Security
Description
In the field of information retrieval, Passage related to Query are usually easy to get, and the passage that irrelevance is in a large collection and contain more noise. How do you get quality, irrelevant passage is vital. Based on this observation, we through a method of similarity measure, called “doc-ad-doc”, which get the irrelevant passage that is least relevant to the query and most similar to the relevant passage. Through experimental verification, The 0.8M training set constructed by our method can achieve the effect of constructing 8M training set by random sampling. Secondly, traditional information retrieval methods usually adopted sparse vector space model, such as BM25.As well as many mainstream deep relevance retrieval methods, They train the classification models of query and passage. In our work, we adopt the dense representation of the interactive query and passage,where embeddings are used a bert- base encoder model. We test it on MS MARCO data set, showing that the retrieval accuracy of our method outperforms the classification model and BM25-based Baselines.
Session Chair
Dashmir Istrefi
Session Co-Chair
Agon Mehmeti
Proceedings Editor
Edmond Hajrizi
ISBN
978-9951-437-96-7
Location
Lipjan, Kosovo
Start Date
31-10-2020 10:45 AM
End Date
31-10-2020 12:15 PM
DOI
10.33107/ubt-ic.2020.208
Recommended Citation
Chen, Liping and Ren, Junchao, "Deep Passage Retrieval Reranking based on Semantic Enhancement" (2020). UBT International Conference. 66.
https://knowledgecenter.ubt-uni.net/conference/2020/all_events/66
Deep Passage Retrieval Reranking based on Semantic Enhancement
Lipjan, Kosovo
In the field of information retrieval, Passage related to Query are usually easy to get, and the passage that irrelevance is in a large collection and contain more noise. How do you get quality, irrelevant passage is vital. Based on this observation, we through a method of similarity measure, called “doc-ad-doc”, which get the irrelevant passage that is least relevant to the query and most similar to the relevant passage. Through experimental verification, The 0.8M training set constructed by our method can achieve the effect of constructing 8M training set by random sampling. Secondly, traditional information retrieval methods usually adopted sparse vector space model, such as BM25.As well as many mainstream deep relevance retrieval methods, They train the classification models of query and passage. In our work, we adopt the dense representation of the interactive query and passage,where embeddings are used a bert- base encoder model. We test it on MS MARCO data set, showing that the retrieval accuracy of our method outperforms the classification model and BM25-based Baselines.