LND4IR

About the Workshop

In recent years, machine learning approaches, and in particular deep neural networks, have yielded significant improvements on several natural language processing and computer vision tasks; however, such breakthroughs have not yet been observed in the area of information retrieval. Besides the complexity of IR tasks, such as understanding the user's information needs, a main reason is the lack of high-quality and/or large-scale training data for many IR tasks. This necessitates studying how to design and train machine learning algorithms where there is no large-scale or high-quality data in hand. Therefore, considering the quick progress in development of machine learning models, this is an ideal time for a workshop that especially focuses on learning in such an important and challenging setting for IR tasks.

The goal of this workshop is to bring together researchers from industry, where data is plentiful but noisy, with researchers from academia, where data is sparse but clean, to discuss solutions to these related problems.

Keynote

"Using biased data for Learning-to-Rank" by Marc Najork

Recent years have seen great advances in using machine-learned ranking functions for relevance prediction. Any learning-to-rank framework requires abundant labeled training examples. In web search, labels may either be assigned explicitly (say, through crowd-sourced assessors) or based on implicit user feedback (say, result clicks). In personal (e.g. email) search, obtaining labels is more difficult: document-query pairs cannot be given to assessors due to privacy constraints, and clicks on query-document pairs are extremely sparse (since each user has a separate corpus), noisy and biased. Over the past several years, we have worked on techniques for training ranking functions on result clicks in an unbiased and scalable fashion. Our techniques are used in many Google products, such as Gmail, Inbox, Drive and Calendar. In this talk, I will present an overview of this line of research.

Marc Najork is a Research Engineering Director at Google, where he manages a team working on a portfolio of machine learning problems. Before joining Google in 2014, Marc spent 12 years at Microsoft Research Silicon Valley and 8 years at Digital Equipment Corporations’s Systems Research Center in Palo Alto. He received a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. Marc has published about 60 papers and holds 26 issued patents. Much of his past research has focused on improving web search, and on understanding the evolving nature of the web. He served as ACM TWEB editor-in-chief, CACM news board co-chair, WWW 2004 program co-chair, WSDM 2008 conference chair, and in numerous senior PC member roles.

Accepted Papers

Distributed Evaluations: Ending Neural Point Metrics, Daniel Cohen, Scott M. Jordan, and W. Bruce Croft. [Link]

Explainable Agreement through Simulation for Tasks with Subjective Labels, John Foley. [Link]

Information Retrieval in African Languages, Hussein Suleman. [Link]

Highly Relevant Routing Recommendation Systems for Handling Few Data Using MDL Principle, Diyah Puspitaningrum, I.S.W.B. Prasetya, and P.A. Wicaksono. [Link]

Learning to Rank from Samples of Variable Quality, Mostafa Dehghani and Jaap Kamps. [Link]

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data, Ethem Can, Aysu Ezen-Can, and Fazli Can. [Link]

Named Entity Recognition with Extremely Limited Data, John Foley, Sheikh Muhammad Sarwar, and James Allan. [Link]

Towards Theoretical Understanding of Weak Supervision for Information Retrieval, Hamed Zamani and W. Bruce Croft. [Link]

Organizers

Hamed Zamani

University of Massachusetts Amherst

Mostafa Dehgahni

University of Amsterdam

Fernando Diaz

Spotify

Hang Li

Toutiao

Nick Craswell

Microsoft

Program Committee:

Michael Bendersky, Google, USA

Daniel Cohen, UMass Amherst, USA

W. Bruce Croft, UMass Amherst, USA

J. Shane Culpepper, RMIT Univ., Australia

Maarten de Rijke, Univ. of Amsterdam, The Netherlands

Jiafeng Guo, Chinese Academy of Sciences, China

Claudia Hauff, TU Delf, The Netherlands

Jaap Kamps, Univ. of Amsterdam, The Netherlands

Craig Macdonald, Univ. of Glasgow, UK

Bhaskar Mitra, Microsoft and UCL, UK

Amirmohammad Rooshenas, UMass Amherst, USA

Min Zhang, Tsinghua University, China

Yongfeng Zhang, Rutgers University, USA

Call for Paper

We invite two kinds of contributions: research papers (up to 6 pages) and position papers (up to 2 pages). Submissions must be in English, in PDF format, and should not exceed the appropriate page limit in the current ACM two-column conference format (including references and figures). Suitable LaTeX and Word templates are available from the ACM Website. The papers can represent reports of original research, preliminary research results, or proposals for new work. The review process is single-blind. Papers will be evaluated according to their significance, originality, technical content, style, clarity, relevance to the workshop, and likelihood of generating discussion. Authors should note that changes to the author list after the submission deadline are not allowed without permission from the PC Chairs. At least one author of each accepted paper is required to register for, attend, and present the work at the workshop. All short papers are to be submitted via EasyChair at https://easychair.org/conferences/?conf=lnd4ir.

Papers presented at the workshop will be required to be uploaded to arXiv.org but will be considered non-archival, and may be submitted elsewhere (modified or not), although the workshop site will maintain a link to the arXiv versions. This makes the workshop a forum for the presentation and discussion of current work, without preventing the work from being published elsewhere.

Relevant topics include, but are not limited to:

Learning from noisy data for IR

Learning from automatically constructed data
Learning from implicit feedback data, e.g., click data

Distant or weak supervision and learning from IR heuristics
Unsupervised and semi-supervised learning for IR
Transfer learning for IR
Incorporating expert/domain knowledge to improve learning-based IR models

Learning from labeled features
Incorporating IR axioms to improve machine learning models

Important Dates:

Submission deadline: May 4, 2018
Paper notifications: May 25, 2018
Camera-ready deadline: June 8, 2018
Workshop Day: July 12, 2018

9:00 - 9:10	Opening
9:10 - 10:00	Keynote by Marc Najork [Link]
10:00 - 10:15	Coffee break
10:15 - 11:30	Paper presentations [Accepted Papers]
11:30 - 12:00	Discussion panel and Closing