In recent years, machine learning approaches, and in particular deep neural networks, have yielded significant improvements on several natural language processing and computer vision tasks; however, such breakthroughs have not yet been observed in the area of information retrieval. Besides the complexity of IR tasks, such as understanding the user's information needs, a main reason is the lack of high-quality and/or large-scale training data for many IR tasks. This necessitates studying how to design and train machine learning algorithms where there is no large-scale or high-quality data in hand. Therefore, considering the quick progress in development of machine learning models, this is an ideal time for a workshop that especially focuses on learning in such an important and challenging setting for IR tasks.
The goal of this workshop is to bring together researchers from industry, where data is plentiful but noisy, with researchers from academia, where data is sparse but clean, to discuss solutions to these related problems.
9:00 - 9:10 | Opening |
9:10 - 10:00 | Keynote by Marc Najork [Link] |
10:00 - 10:15 | Coffee break |
10:15 - 11:30 | Paper presentations [Accepted Papers] |
11:30 - 12:00 | Discussion panel and Closing |
Recent years have seen great advances in using machine-learned ranking functions for relevance prediction. Any learning-to-rank framework requires abundant labeled training examples. In web search, labels may either be assigned explicitly (say, through crowd-sourced assessors) or based on implicit user feedback (say, result clicks). In personal (e.g. email) search, obtaining labels is more difficult: document-query pairs cannot be given to assessors due to privacy constraints, and clicks on query-document pairs are extremely sparse (since each user has a separate corpus), noisy and biased. Over the past several years, we have worked on techniques for training ranking functions on result clicks in an unbiased and scalable fashion. Our techniques are used in many Google products, such as Gmail, Inbox, Drive and Calendar. In this talk, I will present an overview of this line of research.
Marc Najork is a Research Engineering Director at Google, where he manages a team working on a portfolio of machine learning problems. Before joining Google in 2014, Marc spent 12 years at Microsoft Research Silicon Valley and 8 years at Digital Equipment Corporations’s Systems Research Center in Palo Alto. He received a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. Marc has published about 60 papers and holds 26 issued patents. Much of his past research has focused on improving web search, and on understanding the evolving nature of the web. He served as ACM TWEB editor-in-chief, CACM news board co-chair, WWW 2004 program co-chair, WSDM 2008 conference chair, and in numerous senior PC member roles.
University of Massachusetts Amherst
University of Amsterdam
Spotify
Toutiao
Microsoft
We invite two kinds of contributions: research papers (up to 6 pages) and position papers (up to 2 pages). Submissions must be in English, in PDF format, and should not exceed the appropriate page limit in the current ACM two-column conference format (including references and figures). Suitable LaTeX and Word templates are available from the ACM Website. The papers can represent reports of original research, preliminary research results, or proposals for new work. The review process is single-blind. Papers will be evaluated according to their significance, originality, technical content, style, clarity, relevance to the workshop, and likelihood of generating discussion. Authors should note that changes to the author list after the submission deadline are not allowed without permission from the PC Chairs. At least one author of each accepted paper is required to register for, attend, and present the work at the workshop. All short papers are to be submitted via EasyChair at https://easychair.org/conferences/?conf=lnd4ir.
Papers presented at the workshop will be required to be uploaded to arXiv.org but will be considered non-archival, and may be submitted elsewhere (modified or not), although the workshop site will maintain a link to the arXiv versions. This makes the workshop a forum for the presentation and discussion of current work, without preventing the work from being published elsewhere.