Recommender systems represent a key technology for e-commerce providers such as Google, Amazon, Netflix, Booking.com and Spotify. It is therefore with a certain urgency that researchers are working intensively on making ever more accurate predictions about the products and services users might want to consume next. However, in a paper published recently, Maurizio Ferrari Dacrema, Paolo Cremonesi and Dietmar Jannach were able to show that several critical issues concerning the research methodology are hindering progress in the development of recommender systems. In recognition of their work, they received the Best Full Paper Award at the renowned ACM Conference on Recommender Systems in Copenhagen in September.
In the sphere of recommender systems “research business” has followed an established pattern for several decades now: Researchers develop new algorithms with which they can predict the future choices of users on the basis of earlier consumption decisions. They do this by drawing on historical data, for instance the click path taken by a Spotify user. Individual selection decisions are obscured and, using new algorithms, researchers attempt to predict the choices that users make in reality, aiming for the greatest possible accuracy. This process is referred to as an “offline experiment”. “Researchers have many choices here”, according to Dietmar Jannach, professor at the Department of Applied Informatics. The data sets can be freely selected, as can the measurement methods and metrics used to quantify prediction accuracy. “In the global race for the most accurate predictive algorithm, it is possible – in the worst case – to design the configurations in such a way that the result is better and more accurate than the hitherto best, by a margin that is small but nonetheless significant in size,” Jannach goes on to explain.
Together with two Italian colleagues, Dietmar Jannach has analysed 18 scientific articles published between 2015 and 2018 that proposed new, supposedly better prediction algorithms for recommender systems. Their results are sobering: “Many new methods are no better than old fundamental methods, and work could be done on the latter to improve them further.”
The responsibility does not lie solely with the research methodology, but also with the fact that the scientific studies lack reproducibility. Many neglect to make the source code publicly available. In addition, there are too few standards against which all results can be measured uniformly. A further problem is that the investigations are not adequately underpinned by theory.
“In the long run, the industry that is set to exploit our technology needs other insights that can contribute to advancing recommender systems. We must also be fully aware of this: Better predictions do not necessarily equate to better recommendations”, Dietmar Jannach concludes.
Ferrari Dacrema, M., Cremonesi, P. & Jannach, D. (2019). Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. Proceedings of the 13th ACM Conference on Recommender Systems (RecSys 2019). https://arxiv.org/abs/1907.06902