ISBI News internship

Invitation Talk: “Towards Large Neural Networks that can Reason”

The Italian Association for Artificial Intelligence is pleased to announce the next seminar of its Spotlight Seminars on AI initiative:

April, 20 – 5:00PM (CEST)

Title: Towards Large Neural Networks that can Reason

Speaker: YOSHUA BENGIO, Université de Montréal

The aim of the seminar series is to illustrate, explore and discuss current scientific challenges, trends, and possibilities in all branches of our articulated research field. The seminars will be held virtually on the YouTube channel of the Association (https://www.youtube.com/c/AIxIAit), on a monthly basis (and made permanently available on that channel), by leading Italian researchers as well as by top international scientists.

The seminars are mainly aimed at a broad audience interested in AI research, and they are also included in the Italian PhD programme in Artificial Intelligence; indeed, AIxIA warmly encourages the attendance of young scientists and PhD students.

Bio: Yoshua Bengio is recognized worldwide as one of the leading experts in artificial intelligence, known for his conceptual and engineering breakthroughs in artificial neural networks and deep learning. He is a Full Professor in the Department of Computer Science and Operations Research at Université de Montréal and the Founder and Scientific Director of Mila – Quebec Artificial Intelligence Institute, one of the largest academic institutes in deep learning and one of the three federally-funded centers of excellence in AI research and innovation in Canada. He began his studies in Montreal, where he obtained his Ph.D. in Computer Science from McGill University in 1992. After completing a postdoctoral fellowship at the Massachusetts Institute of Technology (MIT) on statistical learning and sequential data, he completed a second postdoc at AT&T Bell Laboratories, in Holmdel, NJ, on learning and vision algorithms in 1993. That same year, he returned to Montreal and joined UdeM as a faculty member. In 2016, he became the Scientific Director of IVADO. He is Co-Director of the CIFAR Learning in Machines & Brains program that funded the initial breakthroughs in deep learning and since 2019, holds a Canada CIFAR AI Chair and is Co-Chair of Canada’s Advisory Council on AI. In 2022, Yoshua Bengio became the most cited computer scientist in the world (h-index). Concerned about the social impact of AI, he actively took part in the conception of the Montreal Declaration for the Responsible Development of Artificial Intelligence. His goal is to contribute to uncovering the principles giving rise to intelligence through learning while favouring the development of AI for the benefit of all. Yoshua Bengio was made an Officer of the Order of Canada and a Fellow of the Royal Society of Canada in 2017 and in 2020, became a Fellow of the Royal Society of London. From 2000 to 2019, he held the Canada Research Chair in Statistical Learning Algorithms. He is a member of the NeurIPS Foundation advisory board and Co-Founder of the ICLR conference. His scientific contributions have earned him numerous awards, including the 2019 Killam Prize for Natural Sciences, the 2017 Government of Québec Marie-Victorin Award, the 2018 Lifetime Achievement Award from the Canadian AI Association, the Prix d’excellence FRQNT (2019), the Medal of the 50th Anniversary of the Ministry of International Relations and Francophonie (2018), the 2019 IEEE CIS Neural Networks Pioneer Award, Acfas’s Urgel-Archambault Prize (2009) and in 2017, he was named Radio-Canada’s Scientist of the Year. He is the 2018 laureate of the A.M. Turing Award, “the Nobel Prize of Computing,” alongside Geoffrey Hinton and Yann LeCun for their important contributions and advances in deep learning. In 2022, he was appointed Knight of the Legion of Honor by France and named co-laureate of Spain’s Princess of Asturias Award for technical and scientific research.

Abstract: Current neural networks, such as large language models and those based on images or paired images and text, are trained to fit their training data, with very little in their architecture that could force them to produce answers that are coherent with respect to individual pieces of knowledge. In that sense, they seem to be missing some of the reasoning abilities and causal understanding that humans can benefit from, and this may result in incoherent outputs and mistakes that humans would typically not make, especially out-of-distribution. This raises the larger question of how higher-level cognitive abilities could be incorporated in deep learning. We know a lot from neuroscience and cognitive science about them and that can be used to design new architectures and training frameworks with the corresponding inductive biases. This has motivated a novel form of deep learning called generative flow networks or GFlowNets, borrowing from reinforcement learning, generative models and amortized variational inference. GFlowNets can sequentially generate compositional data structures whose content may be analogous to our thoughts, and they can be trained to sample them with probability proportional to some given or learned reward function that corresponds with the coherence of the context and generated answer with a structured world model. A GFlowNet can thus be trained to perform amortized probabilistic inference that is consistent with the pieces of knowledge of the world model, including in the sense of generating samples from a Bayesian posterior over world models. Like with amortized variational methods, this can be used to learn the world model itself. That arrangement is similar to model-based reinforcement learning (where we separate the policy from the world model) but concerns the learning of a policy that chooses what internal computation (i.e. reasoning) to perform, rather than acting in the world. Unlike with state-of-the-art deep learning and reinforcement learning, this makes it easy to incorporate inductive biases about high-level cognition and causality in the world model itself, such as sparse causal dependencies and reusable modular pieces of knowledge. It means that the GFlowNet probabilistic inference machine can be trained by querying the world model, without having to directly interact with the real world, and can be as large and trained with as many queries as our computational capabilities allow: unlike current deep nets, its effective capacity is not limited by the size of the externally observed data. This is convenient because probabilistic inference is generally intractable and may thus require high capacity in order to be approximated with a fast neural net. The mathematical foundations of GFlowNets and how they constitute an interesting ML-based alternative to MCMC inference will be briefly explained and recent work on GFlowNets highlighted.