With my colleagues of the BISCUIT team, who have a strong experience in recurrent or self-organizing connectionism (Kohonen's self-organizing maps, continuous neural fields ), we want to endow agents with an artificial physiology that could be described as neuromorphic: consisting of a large number of distributed computational units obeying simple rules and coupled together. In addition to robustness and scaling properties, these architectures can be seen as dynamic systems with an extremely rich range of behaviors, especially when placed at the frontier of chaos (Langton 1990). Fueled by sensorimotor flow, the activity of this kind of architecture can result in a wide variety of motor behaviors.
If one varies the parameters of the neuromorphic system, including the structure and strength of the couplings implemented but also the parameters intrinsic to the computational units (e.g., gains, activation thresholds, refractory periods, etc), one modifies the dynamics of the system and thus the behavior of the agent in its environment. Similar to the reinforcement learning framework, it is possible to evaluate the appropriateness of motor behavior as a function of the agent's motivations or task. Our goal is to use this assessment, which may be a simple scalar signal, to guide and direct variations in the parameters of the neuromorphic system.
There are already many plasticity and adaptation rules that have been studied and experimented with in neuromorphic systems. We want to focus on rules that are compatible with neuromorphic architectures, i.e., local, distributed, and decentralized, but that, while largely unsupervised, can account for the more global evaluation of agent behavior. To schematize, we seek to propose adaptation rules that can "guide" the emergence of an artificial agent's behaviors.
This research theme is highly exploratory and experimental. In particular, we seek to better identify the properties and characteristics of the different architectures or adaptation mechanisms that we study. Our questions focus on the capacities in terms of representation, structuring or generalization. We would thus like to determine which sets of mechanisms and "innate" architectures are sufficient for an agent to develop, in autonomy, relevant behaviors but also representations anchored in its interaction with its environment.
More concretely, my current work is progressing along four lines oriented by this general theme.
Because of their properties, self-organizing maps allow for adaptive and relevant vector quantization of a continuous space (Kohonen 2013). With J. FIX, we explored the advantages and limitations of recurrent or dynamic maps for estimating the value function in continuous state space reinforcement learning (Dutech, Fix, and Frezza-Buet 2018), (Calba, Dutech, and Fix 2021). The logical next step is to guide the self-organization of the maps by explicitly using the reinforcement signal, an originality that has not yet been considered in the literature. This could furthermore be done with maps of varying size (drawing inspiration from (Montazeri, Moradi, and Safabakhsh 2011)) and in the context of knowledge transfer (see (George Karimpanal and Bouffanais 2019)).
The adaptation mechanisms we are trying to develop cannot rely on a central clock discretizing time into relevant decision instants, as is the case in classical reinforcement learning. On the one hand, I am working on an established approach where the temporal difference error links the value function to its derivative (see (Doya 2000)), emphasizing the decentralized aspects of the architectures. In connection with the work of Frémaux (Frémaux, Sprekeler, and Gerstner 2013), we are working on a neuromorphic implementation of these algorithms in continuous time. In addition, with H. Frezza-Buet, we are looking for a more original solution to this problem by focusing on learning "decision events" by combining Continuous Neural Fields (Sandamirskaya 2014). This research resonates with a fundamental problem in work on cognition: how spatial and temporal conceptual representations are formed (Gallistel 1989).
Habituation and sensitization are two non-associative learning mechanisms present in very simple, and sometimes single-cell, organisms. These mechanisms are termed fundamental, they induce basic adaptive capacities that allow us to move beyond reflex behaviors (Rankin et al. 2009). Our current work draws on these mechanisms described but little modeled by biologists to propose new unsupervised learning methods that may be instrumental in the self-organization of an agent's behaviors (Kelso 1995).
In collaboration with B. GIRAU who leads the BISCUIT team and is a member of Intel Neuromorphic Research Community, we are working directly on reinforcement learning algorithms on Loihi neuromorphic processors. To go beyond the many works that only use the reinforcement signal to learn in a supervised manner by modulating the Spike Timing Dependent Plasticity (STDP), we are experimenting with algorithms that allow for true sequential decision making where the reward is only received at the end of an action sequence. The main difficulties come from the adaptation of these algorithms to hardware constraints, an adaptation that then poses problems in terms of convergence and guarantees of these approximate methods.
Other than the previous axes which are more directly related to computer science in a broad sense, I continue to work in the multi-disciplinary framework of the Psyphine group. In particular, we seek to understand and document how a true inter-action can arise between an artificial device (motorized lamp) whose behaviors are adapted in a self-organized way and a human being.
.Concretely, we have a robotic lamp equipped with a camera. The saliency points detected in the image feed a Kohonen map that self-organizes to categorize perceived situations into different classes. Each class is associated with a family of movements, so that the lamp "adapts" in a non-supervized way to the behavior of its interlocutor. The question then becomes: how, and why, the human being manages, or not, to lend an inner life to this device?
Calba, Antonin, Alain Dutech, and Jérémy Fix. 2021. “Density Independant Self-Organized Support for Q-Value Function Interpolation in Reinforcement Learning.” In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2021), 6 P. Bruges, Belgium.
Doya, K. 2000. “Reinforcement Learning in Continuous Time and Space.” Neural Computation 12.
Dutech, Alain, Jérémy Fix, and Hervé Frezza-Buet. 2018. “Reconstruction d’état caché avec cartes auto-organisatrices récurrentes.” In JFPDA 2018 - Journées Francophones sur la Planification, la Décision et l’Apprentissage pour la conduite de systèmes, 1–3. Nancy, France. https://hal.inria.fr/hal-01840627.
Frémaux, Nicolas, Henning Sprekeler, and Wulfram Gerstner. 2013. “Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons.” PLOS Computational Biology 9 (4). Public Library of Science: e1003024+. https://doi.org/10.1371/journal.pcbi.1003024.
Gallistel, Charles R. 1989. “Animal Cognition: The Representation of Space, Time and Number.” Annual Review of Psychology 40 (1). Annual Reviews 4139 El Camino Way, PO Box 10139, Palo Alto, CA 94303-0139, USA: 155–89.
George Karimpanal, Thommen, and Roland Bouffanais. 2019. “Self-Organizing Maps for Storage and Transfer of Knowledge in Reinforcement Learning.” Adaptive Behavior 27 (2). SAGE Publications Sage UK: London, England: 111–26.
Kelso, JA Scott. 1995. Dynamic Patterns: The Self-Organization of Brain and Behavior. MIT press.
Kohonen, Teuvo. 2013. “Essentials of the Self-Organizing Map.” Neural Networks 37. Elsevier: 52–65.
Langton, Chris G. 1990. “Computation at the Edge of Chaos: Phase Transitions and Emergent Computation.” Physica D: Nonlinear Phenomena 42 (1-3). Elsevier: 12–37.
Montazeri, Hesam, Sajjad Moradi, and Reza Safabakhsh. 2011. “Continuous State/Action Reinforcement Learning: A Growing Self-Organizing Map Approach.” Neurocomputing 74 (7). Elsevier: 1069–82.
Rankin, Catharine H, Thomas Abrams, Robert J Barry, Seema Bhatnagar, David F Clayton, John Colombo, Gianluca Coppola, et al. 2009. “Habituation Revisited: An Updated and Revised Description of the Behavioral Characteristics of Habituation.” Neurobiology of Learning and Memory 92 (2). Elsevier: 135–38.
Sandamirskaya, Yulia. 2014. “Dynamic Neural Fields as a Step Toward Cognitive Neuromorphic Architectures.” Frontiers in Neuroscience 7. Frontiers: 276.