neuroscience-ai-reading-course

Neighbourhood Cognition Consistent Multi-Agent Reinforcement Learning

Overview :

Introduces Neighbourhood Cognitive Consistency (NCC) in MARL methods. Uses NCC in deep Q-learning and Actor Critic algorithms

Method :

NCC-Q :

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Arch-1.png

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Equation.jpg

Approach to Neighbourhood Cognitive Consistency :

  1. Assumption 1 : The global state S is *decomposed into several hidden cognitive variables *C. Each neighburhood has one hidden cognitive variable and an agent derives it’s observations from the cognitive variables of all the neighbourhood it belongs too.
  2. Assumption 2 : If the neighbouring agents are able to learn the hidden cognitive variable, the form consisten neighbourhood cognition

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Ass1.png

Training the NCC-Q :

The loss functions used are :

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Loss1.png

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Loss2.png

NCC-AC :

Each agent i adopts an independent actor $μ θ i (o i )$

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Arch-2.png

The GCN module and the cognition module are designed for the critic to achieve neighbourhood cognitive consistency and good agent cooperation

Training NCC-AC :

Similar to the training of NCC-Q

Experiments :

Results :

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Result1.png

Neighbourhood%20Cognition%20Consistent%20Multi%20Agent%20Rei%209717a64f4e9e45f99d2a80aa8206e20d/Result2.png

Conclusion :

This paper introduces two neighbourhood cognition consistent RL methods namely NCC-Q and NCC-AC which not only outperform SOTA methods but also are capable of achieving good scalability in routing tasks.