Thanks @HumbleTraveller for clarifying that. While some researchers have drawn connections between transformers and the hippocampal complex, to my mind the best mapping of self-attention onto Monty is to voting, discussed in this video on transformers vs. Monty.
Voting and self-attention are similar in the sense that they are potentially an all-to-all operation between a set of representations, with as you say O(n^2) complexity in the number of representations. However, we don’t expect to actually have/need all-to-all connectivity in Monty (neither would it be found in the brain), but rather these lateral, voting type-connections would be much sparser, significantly reducing the complexity.
More generally, we’re not too worried about the computational complexity of Monty at the moment. The slow wall-clock time primarily comes down to the implementation of Monty not being highly optimized vs. e.g. deep neural networks on GPUs, but when we look at the actual number of floating-point operations (amount of computation) that Monty needs, it compares very favorably to deep neural networks. There are various things, from neuromorphic hardware, to the utilization of low-precision bit-based representations that should hopefully help when we eventually want to make an effort at scaling Monty up.
Hope that answers your concerns about self-attention, but let me know if I can elaborate.