Hi!
Although I’ve (re)read the docs about voting (and read the posts related to subject) still it is not clear form me why (in a configuration with multiple LMs) we can not accumulate the votes in just one “master” LM as, in the end, the (approximately) same model will be recognized in all LMs. Instead all LMs send votes to all other LMs - in a combinatorial fashion - why do not concentrate all votes in one LM?
I suspect this is related to the idea of lateral connections among cortical columns, but I was not able to figure out the relation
Thanks!
If I understand the purpose of the voting mechanism correctly, the whole point is that no one LM (column) is in charge of deciding which model is the correct one. Each LM (column) must decide for itself which of its own models is consistent with its own internal state and the states of its peers that it is able to observe.
That being said, in practice, there probably will not be alll-to-all communications. There will likely be a finite set of:
- local proximal connections (tied to sensor input or next level down in the hierarchy),
- distal connections (tied to other modules observing other inputs that are topologically nearby),
- and apical connections (tied to the next level up in the hierarchy).
The proximal input is the ground truth (i.e. dynamically accumulating evidence whose behavior the LM is attempting to accurately predict).
The distal input is the local context for what features and poses other nearby sensor patches are observing.
The apical input is a bias from another region that has made a prediction about the identity of the feature and/or its behavior from other contexts (sensor fusion, temporal pooling, etc.). This is a hint about the nature of the current object and/or its behaviors based on evidence that cannot be inferred solely from the locally sensed features.
For small systems, you may be able to use all-to-all connections. But for larger systems, you will probably want to constrain the number of connections/communications to the modules that have the most salient information to share.
Thanks @CollinsEM for the clear explanation!
It helps to think of voting as a process by which a column receives (and uses) information from other columns, not a process for a column to propagate the information outwards. While these may seem equivalent on the surface, they are not. Propagating the information outwards can simply be accomplished with a “master” column that aggregates the votes, as you mentioned, but this does not benefit the individual LMs or help them reach individual terminal states faster.
These LMs are independent processing modules. Each LM sees a different sequence of observations and forms its own set of hypotheses based on these (possibly) unique observations. Voting is a method by which one LM receives a summary (in the form of hypotheses on objects and poses) of the observations processed from other LMs. The voting information received by one column summarizes what the other column thinks about the object given its sensory observations.
Consider two LMs (LM1 and LM2) looking at a coffee mug. LM1 moves around the cup and observes the cylinder but not the handle, while LM2 observes the cylinder and also the handle. Here LM1 would think it could be a cylinder object or a coffee mug object, but LM2 knows it’s a mug because it has seen the handle. If LM1 is allowed to receive votes that influence its evidence scores, it would reach a terminal state of coffee mug much quicker without having to see the handle.
Note that the voting happens at each time step as a way for columns to share their intermediate hypotheses with each other and build on this shared information towards individual final decisions (terminal state). It does not happen at the end of the episode for the purpose of aggregating decisions from LMs.
Hope this made sense…
Thanks @rmounir ! This make a lot sense, in special the difference between “receive” and “propagate”, as you have pointed! Along with @CollinsEM explanation, the situation is much clearer now!
Well…one more question
As “the voting happens at each time step”, are all the hypothesis shared again (at each time step) or just the hypothesis updated after the last voting process?
At every matching step, we send out the hypotheses regardless of whether they had been sent before in the previous matching step. We do, however, often only send the most likely hypotheses.
During voting, all the hypotheses with evidence scores above vote_evidence_threshold
are sent out from a column and received by all the other columns. Note that during voting, we scale the evidence values in the range [-1,1]. So a default value of vote_evidence_threshold=0.8
is used on the scaled evidence scores to send only the interesting hypotheses (i.e., with high evidence). Currently, this is the only parameter that controls which hypotheses are sent out and received by the LMs. Additionally, LMs that did not receive sensory input and therefore didn’t update their hypotheses will not send out a vote.
Hope this answers your question…
I think there is something missing here in the voting mechanism. The basic assumption that an object is recognized by an “ensemble” of neurons is, I think, not correct. The population-rate coding dilemma suggests that we need a “sequence” of ensembles to identify an object, not a single set of cells firing at once. Then perhaps the voting mechanism is something much simpler such as an inhibition mechanism (the first LM to fire inhibits the neighbor LMs, according to the receptive field, like in the retina), since there will be a sequence of values, in each value a different LM can win and then all contribute somehow to the final sequence (or identification). LM may also be critical for reliability (not only against permanent failures, but also against sporadic issues due to low efficiency of synaptic mechanisms).
The idea proposed in TBT does not seem biologically plausible.
Hey there @John_Doe!
Welcome! Are you suggesting that TBT assumes static population encoding? I’d been under the impression it supported temporal dynamics via evidence matching and SDRs?
I’d also be interested if you’d expand on your closing comment. What about TBT seems biologically implausible?
Most likely I’m wrong, but TBT seems to decide across the time, receiving each “evidence” in each time step. Therefore, such evidence seems to be critical to disambiguate the so-called “object” in the identification process. I couldn’t find a concise and precise description of the voting mechanism… my idea could be very wrong.
This seems rather implausible given the stochasticity of the biological substrate. I think that each mini-column can generate a “reduced” number of different evidences (+noise), and it is the compounded activity of very close mini-columns along time (i.e., some code + a lot of rate) that is used to decide. Current voting seems very unreliable in the presence of noise. And given the variability in synaptic efficacy,..
Ah, where you’re coming from makes a lot more sense to me now. Thank you for that
While TBT leans pretty heavily on some of the higher level principals of the brains functioning, it doesn’t try to constrain itself in the same way something like an HTM neuron would. If you’re looking for a more biologically aligned approach, I would honestly look there. Heres a link to their ML guide on it: found here, as well as a short FAQ on the differences between TBT and HTM: found here.
If you’re interested in learning more on the voting mechanisms, I had dug into it a bit for another one of @ElyMatos’s posts. You can find info on that here: About compositionality and heterarchy - #6 by HumbleTraveller
I believe the response is pretty accurate to whats going on under the hood, though I agree with you, it would be helpful to have a dedicated video or an authoritative post on it from the TBT team themselves.
Re. Tolerance to noise…
I’m not 100% sure how the team is handling this in code, but they’d spent quite a bit of time researching the sparse activation patterns in columnar layers 4 and 5b. They’d determined something along the lines of 3X10^211 possible neuronal sequencing patterns per time step within a given columns search-space (with something like a 2% pattern overlap). At the time, this struck me as incredibly noise/fault tolerant.
I’m not sure if they’ve worked this into the TBT framework yet, but I’d be suprised if it isn’t something they’re at least actively working towards. Perhaps @nleadholm or @vclay could provide more insight here?
Clearly I didn’t get it right
My previous message was motivated after reading this:
Still, I don’t get it. The heterarchy and compositionality thread explanations… are even harder for me.
I’ll keep trying.
No worries, we’re all here to learn. If I may ask, whats your background in? If I know that, I might be able to explain this stuff in a way that makes more sense.
Hi @John_Doe
thanks for asking those questions, it’s definitely a concept that takes some time to grasp and there are still aspects that we are actively working on. @HumbleTraveller already gave some great answers (thanks!). In terms of further resources on voting, I can recommend watching the voting part of this video: https://youtu.be/bkwY4ru1xCg?si=MvHxfd8hVLL4zzn2&t=4212 We also have a meeting recording from an earlier meeting where we go into more depth on this mechanism and discuss best ways to implement it https://youtu.be/0Gcw1itpbWM?si=TxDtUhE3a74cHhS8&t=4144 (this video is older than the first one)
You have probably already seen this (and it is not particularly detailed) but our whitepaper talks about voting in section 10.2
For the current implementation you could have a look at the send_out_vote
and receive_vote
functions in the code.
In our first implementation we implemented voting just for object ID (code is here) which is maybe a bit closer to what you were thinking? However, this doesn’t allow us to accumulate different amounts of evidence for different hypotheses and is therefore very brittle with noise. It also didn’t include voting on pose.
@HumbleTraveller you are right that Numenta looked a lot into the representational capacity and fault tolerance of SDRs and showed that this is quite large. Currently we are not using SDRs to represent object IDs in the default setup (we do have some code for it though). We are regularly thinking about when to incorporate more neural elements and past ideas developed at Numenta into Monty. You may be interested in this video about where these things are on our roadmap https://www.youtube.com/watch?v=lewMhwzNtEo
Best wishes,
Viviane