It helps to think of voting as a process by which a column receives (and uses) information from other columns, not a process for a column to propagate the information outwards. While these may seem equivalent on the surface, they are not. Propagating the information outwards can simply be accomplished with a “master” column that aggregates the votes, as you mentioned, but this does not benefit the individual LMs or help them reach individual terminal states faster.
These LMs are independent processing modules. Each LM sees a different sequence of observations and forms its own set of hypotheses based on these (possibly) unique observations. Voting is a method by which one LM receives a summary (in the form of hypotheses on objects and poses) of the observations processed from other LMs. The voting information received by one column summarizes what the other column thinks about the object given its sensory observations.
Consider two LMs (LM1 and LM2) looking at a coffee mug. LM1 moves around the cup and observes the cylinder but not the handle, while LM2 observes the cylinder and also the handle. Here LM1 would think it could be a cylinder object or a coffee mug object, but LM2 knows it’s a mug because it has seen the handle. If LM1 is allowed to receive votes that influence its evidence scores, it would reach a terminal state of coffee mug much quicker without having to see the handle.
Note that the voting happens at each time step as a way for columns to share their intermediate hypotheses with each other and build on this shared information towards individual final decisions (terminal state). It does not happen at the end of the episode for the purpose of aggregating decisions from LMs.
Hope this made sense…