@vclay leads the main discussion on how to start the implementation of behavioral models in Monty.
Then @nleadholm presents on Asymmetric Connections and Behavior Columns and @jhawkins talks about the problem of how columns can help each other learn in the absence of feed-forward inputs.
Watch a two minute summary of this meeting here:
00:00 Introduction 00:08 Object Behaviors in Monty 01:34 Overview of the Capabilities we Want to Add 26:13 Recognize Object Behaviors 45:07 Environment to Test Learning and Recognizing 01:00:28 Open Questions - Learning and Recognizing 01:24:04 Learning Associations Between Behavior and Morphology Models 01:34:42 Open Questions - Learning Associations 01:50:44 Asymmetric Connections and Behavior Columns 02:06:26 Jeff Talks About the Problem of Inter-Column Teaching
+1 for the list of clickable section links! I don’t know how practical this might be, but I’d love to have each video session accompanied by a clean (i.e., human-edited) transcript. (I often find the speakers’ voices quite difficult to follow.)
Cool; from a quick look, it seems like a very clean and useful summary. It might be interesting to feed this into an LLM as a way to enable chat sessions on the overall content, specific sections of interest, etc.
Hmmmm. How hard would it be to set up an automated service to perform this on both the back catalog and upcoming sessions? For extra credit, the results could be linked from the YT videos’ initial comment and the announcement posts in this forum. (And a pony… )
On a related note, I’d like to suggest that the TBP have an AV expert (e.g., podcasting audio support consultant) check out the crew members’ setups and practices, including software, hardware, etc. There’s some great content being presented on these videos; I’d like to see it recorded well for current and future audiences.
FWIW, I watch a lot of interviews, podcasts, and other presentations (usually on YT). Most of them have clear audio even when lots of speakers are involved, so we have an existence proof that it can be done. IMHO, a lot of it comes down to microphones: there are some very good mics out there; much better than the built-in mics on laptops and such.
@Rich_Morin Coincidentally, that’s exactly Recall’s core feature. It’s a ChatGPT-enabled workspace where you can put videos, articles, PDFs, etc. and have multiple GPT conversations about anything in the workspace. The subtitles digesting is just a side-feature
I could definitely go thru TBP’s whole catalog and publish the digests.
Great meeting! Yeah, I agree on the sound quality. For YouTube I think it can turn some viewers off. There is a principle in YouTube: “People will forgive bad video, but not bad audio.”
Anyway, great talk!
There is idea that might help. As discussed in the video, the sensory module will track movement (velocity)… But I think we can improve that. Velocity is the first order derivative. What if we also have the sensor module also track the second order derivative? That would be the acceleration for movement.
For the stapler example, acceleration scales with velocity. So a point at the end of the stapler will have 2x the velocity and 2x the acceleration compared to a midpoint.
But that is not the same for all movement types. In different forms of movement, velocity does not scale in the same way with acceleration. Some examples would be circular movements, or another would be throwing a ball up in the air… and while the ball goes up it will start to slow down, so the vector of its acceleration will be in the opposite direction as its velocity.
I believe this second derivative can be applied to features as well. For example there can be a change in luminosity, that’s the first derivative. But that can also vary, the second derivative.
Second order derivatives can be modeled in traditional deep networks, but it’s harder to do it because the second derivative of the common ReLU functions is 0. So if this gets implemented, other activation functions might give better results.
nice idea! I think this is definitely worth exploring once we get to implementing the movement/change detecting sensor module (or you, if you are interested in testing that idea ) It looks like the brain also encodes acceleration (for instance in a population code in MT Visual motion analysis for pursuit eye movements in area MT of macaque monkeys - PubMed), which makes sense as this is useful information to extract to interact with the world intelligently.
@AgentRev it’s an interesting idea. We have tried the automatic meeting summaries that Zoom provides, but found that they were not accurate enough. They work decently well to summarize basic organizational meetings and action items from those but fail completely when trying to summarize deeper theoretical discussions (far outside their training distribution) that we have during the research meetings.
They usually sound reasonable and grammatically correct, but their content is not accurate. This also seems to be an issue with the Recall summary you shared. For example, the first “Introduction” section notes seem completely hallucinated and do not describe topics we even mentioned tangentially. I didn’t read through all of the notes, but they definitely also seem to suffer from those issues of not comprehending the topics being talked about and hallucinating (e.g. “The implementation would also involve considering the analogy between the barcellular and magnosia pathways, and how in primates, the cortex receives center surround fields without edge detection or movement extraction”). This is not to discourage the use of LLMs to try and understand these topics but just to be wary of their outputs and double check everything they produce. For us, this is currently too much effort and so we decided not to post AI generated summaries but instead summarize the meetings ourselves in these short form videos: https://youtube.com/shorts/8lfXLIW4C1g
Thanks for the feedback on the audio! We will look into that
Yeah I tried with other videos afterwards and I noticed it too. This is caused by a few factors: audio quality, subtitles accuracy, the LLM’s lack of context regarding TBP as a whole, and in this case, the instruction I gave of producing a very detailed outline (causing it to generate vague filler text).
These discussion videos are very nice and contain lots of interesting stuff. I think there’s value in trying to textually catalog their content, both for easier consumption and search indexing, even if the AI summaries are imperfect. Videos in general are opaque “time capsules” that demand a greater time investment to absorb than reading.
I kinda have this problem at work, where senior staff infodumped a lot of tribal knowledge in video meetings over the years, inadvertently growing a sort of “learning debt tumor” So, I’ve been experimenting a bit with retrieval-augmented generation / internal chatbots. The main bottleneck for videos is definitely audio quality; the rest of the equation is just AI limitations.
I agree with Viviane that the concepts from TBP are outside the normal training distribution so not a good candidate for summarization (and terrible for extrapolation) by LLMs.
We do spend human time correcting the transcripts for the videos in the core playlist - these are videos where we’re confident in the approach and explanations so this is more valuable data. But again, probably not for most use cases involving LLMs.
We don’t fix the transcripts for brainstorming videos as this is where we are exploring ideas that could be incorrect or rapidly outdated and we don’t want to spend our limited time there.
We have internal spreadsheet with human written descriptions that might be useful for indexing and search and I’ll think about making this list public somehow now that we are mostly caught up on the backlog of videos we want to publish.