Hi!
I got to run the examples from tutorial
Now, I want to test with some custom data. I guess that the dataset must be a PyTorch dataset, and that DataSet/DataLoader classes must be created (possibly extending from Monty classes). But I’ve no idea how to make the first steps in this direction!
Any help or tips are welcome!
Thanks!
Hi @ElyMatos
It may help to get an overview of how Dataset and DataLoaders currently work to get started.
While there are setup and teardown steps, the main loop for how they are used is captured in the MontyExperiment.run_episode()
method:
def run_episode(self):
"""Run one episode until model.is_done."""
self.pre_episode()
for step, observation in enumerate(self.dataloader):
self.pre_step(step, observation)
self.model.step(observation)
self.post_step(step, observation)
if self.model.is_done or step >= self.max_steps:
break
self.post_episode(step)
In the above, you will notice the for step, observation in enumerate(self.dataloader)
. The enumerate(self.dataloader)
portion is what invokes the DataLoader __iter__
and __next__
methods.
For a high-level overview of how those work, see the __iter__
and __next__
portions of this sequence diagram in one of my RFC drafts and also take a look what goes on inside __getitem__
in a Dataset
:
def __getitem__(self, action: Action):
observation = self.env.step(action)
state = self.env.get_state()
if self.transform is not None:
observation = self.apply_transform(self.transform, observation, state)
return observation, state
I gotta run for now, but I hope this gives you a high-level overview.
Hi Ely,
I’m happy to hear that you went through the tutorials and now want to test with some custom data! We want to write some more detailed instructions for this, including a new tutorial and a monty-for-robotics-starter-kit, but until I get around to this, here are some notes:
- I linked a few resources on customizing the EnvironmentDataLoader and EnvironmentDataSet in this post I'd hoped to be able to end the non-sense - #2 by vclay
- You don’t necessarily need to use an existing PyTorch dataset (in fact, the terminology is confusing, you should not use a static dataset in general since this is a sensorimotor learning approach). You will need to customize the EnvironmentDataLoader and/or EnvironmentDataSet (which customize the pytorch versions of them) to specify how your data should be loaded and how actions will determine the next observation.
- For an example, I would recommend having a look at this file: tbp.monty/src/tbp/monty/frameworks/environments/two_d_data.py at main · thousandbrainsproject/tbp.monty · GitHub. Here, we specify a custom
OmniglotEnvironment
which takes the omniglot dataset and allows a small patch to move over the handwritten symbols, following the strokes, and return those observations to Monty. There is also a custom environmentSaccadeOnImageEnvironment
where we take an RGBD image and move a small patch over that image. The corresponding DataLoaders can be found in this script tbp.monty/src/tbp/monty/frameworks/environments/embodied_data.py at main · thousandbrainsproject/tbp.monty · GitHub
I hope this helps! Let me know if you have more questions
Thanks @tslominski and @vclay for directions! I’ll check on it!
What I have in mind is a “dataset” to model some (abstract) objects. The idea is that this dataset can be used to create the object models for the LMs, so afterwards I’d like to use these models for recognition task - something like the tutorials.
So, I’m thinking about to have an dataset to “models” and another dataset to “test” (similar to common Deep Learning). I’m imaging that the “test dataset” is a simulation of “observations” - this is, it’ll contain data to be passed to sensors.
BUT of course I’m not sure if this mindset is correct or feasible…Actually, I want to learn Monty intricacies with data I know about…
Is this possible or does the very idea need adjustments?
Thanks!
A consequence of having a predetermined simulation of observations is that this would mean that Monty cannot generate its own actions but has to follow predetermined actions. The simulation of observations would need to be in sync with the action file being read by Monty. See read_action_file(file: str) → List[Action] and where it is used to see how you can specify predetermined actions. There’s also a test file example in tests/unit/resources/fixed_test_actions.jsonl.
Hi Ely,
From the rough description, I don’t see any issue with what you are planning. I think it makes sense to test with some basic artificial data to get a general understanding of the algorithm. It’s hard to give more feedback without knowing what your data actually looks like. Like what are the observations, and what are the actions that take you through the observations?
If the series of observations is already predetermined and actions have no effect on them, you can also just write an environment similar to OmniglotEnvironment
where the step
function does not take the action into account and simply returns the next observation. This way, you don’t have to worry about whatever actions Monty outputs or specify a predefined sequence of actions. However, if actions do have an effect on which observations should be retrieved, but you want to control the action sequence (like doing controlled tests of different paths through your environment), you should use the action file like @tslominski mentioned.
I hope this helps!