Custom Dataset/DataLoader

Hi!
I got to run the examples from tutorial :slight_smile:
Now, I want to test with some custom data. I guess that the dataset must be a PyTorch dataset, and that DataSet/DataLoader classes must be created (possibly extending from Monty classes). But I’ve no idea how to make the first steps in this direction!
Any help or tips are welcome!
Thanks!

3 Likes

Hi @ElyMatos

It may help to get an overview of how Dataset and DataLoaders currently work to get started.

While there are setup and teardown steps, the main loop for how they are used is captured in the MontyExperiment.run_episode() method:

    def run_episode(self):
        """Run one episode until model.is_done."""
        self.pre_episode()
        for step, observation in enumerate(self.dataloader):
            self.pre_step(step, observation)
            self.model.step(observation)
            self.post_step(step, observation)
            if self.model.is_done or step >= self.max_steps:
                break
        self.post_episode(step)

In the above, you will notice the for step, observation in enumerate(self.dataloader). The enumerate(self.dataloader) portion is what invokes the DataLoader __iter__ and __next__ methods.

For a high-level overview of how those work, see the __iter__ and __next__ portions of this sequence diagram in one of my RFC drafts and also take a look what goes on inside __getitem__ in a Dataset:

    def __getitem__(self, action: Action):
        observation = self.env.step(action)
        state = self.env.get_state()
        if self.transform is not None:
            observation = self.apply_transform(self.transform, observation, state)
        return observation, state

I gotta run for now, but I hope this gives you a high-level overview.

1 Like

Hi Ely,

I’m happy to hear that you went through the tutorials and now want to test with some custom data! We want to write some more detailed instructions for this, including a new tutorial and a monty-for-robotics-starter-kit, but until I get around to this, here are some notes:

I hope this helps! Let me know if you have more questions :slight_smile:

2 Likes

Thanks @tslominski and @vclay for directions! I’ll check on it!
What I have in mind is a “dataset” to model some (abstract) objects. The idea is that this dataset can be used to create the object models for the LMs, so afterwards I’d like to use these models for recognition task - something like the tutorials.
So, I’m thinking about to have an dataset to “models” and another dataset to “test” (similar to common Deep Learning). I’m imaging that the “test dataset” is a simulation of “observations” - this is, it’ll contain data to be passed to sensors.
BUT :slight_smile: of course I’m not sure if this mindset is correct or feasible…Actually, I want to learn Monty intricacies with data I know about…
Is this possible or does the very idea need adjustments?
Thanks!

1 Like

A consequence of having a predetermined simulation of observations is that this would mean that Monty cannot generate its own actions but has to follow predetermined actions. The simulation of observations would need to be in sync with the action file being read by Monty. See read_action_file(file: str) → List[Action] and where it is used to see how you can specify predetermined actions. There’s also a test file example in tests/unit/resources/fixed_test_actions.jsonl.

1 Like

Hi Ely,

From the rough description, I don’t see any issue with what you are planning. I think it makes sense to test with some basic artificial data to get a general understanding of the algorithm. It’s hard to give more feedback without knowing what your data actually looks like. Like what are the observations, and what are the actions that take you through the observations?

If the series of observations is already predetermined and actions have no effect on them, you can also just write an environment similar to OmniglotEnvironment where the step function does not take the action into account and simply returns the next observation. This way, you don’t have to worry about whatever actions Monty outputs or specify a predefined sequence of actions. However, if actions do have an effect on which observations should be retrieved, but you want to control the action sequence (like doing controlled tests of different paths through your environment), you should use the action file like @tslominski mentioned.

I hope this helps!

1 Like