Combine Dataloader and Dataset class into EnvironmentInterface

Hi @tslominski and @vclay , bringing the discussion from my intro thread over here (hope that this is the right place!).

As I understand it, the __getitem__ method doesn’t do a typical data lookup as is usually expected for DataLoader to sample data from a dataset. Instead, it steps the environment forward, making it a sequential data generator, and reflecting your point Viviane about how Monty learns in an environment, not from a static dataset. My instinct with this task would be to remove the usage of torch.utils.data.Dataset & __getitem__ entirely, and instead rename that method once it’s inside of EnvironmentDataLoader to something like step (or step_and_transform?), as otherwise we have to bring along the Dataset class into the EnvironmentDataLoader, which seems to defeat the purpose of this task.

In my initial research, I only see the dataset being indexed (and therefore that __getitem__ method being called) from within EnvironmentDataLoader and its subclasses, but as there are a fair number of references across the codebase, and I’m still new in my understanding, I could be missing important details and context.

I wanted to ensure my approach is consistent with what you’d like for this work, before I get too far along the implementation path.

Thanks!

Hi @annark

Yes, that is exactly the same way I was thinking of it. The dataset and indexing convention has some historical reasons (at the very beginning of this project we thought this would make it easier to plug alternative AI solutions in and compare) but, as you point out, it really isn’t appropriate for sequential data generation by an environment. We instead want to move more towards conventions used in reinforcement learning. So environments would have obs = step(action) function calls instead of the current obs = dataset[action] setup. Since this is already the way we interact with self.env in the EnvironmentDataset class (observation = self.env.step(action)) I agree that we can just strip it down to that call (+ transform application & state retrieval) and strip away any of the __getitem__ and torch.utils.data.Dataset related code.

As far as I remember, EnvrionmentDataset calls are isolated to the EnvironmentDataLoader like you mentioned and there are no other places to watch out for. There are also no other custom classes of `EnvrionmentDatasetso focusing on lines 58-123 here and the calls to it in EnvironmentDataLoader should be sufficient.

The biggest amount of references to the DatasetClass that will need updating are actually in the documentation but that is a problem for another day :smiley:

Sounds like you have a good grasp on the problem and approach and I don’t see anything you are missing. Thanks so much for looking into this!

  • Viviane
1 Like

Wonderful, thank you Viviane! And I’m happy to tackle the documentation updates as well, as part of this work.

1 Like

With the proposed changes, I’m moving a random number generator (rng) around from the EnvironmentDataset constructor to EnvironmentDataLoader (which I’ll rename to EnvironmentInterface per Tristan’s initial outline). I wanted to ask if you foresee this causing any problems? In my experience moving number generators around can cause unforeseen downstream effects.

Hi Anna,

good catch, that is definitely important to note as it means that the benchmark results may not be exactly replicable after the refactor (the 4th bullet point Tristan mentioned here). From what I can see from a brief look at the use of self.rng in the EnvironmentDataLoader it looks like it is only used to be passed forward to specific transforms (which have needs_rng==True). The transforms that need a random number generator are those that add random noise to the observations for testing, so if you use a benchmark experiment that does not add noise (e.g., base_config_10distinctobj_dist_agent or base_77obj_surf_agent), the before and after results should still be identical.

Actually, even the experiments that add noise after sensor processing in the SM should still be the same (like randrot_noise_77obj_surf_agent) as those don’t add the noise in the transform. So thinking about it a bit more, only the randomrot_rawnoise_10distinctobj_surf_agent experiment might be affected by this.

Best wishes,

Viviane

1 Like

Thank you @vclay for this response, I’ll be sure to run the benchmarks you suggest once the changes are complete.

Hello Viviane and Tristan,

I’ve hit a bit of a roadblock, plus have two questions about how I’m approaching this task as I work through it. Here’s the draft PR (still a WIP - I’m on step two of Tristan’s outlined steps here).

Roadblock: I’m encountering segfaults when running unit tests due to the way the environment dataloader is now managing deallocation of the simulator. It seems multiple places are attempting to close the environment dataloader, so we end up with the following log output:

DEBUG:tbp.monty:close:623:Removing and closing python log handler: <FileHandler /var/folders/7p/p342_4c902x8py77cp4k46640000gn/T/tmpuyxd2nol/log.txt (NOTSET)>
[01:28:02:137462]:[Physics] BulletPhysicsManager.cpp(35)::~BulletPhysicsManager : 
Deconstructing BulletPhysicsManager
GL::Context::current(): no current contextFatal Python error: Aborted

Current thread 0x00000002028cb200 (most recent call first):
File “miniconda3/envs/tbp.monty/lib/python3.8/site-packages/habitat_sim/simulator.py”, line 152 in close
File “src/tbp/monty/simulators/habitat/simulator.py”, line 631 in close
File “src/tbp/monty/simulators/habitat/environment.py”, line 178 in close
File “src/tbp/monty/frameworks/environments/embodied_environment.py”, line 119 in _del_
File “/Users/annarussokennedy/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py”, line 633 in _callTestMethod

I’ve tried removing all but one of the close() / __exit__() / __del__() calls up that call chain (including the one in EnvironmentDataloader itself), and tried stepping through with python’s debugger (as well as lldb, given the issue is in the underlying C++ code), but am missing too much context on the interplay of the classes in the environment/simulator/embodied_environment and habitat sim files, so haven’t yet gotten to the bottom of it. I thought I should pause and check in here to verify my approach before going further.

If you have any suggestions for getting to the bottom of this, or directions for how you’d ideally like to see these resources allocated/deallocated centrally, I’d be grateful for tips.

Questions

Given the following class hierarchy, defined in embodied_data.py:

class EnvironmentDataLoader:…
class EnvironmentDataLoaderPerObject(EnvironmentDataLoader):…
class InformedEnvironmentDataLoader(EnvironmentDataLoaderPerObject):…
class OmniglotDataLoader(EnvironmentDataLoaderPerObject):…
class SaccadeOnImageDataLoader(EnvironmentDataLoaderPerObject):…
class SaccadeOnImageFromStreamDataLoader(SaccadeOnImageDataLoader):…

I’ve refactored the child classes to call their super class __init__ methods, since the EnvironmentDataset that was previously required as an initialization param for those child classes is now integrated into EnvironmentDataLoader, the super class. However there are a few parts that aren’t straightforward with the refactoring.

Question 1: OmniglotDataLoader’s super class EnvironmentDataLoaderPerObject expects object_names and object_init_sampler as init params, and calls create_semantic_mapping based on those. OmniglotDataLoader needs the environment to be initialized in order to define its object_names list -

object_names = [
    str(self.env.alphabet_names[alphabets[i]])
    + “_”
    + str(self.characters[i])
    for i in range(n_objects)
]

but the environment isn’t initiated until its super class runs. I handled this by letting object_init_sampler and object_names be optional params to EnvironmentDataLoaderPerObject, so that the child class can decide what to do with them. This also means create_semantic_mapping is not called if object_names is not provided. Does this approach align with how you’d like this handled?

Question 2: In order to update SaccadeOnImageFromStreamDataLoader, child class of SaccadeOnImageDataLoader, with these changes, I had to do something about the scenes and versions params that SaccadeOnImageDataLoader expects on __init__. I chose to make them optional on the super class, but again wanted to check whether this approach of using optional parameters in this way fits the conventions of this codebase?

If it’s easier to jump on a call to discuss any of the above, I’m happy to do that.

Thank you!

1 Like

Hi @annark

Roadblock: I’m encountering segfaults when running unit tests due to the way the environment dataloader is now managing deallocation of the simulator.

I did some work a while back to turn MontyExperiment into a context manager, but I don’t think that would affect this. If you have a reproducible test case, I can take a look at it and see if I can figure anything out. If not, what I would try to do is set a breakpoint in the close method, probably the one in HabitatSimulator, and see where it’s getting called from because it definitely looks like it’s getting called twice.

The stacktrace you shared looks like it’s part of the teardown of the unit test. The timing of __del__ calls is very much up to the interpreter, assuming it gets called at all before the program ends. It seems like the __del__ call to close is the second one that’s causing the problem, so figuring out where it’s getting called the first time, and why the close implementation isn’t guarding against a second call would probably lead to a solution.

Looking at the implementation of the close methods, they should be idempotent, but maybe there’s some odd bug that I’m not seeing that’s leading to calling close on the simulator twice.

As for the rest of your questions, @vclay or @tslominski should be able to answer better than I can, as I’m not too familiar with that part of the codebase yet.

Hi @jshoemaker, thank you for your response and suggestions!

My investigation did include setting breakpoints at the different close/__del__/exit methods, including the one in HabitatSim (simulator.py). It’s still unclear why this is happening.

This was my observation as well.

This test reliably reproduces the issue:

pytest -s tests/unit/profile_experiment_mixin_test.py::ProfileExperimentMixinTest::test_run_train_epoch_is_profiled -n 0

This is the backtrace from a breakpoint() set at
tbp.monty/src/tbp/monty/simulators/habitat/simulator.py(631)close()

Note that the test has completed when the error occurs, so it seems it’s the clean up logic from the test handler that’s calling with the missing context. My guess is that the segfault is due to a double-free error or an invalid memory access in the underlying C++ libraries, triggered by an incorrect resource management order. The fact that it happens outside the python scope makes it tricky to debug (and why I tried debugging with lldb):

> /tbp/tbp.monty/src/tbp/monty/simulators/habitat/simulator.py(631)close()
-> if sim is not None:

(Pdb) bt
  /miniconda3/envs/tbp.monty/bin/pytest(8)<module>()
-> sys.exit(console_main())
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py(187)console_main()
-> code = main()
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py(164)main()
-> ret: Union[ExitCode, int] = config.hook.pytest_cmdline_main(
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()
-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()
-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()
-> res = hook_impl.function(*args)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(315)pytest_cmdline_main()
-> return wrap_session(config, _main)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(268)wrap_session()
-> session.exitstatus = doit(config, session) or 0
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(322)_main()
-> config.hook.pytest_runtestloop(session=session)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()
-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()
-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()
-> res = hook_impl.function(*args)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(347)pytest_runtestloop()
-> item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()
-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()
-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()
-> res = hook_impl.function(*args)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(111)pytest_runtest_protocol()
-> runtestprotocol(item, nextitem=nextitem)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(130)runtestprotocol()
-> reports.append(call_and_report(item, "call", log))
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(219)call_and_report()
-> call = call_runtest_hook(item, when, **kwds)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(258)call_runtest_hook()
-> return CallInfo.from_call(
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(338)from_call()
-> result: Optional[TResult] = func()
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(259)<lambda>()
-> lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()
-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()
-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()
-> res = hook_impl.function(*args)
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(166)pytest_runtest_call()
-> item.runtest()
  /miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/unittest.py(327)runtest()
-> self._testcase(result=self)  # type: ignore[arg-type]
  /miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(736)__call__()
-> return self.run(*args, **kwds)
  /miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(676)run()
-> self._callTestMethod(testMethod)
  /miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(633)_callTestMethod()
-> method()
  /tbp/tbp.monty/src/tbp/monty/frameworks/environments/embodied_environment.py(119)__del__()
-> self.close()
  /tbp/tbp.monty/src/tbp/monty/simulators/habitat/environment.py(178)close()
-> _env.close()
> /tbp/tbp.monty/src/tbp/monty/simulators/habitat/simulator.py(631)close()
-> if sim is not None:
(Pdb) c

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB continue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[14:34:15:270904]:[Physics] BulletPhysicsManager.cpp(35)::~BulletPhysicsManager : Deconstructing BulletPhysicsManager
GL::Context::current(): no current context
Fatal Python error: Aborted

Current thread 0x000000020361c200 (most recent call first):
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/habitat_sim/simulator.py", line 152 in close
  File "/tbp/tbp.monty/src/tbp/monty/simulators/habitat/simulator.py", line 632 in close
  File "/tbp/tbp.monty/src/tbp/monty/simulators/habitat/environment.py", line 178 in close
  File "/tbp/tbp.monty/src/tbp/monty/frameworks/environments/embodied_environment.py", line 119 in __del__
  File "/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py", line 633 in _callTestMethod
  File "/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py", line 676 in run
  File "/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py", line 736 in __call__
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/unittest.py", line 327 in runtest
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py", line 166 in pytest_runtest_call
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py", line 259 in <lambda>
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py", line 338 in from_call
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py", line 258 in call_runtest_hook
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py", line 219 in call_and_report
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py", line 130 in runtestprotocol
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py", line 111 in pytest_runtest_protocol
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py", line 322 in _main
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/miniconda3/envs/tbp.monty/bin/pytest", line 8 in <module>

That’s interesting. I’m not getting a crash when I run that test. It passes for me. It does look like it calls close twice, first as a result of the MontyExperiment.__exit__ method, and the second as part of the __del__ call on the EmbodiedEnvironment.

When I stop in the EmbodiedEnvironment.__del__ call, self._env is set to None, like I expect it to be. Are you seeing the same thing when you run it?

What happens if you remove the call to close in the __del__ method implementation?

That is interesting! Worth noting that before I posted the original question here, I confirmed that all tests pass on my machine on my fork’s main branch. The errors only happen on my branch (change-dataloader-dataset-naming-setup). Do you see any segfaults when you run the full test suite on my branch? I still see it happening, even when I remove the tests in profile_experiment_mixin_test.py.

Yes, I’m also seeing self._env is set to None when I stop in the EmbodiedEnvironment.__del__ call on my branch. I put a breakpoint in EmbodiedEnvironment.__del__ on my fork’s main, and there I only see it called once.

I still hit the segfault, but interestingly it seems to happen at a slightly later point in execution, as now the test does pass:


============================================================= 1 passed in 70.61s (0:01:10) =============================================================

[16:21:48:290880]:[Physics] BulletPhysicsManager.cpp(35)::~BulletPhysicsManager : Deconstructing BulletPhysicsManager

GL::Context::current(): no current context

Backtrace from the first time the breakpoint is hit in my branch

-> self.close()

(Pdb) p self._env

None

(Pdb) p self.__class__

<class 'tbp.monty.simulators.habitat.environment.HabitatEnvironment'>

(Pdb) p self.__class__.__bases__

(<class 'tbp.monty.frameworks.environments.embodied_environment.EmbodiedEnvironment'>,)

(Pdb) bt

/miniconda3/envs/tbp.monty/bin/pytest(8)<module>()

-> sys.exit(console_main())

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py(187)console_main()

-> code = main()

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py(164)main()

-> ret: Union[ExitCode, int] = config.hook.pytest_cmdline_main(

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(315)pytest_cmdline_main()

-> return wrap_session(config, _main)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(268)wrap_session()

-> session.exitstatus = doit(config, session) or 0

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(322)_main()

-> config.hook.pytest_runtestloop(session=session)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(347)pytest_runtestloop()

-> item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(111)pytest_runtest_protocol()

-> runtestprotocol(item, nextitem=nextitem)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(130)runtestprotocol()

-> reports.append(call_and_report(item, "call", log))

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(219)call_and_report()

-> call = call_runtest_hook(item, when, **kwds)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(258)call_runtest_hook()

-> return CallInfo.from_call(

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(338)from_call()

-> result: Optional[TResult] = func()

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(259)<lambda>()

-> lambda: ihook(item=item, **kwds), when=when, reraise=reraise

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(166)pytest_runtest_call()

-> item.runtest()

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/unittest.py(327)runtest()

-> self._testcase(result=self) # type: ignore[arg-type]

/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(736)__call__()

-> return self.run(*args, **kwds)

/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(676)run()

-> self._callTestMethod(testMethod)

/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(633)_callTestMethod()

-> method()

/tbp/tbp.monty/tests/unit/profile_experiment_mixin_test.py(159)test_run_train_epoch_is_profiled()

-> exp.run_epoch()

/tbp/tbp.monty/src/tbp/monty/frameworks/experiments/monty_experiment.py(644)__exit__()

-> self.close()

/tbp/tbp.monty/src/tbp/monty/frameworks/experiments/profile.py(135)close()

-> super().close()

/tbp/tbp.monty/src/tbp/monty/frameworks/experiments/monty_experiment.py(615)close()

-> self.dataloader.close()

> /tbp/tbp.monty/src/tbp/monty/frameworks/environments/embodied_environment.py(120)__del__()

-> self.close()

Backtrace from the second time the breakpoint is hit

-> self.close()

(Pdb) bt

/miniconda3/envs/tbp.monty/bin/pytest(8)<module>()

-> sys.exit(console_main())

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py(187)console_main()

-> code = main()

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/config/__init__.py(164)main()

-> ret: Union[ExitCode, int] = config.hook.pytest_cmdline_main(

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(315)pytest_cmdline_main()

-> return wrap_session(config, _main)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(268)wrap_session()

-> session.exitstatus = doit(config, session) or 0

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(322)_main()

-> config.hook.pytest_runtestloop(session=session)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/main.py(347)pytest_runtestloop()

-> item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(111)pytest_runtest_protocol()

-> runtestprotocol(item, nextitem=nextitem)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(130)runtestprotocol()

-> reports.append(call_and_report(item, "call", log))

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(219)call_and_report()

-> call = call_runtest_hook(item, when, **kwds)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(258)call_runtest_hook()

-> return CallInfo.from_call(

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(338)from_call()

-> result: Optional[TResult] = func()

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(259)<lambda>()

-> lambda: ihook(item=item, **kwds), when=when, reraise=reraise

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_hooks.py(513)__call__()

-> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_manager.py(120)_hookexec()

-> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/pluggy/_callers.py(103)_multicall()

-> res = hook_impl.function(*args)

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/runner.py(166)pytest_runtest_call()

-> item.runtest()

/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/_pytest/unittest.py(327)runtest()

-> self._testcase(result=self) # type: ignore[arg-type]

/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(736)__call__()

-> return self.run(*args, **kwds)

/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(676)run()

-> self._callTestMethod(testMethod)

/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py(633)_callTestMethod()

-> method()

> /tbp/tbp.monty/src/tbp/monty/frameworks/environments/embodied_environment.py(120)__del__()

-> self.close()

Then when I continue execution, the error happens

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB continue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

[16:09:02:453709]:[Physics] BulletPhysicsManager.cpp(35)::~BulletPhysicsManager : Deconstructing BulletPhysicsManager

GL::Context::current(): no current context

Fatal Python error: Aborted

Current thread 0x0000000202a5d200 (most recent call first):

File "/miniconda3/envs/tbp.monty/lib/python3.8/site-packages/habitat_sim/simulator.py", line 152 in close

File "/tbp/tbp.monty/src/tbp/monty/simulators/habitat/simulator.py", line 631 in close

File "/tbp/tbp.monty/src/tbp/monty/simulators/habitat/environment.py", line 178 in close

File "/tbp/tbp.monty/src/tbp/monty/frameworks/environments/embodied_environment.py", line 120 in __del__

File "/miniconda3/envs/tbp.monty/lib/python3.8/unittest/case.py", line 633 in _callTestMethod

Ah, I didn’t notice you were working on a branch. On your branch, I can reproduce the issue. Interestingly, it looks like there are two separate instances of HabitatSim getting close called on them. When I step through that test with the debugger, I get the first call to close from the context manager __exit__ call chain, and then a call to __del__ on the EmbodiedEnvironment, where _env is already None, so it becomes a no-op. Then I see a second call to __del__ on a different instance of EmbodiedEnvironment, which has a separate HabitatSim instance, and calling close on that is what’s causing the second close that crashes the interpreter.

I think that’s where the problem is coming from, though I haven’t looked at your changes to figure out why we might be getting more than one. Ideally, the HabitatSim class would be a singleton to prevent this sort of problem, but it hasn’t been an issue before.

I set two separate break points to find this, one on HabitatSim.close and the other on EmbodiedEnvironment.__del__.

I hope that helps.

1 Like

Thank you, that did help! I put a breakpoint in the HabitatSim __init__ method to trace where the duplicate creation stems from, and I see what’s going on now and why: before my changes, the code created a single EnvironmentDataset instance, which handled creating the HabitatSim env variable. Then the code created two EnvironmentDataloader instances: one for the eval dataloader, one for the train dataloader, passing the single EnvironmentDataset instance holding the single HabitatSim instance to the constructor.

main:

    def create_data_loader(self, dataloader_class, dataloader_args):
       dataloader = dataloader_class(
            **dataloader_args,
            dataset=self.dataset,
            motor_system=self.model.motor_system,
            rng=self.rng,
        )

my branch:

    def load_dataloaders(self, config):
        …
           train_dataloader_args = dict(
                env_init_func=dataset_args["env_init_func"],
                env_init_args=dataset_args["env_init_args"],
                transform=dataset_args["transform"],
                **config["train_dataloader_args"],
            )

            self.train_dataloader = self.create_data_loader(
                train_dataloader_class, train_dataloader_args
            )
    …
           eval_dataloader_args = dict(
                env_init_func=dataset_args["env_init_func"],
                env_init_args=dataset_args["env_init_args"],
                transform=dataset_args["transform"],
                **config["eval_dataloader_args"],
            )

            self.eval_dataloader = self.create_data_loader(
                eval_dataloader_class, eval_dataloader_args
            )

…
def create_data_loader(self, dataloader_class, dataloader_args):
       …
       dataloader = dataloader_class(
            **dataloader_args, # ← NOTE: no dataset arg
            motor_system=self.model.motor_system,
            rng=self.rng,
        )
    …

The problem in my code stems from the fact that each dataloader holds a pointer to an env_init_func to initialize the env, so two get initialized.

Agreed! I’ll look into reworking my changes to support that approach or something similar.

Thanks again for your assistance.

That worked! The test now passes, no more segfault. There are still failing tests in the wider test suite that I haven’t yet gotten to addressing, but this is great progress.

Does this approach to implementing the singleton align with the codebase’s conventions? I didn’t see existing methods.

Thanks,
Anna

Great debugging y’all :grinning_face:.

Before committing to a singleton… we fully control how the configuration and all the components are created.

Is something preventing creation of only one HabitatSim without using a singleton pattern in the first place?

1 Like

I misspoke when I suggested that HabitatSim should be a singleton. I was mixing up the classes, since that’s our wrapper around the Habitat library simulator.

It’s really the underlying simulator from the Habitat library that needs to be changed to better handle resource management internally. That library has been a source of pain because it’s not a well behaved Python extension. It crashes the interpreter on a regular basis, sometimes simply because the library is misconfigured or missing a dependency and that’s not the way an extension should be written.

I don’t know that we need to go so far as to make our HabitatSim class an actual singleton though. That tends to cause more problems than it solves because it breaks expectations.

1 Like

No, I think I can propose an approach that doesn’t use it.

Thanks to you both for your suggestions and added context.

2 Likes

Hello again,

I considered a few different approaches to creating a single HabitatSim (without resorting to singleton pattern), and have landed on the following as the simplest approach:

  1. Move environment instantiation into MontyExperiment.rb:
class MontyExperiment:
    # ...
    def init_env(self, env_init_func, env_init_args):
        env = env_init_func(**env_init_args)
        assert isinstance(env, EmbodiedEnvironment)

        return env
  1. Pass the environment instance into the EnvironmentDataloader (soon to be renamed to EnvironmentInterface) as an init param. That way we ensure only a single instance of HabitatSim (or any environment) when there are multiple EnvironmentDataloader classes being created (eg both eval and train stages).

A potential con with this approach is the introduction of the EmbodiedEnvironment concept into the MontyExperiment class, and I wondered if that raised any concerns.

The other approach I considered was to maintain a separate class for the initialization of the environment and for data generation steps, which would be passed into EnvironmentDataloader. However that was essentially just renaming EnvironmentDataset to something like EnvironmentHandler, and passing that into EnvironmentDataloader (which will be renamed to EnvironmentInterface), resulting in less clear a separation of concerns than the first approach.

Here’s the work to do that first approach. As before, this is still a WIP, and there are more test failures I need to address, but I wanted to ensure alignment with the team before going any further.

Thanks!
Anna

Hi @annark,

I think that the environment concept belongs in the experiment, so it does not raise concerns for me. In a future setup where Monty system, Experiment, and Environment are separated (once we address the undesirable effect (106 Monty system, Experiment, and Environment are all coupled together.), Experiment is the one coordinating between Monty system and the Environment and needs to know about both.

1 Like

Great! I’ll continue with this approach. Thanks for the feedback.

1 Like