Hi all,
I’ve been looking into the test errors reported in issue #117 because I ran into them while trying to work on some other test related things that @tslominski suggested would be good places to jump in and help with.
It’s been interesting to say the least, and I wanted to take a moment to share my findings, and also discuss some possible improvements since I don’t want to just drop a pull request on y’all without talking about it first, being an outsider and all.
The issue
To start with, what I and the reporter of that issue are seeing, them on Ubuntu and me on NixOS (both Linux distributions), is that when running the full test suite using pytest, two tests are failing, both in graph_learning_test.py:
- The
test_can_run_disp_experimenttest crashes the test worker. @tslominski says this is because HabitatSim is most likely crashing the interpreter. - The
test_5lm_feature_experimenttest fails with a KeyError. @vclay mentioned in the issue this has something to do with a learning module not seeing a vote it’s expecting to see.
Now, here’s where the story gets a little weird.
- If I try running just that test file using
pytest tests/unit/graph_learning_test.py, the crash disappears and we’re just left with the KeyError. - If I mark that test with the KeyError using
pytest.mark.skipto skip it, then the whole suite passes without any crashes.
I’m still not entirely sure what is interacting with what here, but I did find that if I make some of the changes I suggested in the issue (which I don’t think are the right changes really) to make the KeyError go away, then the whole suite passes.
Correctly closing open resources
My suspicion is that when the KeyError is thrown, we don’t ever call self.exp.close() which in turn closes the dataset and the HabitatSim environment underneath it, causing it to leave things that interfere with other tests. I don’t know enough about how that works to know what that could be or if there’s something we could do about isolating things more.
But, one thing that jumps out at me looking at how the tests are written is this pattern of:
- create an experiment
- call
setup_experimentpassing in the config, which loads the dataset and environment as well - do things with the experiment
- call
closeon the experiment
To me this is suggests turning MontyExperiment into a context manager, though I’m worried that a lot of the methods like setup_experiment and close are doing a lot of things instead of one or two things, e.g. setup_experiment takes the config but also loads the dataset, close closes the dataset but also closes the loggers for some reason (also seems weird to me coming from working on Django apps, but I guess it’s because we didn’t use logging handlers or Django handled that for us).
Initial proposed changes
In discussions with @tslominski I’ve proposed just adding some basic __enter__ and __exit__ methods to MontyExperiment in order to turn them into basic context managers that at the very least call close in the exit method and then updating the tests to use with statements for the experiments.
def __enter__(self):
return self
def __exit__(self, ...):
self.close()
return False # don't silence exceptions
Ideal proposed changes
Ideally, I think MontyExperiment could be improved in a couple of other ways while making it into a context manager.
- move the config setup code from
setup_experimentinto a__init__method so it happens on construction - move the dataset loading code into
__enter__or a separate method that can be called by__enter__so we’re initializing things there. - move the dataset closing logic into
__exit__or figure out a better way to close the loggers outside ofclose.
Then to use it, it would change things to look like this:
# could also be a subclass of MontyExperiment, same concept
with MontyExperiment(config) as exp:
# do things with `exp`
exp.run_episode()
# `exp.close` logic automatically called when the scope exits
That’s hiding a lot of things because __enter__ will get called on the exp when the with context is entered and __exit__ is called on exit (even in exceptional cases). That would prevent any problems with a test throwing an exception (even an expected one) and leaving unclosed resources while unwinding the stack.
Unanswered questions
But, that leads me to things like some of the unit tests that do things like reach inside the experiment object and close the dataset directly, e.g. self.exp.dataset.close().
- Are we doing that because we want to close the dataset but not close the loggers? Is it an older pattern that’s not used anymore?
- We’re calling
closeinside of theMontyExperimenttrain, andevaluatemethods. If we move to a context manager model, would removing those calls break something I’m not aware of? That seems to make for an inconsistent interface, i.e. sometimes you have to callexp.close()and sometimes you don’t.
First understand
I’m a big believer in Chesterton’s fence…
Don’t take down a fence until you know why it was put up in the first place.
…so I don’t want to go changing things without understanding why they are the way they are currently.
I’m gonna go ahead and see about making the simple changes I mentioned above, leaving the more complex refactor until I have answers to some of these questions.
That was a lot of text, and maybe there’s a better place to ask these questions or propose these changes (maybe an RFC?), so apologies if this isn’t the right venue.
I’d love to know what people who work on this more than me think and whether they have any answers to the questions I posed above.
Cheers,
Jeremy