Hi all,
I’ve been looking into the test errors reported in issue #117 because I ran into them while trying to work on some other test related things that @tslominski suggested would be good places to jump in and help with.
It’s been interesting to say the least, and I wanted to take a moment to share my findings, and also discuss some possible improvements since I don’t want to just drop a pull request on y’all without talking about it first, being an outsider and all.
The issue
To start with, what I and the reporter of that issue are seeing, them on Ubuntu and me on NixOS (both Linux distributions), is that when running the full test suite using pytest
, two tests are failing, both in graph_learning_test.py
:
- The
test_can_run_disp_experiment
test crashes the test worker. @tslominski says this is because HabitatSim is most likely crashing the interpreter. - The
test_5lm_feature_experiment
test fails with a KeyError. @vclay mentioned in the issue this has something to do with a learning module not seeing a vote it’s expecting to see.
Now, here’s where the story gets a little weird.
- If I try running just that test file using
pytest tests/unit/graph_learning_test.py
, the crash disappears and we’re just left with the KeyError. - If I mark that test with the KeyError using
pytest.mark.skip
to skip it, then the whole suite passes without any crashes.
I’m still not entirely sure what is interacting with what here, but I did find that if I make some of the changes I suggested in the issue (which I don’t think are the right changes really) to make the KeyError go away, then the whole suite passes.
Correctly closing open resources
My suspicion is that when the KeyError is thrown, we don’t ever call self.exp.close()
which in turn closes the dataset and the HabitatSim environment underneath it, causing it to leave things that interfere with other tests. I don’t know enough about how that works to know what that could be or if there’s something we could do about isolating things more.
But, one thing that jumps out at me looking at how the tests are written is this pattern of:
- create an experiment
- call
setup_experiment
passing in the config, which loads the dataset and environment as well - do things with the experiment
- call
close
on the experiment
To me this is suggests turning MontyExperiment
into a context manager, though I’m worried that a lot of the methods like setup_experiment
and close
are doing a lot of things instead of one or two things, e.g. setup_experiment
takes the config but also loads the dataset, close
closes the dataset but also closes the loggers for some reason (also seems weird to me coming from working on Django apps, but I guess it’s because we didn’t use logging handlers or Django handled that for us).
Initial proposed changes
In discussions with @tslominski I’ve proposed just adding some basic __enter__
and __exit__
methods to MontyExperiment
in order to turn them into basic context managers that at the very least call close
in the exit method and then updating the tests to use with
statements for the experiments.
def __enter__(self):
return self
def __exit__(self, ...):
self.close()
return False # don't silence exceptions
Ideal proposed changes
Ideally, I think MontyExperiment
could be improved in a couple of other ways while making it into a context manager.
- move the config setup code from
setup_experiment
into a__init__
method so it happens on construction - move the dataset loading code into
__enter__
or a separate method that can be called by__enter__
so we’re initializing things there. - move the dataset closing logic into
__exit__
or figure out a better way to close the loggers outside ofclose
.
Then to use it, it would change things to look like this:
# could also be a subclass of MontyExperiment, same concept
with MontyExperiment(config) as exp:
# do things with `exp`
exp.run_episode()
# `exp.close` logic automatically called when the scope exits
That’s hiding a lot of things because __enter__
will get called on the exp
when the with
context is entered and __exit__
is called on exit (even in exceptional cases). That would prevent any problems with a test throwing an exception (even an expected one) and leaving unclosed resources while unwinding the stack.
Unanswered questions
But, that leads me to things like some of the unit tests that do things like reach inside the experiment object and close the dataset directly, e.g. self.exp.dataset.close()
.
- Are we doing that because we want to close the dataset but not close the loggers? Is it an older pattern that’s not used anymore?
- We’re calling
close
inside of theMontyExperiment
train
, andevaluate
methods. If we move to a context manager model, would removing those calls break something I’m not aware of? That seems to make for an inconsistent interface, i.e. sometimes you have to callexp.close()
and sometimes you don’t.
First understand
I’m a big believer in Chesterton’s fence…
Don’t take down a fence until you know why it was put up in the first place.
…so I don’t want to go changing things without understanding why they are the way they are currently.
I’m gonna go ahead and see about making the simple changes I mentioned above, leaving the more complex refactor until I have answers to some of these questions.
That was a lot of text, and maybe there’s a better place to ask these questions or propose these changes (maybe an RFC?), so apologies if this isn’t the right venue.
I’d love to know what people who work on this more than me think and whether they have any answers to the questions I posed above.
Cheers,
Jeremy