Investigation into Linux test failures and possible improvements

codingkoi · February 27, 2025, 6:31pm

Hi all,

I’ve been looking into the test errors reported in issue #117 because I ran into them while trying to work on some other test related things that @tslominski suggested would be good places to jump in and help with.

It’s been interesting to say the least, and I wanted to take a moment to share my findings, and also discuss some possible improvements since I don’t want to just drop a pull request on y’all without talking about it first, being an outsider and all.

The issue

To start with, what I and the reporter of that issue are seeing, them on Ubuntu and me on NixOS (both Linux distributions), is that when running the full test suite using pytest, two tests are failing, both in graph_learning_test.py:

The test_can_run_disp_experiment test crashes the test worker. @tslominski says this is because HabitatSim is most likely crashing the interpreter.
The test_5lm_feature_experiment test fails with a KeyError. @vclay mentioned in the issue this has something to do with a learning module not seeing a vote it’s expecting to see.

Now, here’s where the story gets a little weird.

If I try running just that test file using pytest tests/unit/graph_learning_test.py, the crash disappears and we’re just left with the KeyError.
If I mark that test with the KeyError using pytest.mark.skip to skip it, then the whole suite passes without any crashes.

I’m still not entirely sure what is interacting with what here, but I did find that if I make some of the changes I suggested in the issue (which I don’t think are the right changes really) to make the KeyError go away, then the whole suite passes.

Correctly closing open resources

My suspicion is that when the KeyError is thrown, we don’t ever call self.exp.close() which in turn closes the dataset and the HabitatSim environment underneath it, causing it to leave things that interfere with other tests. I don’t know enough about how that works to know what that could be or if there’s something we could do about isolating things more.

But, one thing that jumps out at me looking at how the tests are written is this pattern of:

create an experiment
call setup_experiment passing in the config, which loads the dataset and environment as well
do things with the experiment
call close on the experiment

To me this is suggests turning MontyExperiment into a context manager, though I’m worried that a lot of the methods like setup_experiment and close are doing a lot of things instead of one or two things, e.g. setup_experiment takes the config but also loads the dataset, close closes the dataset but also closes the loggers for some reason (also seems weird to me coming from working on Django apps, but I guess it’s because we didn’t use logging handlers or Django handled that for us).

Initial proposed changes

In discussions with @tslominski I’ve proposed just adding some basic __enter__ and __exit__ methods to MontyExperiment in order to turn them into basic context managers that at the very least call close in the exit method and then updating the tests to use with statements for the experiments.

def __enter__(self):
    return self

def __exit__(self, ...):
    self.close()
    return False # don't silence exceptions

Ideal proposed changes

Ideally, I think MontyExperiment could be improved in a couple of other ways while making it into a context manager.

move the config setup code from setup_experiment into a __init__ method so it happens on construction
move the dataset loading code into __enter__ or a separate method that can be called by __enter__ so we’re initializing things there.
move the dataset closing logic into __exit__ or figure out a better way to close the loggers outside of close.

Then to use it, it would change things to look like this:

# could also be a subclass of MontyExperiment, same concept
with MontyExperiment(config) as exp:
    # do things with `exp`
    exp.run_episode()
    # `exp.close` logic automatically called when the scope exits

That’s hiding a lot of things because __enter__ will get called on the exp when the with context is entered and __exit__ is called on exit (even in exceptional cases). That would prevent any problems with a test throwing an exception (even an expected one) and leaving unclosed resources while unwinding the stack.

Unanswered questions

But, that leads me to things like some of the unit tests that do things like reach inside the experiment object and close the dataset directly, e.g. self.exp.dataset.close().

Are we doing that because we want to close the dataset but not close the loggers? Is it an older pattern that’s not used anymore?
We’re calling close inside of the MontyExperiment train, and evaluate methods. If we move to a context manager model, would removing those calls break something I’m not aware of? That seems to make for an inconsistent interface, i.e. sometimes you have to call exp.close() and sometimes you don’t.

First understand

I’m a big believer in Chesterton’s fence…

Don’t take down a fence until you know why it was put up in the first place.

…so I don’t want to go changing things without understanding why they are the way they are currently.

I’m gonna go ahead and see about making the simple changes I mentioned above, leaving the more complex refactor until I have answers to some of these questions.

That was a lot of text, and maybe there’s a better place to ask these questions or propose these changes (maybe an RFC?), so apologies if this isn’t the right venue.

I’d love to know what people who work on this more than me think and whether they have any answers to the questions I posed above.

Cheers,
Jeremy

vclay · February 28, 2025, 12:12pm

Hi @codingkoi

this is great to see! Thank you for having such an in-depth look at these issues and our code and proposing those nice changes. I just reviewed your PR and think it’s a nice idea to turn the MontyExperiment into a context manager.

Regarding your unanswered questions:

Are we doing that because we want to close the dataset but not close the loggers? Is it an older pattern that’s not used anymore?

As far as I can tell, we call self.exp.dataset.close() explicitly in the unit tests whenever we don’t call the train() or evaluate() functions in a test. Those two functions are the ones that call .close() on the MontyExperiment and in some unit tests we don’t want to run an entire training or evaluation cycle but instead just test a small method of MontyExperiment like setup_experiment().

We’re calling close inside of the MontyExperiment train, and evaluate methods. If we move to a context manager model, would removing those calls break something I’m not aware of? That seems to make for an inconsistent interface, i.e. sometimes you have to call exp.close() and sometimes you don’t.

I commented on this in the PR already but just for completeness here: As far as I can tell, the main thing that would break is code outside the tbp.monty repository that currently doesn’t use the context manager approach when starting an experiment. For example our monty_lab repository contains several run.py and run_parallel.py scripts and I am not sure if any people outside of our internal team have other scripts that rely on this interface. This is not to say that we can’t change it. Better to change the interface now than later.

I hope this helps! Thanks again for the great contribution and welcome to the TBP community

-Viviane

Rich_Morin · February 28, 2025, 8:06pm

Thanks for looking into this, suggesting specific changes, etc!

Assuming that it doesn’t hamper the team’s efforts, I applaud the idea of refactoring Monty’s interfaces to centralize common functionality. Aside from reducing the incidence of cut-and-paste programming (and related problems), this can provide a good place to extend capabilities, ease cross-platform development, etc.

Speaking of which, I’ve commented before on the idea of centralizing all of the CMP I/O:

What are the prospects for doing this? Also, could you comment on the amount of OO-fu you’ve seen in the code base? Here’s a related post:

codingkoi · March 3, 2025, 2:50pm

Hi Rich,

I don’t know a whole lot about the code base just yet. I only just started looking about it a few weeks ago having only been introduced to the project by @tslominski around the same time.

This work grew out of me going through the getting started guide and running into problems while running the test suite on my Linux machine. Then I started looking into why there are differences between running the tests on Linux vs MacOS vs in the Github Actions (which should be the same as Linux), and found this issue with tests interfering with each other because of resources not being closed correctly.

Also, could you comment on the amount of OO-fu you’ve seen in the code base?

From what I’ve seen, which isn’t a whole lot, it looks to be similar to other Python code bases I’ve worked on in the past, mostly object oriented, with some procedural things mixed in here and there.

I’m not too concerned about it being too tied to object oriented concepts, as most languages have their own solutions to code reuse and dealing with the expression problem and in my experience moving from one to the other isn’t that complicated as long as the people doing it are well versed in the two languages’ approaches to the problem.

You mentioned specifically the used of inheritance, and in general that’s one feature of object oriented programming that a lot of people try to avoid these days, choosing instead to use composition of objects for shared functionality rather than hierarchical inheritance trees.

I did read a little bit of your post where you suggested Elixir as an alternative, and that was also one of the first things I thought of when I was watching the videos about the architecture of this project. The BEAM’s process model seems like a perfect fit for modelling parallel executing cortical columns that communicate with each other.

But, I’m still learning the details of the project and as an outside contributor, I don’t know what the team has planned. I definitely don’t want to be like one of those annoying people who comes into an open source project and suggests “Why don’t you rewrite it in Rust?” (or some other language).

I’m still very much in the “understand why things are the way they are” phase and don’t want to suggest any huge changes until I understand better.

Topic		Replies	Views
Pytest, 1 failed test Monty Code	4	56	June 15, 2025
Monty, pytest: 10 failed and 148 passed tests Monty Code	4	146	March 11, 2025
It's ALIVE! ⚡️ The source code is now on GitHub! General	9	236	December 16, 2024
Questions regarding environment setup Monty Code	5	99	February 20, 2025
Missing Imports in Tutorials - Leads to "name 'MyExperiments' is not defined" Error Monty Code	5	33	June 24, 2025