Monty, pytest: 10 failed and 148 passed tests

Hi!

I’m getting up to speed with the Monty project. It’s really quite incredible what you’ve managed to achieve since Nov of 2021!

Also, thank you so much for sharing the project videos, research talks and roadmap discussions. It’s been a blast to follow along.

Viviane, you have a great way of presenting things that can break down even difficult concepts in a very clear and also gleeful way. I am in awe.

With regards to the Monty code, I tried following along the getting-started guide, and it seems I have 10 tests that fail, while the other 148 pass.

========================================================== FAILURES ===========================================================
_______________________________________________ tests/unit/base_config_test.py ________________________________________________
[gw1] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw1' crashed while running 'tests/unit/base_config_test.py::BaseConfigTest::test_can_run_eval_epoch'
_______________________________________________ tests/unit/base_config_test.py ________________________________________________
[gw0] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw0' crashed while running 'tests/unit/base_config_test.py::BaseConfigTest::test_can_run_episode'
_______________________________________________ tests/unit/base_config_test.py ________________________________________________
[gw3] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw3' crashed while running 'tests/unit/base_config_test.py::BaseConfigTest::test_can_save_and_load'
_______________________________________________ tests/unit/evidence_lm_test.py ________________________________________________
[gw2] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw2' crashed while running 'tests/unit/evidence_lm_test.py::EvidenceLMTest::test_evidence_time_out'
______________________________________________ tests/unit/graph_learning_test.py ______________________________________________
[gw4] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw4' crashed while running 'tests/unit/graph_learning_test.py::GraphLearningTest::test_reproduce_multiple_episodes'
__________________________________________________ tests/unit/policy_test.py __________________________________________________
[gw5] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw5' crashed while running 'tests/unit/policy_test.py::PolicyTest::test_can_run_curv_informed_policy'
_______________________________________________ tests/unit/base_config_test.py ________________________________________________
[gw6] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw6' crashed while running 'tests/unit/base_config_test.py::BaseConfigTest::test_logging_info_level'
______________________________________________ tests/unit/graph_learning_test.py ______________________________________________
[gw7] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw7' crashed while running 'tests/unit/graph_learning_test.py::GraphLearningTest::test_can_run_eval_episode_with_surface_agent'
__________________________________________________ tests/unit/policy_test.py __________________________________________________
[gw8] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw8' crashed while running 'tests/unit/policy_test.py::PolicyTest::test_surface_policy_moves_back_to_object'
_______________________________________________ tests/unit/evidence_lm_test.py ________________________________________________
[gw9] linux -- Python 3.8.20 /home/u/.conda/envs/tbp.monty/bin/python
worker 'gw9' crashed while running 'tests/unit/evidence_lm_test.py::EvidenceLMTest::test_moving_off_object_5lms'
========================================== xdist: maximum crashed workers reached: 8 ==========================================
=================================================== short test summary info ===================================================
FAILED tests/unit/base_config_test.py::BaseConfigTest::test_can_run_eval_epoch
FAILED tests/unit/base_config_test.py::BaseConfigTest::test_can_run_episode
FAILED tests/unit/base_config_test.py::BaseConfigTest::test_can_save_and_load
FAILED tests/unit/evidence_lm_test.py::EvidenceLMTest::test_evidence_time_out
FAILED tests/unit/graph_learning_test.py::GraphLearningTest::test_reproduce_multiple_episodes
FAILED tests/unit/policy_test.py::PolicyTest::test_can_run_curv_informed_policy
FAILED tests/unit/base_config_test.py::BaseConfigTest::test_logging_info_level
FAILED tests/unit/graph_learning_test.py::GraphLearningTest::test_can_run_eval_episode_with_surface_agent
FAILED tests/unit/policy_test.py::PolicyTest::test_surface_policy_moves_back_to_object
FAILED tests/unit/evidence_lm_test.py::EvidenceLMTest::test_moving_off_object_5lms
=============================================== 10 failed, 148 passed in 15.90s ===============================================

What is the best way to debug this and provide more detailed logs for figuring out what the underlying issue may be?

Can’t wait to get more involved with the project!

Cheerful regards,
Robin

2 Likes

Hi @mewmew :wave:

When I encounter the test worker crashing error, the first place I check is HabitatSim. The failure you’re observing may be caused by HabitatSim crashing/segfaulting, which crashes Python, which then crashes the test worker, leaving no logs or debugging information otherwise.

Running a single HabitatSim test should pinpoint whether this is the case:

pytest tests/unit/habitat_sim_test.py -k 'test_create_environment'

If the above fails, then to get at the logs, it may help to comment out all the other tests in tests/unit/habitat_sim_test.py and run:

python tests/unit/habitat_sim_test.py

At that point, you should see some Habitat-specific logs that may point towards what’s going on.

Hi @tslominski,

Thanks for the suggestions.

I tried run only the test_create_environment HabitatSim test, and it does fail.

So, I (temporarily) removed the other tests from tests/unit/habitat_sim_test.py, and ran python tests/unit/habitat_sim_test.py as suggested.

This gave the following output, the main concern of which seem to be the error Platform::WindowlessGlxContext: no supported framebuffer configuration found.

[12:14:48:212562]:[Metadata] AttributesManagerBase.h(380)::createFromJsonOrDefaultInternal : <Dataset>: Proposing JSON name : default.scene_dataset_config.json from original name : default| This file does not exist.
[12:14:48:212639]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (capsule3DSolid:capsule3DSolid_hemiRings_4_cylRings_1_segments_12_halfLen_0.75_useTexCoords_false_useTangents_false) created and registered.
[12:14:48:212672]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (capsule3DWireframe:capsule3DWireframe_hemiRings_8_cylRings_1_segments_16_halfLen_1) created and registered.
[12:14:48:212698]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (coneSolid:coneSolid_segments_12_halfLen_1.25_rings_1_useTexCoords_false_useTangents_false_capEnd_true) created and registered.
[12:14:48:212722]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (coneWireframe:coneWireframe_segments_32_halfLen_1.25) created and registered.
[12:14:48:212742]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (cubeSolid:cubeSolid) created and registered.
[12:14:48:212761]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (cubeWireframe:cubeWireframe) created and registered.
[12:14:48:212791]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (cylinderSolid:cylinderSolid_rings_1_segments_12_halfLen_1_useTexCoords_false_useTangents_false_capEnds_true) created and registered.
[12:14:48:212822]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (cylinderWireframe:cylinderWireframe_rings_1_segments_32_halfLen_1) created and registered.
[12:14:48:212851]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (icosphereSolid:icosphereSolid_subdivs_1) created and registered.
[12:14:48:212880]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (icosphereWireframe:icosphereWireframe_subdivs_1) created and registered.
[12:14:48:212911]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (uvSphereSolid:uvSphereSolid_rings_8_segments_16_useTexCoords_false_useTangents_false) created and registered.
[12:14:48:212940]:[Metadata] AssetAttributesManager.cpp(121)::createObject : Asset attributes (uvSphereWireframe:uvSphereWireframe_rings_16_segments_32) created and registered.
[12:14:48:212959]:[Metadata] AssetAttributesManager.cpp(110)::AssetAttributesManager : Built default primitive asset templates : 12
[12:14:48:213330]:[Metadata] SceneDatasetAttributesManager.cpp(35)::createObject : File (default) not found, so new default dataset attributes created  and registered.
[12:14:48:213345]:[Metadata] MetadataMediator.cpp(120)::createSceneDataset : Dataset default successfully created.
[12:14:48:213366]:[Metadata] AttributesManagerBase.h(380)::createFromJsonOrDefaultInternal : <Physics Manager>: Proposing JSON name : ./data/default.physics_config.json from original name : ./data/default.physics_config.json| This file does not exist.
[12:14:48:213397]:[Metadata] PhysicsAttributesManager.cpp(26)::createObject : File (./data/default.physics_config.json) not found, so new default physics manager attributes created and registered.
[12:14:48:213412]:[Metadata] MetadataMediator.cpp(203)::setActiveSceneDatasetName : Previous active dataset  changed to default successfully.
[12:14:48:213431]:[Metadata] AttributesManagerBase.h(380)::createFromJsonOrDefaultInternal : <Physics Manager>: Proposing JSON name : /home/u/Desktop/tbp/tbp.monty/src/tbp/monty/simulators/resources/default.physics_config.json from original name : /home/u/Desktop/tbp/tbp.monty/src/tbp/monty/simulators/resources/default.physics_config.json| This file exists.
[12:14:48:213500]:[Metadata] PhysicsAttributesManager.cpp(26)::createObject : JSON Configuration File (/home/u/Desktop/tbp/tbp.monty/src/tbp/monty/simulators/resources/default.physics_config.json) based physics manager attributes created and registered.
[12:14:48:213518]:[Metadata] MetadataMediator.cpp(66)::setSimulatorConfiguration : Set new simulator config for scene/stage : NONE and dataset : default which is currently active dataset.
Platform::WindowlessGlxContext: no supported framebuffer configuration found
WindowlessContext: Unable to create windowless context

I tried running the command under both X11 and Wayland on Linux, which gave the same error (Platform::WindowlessGlxContext: no supported framebuffer configuration found WindowlessContext: Unable to create windowless context).

Any suggestions how I may continue to troubleshoot?

Is there some information about my system that could help in the process?

$ uname -a
Linux x1 6.11.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 17 Nov 2024 16:06:17 +0000 x86_64 GNU/Linux

Cheers,
Robin

1 Like

Great, it looks like you can see the relevant Habitat logs.

It looks like the error may be coming from magnum/src/Magnum/Platform/WindowlessGlxApplication.cpp at 1d2a1c1b3e979ff3b8bd5eacadc780e3c1359945 · mosra/magnum · GitHub.

Unfortunately, I don’t have much guidance in debugging Habitat compatibility as I am not yet very familiar with Habitat.

One workaround for some issues we have in place is when running on machines without a display attached; we use xvfb. However, this workaround is dependent on the specific versions involved. You can see it in our GitHub Actions workflow specification: tbp.monty/.github/workflows/monty.yml at 97750b5e706abe4f5e564fafe23fddeb992471db · thousandbrainsproject/tbp.monty · GitHub.

1 Like

Hey, I don’t know if people are still having issues with habitat but I just wanted to share my 2 cents. My two test scenarios are a desktop and a laptop both using Linux with Nvidia GPUs.

On my laptop, I have the (mis)fortune to have a combination of Intel + Nvidia GPUs. My problem with testing on that was that anything GLX related was using the Intel GPU which lead to magnum saying it couldn’t create a context.

On my desktop, I have to access through ssh and I do export/forward X - other GL-based windows work. Similarly, magnum was complaining about context creaton; seems like software rendering has some issues on this one.

The solution (for me) in both cases was to force the tests to use the Nvidia GPU to render.

__GLX_VENDOR_LIBRARY_NAME=nvidia pytest

I hope this helps anyone who may be facing the same issues.

3 Likes