Documentation PDF export

Hello everyone,

Would it be possible to export the documentation to a single PDF file? This would allow the PDF to be used as context in an LLM.

2 Likes

This seems like a useful request. Along the same lines, exporting forum threads might also be useful. (Although I’ve had pretty good luck asking ChatGPT about Monty, I have no clue about how it’s retrieving its base info.)

Using Python, I fused all the markdowns from the TBP docs into a single file: https://gist.github.com/AgentRev/ea303c6a86ca430a0d119a308881f868

To download it, right-click this link then “Save link as”.

As for the forums, it’s a bit trickier since Discourse doesn’t offer a “download everything” option. I found a third-party tool that might help, although I haven’t tested it myself, and I dunno if the end result is LLM-friendly or not:

3 Likes

Hi @Maggus,

I’d echo what @AgentRev said, you can also just zip up the /docs directory after you git clone our repo and upload them to something like Google NotebookLM.

Fair warning, our content is considerably outside the distribution of what LLMs understand and therefore they hallucinate a fair amount about sensorimotor AI. That might improve over time.

@Rich_Morin I suspect we’re getting a lot of LLM crawlers

1 Like

As @brainwaves opined:

I suspect we’re getting a lot of LLM crawlers.

Indeed. And, if current trends are any indication, we’re likely to get far more crawlers over time. I’d also note that this thread is motivated by attempts to capture TBP content and use it to create and/or supplement LLM (etc) context. In short, the general level of interest seems to be growing.

Taking this as a starting point, here are some follow-up questions and some (biased and probably incomplete) answers:

Q: What published and/or archival materials are available?

  • Current and historic versions of the project’s source code are available via GitHub.

  • The project web site contains the current documentation, along with navigation and overview pages, etc. Both current and historic versions of this material are available on GitHub.

  • The Discourse group contains a wealth of commentary, explanations, notions, etc. Dunno whether this is being archived elsewhere (e.g., on GitHub). IMNSHO, it should be…

  • A large number of TBP (and TBP-adjacent) videos have been published on YouTube.

  • Assorted books and papers have been published.

Q: How accessible are these materials by humans and/or LLMs?

The GitHub copies of the web site content have not been expanded into HTML, etc. This has both benefits and drawbacks:

  • An LLM might be able to interpret the markup “source code” (e.g., Markdown, Mermaid) much more easily (and effectively) than the expanded (e.g., HTML, image) versions.

  • Digging through GitHub repos and versions can be pretty painful for humans, but an LLM might have less trouble…

The YouTube-hosted videos (IMHO) aren’t all that accessible by either humans or LLMs:

  • The audio quality is very uneven.
  • The videos may not be divided into chapters.
  • Clean, machine-readable transcripts are not available.

I dunno how many of the books and papers are freely accessible (let alone in LLM-friendly formats), indexed, etc.

1 Like

Hello.

I happen to be doing some work on llm-based tools and decided to play with the docs from TBP as a dataset. I decided to use videos too so ran whispher and some llm cleanup on them, only to find out that the transcripts already existed in youtube videos themselves, and were much higher quality :frowning: …

Downloaded those and post-processed them a bit (very basically, more postprocessing e.g disfluency, summarization lands somewhere else). You can find all the resulting Markdown here monty-video-transcription/youtube/transcripts at main · nunoatgithub/monty-video-transcription · GitHub

Nuno

4 Likes

FYI, there is now a cleaned up copy of the transcripts here

4 Likes