There are common needs across open source or academic projects that sometimes continually hit us in the face. They become such regular annoyances, in fact, that we even stop seeing them. Such an example is the simple task of checking static links in any kind of documentation or code. You know, given that you have umpteen pages of docs, how could you easily check that the links aren’t broken?
I can give an example. For the usrse website one of the original creators had set it up to use html-proofer that was using an underlying library called typhoeus to implement the checking. For a general request, it worked most of the time but as you can see from the link, since the library has no implementation for a retry, this means that a failing link is common. It was so common, in fact, that I started to look into how to go about addressing it. Since I was always one to quickly respond to CI failures, the burden of re-running these failed checks on merges to master (after the same commit passed for a pull request) was repeatedly on me.
At this point I started to keep my eyes open for other tools in the ecosystem that might be able to provide such an easy service, checking urls in static files. I also had my eye on the lookout for a tool that would best serve the scientific community, likely meaning something in Python. It’s not to say that other languages aren’t equally good, but rather if something breaks or needs a look in terms of the code, if the language is something familar, it’s easier for the community to adopt (e.g., html-proofer is in ruby, which isn’t common amongst scientific programmers). It was after a few months that I stumbled on the urlstechie organization, which was created by SuperKogito for this exact purpose.
I was pumped! I forged ahead to open up issues for the features that I saw important, and wound up doing a few major pull requests:
- Adding retry parameter
- Prettify-ing the interface
- Optimizing for GitHub actions (local checkout)
- White listing files
- Refactoring action to use urlchecker-python
The last one was hugely fun, and I did yesterday. What exactly is urlchecker-python? Let’s talk about that next.
Here’s the thing - although the original repository
urlstechie/URLs-checker was a GitHub action,
it was sort of a Python module and GitHub action squished into one. This happens sometimes when we
create small snippets of code intended to run as actions, but then realize we want to extend them
beyond that. Being able to reproduce an action locally using the same exact underlying tooling
is hugely important for developers to be able to do - if I’m going to run the urlchecker for a GitHub workflow test,
I want to be able to run it locally to reproduce the same tests. We thus
decided to embark on gutting out the core of the GitHub Action (at that time)
and creating a separate Python library. And tada, here it is!.
You can install it with pip:
pip install urlchecker
And you can now easily check a repository (documentation and code) locally, using the same parameters that are plugged into the GitHub action:
urlchecker check .
Here is an example run. This is the same action that is run for the awesome-rseng repository. This command says to check all markdown files, but skip the files in docs in the present working directory (.).
urlchecker check --white-listed-files docs --file-types .md .
So if you have struggled with checking static links in the past, look no further! Here is a quick example to get you started:
name: URLChecker on: [pull_request] jobs: check-urls: runs-on: ubuntu-latest steps: - name: Checkout Actions Repository uses: actions/checkout@v2 - name: Test GitHub Action uses: firstname.lastname@example.org with: file_types: .md,.py white_listed_files: docs
I’m having a lot of fun developing these tools, so please open an issue if you have any questions, feature requests, or just want to say hello! We have a fabulous old school style website with information about urlstechie.
If you are interested in contributing, please send us a note!