New task was created
Sanitize HTML using bleach
Created on Tuesday 30 June 2020, 16:54Back to task list
ProjectMetabolism of Cities
Assigned toNo one yet
Programatically speaking, we have two levels of users:
- Users we trust (barely anyone)
- Users we don't trust (default mode)
We allow users to contribute content using Markdown in various places, and this is sanitized using bleach (see
models.py -> when every
Record is saved there is a copy of the
description field that gets recorded in
description_html, which is the converted and sanitized version of the
description field. This is what we display on the page.
Now, in addition to Markdown-based contributions we also have regular pages (
Webpage in our models), which can use markdown, or they may use HTML. The content gets sanitized accordingly. When HTML content is used, it gets printed unsanitized using Django's
|safe filter. This is fine and well, but it would make sense that we do sanitize this content a bit. There are just no reasons for things like <script> tags being entered by even our trusted admin users. If more complex coding is needed, we can hard-code it on the page.
So this task is about embedding that user input validation model. I'm thinking of simply applying bleach with whitelisted tags and attributes. This page seems to have a nice starting point to figure out which tags can go, but they mention that certain attributes should be further filtered (e.g. for img tags). So we should look into that. If someone is keen to work on this, give me a shout and I can quickly point you to the lines of code where this should be set up (it may change between now and the time this gets worked on).