New task was created
Sanitize HTML using bleach
Created on Tuesday 30 June 2020, 16:54
Back to task list-
ID32105
-
ProjectMetabolism of Cities
-
StatusOpen
-
PriorityMedium
-
TypeProgramming work
-
Assigned toNo one yet
Description
Programatically speaking, we have two levels of users:
- Users we trust (barely anyone)
- Users we don't trust (default mode)
We allow users to contribute content using Markdown in various places, and this is sanitized using bleach (see models.py
-> when every Record
is saved there is a copy of the description
field that gets recorded in description_html
, which is the converted and sanitized version of the description
field. This is what we display on the page.
Now, in addition to Markdown-based contributions we also have regular pages (Webpage
in our models), which can use markdown, or they may use HTML. The content gets sanitized accordingly. When HTML content is used, it gets printed unsanitized using Django's |safe
filter. This is fine and well, but it would make sense that we do sanitize this content a bit. There are just no reasons for things like <script> tags being entered by even our trusted admin users. If more complex coding is needed, we can hard-code it on the page.
So this task is about embedding that user input validation model. I'm thinking of simply applying bleach with whitelisted tags and attributes. This page seems to have a nice starting point to figure out which tags can go, but they mention that certain attributes should be further filtered (e.g. for img tags). So we should look into that. If someone is keen to work on this, give me a shout and I can quickly point you to the lines of code where this should be set up (it may change between now and the time this gets worked on).