Sanitize HTML using bleach

Created on Tuesday 30 June 2020, 16:54

Back to task list
  • ID
  • Project
    Metabolism of Cities
  • Status
  • Priority
  • Type
    Programming work
  • Assigned to
    No one yet

You are not logged in

Log In Register

Please join us and let's build things, together!


Programatically speaking, we have two levels of users:

  1. Users we trust (barely anyone)

  2. Users we don't trust (default mode)

We allow users to contribute content using Markdown in various places, and this is sanitized using bleach (see -> when every Record is saved there is a copy of the description field that gets recorded in description_html, which is the converted and sanitized version of the description field. This is what we display on the page.

Now, in addition to Markdown-based contributions we also have regular pages (Webpage in our models), which can use markdown, or they may use HTML. The content gets sanitized accordingly. When HTML content is used, it gets printed unsanitized using Django's |safe filter. This is fine and well, but it would make sense that we do sanitize this content a bit. There are just no reasons for things like <script> tags being entered by even our trusted admin users. If more complex coding is needed, we can hard-code it on the page.

So this task is about embedding that user input validation model. I'm thinking of simply applying bleach with whitelisted tags and attributes. This page seems to have a nice starting point to figure out which tags can go, but they mention that certain attributes should be further filtered (e.g. for img tags). So we should look into that. If someone is keen to work on this, give me a shout and I can quickly point you to the lines of code where this should be set up (it may change between now and the time this gets worked on).

Discussion and updates

New task was created