Implement queries so all data from a dataset is converted into a single unit

Created on Thursday 1 September 2022, 13:05

Back to task list
  • ID
    1011522
  • Project
    Metabolism of Cities Data Hub
  • Status
    Open
  • Priority
    Medium
  • Type
    Programming work
  • Tags
    MFA integration
  • Assigned to
    No one yet
  • Subscribers
    Carolin Bellstedt
    Paul Hoekman
    joe

You are not logged in

Log In Register

Please join us and let's build things, together!

Description

Objective: With the data, we are often dealing with many different units. (See list of units here.) It would save the user time, if they could upload data with varying units of the same type (e.g. volume or weight), for example of mass units in kg, tons, and short tons. The objective is that the system converts the secondary units, using the factors that exist for them to the primary unit OR that all units are converted to the same unit, using the factors. This is needed so that for the visualizations, they are all in a single unit. It will also be a useful addition to the site, which is most handy when combining data from multiple datasets into a single visualization.

Audience: This function is for someone who uploads a lot of data or is actually submitting processed data, ready to be read by the system. To save time doing manual conversions in Excel/LibreOffice etc. they ensure that the units are of the same type, but don't have to bother with factors etc.

How it works: Using the example of a dataset with three varying weight units, to test for the processing linked here and attached. (The original file is here. The first three materials in the table should look like the visualisation in the end.), let's see how this should work.

  • Following the processing of flows or stocks queue to process the file, the user starts the process.
  • In the "review of file content" stage (stage 2 out of 3), the material and reference spaces are classified, see link. So far, there is no visible "unit check" where the system confirms what the unit is. The file processing can be finalised, regardless if the units are the same or not.
  • I imagine, the unit check could already take place at this point, and not just when doing the data visualisation. Here the intervention could be that the system reads the unit column and then lists all units that were identified with the other two classification boxes, see screenshot. As opposed to the material check, it should not only analyse the first 10 records, but all.
  • The user should confirm that these units are all of the same type. There could be a link to the units (https://staf.metabolismofcities.org/units/) and a dropdown or list, where the user selects that e.g. the unit type is WEIGHT (in our example).
  • The user should then be asked to choose with which unit they would like to proceed, suggesting the primary unit (tonnes, in our case).
  • The system should convert the values that are not the target unit with the respective factors. We already have the system to convert units, but the query must be changed so that this conversion is done when extracting the data from the db.

Discussion and updates


New task was created


Screenshot of classification stage, where unit check could take place.


Quick question about this: is there a specific reason you want to convert data at the moment of storing it? Isn't it better practice to leave data in their original unit, and only convert it at the time the user extracts it from the db (e.g. when creating a visualization)? The benefits of doing that would be:

  • That you keep recorded data as close to the original data as possible, making validation and review easier
  • More flexibility because you can have one user displaying data in one unit, and another user displaying it in another
  • You are not "locked in" to the unit chosen by the uploader
  • Lastly it allows you to combine data from different sources when you generate a report or visualization and have the system display them in a single unit

I had thought about this during the writing of the task too and I figured that it doesn't have to be at this moment and I do understand your benefits. BUT, I do find that we should have a unit check there that confirms if there is one or several units. This might even help people who may have overlooked that there are different units.
Moreover, displaying different units in one dataset is misleading, since we only have a primary axis for all units, as opposed to having a secondary one. How can we even display 1 kg vs. 1 tonne on the same scale? See screenshot of the test data. The "raices" are in kg and distort the results.
And, if there are several units, like in this example too, then it only says measured in various units, but not which these are.

By all means, we can also store it in different units and then convert it in the chart editor. But they should be displayed in one single unit, no?


Yes I agree that it should be displayed in one unit. I would say let's simply add a UNIT box to the data viz editor where you select the unit of choice. When data are being processed the system can look at what the most common unit is in the dataset and make that the default unit. But data are not modified when they are being uploaded. And having a unit check at the time of uploading does sound good, just for checking as you say. Note that scans of the ENTIRE spreadsheets are sometimes too time consuming to do live, because there can be hundreds of thousands of rows, so that is a thing to look into.


I understand. This sounds all good to me.


hi Paul, what url/view should I start adding this feature? or is this going to be a new upload file view?


Hey Joe, the page where you should add the setting in terms of which unit should be used is staf/dataset-editor/chart.html. The relevant view is in staf/views.py: chart_editor. Once the unit if choice has been recorded, head over to library/views.py and look for the data_json view. This is the view where you need to change the query so that it converts it into a single unit.

Carolin has the final say in how she wants this implemented, but from what I understood from the discussion this is NOT going to be done at the point of uploading data but only at the point of visualizing data. So the original instructions are not what needs to be done; be sure to read the comments in the thread. Check with Carolin if in doubt.


hi @paul, I successfully setup the project on my machine, however I got the issue with:
1. project not found on lhttp://localhost:8000 => I created one and it works
2. http://localhost:8000/staf/datasets/1/editor/chart/: screenshot attached.
3. other urls have issues with other project ids too.

Please export your database, so I don't need to populate data manually as this is really time-consuming and not needed. Lets get in a call if you have time mate, my phone number is +84 967458354 and my skype is zozovp.


I got it now.


I am able to open the link http://localhost:8000/staf/datasets/36799/editor/chart/ now.


Hi Joe, as I understand it you got the database loaded successfully? All good?
Please do let me know what the problem was so we can look into fixing that.


  1. add unit dropdown
  2. show the unit name in the chart in the library
  3. able to generate preview.
  4. able to save settings

github branch

I will create a pull request later on.

Last edited: 2022-09-13 12:32:22.887283+00:00