Restructure Data Layer Tags

Created on Wednesday 18 November 2020, 09:25

Back to task list
  • ID
    513434
  • Project
    Metabolism of Cities Data Hub
  • Status
    In Progress
  • Priority
    Medium
  • Type
    Programming work
  • Tags
    Data Hub Priority Plan 2021 General data hub improvements MFA integration
  • Assigned to
    Paul Hoekman
  • Subscribers
    Aristide Athanassiadis
    Carolin Bellstedt
    Paul Hoekman

You are not logged in

Log In Register

Please join us and let's build things, together!

Description

Based on the experience of the first data courses, and the technical needs arising in the second set of courses, it makes sense to restructure the library tags. Key changes that I would like to apply:

  • No duplication (no AGRICULTURE under both flows and stocks)
  • Allowing people to turn tags on or off for their city, thus allowing the creation of a true completion percentage (well, at least better)
  • Creation of sub-tags for certain infrastructure so that we can programatically differentiate between AIRPORTS and CYCLE LANES, which would currently both fall under TRANSPORT INFRASTRUCTURE.

Discussion and updates


New task was created


Task was assigned to Paul Hoekman


Status change: Open → In Progress


Hi Aris and Carolin, I know that you have some specific needs with CityLoops. How about I first outline a new structure here, and you can then decide if that kind of structure may possibly work for you. If it does, great, and we'll roll it out site-wide. If not, then I'll set up a system in which the CL site can have its own tagging structure. How does that sound?


Sounds fantastic! Thanks for taking this on.
Not sure if you remember, but I had already done some work around tags in this task (https://metabolismofcities.org/hub/work/18990/), but perhaps that is already superseded?


There are also some "can be deleted" or something parts in some MFA method tags (in case that is part of this operation) - if you could leave those alone that would be good, because they haven't been retagged properly yet - a goal to be initiated by the future "library TF leader / captain".


Ahhhh wait, sorry for the confusion, my idea is to focus primarily on the "data layer tags" in this task. Of course the other tags should also be cleaned up (I did a bit of that recently btw), but that's indeed best left within the scope of that other task. Let me rephrase the title of this one to be clearer....


Ah, haha, I see. Will you make the draft in a doc first or change it right away on the site? I think that there were also some notes in our handbook doc and Aris who has been focusing on the CL layers might also have some general insights.


I'll make separate notes first, so that we can discuss it a bit, and then depending on your CL requirements we can see if this can be integrated as a site-wide solution or not. So I'll keep you posted - will check the notes in the previous docs as well when I start working on this.


Hi Aris,


Okay so I've had a decent think about all this. Trying to see about the merger of the CL layers with the future layers I envision for the system as a whole. Below are my thoughts on things:


Points to clarify the CL structure/approach



  1. What exactly is the difference between Layer 2 & Layer 3? They look very similar and in fact have the same header (Life cycle / Value chain stages). The only difference seems to be that Layer 3 is more detailed...?

  2. At some point you guys showed me some document/notes around the approach to these "sectors", and I recall seeing a list of individual materials in each sector. However, in this particular document the two individual sectors again the main header. Is this material-based approach still relevant?

  3. If it is (from some BO comments I get the idea that indeed this is the case), and I look at all the layers, am I to understand that the method basically consists of undertaking a number of city-level SFAs? After all it seems you want to track individual materials throughout their entire life cycle in the entire city.


Potential difficulties


In an ideal world, we would have CL use the same "master list" as the MOC data hub. This would make things fully compatible, it would be clearer to visitors, and it would technically speaking be cleaner. And you can take advantage of whatever I build for MOC. So I am trying to explore options to have one single structure used by both (all) subsites. Looking at your list, and comparing it to my attempts to structure a general master list, these are the challenges I see:



  1. I don't want to embed NUTS in the master list given that they are Euro-centric. Furthermore, I would like a CITY masterlist to only include data on that same city level (not on any higher level). There could somehow be LINKS to this higher level and we could make those connections more obvious and at a later stage pull in the "higher level" data, but they would not be part of the city-specific tagging catalog.

  2. My current thinking is that the new layer structure will be very much sector-based. These sectors will include primary/secondary/tertiary sectors, with a separate category for "foundational systems" which describe the transport system, waste system, energy system, etc. In each of these categories, flows&stock&infrastructure will be uploaded (this will no longer be divided over several different main layers). I am happy to ALSO have a materials-based layering system, which can be an alternate structure for those cities/projects/people that want to take a materials approach. As I mentioned before I see your approach more as a bunch of SFAs and this would thus make more sense. But the question is if you see this potentially work for your situation.

  3. I can already see that the CL structure deviates quite a bit from the general structure. I think you should prepare for getting some help in structuring this according to your needs. Two options to consider: a) focusing on the cosmetic changes so that it LOOKS the way you want it to, which is likely going to mostly be a set of front-end tweaks that look the way you want, but in the background it's going to be messy and non-compatible with the main MOC site (which may or may not be a problem for you), b) focusing on developing a system that is somehow structured in alignment with the main MOC structure, which means data uploaded will appear on both sites and there will be minimal duplication of work/data. However you need advanced backend support for this. If you want to pursue this please make sure you prepare for this asap. I am not able to commit to this myself but I may be able to help create generic instruction videos on database structure that may allow others to do this work. But it's going to take time to do this well. I'd need to know soon what your timelines look like to see if this is even feasible.


OK let me know what you think and we'll take it from there!


Hey Paul, just subscribed.


Let me try to respond to some bits.


Points to clarify



  1. Layer 2 are economic activities and Layer 3 are the associated flows/stocks. In a sense, Layer 2 are the nodes of a Sankey diagram whereas Layer 3 are the flows of the Sankey diagram. Layer 2 and 3 are per material and can be added up to provide sector totals. In Layer 2 you would normally find info on infrastructures (for that sector)

  2. See above (it is still relevant)

  3. Indeed although technically it is not substances but materials.


Potential difficulties




  1. Sure the name could be changed or adapted in the instruction. However, I think it is very helpful to have higher level boundaries and or data to downscale data and ideally make the link within each dashboard. This was already somehow in OMAT and I think it provides a great contextual overview. We can discuss this further of course.




  2. Which new layer structure? The one we propose within CL or the Master one for the Data Hub? For the foundational systems I would say foundational services a bit like Baccini and Brunner (to reside, to clean, to transport, etc.)? I think there is this famous link to be made between economic activity(or sector)/material/service. So I'm not sure what is the proposal here? Should we develop this link already? I guess we need to plan this further?




  3. Hmm, yes you are right. I think this is a larger question for us as well. I think that each classification of layer fits one purpose and there is no one size fits all except if we have to force it. In any case, we always have to choose between the comprehensive side (covers all/one size fits all) OR the accuracy side (or relevancy side). I think the best way forward is to have different layers lists and have correspondance tables. What do you think?




Great stuff Aris, thanks! Hereby my replies:


Points to clarify



  1. Hmmm okay. I'm not yet "getting" it completely. Layer 3 I can envision. But can you show me some sample documents that you would expect people to upload into layer 2?

  2. Okay but if the multiple materials are still relevant, do you plan to differentiate at all within your layering structure? In other words, if someone were to upload a single document concerning a single material into a certain layer, would the system mark it as DONE?

  3. OK noted.


Potential difficulties



  1. Yeah I do also see the benefits of having this info, it's mostly just a matter of how would this be uploaded and presented to the user. My preference: NUTS 3 data (for example) would be uploaded into the "NUTS 3" reference space, not into the city reference space. Then, the city is linked to the NUTS 3 and you can still use/link the data in future (analysis) steps, but we don't mix them when presenting a "progress" matrix as not to mix these different levels. If you really want you can do it, but the more mixing you want / the more custom "progress page" views you want, either the more intricate your back-end code is going to be, or the more messy your document repository is going to be.

  2. Ahh you likely haven't seen this. Ever since I started this task, I have been fiddling with a new overall structure. You find a work in progress document here. That is the structure I am referring to, which I envision would, once done, replace the current main MOC data hub layer system, which I have found lacking in some respects (further explain in that document itself). So if possible it would be great to have CL use this very same layering system. But as I said I fear too much deviation which will not make it work out in all likelihood.

  3. Yeah we can do different lists, it seems that it will be the way to go. Correspondence tables could work. However do note that you'll need to get some back-end programming capacity for that. It won't be a "fun project" that I'll dive into so do take that into account. It also means data uploaded so far for e.g. Porto and Sevilla won't be available until correspondence tables are developed and appropriate code is written.




Okay in order to get going with the implementation for you, given that we lean towards separate tables, can I ask you to please structure these tables in the tagging tree? You can manage this here. Please go to CITYLOOPS > DATA LAYERS and add all the data layers there. You can still change them later, but if at least the baseline is there it will help me build the framework around it. Please only add new layers and sublayers (so 2 levels deep, e.g. "Layer 1 Context", and "1.1 Context details abc"), if you need a "fake" third level just use the numbering in front but add them on that same second level (so "1.1.1 Context sub details", whose parent layer is "Layer 1 Context").


Feel free to try out with a few and run it past me for confirmation if in doubt. Also please note that when you add new layers (click the + sign next to the layer it should go under), the only field you have to fill out is the name field, the other fields you don't have to worry about.


Hey Aris,


I had to move ahead with this in order to get this done on time. I have now loaded your catalog into the system. Layer 1/2/part of 3 are loaded. For layer 3 it's up to row 21 in the spreadsheet (you should continue with 3.3). Note that your numbering seems to have some repetition but I don't know if that was intentional or not so I left it. You can change the exact phrasing and numbers yourself - this won't impact the system.


I am now moving ahead with this and will start to implement this catalog in the CL website so that you can have this ready by the team you need to show it to the partners. I guess it would still be useful for me to understand what Layer 2 is exactly, but at this point we're going the route of having multiple different uploading and data management systems so in the end you'll be able to manage this as you see fit and it won't impact the rest of the site so it's not that urgent anymore.


Heh, I was just about to reply to you.


I gave all this a thought and I have to admit that it is a bit unfortunate to go the route of separate layers systems. I didn't remember that you had worked on the new relayering scheme :)


Especially if we can't manage this, I see it hard to convince other people to stick to our classifications.
The thing is that we're a bit stuck in term of time and I would like to have Carolin on board for this. Ideally I would really want to converge and unify all our efforts (and eventually make some layers optional depending if we want to be circular or whatever else).
For layer 2, perhaps could you read from p.52 to p.55 of our 4.3 deliverable? Perhaps that would make things clearer?
Do you still want me to add the layers as instructed?


Hey Aris,


Yeah I'd also like to unify things but if it should be done by Monday then this won't be feasible. OK I'll ready those pages in the deliverable. But as I said for now I'm planning to have a separate layering system for CL as it is significantly different from where we're headed with the data hub.


I have already added the data layers, so the only ones pending are layer 3.3 onwards - please add those.


But please confirm whether or not I should continue with this separate system. It's either a separate system now, or potentially an integrated system by the end of Jan...


Hey Paul,
Should we aim to have a call tomorrow (morning) to think about that together before splitting ways?


I'm in between lots of things, bit difficult to pin down a time. Perhaps in the afternoon? 2.30 PM my time / 1.30 PM your time?? Not 100% guaranteed I can make it but I can try, let me know if that works.


Hey, that time will be difficult as I'll be out to enjoy the last hours of daylight. I can be available after 4.30PM my time if that suits you?


OK let's do 4.30 PM your time then, but I gotta see then about planning as there's really only one day left then... maybe I will have to start a bit now, let me see.


Ok cool see you in a bit on uber


Hey Paul, I finished adding the layers and sublayers. Let me know if that is ok.
Thanks!


Great Aris, thanks, that looks good!


Hi all,


The new data tag system was now implemented. Please see this video for more details.


Guus, to define the different types of documents shown for each category, see hub_harvesting_tag in staf/views.py (line 1300, thereabouts).


Very cool, thanks a bunch Paul.
I hope you enjoy your holidays and we'll catch up when you're back!
cheers


Just to pick this up again, I would like to finalize the general restructuring of the tags. We have done the CL restructuring, but there is still a need to do a restructuring for the main data hub. Now that the African Cities MOOC starts being recorded we should settle on this format. The Grand Relayering document has an overview of my proposal, but I am open to input from your side. Anyone any comments/suggestions/thoughts on this??