Stocks and Flows Database Schema (STAFdbs)

Description

Background

For many years, the open source, urban metabolism web platform developed by Metabolism of Cities (available at https://www.metabolismofcities.org) has been storing and sharing urban metabolism data in order to better understand the metabolism of urban systems. Over the years, the way data has been uploaded and used has changed with a number of iterations to cater different purposes and users. In this section, the rationale behind these iteration steps are further detailed.

The first step, initiated in 2014, was an online tool to administer a material flow analysis (MFA), called OMAT (Online Material Flow Analysis Tool), which allowed users to record and manage material flow data for their own project(s). The system generated (and still is) tables, indicators, and charts based on the entered data. OMAT can be used for an economy-wide MFA or it can be used to perform an MFA on a specific sector. It has been used to allow students to jointly contribute data into the same MFA project (Villalba and Hoekman 2018), and it was one of the first web-based, open source tools to manage MFA datasets. Amongst similar tools that existed at that period was the offline software STAN which was an inspiration for OMAT.

In 2017, the Global Urban Metabolism Database (GUMDB) was set up as an initial experiment to centralise data points and indicators obtained from/by academic work (Hoekman et al. 2019). Both GUMDB and OMAT have export functions that enable users to download data (either the entire project or a specific part of it) in CSV format.

After running these two projects for a number of years, Metabolism of Cities started working on a new system to capture material stocks and flow data from a greater variety of sources and with a larger degree of heterogeneity. This project, dubbed MultipliCity, was set up to allow for a much more fine-grained level of data capturing and data visualisation. MultipliCity makes it fairly easy for users to upload data, and it is built around the idea of crowdsourcing the collection and curation of urban stocks and flow data. Data could be recorded on a city-wide scale, but it could also be recorded on a suburb or neighbourhood level. Data could even be linked to individual infrastructure (e.g. a train station or wastewater treatment plant). Uploaded datasets are stored in a single database and data can be aggregated or disaggregated according to user needs.

Both OMAT and GUMDB used two different MySQL database schemas, both of which are specifically made for their associated application. However, MultipliCity was set up with a more widespread use in mind. This system was built on the Unified Materials Information System (UMIS). UMIS was developed at Yale University (Myers et al. 2019), and was put to use in a database subsequently created to store material flows data obtained from decades of material systems research at Yale. This database, called the Yale Stocks and Flows Database (YSTAFDB), was one of the first functional databases where theoretical frameworks (like UMIS) are applied to a real-life scenario. This also meant that a database schema had to be developed alongside the theoretical framework. Both, the YSTAFDB database schema and the data points are published as open source works (Myers, Reck, and Graedel 2019).

Other material stocks and flows research groups have also developed databases or worked on consolidating the often incompatible formats. The industrial ecology data commons project (Pauliuk et al. 2019) provides a prototype database structure that aims to integrate other databases developed within a variety of disciplines. Other interesting work includes a database containing data on material intensity for buildings (Heeren and Fishman 2019), and a general system structure for socioeconomic metabolism information (Pauliuk, Majeau‐Bettez, and Müller 2015).

STAFdbs

YSTAFDB provided the most suitable starting point for the MultipliCity system. This system was one of the most applied database structures (rather than being a more theoretical framework), and the goals were well-aligned with Metabolism of Cities’ data storage goals. However, from a technical perspective this database structure lacked some features. A principal shortcoming was the lack of database normalisation which may result in data redundancy and lack of data integrity (this means that when data is stored in multiple places, a change effected in one place may lead to a discrepancy if the same data point is not changed in another place, which becomes more likely if the database is not normalised). The initial implementation of an adjusted YSTAFDB in MultipliCity, called the Stocks and Flows Database Schema (STAFdbs), primarily consisted of applying database normalisation practices to the existing structure. It was in this form that it was implemented within the Metabolism of Cities website.

Currently (August 2020), the STAFdbs has gone through various iterations and has been further refined, and the latest version has been adopted in the new Metabolism of Cities Data Hub. Work is currently underway to write up the database considerations and specifications in an academic publication.

The main structure, which has been adjusted within the scope of the CityLoops project, has been reported on in Deliverable 4.2 for this project (Development of an Urban Material Flow and Stock Database Structure).

Team members

The following people from Metabolism of Cities are involved in this project.

Paul Hoekman

Rupert J. Myers