Clearinghouse for the environment – the scaffolding (I)

In our work with development cooperation GIS we have come to a point where we find it necessary to establish an overall publication system for environmental information. We will do this together with some of our partners. The system, a clearinghouse, should enable our partners to present project related information to the general public.

A draft system was set up and documented in a former posting on this website. In the article, Environmental Spatial Data Infrastructure – technology, I described the system and some of the challenges. In this article I am taking it a bit further, hoping to stimulate to discussions about how such a system could be implemented.

This posting is about designing a clearinghouse predominantly intended for environmental data. It describes a work in progress. We are working on a requirements document and this posting is ment to inform interested parties about the work. Inputs to our work is both asked for and necessary.

What is a clearinghouse?

Clearinghouses come in many shapes and colors. Some are gigantic structures slowly crumbling under their own weight – being too complex and expensive to sustain use over time. Other systems are feather weight systems failing to do the job assigned to them. Other again are some are perfectly sized constructions which integrate well with other systems, flexible and varsatile. We are of course aiming at building the latter of these. There are no standards – and we’re starting from scratch on this one.

The objective of the clearinghouse is to provide in-depth information about the state and development of the environment. It shall present environmental topics from projects in a simple and easy-to-follow way, providing access to more detailed scientific presentations where such have been supplied by the contact organizations.

The following concept figure places a clearinghouse within in a context. As a concept figure it is not drawn with any particular country in mind.

Concept figure of a clearinghouse in an institutional and societal context.

Structure overview

Lets start with the main structure of our clearinghouse. Our partner has over the last three years completed two versions of a sensitivity atlas for a relatively big area. The next project is to establish a monitoring plan for the same area.

The structure for the monitoring program looks more or less like this:

Monitoring project structure

We have noted that the structure is general enough to be used with other projects. It is our ambition that the clearinghouse should be able to shoulder more than one project. If it does, it will probably also be used more, and one would avoid having a rather confused family of specialized sub-systems on the hosting organizations webpages. It would be easier for developers, partners, administrators and users.

Clearinghouse with the capacity to support many projects.

Now what if we also could be able to use this system with other partners? And what if others could use it as well? It depends on how well we design our system. Will it be general enough without rendering it useless for our purpose?

Our project context

We have taken great care to design the system as general as possible. Let’s have a look at the structure with more and different projects. See how the general structure (dark gray boxes) can be reflected in different projects in the figure below.

Clearinghouse with several projects structured around projects, themes and data groups.

The above figure at gives us an indication of how our data model fits with several of the prospective projects it shall serve.

Functionality

Within the overall structure we also have to consider which functionality the system should serve. The following represent a roundup of core functionalities of the system:

  • Structured/hierarchical fact pages according to the presented projects. The pages will have text, tables, graphics and maps.
  • The system facilitates input of data from stakeholders. The system administrator imports the data to the system.
  • The system consists of both spatial and non-spatial data. In addition comes descriptive data (meta-data) providing the users with information necessary to know more about the provided data.
  • The spatial data is provided to the user through maps. The maps consist of base layers (imagery to provide context) and the actual data. The actual data are “clickable” and will point to non-spatial data.
  • A standard library of geographical objects is used. This means that the user will only have to upload references to the geographical objects and associated values.
  • Non spatial data are files and online texts. The online texts provide context for files stored in the system. It gives the user an option to move through the system in a systematic way through hierarchies.
  • Interactive request and comments box-which helps to provide feedback from stakeholders.
  • Map layers are available through a content management system, as well as through OGC standards like Web Map Services (WMS), WFS and other.

It’s a tall order, but it can be done.

A challenge with the functionalities is that it is evident that the system will have two parallel data-supporting structures. One for the spatial data and one for the non-spatial data.

We’re thinking of keeping the spatial data in a separate database. The data is formated and made available as embeddable maps or tables giving an overview of the spatial data.

The non-spatial data provides a framework around the data. This means that we are using a content management system to establish a parallell structure to our custom made database. From within the content management system we then call for relevant tabular presentations and maps. It is not as neat as one could wish for, but drawing on the resources of a professional content management system is far better than building your own.

Our main work will therefore be in building components which will provide the content management with “consumables”.

Spatial data

Including maps to a clearinghouse adds to the complexity. There are two ways of dealing with the spatially related user data:

  • The user uploads the geographical objects with attributes
  • The user interact with predefined geographical objects

Uploading geographical objects would mean uploading shapefiles. Unfortunately shapefile uploading without a good quality assurance process could lead to several problems. These are some of them:

  • Messed up or missing projections
  • Corrupted files
  • Inconsistent naming of files, objects and column names
  • Duplicate objects
  • Objects covering the same area but with different origins and quality
  • Questionable legal status on the geographical objects
  • It might be necessary to establish a user-role model within the admin module

We think the best thing in this case is to let one administrator handle the geographical objects. He or she should discuss with the users/partners what geographical objects are necessary and a proper process should then lead to the correct objects being imported.

The geographical objects in the system could be points, lines, polygons and multi-polygons. Basically whatever you can throw into a PostGIS table and attach a value to. An example could be waterbodies, districts, rivers, measuring points or  even Quarter Degree Grid Cells. The constraint is that the table of geographical objects should have a unique reference known to all potential providers of tabular data to be connected with the geographical object.

How will this pan out? We start with keeping quality assured geographical objects in the database. Next thing is to let users provide files with values and references to the geographical objects. Excel files would probably be easiest. Using SQL views within geoserver will make it easy for us to pull out the correct maps. Using SQL parameters could make it possible for us to limit the number of layer definitions in Geoserver.

In the end this will give us a map pretty much like this:

Districts drawn according to a value assigned to them. The data are random numbers from 0 to 6.

The users need to know which geographical objects are available, and which unique reference they answer to. The system should have a report engine able to provide them with a default list.

The picture is of course somewhat more complex. We have to restrict the value sets the available for the users as well. This is because we most probably will have to keep a limited number of styles available for the users. Many styles would add to the complexity.

By restricting the data contributors we are hopefully helping the users of the system getting more readable maps.

Styling the maps

The map styling is handled by keeping a default styling in the database. We are considering three styling standards. We will have to consider more depending on how many layers we will have in one map.

 

Styling sketches. How should we handle the needs and wants of styling?

We might also be able to implement the new styling transformation functions as documented in Geoserver 2.2 and this might ease our workload somewhat.

Data model

A data model has been sketched using MS Access. The clearinghouse will however be running on top of a PostGIS database. The database will not integrate directly with the one supporting the content management system.

As one can see from the model the hierarchy is supported by the following tables:

  • tbl_h_project
  • tbl_h_theme
  • tbl_h_datagroup
The geographical objects will be in tbl_geography categorized by a reference to tbl_cat_geography.

Draft database model designed using MS Access. The final database will be safely set up in PostGIS.

We will leave it to the interested reader to look in more detail on the data model.

Pulling it all together

As explained earlier the system will rely on many modules, some of which are custom made and some of which are standard systems. Luckily we have now come to where we have many good standard systems many of which are made by OpenGeo.

The following are the standard modules and tools which will be used:

Standard modules to be used in the clearinghouse.

The main effort for the clearinghouse project will lie in the custom made modules:

Custom made modules for the clearinghouse system.

The figure below indicates the relation between the different modules. The custom made modules are in light gray and will have to be specified and developed.

Shows relation between data input/output (white boxes), custom modules (light gray) and standard modules (dark gray).

The custom modules will leave a lot of work on the structure to be done within WordPress. This means the administratir will have to maintain a mirror structure of the one in the clearinghouse database for non spatial data. A properly designed lists module will be helpfull.

Challenges and questions

Challenges to a system like this are likely to be many. Here are some of them:

  • How do you handle the many styles necessary to keep several layers in one map?
    • The styling standard used in geoserver (SLD) is flexible. We will probably set up some standard styles and also keep a couple of dynamic styles using the interpolate functionality in SLD.
  • Why did we choose OpenGeoSuite over GeoNode?
    • GeoNode is an excellent tool for management and presentation of spatial data. But currently the project stops there. In a clearinghouse it is necessary to integrate many pieces of information – both spatial and non-spatial. The data model and other tools to integrate the information is bluntly speaking too complex for Geonode. That is not to say that Geonode cannot be a tool used to play around with the data we present. It can be – and it probably also will be. But at this stage not by us.
  • How do we deal with ownership?
    • Using open source modules it makes sense not to break it. In an initial phase, up until the first version is ready we will work our way forward together with our developers. The custom made modules will then be released into the wild on a suitable  collaborative platform for open source coding.
  • What about meta-data?
    • Meta data is an important part of web based services. We were at some point contemplating including GeoNetwork into the system. At this point we will have to focus on preparing a primary level functional clearinghouse. Geonetwork could be added later.

Concluding remarks

This article describes my initial thoughts on such a system. We are working on a specification of the system and we do have funding for some development. We expect to have a draft specifications document ready for distribution sometime in September. We expect our developer has a beta ready by the end of the year.

At this point we are looking for feedback on the above text. Are there issues we should take care in covering better? Does similar systems exist? Where should we host the publicly available code?

 

2 thoughts on “Clearinghouse for the environment – the scaffolding (I)

  1. ragnvald Post author

    Chris Nicholas gave some comments to this article on the SDI-Africa mailing list. This is my response to his comments:

    Mapping plugins
    Although Mapping Kit to the looks of it integrates a lot of the functionality it is not unique. Similar solutions exist within WordPress. The plugin does look a bit restricting, and as you can see in the data model more is needed than what I suspect Mapping Kit will be able to deliver. It is also based on MapServer.

    Map server
    MapServer is a very good map server platform. But to my experience it lacks a good user interface. AsplanViak in Norway has made a web portal (web user interface) called Adaptive on top of MapServer. It is unfortunately closed source software. The dutch company GeoCat is working on an ArcGIS plugin to support direct styling of MapServer (the .map-files).
    GeoServer has an excellent user interface. Its integration with PostGIS through GeoNode or OpenGeoSuite shows that it is driven by a dedicated company with a sound and dedicated user base. For integration in a system like this it is of course crucial that its licens allows for redistribution and integration with other tools. I have earlier written a small text about Geoserver.
    ArcGIS Server is a versatile map server with strong commercial support. The business model relies on rather expensive licenses. Licenses which comes on top of the costs of integrating it with other solutions. Redistribution of server integrating solution does not lend itself to try-before-buying. As I have also written elsewhere the money trail problem is one of the challenges in development cooperation GIS.

    Hosting institution
    A national level entity, like in this case, would prefer to host a solution like this themselves if possible. The solution could be established on a physical machine, and later kept runing on in a virtual environment. Pretty much how I described it in the article Environmental Spatial Dat Infrastructure – technology.

    Languages
    Wordpress, indicated as the serving content management system, gives access to several plugins which would facilitate multiple languages. The data structure as indicated in the article does not. So information in maps and tables would be in one language. Adding such support is possible – but would complicate the data structure. With good context information in the wordpress pages this is bearable – albeit subject to discussion.

    User dialogue
    Handling user dialogue is not easy. Not in Norway, and not elsewhere. It requires an information strategy and also time to handle it both with regards to technical issues and content. WordPress, and other content management systems facilitate for this in many ways. Twitter, forums, mailing lists, Facebook integration and so on. National/institutional policy for the users should be determinant on this issue.

    Mobile integrations
    It’s a very good idea! There should probably be a mini browser for the data content suitable for use from handheld devices. WordPress has plugins which s makes it’s content readily available for handhelds, but the maps and tables would need special care. Some of it might be handled through RSS and GeoRSS. But letting the user provide positional information as a parameter for relevant geographical information is something to bring into the equation.

    Data lifecycle
    I have been around long enough to see my share of abandoned websites and clearinghouses. A good process will leave funding to update and refine both content and structures around it. The best way to achieve this is to have long term projects with dedicated partners. In other words this is not a technical level issue. Good projects must take into account changes in technology, methods for collecting and distributing data, politics, partner relations and a lot more. Technology is sometimes just a suitable answer at a fixed point in time. Good solutions are dynamic. And sometimes the best solution can even be to pull the plug of a system.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *