In a recent project at work I did an analysis on the spread of an alien species in Norway using ESRI ArcGIS 10.1 SP1. In this particular analysis we assumed that the species could swim a certain number of meters in open sea. How would it spread and to what extent would current protected areas be invaded by this overseas stranger to our environment? The density of islands Norwegian archipelago is massive, so the possibility for the alien species to spread is rather overwhelming.
As part of the analysis I ended up doing buffers around islands in the Norwegian archipelago. After which it would be necessary to merge and dissolve the objects. This turned out to be problematic. But for some of the shapefiles I was working with ArcGIS (arcpy and python) simply failed to complete the dissolve operation.
After contacting our local ESRI representative, Geodata AS in Norway, they concluded that this was related to the following error in ArcGIS 10.1: NIM079373: Running a large number of features through the Dissolve or Buffer with dissolve option, hangs during process. I have not found any publicly information with this reference.
One could say that 7283 polygons is a tall order. One could perhaps also say that working with polygons in a task like this rather than with raster is asking for problems. Given enough time I will look into it – later – in that quiet week when nothing else is going on at work, sometime.
This blog post is about but how I came to understand more about the limitations and possibilities with the ESRI arcpy Dissolve_management tool. It is also explains how I found a rather surprising way to make it faster.
Buffering around n thousand islands in different regions and then dissolving them to one object works fine most of the time. But for two of the regions the Dissolve_management just stopped processing. The same thing happened if I tried doing the same thing from ArcGIS desktop.
Since I really had to make this work, I tried different ways to fix it. Googling gave me some answers, but my dissolve still hung. So I divided the input file objects file in two halves after which I again tried doing the dissolve operation. Both dissolve operations succeeded. I then continued to merge the resulting files before doing a dissolve operation again. It worked. So somewhere in that fuzzy code within ArcGIS (arcpy and desktop) there is a tripwire stopping the operation.
To handle this in general I wrote a function which splits the input file into smaller files grouped by given number. the objects in these files are dissolved. The resulting files where then merged to one file and dissolved. And just to remind the reader – I was still using the arcpy.Dissolve_management function. The figure below explains the procedure:
Running the “new” dissolve function I noted that the time the whole process took varied. I initially expected the whole process to take longer than it would using the ordinary functionality.
It turned out varying the group size had a rather big impact on the time the process took – in a positive way. I got curious and added timers around all dissolve operations in the script. My expectations for a lower performance were not met. The new procedure was faster. I also prepared at batch script doing the operation on the same input file with a group size varying from 10 to 600.
The result based on nordland_buffer.shp was interesting and to make sure this was not only about that one file I did an additional analysis on a similar file (agder_buffer.shp). The below figure gives you the general idea of what happened:
So what’s in the bottom of the curve? It looked like there is a minimum time for the job when the objects are grouped at a certain size. I was right…
That’s it… the minimum time for these job is when the objects are in groups of around 80. The dissolve operations only takes from 25 to 28 seconds for the tested files. The required padding (merge and delete operations) around the essential dissolve processes adds marginally to the time used.
How does this compare to using the dissolve function straightforward without using above mentioned function? The Nordland dataset hangs, but fortunately the Agder dataset runs through. The total time used for dissolving the dataset is 149,5 seconds!
Could I be missing something here? Or is it the dissolve function from ESRI a rather sub-optimal one? And that it can be made more efficient and stable by simply grouping the input file in sizes of around 80 objects?
There are a host of reasons which could confuse the above picture. Here are some of them:
- The number of overlapping polygons in the input files can make a big impact. So the issue with a sub-optimal dissolve function might not be relevant for an input file with fewer overlapping polygons. The example is extreme with +50 overlapping polygons.
- The remainder of the total polygons divided by group size will vary. This might have implications on the total time to perform the dissolve operation.
- The files I have used might be very unorthodox.
- The computer was in use while the calculations were made. Other activity might have influenced on the time used.
Since working through this problem in December last year I have happily concluded my alien species project, and as such this issue is not of my concern anymore. If someone finds the above of interest I would be curious to have some feedback. I am always eager to have arcpy script go faster. I am of course also interested in other approaches using for example open source libraries.
If my assumptions hold water I suggest that the ESRI guys and girls sit down and remake their dissolve function. It is basically sound, but something is amiss. And when the original function hangs they should give the user some feedback about this. How difficult could it be to implement a failsafe?
To allow for further testing and experimenting I have included the files used in this little experiment:
|Agder buffer||7.5 MiB||91|
|Agder Group 100 Dissolve Result||116.9 KiB||9|
|Agder Original Dissolve Result||116.5 KiB||11|
|Dissolve Calculations (xlsx)||25.7 KiB||50|
|Nordland buffer||6.7 MiB||26|
|Testing Group Dissolve Pythonscript||941.0 B||23|
I would also like to point you to the following discussion on http://gis.stackexchange.com/:
At last I would like to thank my colleagues Johan Danielsen and Martin Bartnes for their contributions and help in the process of understanding the shortcomings of the dissolve functionality in ArcGIS.