Why ESRI (as is) can’t be part of the open government movement

open access — kpw @ February 3, 2012

This post has been floating around in email conversations with colleagues for the past month, however, last week a semi-related Twitter discussion prompted me to finally document my thoughts in a public form. For those interested in that conversation you should also read Andrew Hoppin’s excellent post on the CivicCommons blog. I’m a big fan of what CivicCommons is doing but think the situation with ESRI demonstrates some of the challenges in evaluating civic software options and opportunities for further refinement of the CC Marketplace.

 

I recently made a frustrating–and failed–attempt to access a government data set encoded in ESRI’s proprietary and closed File Geodatabase format.

This led to an email exchange with Jack Dangermond at ESRI asking about the difference between ESRI’s stated desire to help open up government data and my experience where ESRI’s commitment to closed formats prevented me from accessing data–at least without first buying a copy of ArcGIS.

Jack wrote a lengthy and thoughtful response that can be summarized as follows:

 

1) ESRI’s closed formats shouldn’t prevent me from accessing the data–there are many options for extracting what I need without buying their software. While true in concept, given the particularities of the data I needed this turned out not to be the case. There was absolutely no way for me to read the data I needed without buying ESRI software.

Sure, the government could intervene and reformat their data to better serve me, however, they were unable or unwilling. And from their perspective had already met the spirit of the Open Government Directive by releasing a bulk download of what I needed. Unfortunately for me that download was in ArcGIS’ default format, a format that can only be opened with ESRI tools.

 

2) ESRI isn’t interested in repeating the experience with Shapefile where any developer could read or write to the files. In Jack’s words, this created “lots of work for users ‘fixing’ these corrupt files because some software developers did not implement the specification correctly.”

That is pretty different from my experience where the open Shapefile spec ushered in an era of collaboration and led to a proliferation of new tools. Never once in my fifteen years in the industry have I encountered a corrupted file caused by a bad tool.

Jack claims this as a technical consideration, however, anyone that has familiarity with geospatial software should also notice a strong business case for this perspective: ESRI doesn’t benefit from others participating in the software marketplace or building these “bad tools.”

Closed formats result in closed data. Despite the availability of interchange formats no one is using them and it’s not clear if they are a complete solution. In fact, I couldn’t find any open source tools that read or write ESRI’s XML-based interchange format, nor could I find anything on the web that had been released in that format. ESRI may think they’ve checked the data portability box but no one else seems to agree. Same is true for proprietary drivers for file access: ESRI showed up several years late with a binary driver solution that is still incomplete (it couldn’t open my data).

There’s simply no technical excuse for closed formats. If ESRI doesn’t want to maintain its own format spec it should use someone else’s: SQLite or many other formats like it could offer a great foundation for an open file-based spatial data store.

 

3) ESRI is a private company and has every right to create proprietary formats.

True.

But I have a problem when ESRI also claims to be an enabler of open government, facilitating data sharing and collaboration. I have an even bigger problem when it takes taxpayer money to manage public data, and then requires taxpayers to buy ESRI software in order to access that data.

One of the fundamental tenants of the OGD is providing data in bulk, platform-independent formats. Instead of supporting this outcome, Jack explained his preferred approach: APIs exposed via ESRI software that allow users to access what they need from data stored within closed ESRI file formats.

This is a great strategy if you’re in the business of selling GIS software, however, this is a bad idea if you’re interested in sharing data. APIs aren’t the same as bulk access (bulk access is critical for many GIS applications). And APIs with EULAs != open data.

 

ESRI has chosen to vertically integrate their GIS platform and has no interest in allowing others in. This is a fundamental incompatibility with the current conception of open government.  They have every right to follow this course but there’s no reason why those of us that have invested years specifying the terms for dissemination of public data should celebrate their participation until they follow the guidelines we’ve set.

It doesn’t have to be this way. Their software is incredibly useful and in many cases has no counterpart, open or closed. ESRI could become a valuable partner in opening government data–if they embrace open formats and let others in.

 

For those actually interested in the gory details, here’s the TL;DR’er version of my recent encounter:

Several weeks back I needed to make a map for a big chunk the Pacific Northwest. I leveraged all kinds of useful open data (OSM for streets, Lidar from local governments, etc.) but above all else I needed really good stream and river data. Lucky for me the USGS maintains a detailed data set that maps every stream and pond in the entire U.S., even the tiny intermittent ones!

I surfed over to the “Data” page, and a few minutes and several hundred megabytes later had the data I needed on my hard drive. Then I unzipped the files and encountered something unexpected: a constellation of files with new and strange file extensions.

I’ve been working with GIS tools and data in a professional capacity for going on fifteen years and I consider myself pretty savvy. However, over the last decade all of my work has come to depend on open source GIS tools—my ArcGIS license and the parallel port dongle it required stayed behind when I left university. So while I can tell you all about spatial indexes and encoding formats for transmitting geometric primitives, I missed the memo on ESRI’s new File Geodatabase format; the format now being used to manage and disseminate data at the USGS.

Never fear, there are all kinds of cool open source tools designed just for these situations. I opened up my terminal and started trying out various translation utilities, thinking I’d just convert this data into some more familiar and readable (by me) format.

A several hours later I was starting to get frustrated.

All my normal go-to tools were letting me down and my Google searches were coming back empty. Clearly this new Geodatabase format mattered: it is ESRI’s official replacement for the now deprecated Shapefile format. I learned about the new features it offered (useful things I take for granted now that I primarily use spatial databases like PostGIS). I was also getting a sneaking suspicion I wouldn’t be able to open the file.

ESRI’s new file format, while exceedingly useful is also intentionally closed. And not just in the “we don’t support others opening the file” sense; this format is closed in that no one else knows how to open it, period.

Given the complexity and lack of a roadmap from ESRI, I learned that reverse engineering was a futile undertaking. But after calling in a lifeline to a colleague I discovered that ESRI had recently developed a set of binary drivers that offer API level access. These drivers were several years late to the game and were still incomplete, but at least they might let me get access to the data I needed.

Again, no dice.

After creating an ESRI.com account and signing an EULA, installing the drivers and recompiling GDAL with the ESRI FileDB plugin turned on I get back a cryptic error message saying that my particular files were in an unsupported format. I then discover that the USGS maintains its data in version 9.2 of the file format. ESRI’s drivers only support version 10. Further investigation reveals I can forward port the data. But only if I own a copy of ArcGIS. Mine for a mere $1,500.

Having invested a full day already, I regroup and attempt to use the USGS Data Portal (an ESRI product, of course). It turns out that the portal offers a Shapefile export for the layers I need–Shapefiles offered a lower-fidelity version of the data, fortunately still useful for my application. After a few failed attempts navigating its clunky interface I manage to select most of Oregon and Washington, the area I need. Submit. Nothing.

The next day I receive a polite email from a USGS staffer letting me know my request for data was too large and could not be processed by ESRI’s data portal. If I wanted this data I’d need to submit dozens of smaller requests and recombine the data once downloaded. After a bit of testing I discovered that this process would take several days to complete given the sluggishness of the portal’s processing pipeline.

I emailed back to the USGS staffer asking if I could download the the data I needed in a bulk (but readable by me) format. I could not. I asked if others had encountered similar problems. They had. I asked if there was a solution in the works. There was not.

However, they offered kindly to mail me a series of DVDs containing an older Shapefile export covering the entire country. This was not ideal but under the circumstances it was as good as it would get.

Two weeks and a hundred gigabytes of data later I’d managed to filter down an out-dated and incomplete copy of the data I downloaded in those first naive moments of this journey.

At least I had my map.

 

 

Full disclosure: I work for an organization that, among other things, builds open source GIS software. The views expressed here are my own.

19 Comments »

  1. Kevin – Thanks for sharing this most recent proprietary horror story and, more importantly, fighting the good fight!

    Comment by Lev — February 7, 2012 @ 11:45 am
  2. [...] sometimes their own agencies) a unique competitive advantage? Also see e.g. these pieces on ESRI or Ordnance [...]

  3. Thank you for sharing your experience.
    In general I agree, although I also tend to distribute data in ESRI formats.
    The problems starts earlier. Many GIS ‘professionals’ are not capable or willing to use anything else than they are used to.
    Since ESRI is very clever in distributing its software at universities we end up with geographers and the like having low affinity with software and if they have to than with the product they know.
    This marketing strategy works on both ends. For a government organisation it is difficult to find personnel knowing other tools but ESRI and its main ‘users’ want ESRI.
    If we want to start to change this, we should start in the geography course in the University in which ‘programming’ as such should be a severe demand and tested rigorously.

    With the commercialization of Universities courses it will be quite a challenge (..) to asses the quality of diplomas and to get enough bright young men and women willing to understand the computer in stead of just using it.

    Comment by Alfred de Jager — February 8, 2012 @ 10:14 am
  4. I’m not taking sides in the open/closed software argument.
    I seems to me that your gripe should be with USGS for not making the data ‘available.’
    If you want what’s in the safety deposit box, see the banker, not the guy who made the lock.

    Comment by Bill Hardiman — February 11, 2012 @ 10:30 am
  5. Dear Kevin,

    I appreciate that you share your bad experience with us. However, I am quite agree to Bill Hardiman’s comment.

    I am not sure what you excitely mean “Open Government Movement”. Data sharing is only one aspects of Data Management of “Whole of Government” approach. ESRI products are official designated softwares by US Army (I was told this, but did not have the time to verify it). So you will see the picture. Also, ESRI is still making different data formats available, other than shapefile and file geodatabase, there are KML/GML, personal Geodatabase, and SQL native spatial data tables. As the data you requested are big, what alternative would you suggest to be a free-format that can handle the data consistence and integraty (such as domain, subtype, topology and enbeded business rules)?

    Open source means free to me, however there is not ‘free meal’ in the agressive captialised world. Free ware might not have the technical supporting services that back your business operation when problem occurs. Then you might need a in-house technician, which might cost the arms and legs.
    In my personal view, there is always a trade-off between cost and performance. If you try to save $1,500 to do the consulting job, you two weeks time probably cost more than that. This is the reality we all have to face everyday.

    As the Bible pointed out that none of the governments on the earth can solve human problems. Believe or not, this is the reality we all have to face everyday……

    Please let me know, if I said something wrong.
    Thank you for your time!

    Kind regards!

    Tom Zeng

    Comment by Thomas Zeng — February 13, 2012 @ 12:31 am
  6. Tom, Thanks, just to be clear, when I say “Open Government Movement”, I’m referring primarily to what’s codified here:

    http://www.whitehouse.gov/open/documents/open-government-directive

    The open government directive is the result of several years of activism and engagement across all levels of government.

    Further, what I’ve written about has nothing to do with software being “free.” No software, open or closed, is free (as in “meal”). What I’m concerned about is that when a public entity pays for software, as they do by the billions of dollars with ESRI, there should be an expectation that the public retains the rights to its data and retains recourse in that relationship.

    Unfortunately, that appears to be decreasingly the case with ESRI. ESRI’s adoption of a “default closed” format is not accidental. There’s no technical argument for keeping the format closed–it’s a business decision and one that runs counter to the interests of its customers, locking data into ESRI’s GIS stack.

    The in the case of geospatial software, ESRI’s monopoly in the market don’t allow buyers many alternatives (in fact, the closed format strategy further reinforces that monopoly). This makes it hard for folks to vote with their feet and choose a better solution. But as ESRI’s largest customer the US government has the ability–and the right–to articulate the requirement for open formats as part of the procurement process.

    Keep in mind, the free market cuts both ways: while ESRI has every right to create a proprietary format, they have no right to taxpayers’ money unless they provide the features required by the government. And the OGD (linked above) makes the expectation for open formats quite clear.

    Comment by kpw — February 13, 2012 @ 2:07 pm
  7. The problem mentioned in the blog is not actually about ESRI, but USGS. USGS have a wide range of data formats they could release their data in. They chose a closed, proprietary one. ESRI did not make that choice, USGS did.

    That to me is the main issue. Berating the Microsoft of the GIS world for acting like Microsoft is fair enough :-) , but I would say that in this case, the Govt agency should be the real target.

    Comment by Brent Wood — February 13, 2012 @ 5:27 pm
  8. Note that by making the binary geodatabase API free, even though closed, ESRI can solve most of the concerns Jack expressed. They control the code & format. No problem. ESRI chooses not to do this, so control is not the issue, the only difference is profit. ESRI don’t want to make data in “their” format accessible to non-ESRI users. Full stop.

    Stop bleating about supporting Open Source, Open Standards & Open Data for the benefit of the community, and admit you’re only in it for the $$.

    Comment by Brent Wood — February 13, 2012 @ 5:43 pm
  9. Brent, Sorry, I don’t agree. It’s not clear that USGS actually has any real alternatives here.

    The options I know of are:

    1) Release a lower-fidelity copy using Shapefile (which they did via DVD) but that isn’t even remotely the same as the data stored in the File Geodatabase.

    2) As Jack pointed out they could have used ESRI’s XML file interchange format. However, it’s not clear that would have actually worked (or at least done me any good). I was unable to find any tools or public data sets that make use of that format. As far as I can tell, it’s a dead end but it lets ESRI check the data portability box. (I’m also quite interested to know how big the 100+ GB data set would have been in XML form.)

    3) Forward port the data to ArcGIS v10 (i.e. buy more ESRI software). This would have helped me out of a jam but it wouldn’t have made the data open. I still would have needed the binary driver which is protected by a EULA, and thus does not meet the requirement for platform independence.

    4) Use a Personal File Geodatabase which trades ESRI’s proprietary format for Microsoft’s proprietary Access Database format. Awesome, now I need to buy a Windows computer and a copy of MS Office!

    All of these are non-starters. If you use ArcGIS you now use a closed format. Yes you can get your data out (if you own the right version of ArcGIS) but you don’t really have any clear options for open file formats. That’s not accidental.

    Comment by kpw — February 13, 2012 @ 5:43 pm
  10. Ultimately the onus is on government agencies to provide a decisive signal to software vendors that they require data formats which ensure that there is a minimal barrier to accessing data that is a public good and is funded by the taxpayer. The fact that government agencies do not make that an absolute requirement is the problem that needs fixing. It’s simply unacceptable, and our government agencies must be more discerning, and willing to use their massive commercial power to set these vendor expectations accordingly.

    Of course, until governments insist that Microsoft (and the various “Microsofts of other software domains”) provide open file formats (just to pre-empt the rebuttal, no OOXML does not qualify) then they’re just digging the hole deeper. Sadly, it’s the tail wagging the dog, and the vendors are calling the shots. Time for that to change.

    Comment by Dave Lane — February 13, 2012 @ 5:49 pm
  11. Also, FWIW, I’m definitely not in it for the profit. Others in my org (a non-profit!) work on open source GIS–I don’t and I really don’t care about that aspect of this issue, I just use these tools for accomplishing my work.

    I care about this issue because I’ve spent several years trying to get governments at all levels to use open formats. Before joining OpenPlans I worked at the Sunlight Foundation, which helped define much of what’s currently embodied in the Open Government Directive. I ended up there because I worked in a startup where closed government data created a huge impediment to building useful (and, yes, profitable) public services.

    I, and the others I work with, could care less about ESRI’s binary format. It’s actually not that good from what I understand. I’d take PostGIS over FileDb any day. The problem is that no one can read from the files, so it shuts non-ESRI tools out of the market. That’s the problem.

    Comment by kpw — February 13, 2012 @ 5:51 pm
  12. [...] – a cycle with negative equity.  This was recently covered in a well-written piece found here, but the most salient part for me in this blog post is as [...]

  13. So, what format -should- USGS have provided the data in?
    Shapefile would be too low of fidelity. Personal geodatabase is proprietary. V9 file geodatabase is proprietary. V10 file geodatabase is proprietary.
    GML? GeoJSON? KML?
    If the USGS has no clear solid options, then perhaps the blame should fall more on Esri.

    Comment by Brett — February 21, 2012 @ 4:59 pm
  14. At a technical level, the best available answer is SpatiaLite (a derivative of SQLite). I wish that format would gain some traction in the community, but I have a feeling that dark forces are at work keeping it at bay.

    Comment by Adam — February 22, 2012 @ 2:31 pm
  15. ESRI does not have a “monopoly” on the market, they simply have a product which has become the standard. There are alternatives as listed above, and if you want something else or these are not working, then there is a market to develope something, why dont you go develope it. oh, but wait then I would have to pay to use that too right? Free has no value!!!

    kpw, if you think “non-profits” are not in buisness for a profit, you are drinking major cool aid.

    I like what Dave Lane and Brent Wood says, however, sadly our gov. has been moving more and more towards a centralized federal dictatorship for decades, and they don’t like competion. Too much pocket padding and one size fits all mentality in this country already. ESRI is great example of this manifested real.

    BTW, not all gov gis portals are clunky and work as poorly as the USGS. there are alternatives.

    Ah gee, remember paper and mylar?

    Comment by billy bob — February 22, 2012 @ 6:16 pm
  16. Billy Bob, interesting comment. A couple of thoughts:

    1) As has already been stated, this conversation is not about “free” software, it’s about open data. The market for other solutions (closed or open) ceases to exist when the data standard is owned and controlled by a single vendor.

    On this topic I highly recommend reading this post about contemporary software monopolies:

    http://georeferenced.wordpress.com/2012/02/07/google-the-real-questions-we-should-be-asking/

    2) The organizations I’ve worked with aren’t interested in profiting from open data–they’re interested in making sure that as a concept it is understood and supported by government.

    It is my belief that this will result in freer markets for solutions and better outcomes for both the public and private sector. Free and open markets are generally desirable however they are imperative when public money is being invested.

    3) Your reference of the “centralized federal dictatorship” is interesting. I couldn’t quite make out your point, however, the sentiment I think you’re channeling is something that lurks at the heart of this conversation.

    I’ll be clear: I value the notion of the “public” and believe it’s the government’s role to facilitate and protect its existence. Vendors have a role in supporting government and in doing so should be required to act on behalf of the public.

    Unfortunately my sense is that, at least in the U.S., we’re hearing more calls for the reverse. Many no longer believe that government supports the public and would prefer that the private sector replace many of its functions. At least in part, the belief is that free market mechanisms should facilitate an ideal, and perhaps more democratic outcome.

    This misses an important point: free and open markets don’t build themselves. They’re built by the public via government.

    It’s even more complicated in this particular instance as the vast majority of money being spent in GIS market is from public funds. Government investment in GIS dwarfs any other industry. So as a taxpayer I’m quite motivated to ensure that the geospatial software landscape has room for me to participate on my own terms, given that I help support its existence.

    This last point probably deserves a series of posts to itself. Though, I’d love it if you could try and articulate your point a bit better–I’m responding back to you and Brent without a clear sense of what you meant to say.

    Finally, on this theme, I noticed that you apparently work for the federal government (the comment metadata indicates at the USDA). Fascinating! I can’t help but wonder, are you a federal employee or a contractor?

    Comment by kpw — February 23, 2012 @ 10:24 am
  17. ArcGIS Explorer is a free download and can read the ESRI File Geodatabase.

    Comment by l — March 1, 2012 @ 12:31 pm
  18. [...] Structural Knowledge: Why ESRI (as is) can’t be part of the open government movement [...]

    Pingback by HC SVNT DRACONES » Assortert lesestoff — March 8, 2012 @ 3:42 pm
  19. ESRI has done a good job at the job they do best.. i.e. Con Job.. They came disguised as an angel to save the governments, serve the public.. but underneath was a vampire determined to drink their blood. The Govt agencies have flipped for it.. got married.. and no has now live with it. Divorce is expensive and messy.

    ‘fool me once shame on you.. fool me twice shame on me’.

    Comment by Sham Mo — October 4, 2012 @ 3:03 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2014 Structural Knowledge | powered by WordPress with Barecity