"The only way to discover the limits of the possible is to go beyond them into the impossible."
Arthur C. Clarke
Soon after Codd wrote his paper on relational algebra in 1970, relational databases significantly changed the way people managed data. Today, relational databases are the workhorses of enterprise data storage. Similarly, imagine a world without email or the Internet. What will the next "killer app" or "killer service" look like? What kinds of attributes and features will it provide?
In this article, we provide a primer on geospatial technology. We then explain possible reasons for growth in the geospatial industry, examine Ingres' geospatial project, and relate the material to learnings about open source as a protocol for business.
The Storm is Coming
Technology change has made spatially aware applications and devices more affordable and accessible. This is based on smaller, faster, and more power efficient chips. Increased network bandwidth for both wired and wireless networking has improved the availability of spatial data. Traditionally dominated by a few large competitors, new standards and competition have started the geographic information system (GIS) industry's evolution towards becoming main stream. More importantly, these standards and technologies have hastened the inclusion of spatial awareness into applications from other industries. New opportunities are emerging to add maps and spatial awareness to enterprise information technology (IT) and to do so at a cost that the masses can afford. The stage is set to provide new insights from existing data. As countless more devices become spatially aware and interconnected, we are going to experience an epic storm of spatial data.
Geospatial Data
There are two types of geospatial data: raster data and vector data. Raster data are essentially pictures, although not always in the visible spectrum. Satellite or aerial images are examples of raster data. Vector data are a mathematical representation of real life. Vector constructs include points, lines, polygons, and other shapes which can be used to represent houses, roads, rivers, parks, lakes, and more.
The industry standards were published by the Open Geospatial Consortium (OGC) and describe how raster, vector, and combined map data can and should be represented. These standards include Web Coverage Service (WCS), Web Feature Service (WFS), and Web Map Service (WMS) which describe serving raster data, vector data, and maps respectively. OGC also defines how relational databases should store and provide interfaces to act upon spatial data. Adhering to these standards means systems can interoperate more easily. This enables using raster data from one source, vector data from another, and combining them into a map service that can be consumed by a large choice of software.
Coordinate Systems
Most of us are familiar with the concept of latitude and longitude with zero degrees longitude centered on Greenwich, England. There are other systems that have zero degrees centered on Moscow, Paris, and other major cities. Each of these systems is a coordinate system. Since geospatial data may be stored in any one of a number of coordinate systems, it is important to be able to convert between them. The Open Source Geospatial Foundation (OSGeo) sponsored software projects Proj.4 and csmap provide this functionality. Another name for a coordinate system is a spatial reference system.
Geodetics
We have all been told that the closest distance between two points is a straight line. But on the surface of a sphere, that straight line is actually an arc. To complicate things further, most planets are not perfect spheres but ellipsoids with imperfections. The science of geodetics deals with the measurement of the earth.
Why Use a Relational Database for Spatial Data?
There are a number of formats for storing spatial data, including several that are just files on a disk. So, why burden oneself with the overhead of a relational database? With one user, one set of data, and fairly simple and unchanging demands for data, it is easy to make the case for storing data as files on a disk. However, once you need to share that data with a team of people, things become more complex. A relational database management system (RDBMS) provides atomicity, consistency, isolation, and durability (ACID). In short, this means that the database will ensure that your data is not corrupted. An RDBMS also provides a client/server architecture that allows shared data over a network. The security model of a RDBMS enables roles defining who can view, modify, or delete data. These are all important considerations when sharing data.
Ingres
Ingres was one of the original RDBMSs and was born out of the INGRES project at the University of California, Berkeley in the 1970s. In 1980, INGRES project founders Michael Stonebraker and Eugene Wong created Relational Technology Incorporated (RTI) based on the technology. RTI changed names to Ingres Corporation and was purchased by Ask Corporation in 1990. Computer Associates acquired Ask in 1994. In 2005, Ingres was spun out of Computer Associates with venture funding to form the current Ingres Corporation. Today's Ingres Corporation is an open source startup based in Redwood City, California. Ingres' revenues have recently grown to $68M, despite the gloomy economy, making it currently the largest independent open source RDBMS company.
Ingres competes with closed source offerings from Oracle, IBM, and Microsoft. The main open source RDBMS projects are MySQL, now owned by Sun Microsystems, and PostgreSQL. O'Mahony and West propose there are two major types of community, those that are grass roots initiated and those sponsored by a for-profit firm. In the context of the Ingres community today, the latter is a better fit.
Hindsight is 20/20
It is worth noting that Ingres was one of the first RDBMSs to support geometry datatypes. Geometry datatypes provide mathematical constructs to describe points, lines, polygons, and other data types for describing objects and relating them in cartesian space. Many of these constructs are used to enable geospatial technology to relate objects on the surface of the earth. Even though Ingres supported geometry types, it had no support for coordinate systems, geodetics, and its geospatial functions were sparse. As the industry defined standards for additional data types and functions in the late 1990's, work was needed to update the code to support them. When the Ingres Spatial Objects Library (SOL) was originally developed, the decision was made to outsource its development. The deal left the intellectual property (IP) in the hands of the outsourcing company, leaving Ingres with the rights to distribute binaries, but not the code. Recall that in those days, geospatial technology was a tiny niche and only those with deep pockets and an urgent need for the technology were interested.
Ingres Geospatial Project
Ingres' customer base of over 10,000 customers represents a considerable amount of data and business. Since IT systems often contain spatial data in the form of addresses, it is common for customers and the community to ask what the company is doing in the area of spatial technologies. As an open source company, it is a significant problem to have an in-demand component not available as open source. Out of customer and community interest and the emergence of new standards, the Ingres geospatial project was born.
Power of Open Source
In IT Doesn't Matter, Carr notes that large IT suppliers such as Microsoft, Oracle, and IBM are making huge amounts of money while companies overspend on IT. Carr also notes that there is no correlation between IT spending and superior performance. If anything, the relationship is the inverse. Carr asserts that IT can be done more efficiently and inexpensively as there is no strategic advantage to paying more for platform software. Open source software (OSS), which is distributed for free and has development costs spread across numerous firms, seems well positioned as a commodity and poses a significant threat to the business of the closed source market share leaders.
The success of OSS projects such as Linux, Apache, and Firefox demonstrate that OSS can compete and be successful. In many cases, it can even challenge the market leaders.
To Make a Change, First Look in the Mirror
As a code base that was recently re-opened, the Ingres open source community struggled to compete with the enormous mindshare of MySQL. Much like the battle of VHS and Betamax, community developers did not seem to pay much attention to details of how Ingres was technically superior. It is fair to characterize Ingres' early days of returning to its open source roots as "open code" but closed in other ways. While an archive of the source code was available from the website, design discussions, code inspections, the production code repository, product roadmap information, and more were hidden behind the corporate firewall. It does not make sense to be an open source company without benefitting from an open source community.
Changes needed to make Ingres more open to community participation met with resistance within the company. A company exists to "maximize shareholder wealth" by making a lot of money. Open source and making money are not at odds. However, in order to make money with open source, you must first invest. For a startup with a sharp focus on profitability, it is very difficult to set aside money and people to work on something that may not generate a short term return. Despite the odds, a decision to forge ahead was made and investment in infrastructure such as a public code repository, bug tracking system, public technical documentation, community mentorship, and community management were made.
Survey of Reusable Components
It is worth explaining that much of our underlying technology, its defining points, lines, polygons and the functions for operating on them, is a commodity. We call this a "geometry engine" for the sake of this article. Given the importance of community, it was important to look first at existing communities and code reuse. Top on the list of priorities was to contribute to making an existing code base stronger rather than creating yet another geometry engine.
Contact was made with members of the OSGeo community who assisted in identifying candidates for code reuse. The leading candidate was a project called Geometry Engine Open Source (GEOS) originally developed by Refractions Research to enable PostGIS, the geospatial plugin for PostgreSQL. GEOS had roughly 20 year's worth of investment borne mostly by Refractions. A plan was assembled where Ingres would adopt GEOS and contribute to the development of the code. Helping to make this proposition more attractive, Ingres and others lobbied other companies in the OSGeo community to join. Eventually, the GEOS project was moved to OSGeo, and code contributors came forward from a number of companies. Each of the organizations involved benefits from giving a little to the development of GEOS and receives much more back in return.
OSS provides many benefits including sharing costs, risks, and ideas. OSS enables swift development, open communications, and collaboration. With closed source, just negotiating the legal agreements between the multiple companies involved can take many months. With open source, new companies and people can join the project without having to renegotiate contracts, thus reducing transaction costs.
Summary
The geospatial industry is poised for tremendous growth as location aware applications and devices grow in popularity. Enterprise IT will discover new value and insights through spatial analysis of existing data. Open source can reduce the transaction costs of technology partnerships. Businesses should seek out partners with interests that align through mutual investment and reuse of OSS. Doing so allows them to re-allocate spending to areas that provide unique value.