Proj.4 may become more similar to Apache SIS
The GDAL/Proj.4 community has raised $142,000 for improving the map projection library behind major Open Source geospatial softwares. The issues to be fixed require an architectural change. Proj.4 is moving away from the early-binding approach (where WGS 84 is used as a pivot system) to a late-binding approach (with no fixed pivot system and with parameters that may depend on the geographic area). This change is required for accurate transformations in some situations. Implementing the late-binding approach requires a more complete EPSG geodetic dataset than the CSV files bundled in Proj.4. SQLite is the proposed alternative. Finally expressing this richer information in a Well Known Text (WKT) format requires ISO 19162 support (also known as « Well Known Text 2 »). The new WKT 2 standard fixes some ambiguities that caused the apparition of non-standard interpretations of WKT 1 standard, and can describe reference systems that can not be described in the previous standard (for example projections over pole). Those improvements to be implemented in Proj.4 were already implemented for more than 10 years in Apache Spatial Information System (SIS) project or its predecessor GeoTools, except WKT 2 which has been implemented in 2014. Indeed the GDAL Coordinate System Barn Raising cites GeoAPI and Apache SIS has inspiration sources. But if Apache SIS was a precursor, it was not because of clever developers. Apache SIS chooses to follow closely the ISO 19111 conceptual model for its design. While apparently overly complex at first sight (developers are often prompt to judge someone’s else model as « unnecessary complexity »), complying with OGC/ISO models allowed Apache SIS to benefit from the insight of the experts who designed those standards. With more than 15 years of experience with this approach, we feel that its long-term benefit has been largely demonstrated since ISO/TC211 experts anticipate in their models many real-world complexities that are not immediately apparent to developers. However this benefit exists only provided that we use the right standards, which are the UML in abstract models (ISO 19115, ISO 19111, etc.). The standards designed for web technologies (WCS, XML, etc.) are usually not suitable for designing the API of a library. Proj.4 is now following a similar path, with a proposed API very similar to ISO 19111 model. If Proj.4 continue on this path, its API would naturally be close to the Java interfaces defined in GeoAPI (and implemented in Apache SIS) since they would share the same conceptual model. This may open an opportunity to use implementation-neutral parts of Proj.4 API for defining a C++ version of GeoAPI, with Proj.4 being one implementation. It would make easier to share tests, or allow users to depend more transparently on the library of their choice.
Quels standards OGC pour l’informatique en nuage ?
Les membres du consortium Open Geospatial (OGC) se sont réunis à Orléans du 19 au 23 mars. Depuis peu, ces réunions expérimentent une nouvelle formule pour la séance plénière de clôture. Durant la semaine, les membres votent pour un sujet qui sera présenté et débattu en séance. À Orléans, le sujet discuté à la clôture était latent depuis plusieurs mois. Il avait été soulevé plus explicitement, sous différentes formes, à au moins deux reprises au cours de la semaine. Depuis une ou deux années, les agences spatiales – notamment la NASA et l’ESA – expriment à l’OGC un besoin d’inverser la logique actuelle dans la façon de manipuler les données. Au lieu d’amener les données aux algorithmes (c’est-à-dire de télécharger des données à partir de serveurs vers le poste du scientifique qui effectuera les calculs), on souhaite amener les algorithmes vers les données. La raison est que les données d’observation de la Terre occupent un volume tel que leur téléchargement peut devenir impraticable. On souhaite plutôt exécuter les algorithmes là où se trouvent les données. Ce souhait est déjà partiellement réalisé par certains acteurs. Par exemple Google Earth Engine (GEE) permet d’exécuter des algorithmes définis par l’utilisateur sur les serveurs de Google. Le guide du développeur contient un exemple d’algorithme (un calcul d’indice de végétation) exécuté localement, puis exécuté plus efficacement sur les serveurs où se trouvent les données. Mais cet exemple utilise une interface de programmation (API) propre à Google ; aucun standard géospatial n’y est appliqué. Cette conception expose les utilisateurs au risque de verrouillage du fournisseur. Une situation où les algorithmes écrits pour GEE peuvent être difficiles à porter vers une autre source de données. L’initiative openEO part de ces constatations (besoin d’exécuter des algorithmes là où se trouve la donnée ; les solutions existantes utilisent des API propriétaires) pour proposer une API neutre en Python. Leur prototype permet déjà de définir des algorithmes pour GEE en utilisant une API qu’ils veulent transposable à d’autres fournisseurs. Toutefois, bien que openEO ait été présenté à la réunion OGC à Orléans, l’API montrée semble leur être propre. Il utilise assez peu les modèles conceptuels de l’OGC. Les standards définis par l’OGC s’articulent autour du transfert de données. Soit en définissant des formats de fichiers (par exemple netCDF) ; soit en définissant des services web permettant de télécharger ces fichiers. La principale exception est le standard Web Processing Service (WPS) qui permet d’exécuter des algorithmes à distance. Cependant, ses possibilités sont limitées aux algorithmes pré-définis sur le serveur (auquel cas on ne fait que transmettre des paramètres) ou aux algorithmes exprimables dans un langage proche du SQL (beaucoup plus limité que Python, Java ou autres langages de programmation). Cette situation a amené la NOAA à poser trois questions à Orléans : La réponse à la question 2 pourrait être « tous les standards web et formats de fichiers ». Ce qui englobe beaucoup des standards les plus populaires de l’OGC ! Notons toutefois que même si ces standards peuvent devenir inutiles pendant la phase d’exécution d’un algorithme par l’informatique en nuage, ils restent pertinents pour récupérer le résultat du calcul. Une réponse à la question 3 pourrait être GeoAPI, un standard OGC qui définit des interfaces de programmation depuis plus de 15 ans. GeoAPI poursuit les mêmes objectifs que openEO, mais en prenant le problème par le bout opposé. OpenEO part du haut (« fournir une image à l’utilisateur »), et descend dans les détails au fur et à mesure des besoins. Ils peuvent se permettre cette approche car ils définissent leur propre API, qu’ils complètent à leur rythme. De l’autre côté, GeoAPI s’est donné comme mission d’implémenter les modèles conceptuels standards de l’OGC et de l’ISO. GeoAPI évite de créer son propre modèle, excepté pour des besoins d’intégration avec le langage de programmation ciblé. Par exemple, pour représenter une image, GeoAPI s’appuiera sur le modèle conceptuel définit par ISO 19123 (schema for coverage geometry and functions), qui lui-même dépend de ISO 19107 (spatial schema), qui dépend de ISO 19111 (spatial referencing by coordinates), qui dépend de ISO 19115 (metadata), qui dépend de ISO 19103 (conceptual schema language)… GeoAPI part donc du bas et monte vers les besoins des utilisateurs en suivant le fil des dépendances entre les standards ISO. C’est un processus plus long que l’approche prise par openEO, mais qui tend vers une solution collant mieux aux standards internationaux. Paradoxalement, l’informatique en nuage remet d’actualité une approche qui était en vogue il y a une vingtaine d’années, avant d’être (temporairement?) éclipsée par la vague des protocoles web. Le mode de fonctionnement de Google Earth Engine ressemble au mécanisme derrière les Remote Method Invocations (RMI) du Java, publié en 1997. D’ailleurs, le slogan de Sun Microsystems (le créateur du langage Java à cette époque) était « The network is the computer ». Les API définis par des interfaces Java – comme GeoAPI – se prêtent naturellement à une utilisation par l’informatique en nuage. Étendre cette approche à des langages autres que Java – à commencer par Python – est en cours sur le dépôt de code de GeoAPI.
Proj.4 versus Apache SIS: an accuracy comparison

Proj.4 versus Apache SIS: an accuracy comparison 20/09/2017 user In August 17th 2017, we presented an introduction to Apache SIS in the Free and Open Source Software for Geospatial (FOSS4G) conference. A discussion about accuracy were planed, but skipped because of lack of time. The results that we intended to show are below. Disclaimer: the author of this blog is an Apache SIS contributor. Summary The accuracy of two map projection libraries were compared: Proj.4, a library in the C language. Apache Spatial Information System (SIS), a library in the Java language. The benchmark code was written in Java. The code measured both performance and accuracy. Those two measurements were done together in order to ensure that performance comparisons were applied on equivalent coordinate operations. Results of performance comparisons were presented in a previous blog post. Present blog post is about the accuracy part. Tests done for this blog post expand on the Geospatial Integrity of Geoscience Software (GIGS) tests procedure. We require an accuracy of one centimetre, unless the coordinate operation is documented in the EPSG database as having a larger accuracy. In every cases, we require the results after 100 “forward projection – inverse projection” round trips to drift by less than one meter. However while we used some procedures, we didn’t used GIGS data for this blog. A subset of GIGS tests is executed by the Apache SIS project and a smaller subset is executed in the GeoAPI project for Proj4 and UCAR libraries, but they need to be expanded before further discussion. The ISO 19111 standard recognizes two broad categories of coordinate operations: conversionsand transformations. Those two categories need to be analyzed separately. Coordinate conversions This category includes map projections. For all tested conversions, Apache SIS and Proj.4 are in close agreement. Both implementations show drifts much smaller than one meter (and almost always smaller than 1 millimetre) after 100 “forward projection – inverse projection” round trips. Apache SIS has smaller drifts for projections implemented by exact formulas (mostly projections on sphere), presumably because of stricter trigonometric functions in Java compared to C/C++ and because of Apache SIS aggressive concatenation of linear steps. Proj.4 has smaller drifts (except for Cylindrical Equal Area — “cea”) for projections on Earth approximated by series expansions or iterative algorithms, presumably because of more conservative formulas in series expansions and because of iteration stop conditions controlled by smaller tolerance thresholds. However in both libraries the drift still much smaller than the accuracy requested by EPSG. For map projections on planets having higher eccentricity than Earth (e.g. Jupiter), two behaviors are observed: map projections implemented by iterative algorithms keep a stable accuracy, at the cost of more iterations. But map projections implemented by series expansions have growing errors as the eccentricity increases. For a planet flattened like Jupiter but scaled at Earth size (only for comparison purpose with other projections in this test), the Proj.4 “cea” projection has a drift of 9 km after 100 iterations. Apache SIS drift for the same projection is still below a millimetre, because of SIS dual implementation (series expansion completed by iterations when necessary). Such dual implementation is possible for Mercator (not transverse) and a few other projections. Coordinate transformations This category includes datum shifts. Apache SIS and Proj.4 produce the same results when they use the same operation method with the same Bursa-Wolf parameters, but their results can differ by 1 or 2 meters when those two libraries selected different parameter values. The Bursa-Wolf parameters are the same for tested datum shifts from/to WGS84 except in North America. But the parameters differ for datum shifts between other datums than WGS84, for example Martinique 1938 to RGAF09. In all tested cases, Apache SIS uses the Bursa-Wolf parameters as specified by the EPSG geodetic dataset (late-binding approach) while the Proj.4 early-binding approach puts a 0.9 ± 0.4 meter error in Martinique case, which is about 10 times the expected accuracy. Proj.4 does not give any indication to users about such inaccuracy; currently, only informed users could suspect a problem by inspecting the Proj.4-specific +towgs84 elements in definition strings and guessing how Proj.4 may use them. For coordinate transformations in North America, Apache SIS and Proj.4 may agree or disagree depending on the geographic area. The EPSG registry does not provide Bursa-Wolf parameters covering North America as a whole (both USA and Canada). But it provides parameters for smaller areas, for example a particular state. Apache SIS takes the geographic area in account when selecting Bursa-Wolf parameters, while Proj.4 API does not have this option. Proj.4 may use NADCON grids instead, but has no standard way to tell what it actually does. For this reason, while both Proj.4 and Apache SIS can handle NADCON grids, their usages have not been compared yet. Material and method The comparison has been performed between Proj.4 release 4.9.3 (August 2016) and Apache SIS 0.8-jdk8-SNAPSHOT (August 2017). The environment is Java 1.8.0_144-b01 on MacOS 10.12.6. The comparison program uses the GeoAPI interfaces. GeoAPI 3.0 is an OGC standard which allows to write code once and run it on different implementations. It is similar in this respect to Java DataBase Connectivity (JDBC) interfaces. The “Proj.4 versus Apache SIS: a performance comparison” blog post gives more details about GeoAPI implementations, which Coordinate Reference System (CRS) are used and how operations are instantiated. The table of CRS is repeated here for convenience: SOURCE CRS TARGET CRS EPSG:4053 – International 1924 Authalic Sphere EPSG:3410 – NSIDC EASE-Grid Global EPSG:4326 – WGS 84 EPSG:6933 – WGS 84 / NSIDC EASE-Grid 2.0 Global EPSG:4326 – WGS 84 EPSG:3857 – WGS 84 / Pseudo-Mercator EPSG:4326 – WGS 84 EPSG:3395 – WGS 84 / World Mercator EPSG:4269 – NAD83 EPSG:3978 – NAD83 / Canada Atlas Lambert EPSG:4326 – WGS 84 EPSG:3031 – WGS 84 / Antarctic Polar Stereographic EPSG:4269 – NAD83 EPSG:5070 – NAD83 / Conus Albers EPSG:3994 – WGS 84 / Mercator 41 EPSG:3395 – WGS 84 / World Mercator EPSG:4301 – Tokyo EPSG:4612 – JGD2000 EPSG:4277 – OSGB
Proj.4 versus Apache SIS: a performance comparison

Proj.4 versus Apache SIS: a performance comparison 28/08/2017 Martin Desruisseaux In August 17th 2017, we presented an introduction to Apache Spatial Information System (SIS) in the Free and Open Source Software for Geospatial (FOSS4G) conference. Discussions about performance and accuracy were planed, but skipped because of lack of time. A first part of the results that we intended to show are below. Disclaimer: the author of this blog is an Apache SIS contributor. Summary The performance of two map projection libraries were compared: Proj.4, a library in the C language. Apache Spatial Information System (SIS), a library in the Java language. The benchmark code was written in Java. The Proj.4 functions were invoked through the Java Native Interfaces (JNI). The use of JNI introduces a bias in benchmark measurements for Proj.4, but this bias is estimated lower than one standard deviation in the time measurements. In our results, Proj.4 is faster except for the Mercator inverse projection. The Proj.4 performance advantage is explained by the cost of Java 8 trigonometric functions like Math.sin(φ) and Math.asin(y)compared to their C counterparts. Java 9 is known to be faster than Java 8 but has not been tested yet. The Mercator inverse projection exception is explained by the mathematical work done in Apache SIS, where some formulas have been rewritten in more efficient ways using mathematical equivalences; in the Mercator case those gains outweigh the handicap of slower trigonometric functions. In a few extreme cases, Apache SIS is 400 times faster than Proj.4. Those extreme cases are explained by Apache SIS capability to detect when a chain of coordinate operations can be simplified as an affine transform. In all tested map projections, Proj.4 and Apache SIS results differ by a few micrometres or less. In datum shift tests, Proj.4 and Apache SIS are sometime in agreement and sometime apart by one or two meters. Those differences are explained by the way the two libraries use the EPSG geodetic dataset. More details are given in the discussion after “Material and method” section. Material and method The benchmark compared Proj.4 release 4.9.3 (August 2016) with Apache SIS 0.8-jdk8-SNAPSHOT (August 2017). The environment is Java 1.8.0_144-b01 on MacOS 10.12.6. The tests use the GeoAPI interfaces. GeoAPI 3.0 is an OGC standard which allows to write code without knowledge of the underlying implementation. It is similar in this respect to Java DataBase Connectivity (JDBC) interfaces. This approach allows us to write benchmark or test codes only once, then execute it on arbitrary GeoAPI implementations. Apache SIS is one such implementations. The sis-gdal module provides another implementation as wrappers around the Proj.4 library (another variant is available in geoapi-proj4 module provided by the GeoAPI project). Coordinate Reference System (CRS) instantiations All Coordinate Reference System (CRS) objects are created from an EPSG code through the GeoAPI CRSAuthorityFactory interface. The Apache SIS CRS class provides convenience static methods which delegate to GeoAPI implementations for performing the real work. Coordinate Reference Systems backed by Proj.4 are created as below. Note that despite the “epsg” part, this is considered a Proj4 definition (indicated by the “Proj4::” prefix) rather than an EPSG definition. Those definitions differ in axis order and sometime in units of measurement. CoordinateReferenceSystem crs = CRS.forCode(“Proj4::+init=epsg:4326”); Coordinate Reference Systems backed by Apache SIS are created as below. The first line creates a CRS as defined by EPSG. The second line modifies the CRS for the same axis order than Proj.4 (note that the CRS intentionally lost its “EPSG::4326” identifier in this process). CoordinateReferenceSystem crs = CRS.forCode(“EPSG::4326”); crs = AbstractCRS.castOrCopy(crs).forConvention(AxesConvention.RIGHT_HANDED); Note: alternatively, we could have created the Proj.4 CRS with “Proj4::+init=epsg:4326 +axis=neu”definition string. But this approach forces us to maintain a list of axis orientations for all supported EPSG codes. This is the approach implemented by GeoAPI wrappers for Proj.4, as a way to get an EPSG factory closer to authoritative definitions. Apache SIS takes a different approach where EPSG codes are handled by Apache SIS itself and the codes provided by Proj.4 are considered to be in a different, non-EPSG, name space. Coordinate Operation instantiations Many coordinate operations may exist for the same pair of source and target CRS. For example the EPSG geodetic dataset contains about 85 operations from NAD27 to WGS84, each operation using a different set of parameters for different state or geographic area. For making a choice, the Apache SIS CRS class provides another convenience method: CoordinateOperation op = CRS.findOperation(sourceCRS, targetCRS, areaOfInterest); The areaOfInterest option is set to null in this test, which default to the widest domain of validity. Note that in the case of NAD27 to WGS84 transformations, the “widest domain of validity” criterion causes Apache SIS to select a transformation for Canada, not for USA. In order to avoid ambiguity, “NAD27 to WGS84” transformations are not compared in this benchmark. The issue of Proj.4 and Apache SIS selecting different transformations will be discussed in more details in another blog post, to be named “Proj.4 versus Apache SIS: an accuracy comparison”. For the performance comparison discussed in this post, suffice to say that we verified that the benchmark compares the same operation methods. All coordinate conversions or transformations are executed through the GeoAPI MathTransforminterface. In the Proj.4 wrapper case, MathTransform delegates to Proj.4 pj_transformfunction. All those functions can work on arbitrary number of coordinates. Our benchmark uses this capability for applying coordinate operations on groups of 65536 coordinates. In Proj.4 case, this means one single JNI call and one single pj_transform call per group of 65536 coordinates. The cost of those calls has been estimated by performing an identity transform (where source and destination CRS are the same) on 65536 random coordinates with Proj.4 and assuming that all the execution time were due to JNI overhead. We measured 0.2 milliseconds, which is about 0.6% of the average execution time of other coordinate operations tested in this benchmark. Conversions or transformations have been tested between the following pairs of CRS. Those pairs have been selected in order to use different operation methods: Cylindrical Equal Area