Saturday, May 3, 2014

Best Practices for Storing BigDecimal Values in Google App Engine Datastore


This article describes some best practices for storing BigDecimal values in the App Engine Datastore using the objectify-appengine library.  The article provides a brief overview of existing options and highlights some drawbacks with each.  The article then outlines an option that overcomes many of the shortcomings of the current solutions while offering flexibility depending on use-case.


-------------------------------------------------------------------------------------

The java BigDecimal class has many great use-cases, so when you want to store an entity with a BigDecimal property in the Google App Engine Datastore, you'll want to be aware of some potential pitfalls.

Existing Options: Objectify
The objectify-appengine library is an all-around excellent tool in any App Engine developer's tool kit.  It handles all of the typical Java types in the Datastore, and also has the ability to store custom data-types via its Translator framework.  This is a powerful feature of Objectify, and to this end the library comes with an optional translator that can handle BigDecimal types automatically.  It's called BigDecimalLongTranslatorFactory.  As the Javadoc states, however, this is just a simple implementation.  There are potential pitfalls to its usage, and this article explores some of the areas where a different solution might be preferred.

Potential Lack of Space
The first potential drawback to using BigDecimalLongTranslatorFactory is that it does not have the ability to store number values that would exceed the "space" of a Long number in the Datastore.

To explain this better, it's useful to understand that BigDecimalLongTranslatorFactory encodes the decimal portion of a BigDecimal into a Long value using a pre-defined "factor" (configured when the translator is instantiated).   When loading the encoded BigDecimal value from the datastore, the Long number returned is divided by this "factor" to reproduce the decimal and numeric portions of the BigDecimal object.

Since App Engine Long numbers are 64 bits long, this means we can have some pretty large numbers, but it's possible your use-case may run out of space, depending on the number being stored.

Arbitrary Precision
Another potential drawback to using the objectify-supplied translator for BigDecimal is that the "factor" mentioned above must be consistent, so changes to it over time would seem to corrupt existing entities that require the old factor.  This is not necessarily a showstopper -- it simply means that you need to know ahead of time the largest number of decimal digits you'll need to store and pick a factor large enough to accommodate that value before you start storing data into the datastore.  If you choose a factor of 100 (i.e., 2 decimal places) and you eventually need to move to a factor of 1000 (i.e., 3 decimal places), you'll need to convert your old entities to a format that uses the new factor.

A Potential Improvement: BigDecimalStringTranslatorFactory
A library called objectify-utils provides a potential improvement over the default translator when it comes to storing BigDecimal fields into the Datastore.  The translator class is called BigDecimalStringTranslatorFactory, and the most important difference between the two translators is telegraphed by the name:  the BigDecimalStringTranslatorFactory stores BigDecimals as an encoded String instead of as an encoded Long number.

This allows developers to overcome any space limitations -- String fields in the App Engine datastore allow for up to 500 characters, which is a much larger number than anything available inside of a Long.  Additionally, developers can start using arbitrary precision numbers throughout their code without having to know ahead of time how much precision they'll eventually need.

At first glance, BigDecimalStringTranslatorFactory might seem to solve some important potential problems introduced by BigDecimalLongTranslatorFactory.  Howeverstoring numbers as a String can cause problems when it comes to sorting.  This is because there are often subtle instances where the lexicographical ordering of two Strings does not align with the natural ordering of two numbers.  

A good example of this can be seen with the numbers 11.23 and 100.23.  Comparing these two, it's obvious that 11.23 is less-than 100.23.  However, if we convert these to their Strings equivalents, "11.23" and "100.23", then lexicographically speaking, "11.23" is greater-than "100.23", which is counter-intuitive.  We've just lost our ability to have the Datastore natively sort these numbers for us.  :(

Thank-you Peter Seymour and Ted Stockwell...
I've never met Peter or Ted, but thankfully they each spent a lot of time thinking about how to solve this issue.  Peter's work inspired Ted, who applied it more specifically to the realm of the Appengine Datastore in a paper called "A Pragmatic Method of Lexicographic Encoding Of Numbers." Ted also wrote some Java code that implements the algorithms discussed in the two papers, and objectify-utils incorporates this via BigDecimalCodec.java.

In a nutshell, BigDecimalCodec encodes numbers into Strings, and these Strings have the same comparison attributes as their corresponding numbers.


Building off of that, BigDecimalStringTranslatorFactory can effectively store a BigDecimal as a String without breaking the native datastore sorting capabilities. (To reiterate, this was not an issue with BigDecimalLongTranslatorFactory because it was based upon a Long, but is instead an issue that BigDecimalStringTranslatorFactory would have needed to overcome in order to be useful).

Key Features of BigDecimalStringTranslatorFactory
In Summary, here are some of the key benefits of using BigDecimalStringTranslatorFactory:
  1. Large Numbers
    The translator can store encoded values whose size approaches 500 digits.  This could be a very small number with a large decimal value, or a very large number with or without a decimal value.
  2. Arbitrary Precision
    The translator allows for arbitrary precision numbers across all entity groups.  For example, one "Car" entity with a "value" property of "2500.00" could be stored in one entity while another "Car" entity could store a value of "350.123", and the translator can handle each one.  Additionally, future values larger than 3 decimal places would automatically "just work."
  3. Correctly Indexable
    The translator stores all number values in an encoded String-format that is lexicographically equivalent to each corresponding numeric value when it comes to comparison.  This supports negative values, too.

More Details
For more details about this translator, checkout the objectify-utils project.  And if you plan to store Money in your Datastore entities using Objectify, checkout the follow-up to this article entitled, "Best Practices for Storing Joda-Money Values in the Google App Engine Datastore".

No comments:

Post a Comment