Saturday, May 3, 2014

Best Practices for Storing Joda-Money Values in the Google App Engine Datastore

This article describes some best practices for storing Joda-Money currency values in App Engine using the objectify-appengine library.  It is based upon knowledge outlined in a previous article called, "Best Practices for Storing BigDecimal Values in the Google App Engine Datastore."


-------------------------------------------------------------------------------------

One of the best libraries around for manipulating currency values in Java is called Joda Money, which supplies two nice classes to represent currency: org.joda.money.Money and org.Joda.money.BigMoney.  If you're not familiar with them, the biggest difference is that the BigMoney class is "...not restricted to the standard decimal places of a Money object, and can represent an amount to any precision that a BigDecimal can represent."  

See the Javadoc for more details, but it's worth reiterating that both of these Joda classes are backed by BigDecimal, so before continuing you might want to explore some potential issues with storing BigDecimal values in the App Engine Datastore here.

Existing Options: Objectify
The objectify-appengine library is an all-around excellent tool in any App Engine developer's tool-kit.  It handles all of the typical Java types in the Datastore, and also has the ability to store custom data-types via its Translator framework.  This is a powerful feature of Objectify, and to this end the library comes with two optional translators that can handle joda-money values: MoneyStringTranslatorFactory and BigMoneyStringTranslatorFactory.

MoneyStringTranslatorFactory & BigMoneyStringTranslatorFactory
The MoneyStringTranslatorFactory and BigMoneyStringTranslatorFactory classes are great alternatives for storing joda-money values.  Each essentially translates a BigMoney class into it's .toString() equivalent when storing to the datastore, and then creates a new BigMoney object from this String when loading the value from the datastore.  For example, a BigMoney object with the "USD" currency code and a value of "35.751" would be stored to the Datastore as the String: "USD $35.751".  

As the Javadoc states in the class definition, however, this implementation is not ideal for all use-cases, and can be problematic in certain cases.  

Improper Native Datastore Comparison
For example, in Java a BigMoney object with a value of "USD $11.25" would be "less-than" a comparable BigMoney object with a value of "USD $100.25", which makes sense from a currency and number perspective.

However, when translated to their String-equivalents, and then compared, the value “USD $11.25” is lexicographically “greater than” the value “USD $100.25”, which is somewhat counter-intuitive if you're not aware of how lexicographic String comparison works.

While technically accurate (the String “USD $11.25” is greater than "USD $100.25"), it's wildly incorrect from a currency perspective.  Negative values and divergent currency codes will tend to exacerbate this problem.  For example, “USD $35.00” is lexicographically "greater-than" “AUD $500.00” while numerically the $500 value is greater than the $35 value.

A Potential Improvement: JodaMoneyTranslatorFactory
A library called objectify-utils provides a potential improvement over the default translator when it comes to storing joda-money fields into the Datastore.  This new translator is called JodaMoneyTranslatorFactory, and is build upon the lexicographic encoding libraries discussed in another post here, which are used to ensure the proper storage of BigDecimal values in the App Engine Datastore.

From a high-level, the main benefits to using JodaMoneyTranslatorFactory are that developers can use arbitrary precision money values (just like with BigDecimals) and can rely on consistent greater-than/less-than filtering as well as sorting of Money values natively via the Datastore. 

Additionally, JodaMoneyTranslatorFactory separates the Currency code from the currency amount in the Datastore, allowing for finer-grained control over each data point, as well as potential sorting across money values with different currency codes (though the use-case for this is somewhat dubious since it is difficult to envision the usefulness of sorting money values in different currency codes).

A Final Note About Potentially Inconsistent Money/BigMoney Indexing
Another area that can be the source of stubborn inconsistencies when dealing with Money types in general is that of Money/BigMoney object instantiation.  

Since these classes are based upon a BigDecimal, it can be very easy to create two numbers with the same "amount", but with different decimal-place precision values.  An example can be seen in the number "$35.990" and the equivalent number "$35.99".  While numerically equivalent (depending on who you ask), these two numbers would be considered "not equivalent" both per the BigDecimal .equals() contract, and lexicographically when stored as an encoded String in the Datastore.

This may or may not make sense, depending on your point of view, but luckily Joda provides the #isEqual method which allows us to perform money comparisons independent of scale.  In this case, "$35.990" and "$35.99" would be considered equal.

In the Datastore, however, we have no such mechanism, even when lexicographically encoding our Money values.  Thus, for purposes of native datastore indexing, the value "35.990" would still be considered "greater-than" "$35.99".

In practice, this is probably not a big deal, because these two values would still be returned "next to" each other in a sorted result.  However, it's something to be aware of when using either Translator.

Best Practices for Storing BigDecimal Values in Google App Engine Datastore


This article describes some best practices for storing BigDecimal values in the App Engine Datastore using the objectify-appengine library.  The article provides a brief overview of existing options and highlights some drawbacks with each.  The article then outlines an option that overcomes many of the shortcomings of the current solutions while offering flexibility depending on use-case.


-------------------------------------------------------------------------------------

The java BigDecimal class has many great use-cases, so when you want to store an entity with a BigDecimal property in the Google App Engine Datastore, you'll want to be aware of some potential pitfalls.

Existing Options: Objectify
The objectify-appengine library is an all-around excellent tool in any App Engine developer's tool kit.  It handles all of the typical Java types in the Datastore, and also has the ability to store custom data-types via its Translator framework.  This is a powerful feature of Objectify, and to this end the library comes with an optional translator that can handle BigDecimal types automatically.  It's called BigDecimalLongTranslatorFactory.  As the Javadoc states, however, this is just a simple implementation.  There are potential pitfalls to its usage, and this article explores some of the areas where a different solution might be preferred.

Potential Lack of Space
The first potential drawback to using BigDecimalLongTranslatorFactory is that it does not have the ability to store number values that would exceed the "space" of a Long number in the Datastore.

To explain this better, it's useful to understand that BigDecimalLongTranslatorFactory encodes the decimal portion of a BigDecimal into a Long value using a pre-defined "factor" (configured when the translator is instantiated).   When loading the encoded BigDecimal value from the datastore, the Long number returned is divided by this "factor" to reproduce the decimal and numeric portions of the BigDecimal object.

Since App Engine Long numbers are 64 bits long, this means we can have some pretty large numbers, but it's possible your use-case may run out of space, depending on the number being stored.

Arbitrary Precision
Another potential drawback to using the objectify-supplied translator for BigDecimal is that the "factor" mentioned above must be consistent, so changes to it over time would seem to corrupt existing entities that require the old factor.  This is not necessarily a showstopper -- it simply means that you need to know ahead of time the largest number of decimal digits you'll need to store and pick a factor large enough to accommodate that value before you start storing data into the datastore.  If you choose a factor of 100 (i.e., 2 decimal places) and you eventually need to move to a factor of 1000 (i.e., 3 decimal places), you'll need to convert your old entities to a format that uses the new factor.

A Potential Improvement: BigDecimalStringTranslatorFactory
A library called objectify-utils provides a potential improvement over the default translator when it comes to storing BigDecimal fields into the Datastore.  The translator class is called BigDecimalStringTranslatorFactory, and the most important difference between the two translators is telegraphed by the name:  the BigDecimalStringTranslatorFactory stores BigDecimals as an encoded String instead of as an encoded Long number.

This allows developers to overcome any space limitations -- String fields in the App Engine datastore allow for up to 500 characters, which is a much larger number than anything available inside of a Long.  Additionally, developers can start using arbitrary precision numbers throughout their code without having to know ahead of time how much precision they'll eventually need.

At first glance, BigDecimalStringTranslatorFactory might seem to solve some important potential problems introduced by BigDecimalLongTranslatorFactory.  Howeverstoring numbers as a String can cause problems when it comes to sorting.  This is because there are often subtle instances where the lexicographical ordering of two Strings does not align with the natural ordering of two numbers.  

A good example of this can be seen with the numbers 11.23 and 100.23.  Comparing these two, it's obvious that 11.23 is less-than 100.23.  However, if we convert these to their Strings equivalents, "11.23" and "100.23", then lexicographically speaking, "11.23" is greater-than "100.23", which is counter-intuitive.  We've just lost our ability to have the Datastore natively sort these numbers for us.  :(

Thank-you Peter Seymour and Ted Stockwell...
I've never met Peter or Ted, but thankfully they each spent a lot of time thinking about how to solve this issue.  Peter's work inspired Ted, who applied it more specifically to the realm of the Appengine Datastore in a paper called "A Pragmatic Method of Lexicographic Encoding Of Numbers." Ted also wrote some Java code that implements the algorithms discussed in the two papers, and objectify-utils incorporates this via BigDecimalCodec.java.

In a nutshell, BigDecimalCodec encodes numbers into Strings, and these Strings have the same comparison attributes as their corresponding numbers.


Building off of that, BigDecimalStringTranslatorFactory can effectively store a BigDecimal as a String without breaking the native datastore sorting capabilities. (To reiterate, this was not an issue with BigDecimalLongTranslatorFactory because it was based upon a Long, but is instead an issue that BigDecimalStringTranslatorFactory would have needed to overcome in order to be useful).

Key Features of BigDecimalStringTranslatorFactory
In Summary, here are some of the key benefits of using BigDecimalStringTranslatorFactory:
  1. Large Numbers
    The translator can store encoded values whose size approaches 500 digits.  This could be a very small number with a large decimal value, or a very large number with or without a decimal value.
  2. Arbitrary Precision
    The translator allows for arbitrary precision numbers across all entity groups.  For example, one "Car" entity with a "value" property of "2500.00" could be stored in one entity while another "Car" entity could store a value of "350.123", and the translator can handle each one.  Additionally, future values larger than 3 decimal places would automatically "just work."
  3. Correctly Indexable
    The translator stores all number values in an encoded String-format that is lexicographically equivalent to each corresponding numeric value when it comes to comparison.  This supports negative values, too.

More Details
For more details about this translator, checkout the objectify-utils project.  And if you plan to store Money in your Datastore entities using Objectify, checkout the follow-up to this article entitled, "Best Practices for Storing Joda-Money Values in the Google App Engine Datastore".

Monday, March 29, 2010

OInvite Draft-3 Released

It's been a while since I've written about OInvite (or anything for that matter). But a few days ago, I had the chance to spend some more time updating the OInvite spec. Below is an overview of some things that have changed. For more information and to read the updated spec, checkout http://oinvite.net.

OInvite Core
First of all, I've decided to create a base specification called OInvite Core. This is what I'm referring to when I advertise "Draft 3". OInvite Core defines an XML based request/response document format to define the parameters surrounding an invitation. This core spec is meant to be a baseline for extension profiles of OInvite that can work appropriately under different situations. For example, there will be an HTTP profile of OInvite that specifies discovery and other aspects which work well for web applications. At the same time, I'm envisioning it might make sense to have a slightly different profile for XMPP, Google Wave, and posibbly something for regular email (heavy underline the "possibly" in that one).

OInvite Verification Extensions
The core spec provides a brief outline of the process a server should follow before prompting a user that a new invitation has been received. This is known as OInvite verification, but the exact details of how or what to verify are left to extensions to define.

Spam Prevention
You may notice that previous versions of OInvite had a Proof-of-Work scheme defined that worked to reduce and/or eliminate spam invitations. I'm still working to define this, but it will be in the form of an optional extension. Best of all, the core spec includes a way for servers to advertise any verification extensions in a programmatic way. Since these extensions are optional, individual implementors will have lots of freedom to experiment with the best verification mechanisms to protect from spam, verify senders & receivers, and ensure that every aspect of a particular OInvite has what it needs to work with a recipient's systems.

OInvite over HTTP
Next on my list is to create an HTTP profile of OInvite that relies on OpenID+WebFinger for authentication, XRD for discovery, and OAuth for authorization. This latter piece is going to be very interesting. It's starting to look like OInvite could be an automatic way for two users to exchange OAuth authorization tokens, which could open up some interesting new possibilities for private resource sharing across domains.

More to come!

Wednesday, June 3, 2009

Announcing OInvite

<Some Context>
A while back, I posed a question to the Diso mailing list wondering if there was any work being done to enable cross-domain "friend requests" in the context of social networks and other systems that have the notion of "friend lists".
</Some Context>

As I soon found out, this is a pretty involved topic, so I collected my thoughts and shared them with the world in a blog post here (I recommend reading that post for a lengthy--if surfacey--background relating to the issues surrounding friend requests, open vs. closed communications relationships, and my general musings on this topic).

At the same time, I formalized some more thoughts surrounding this idea into a spec I'm calling OInvite. Perhaps "spec" is too formal of a word. In reality, I just like to use the xml2rfc tool, so it was more enjoyable to try to hone my ideas in the confines of a specification.
To say that this document is anything close to a spec would be an understatement!

:)

At any rate, the idea behind OInvite is to codify an open protocol that helps facilitate the creation of unsolicited "communications relationships" (i.e., friend requests) between various parties in unaffiliated domains without the worry of spam.

Some key attributes of OInvite (see my musings post here for more background on what these terms mean):
  1. Open: Communications Relationship ("CR") requests can be send to users with identifiers in unaffiliated domains or federations.

  2. Bi-Directional: OInvite assumes all CR's are two-way, meaning information can be sent and received by both parties (so long as participants can elect to ignore messages from a particular sender, uni-directional relationships can be simulated in a bi-directional CR).

  3. One-to-One: OInvite always involves a CR's between only two participants (a.k.a., cardinality) because this greatly simplifies the spec, yet still supports one-to-many communications.

  4. SPAM-Free: OInvite provides support for a pluggable mechanism to guard against "first-contact" spam, which is the only type of SPAM available in a white-list type communications environment, such as a social network (in systems like email, this pluggable model is not sufficient to prevent spam due to the implied "all senders are good" principle underlying the design of SMTP).
Check out the specification, and feel free to share your thoughts in the OInvite discussion group or on the Diso list.

The Case for an Open "Friend Request" Protocol to Enable Communications Nirvana 1.0

A while back, I posed a question to the Diso mailing list wondering if there was any work being done to enable cross-domain "friend requests" in the context of social networks and other systems that have the notion of "friend lists".

There was some interesting discussion surrounding this topic, and I've been meaning to solidify my thoughts around this whole idea. Given that I love to utilize the xml2rfc tool, I decided to just codify my thoughts into a formal specification. However, I quickly realized that I was creating a lot of new terminology, so this blog post is my attempt to provide some background explanation about the "what" and "how" of cross-domain friend requests, as well as to illuminate my new proposed specification, OInvite.

So here goes...

A "Friend Request"?
First, you might be wondering, "What exactly is a 'friend request', anyway." Well, for my purposes I will use the following definition:

Friend Request: A request from an invitor (the sender of the request) to an invitee (the recipient of the request) to enter into a "communications relationship."

OK, there's some more new terminology.

First, the notion of a "communications relationship" is purposefully left quite vague. Perhaps two users (let's call them "John" & "Beth") simply want to email each other. That's certainly an obvious form of communication. However, communications relationships are much bigger than this. Imagine if John wants to allow Beth to see his photographs on Flickr...that's communication, too, and it generally involves some acceptance of this communication.

Thus, a "communications relationship" is the term I invented to describe an agreement between two parties to send and receive information.

In tandem, an "OInvite" is a request to enter into such a relationship.

In Facebook terms, this type of thing is generally referred to as a "friend request"; in twitter terms, a "follow me" request; in OAuth terms, it's "I want to share data with you" request, etc.

One-to-One? Uni-Directional? Bi-Directional?
As we travel down the rabbit-hole of "communications relationships", we quickly encounter the notion of directionality. For example, what if John only wants to share (read: send) information with Beth, but doesn't really care to receive information from Beth? Such a relationship would be considered "unidirectional", because information only flows one way. However, if John wants to both send and receive information to/from Beth, then a bi-directional communications relationship would be required.

In addition to directionality, "friend requests" might be sent to a single individual, or they might be sent to a group of invitees (a.k.a: recipients). This is the notion of Invitation cardinality, and quickly adds to the complexity of a communications relationship. For example, should an invitation be "one-to-one" (an invitation from one user to a different user); one-to-many (an invitation from one user to multiple users); or many-to-many (I'm not even sure if this is possible).

;)

Closed vs. Open Communications Relationships
Going even deeper, there's one last area of communications relationships that should be considered/defined, and that is the notion of an "open" vs. "closed" communications relationship.

On today's Web, we are commonly exposed to "closed" communications relationships. Facebook users can invite other Facebook users to become "friends", thereby allowing communications of various types (messages, photos, activity-streams, etc). Twitter users can only "follow" other Twitter users; MySpace users are limited to communicating with other MySpace users, and so on. These are all considered to be "closed" relationships because a Facebook user (for example) cannot easily "becomes friends" with a user from Twitter, without that Twitter user first having a Facebook account with which to interface to Twitter with.

Conversely, the notion of an "open" communications relationship would allow Facebook users to "friend" users on any other social-network (assuming the two networks could speak the same protocol). For example, John (on Facebook) might invite Beth (on Orkut) to "share information".

We don't see this type of thing today for a myriad of reasons. For one, most social networking sites are closed ecosystems, so they have never bothered to support a "hook" into other social networks (this is beginning to change, e.g., Facebook Apps).

However, even as social networking websites "open up" their content (and access to that content), they still require an account on the "home" social network. In the example above, Beth (on Orkut) would need an account on Orkut as well as on Facebook to have John's Facebook data show up in Orkut.

These "silos" exist for a number of reasons (competitive, economic, technological, etc), but at the core they exist because there is no protocol that will allow Facebook (for example) to control the number of invitations coming from an infinite number of other social networks. In essence, even if the Internet community could overcome the all the hurdles standing in the way of "open" social networking, there would still be the great white elephant in the room: SPAM.

Now, I'm not talking about the kind of spam everyone is familiar with in the email world. Instead, I'm talking about Invitation Spam.

Social Networks don't allow communication between users until each user has given some sort of "approval" to be communicated with (e.g., acceptance of a "friend request"). When a social network provider controls both ends of this interaction, that provider can easily throttle invitations, remove "bad" accounts, and more; thus controlling and eliminating SPAM of any kind.

However, in a truly open world, the ability to control the "bad guys" goes away, demanding some other mechanism that will both 1.) allow a nearly infinite number of cross-domain social networks to interact with users on the local social network (i.e., a completely open Facebook) and 2.) prevent unwanted invitation SPAM from ruining the entire user experience for legitimate users.

Email: Truly (Bad) Open Communications Relationships
O.K., so there's a lot to digest there...but we're not quite done yet. One last thing to consider before delving into the innards of OInvite is the current SMTP-based email system.

SMTP provides for cross-domain information sharing, or "open" communications relationships. In fact, the current email system is perhaps the most successful (and unsuccessful) "open" communications mechanism ever created in the history of the world. For example, messages from an email address in one domain can be freely sent and received to/from email addresses in another domain. As we all know, this is a blessing and a curse, enabling unfettered communication on the one hand, while at the same time enabling mass amounts of SPAM on the other.

Email: Communications Nirvana 0.1
Despite its shortcomings, email is pretty amazing. It has enabled an incredible amount of asynchronous communications, productivity (perhaps that's debatable), and more. However, if there is such a thing as "Communications Nirvana", then email is version 0.1 because it has, despite its open-ness, three significant shortcomings--all of which ultimately have to do with SPAM.

First, email lacks the ability to properly authenticate the sender of a message. I can send an email purporting to be from john@example.com, and it's somewhat difficult to verify whether or not "John" was the actual author of the message.

Editor's Note: The Internet is making progress in this area: SPF, DomainKeys, etc are all positive moves, but they're not universally required nor adopted, making SMTP fundamentally susceptible to this vulnerability.

Second, email puts all of the resource burden onto the recipient of a message, allowing senders to trivially send mass amounts of email at virtually no cost. Bandwidth, CPU resources, and intermediate data storage for email messages is borne by ISP's, and likewise each recipient, making it very inexpensive to send copious quantities of SPAM.

Third, email has no concept of a "friend list". Email was designed from the perspective that a particular user should receive all incoming messages. Today, this principle still holds, unless a message, sender, or domain is specifically blocked by a SPAM filter.

Social Networks: Communications Nirvana 0.5
So, if email is Communications Nirvana 0.1, then social networks are version 0.5. They solve issues #1, #2, and #3 above; but at the expense of open-ness (e.g., we can't send arbitrary messages to a Facebook account from a system in a different domain).

Yet, social networks are still pretty incredible. Virtually no spam; resource burdens borne by the network provider; and the concept of a friend-list that means I'm only receiving information that I elect to receive (for the most part -- I can't say that I really care about Facebook activity updates from 200 'friends', but again -- perhaps the topic of another post).

The Future: Communications Nirvana 1.0
Well, if you're still reading this, then congratulations--I'm now ready to make a point, which is this: If you buy the argument that social networks are an improvement over email, then the last remaining hurdle to getting to a 1.0 version of Communications Nirvana (in my opinion) is to "open" up social networks. This is was the Diso project is working towards, but it seems to me that as long as we sacrifice "open-ness", we'll never it make it to a 1.0 version of communication Nirvana, let alone a version 2.0, 3.0, or 10.0.

To become "open", we require a mechanism to enable cross-domain invitation requests which will allow us to enter into communications relationships with each other. In layman's terms, we need an open "Friend Request" mechanism that solves the problem of first-contact SPAM.

My answer is OInvite--an open protocol to enable cross-domain "friend requests" with a pluggable anti-spam mechanism--and since this blog post is so stinking long, I'm going to discuss the actual protocol in a different post, namely this one.

<Musing>
Why do we express incredulity when remembering the Internet of ~16 years ago, where for a long time people with an AOL email addresses could not send email to people with a CompuServe addresses? Today, we suffer this same abuse when it comes to the communications platform du-jour, namely, social networks.

Imagine 16 years from now when we tell our children how it "used to be":
Dad: "...in my day, you needed an account at every social network provider in order to be able to communicate with people there..."
Daughter: "Seriously? You're kidding, right?"
Dad: "Yep, if I wanted to talk to you on Facebook, I had to sign-up there. Then, if I wanted to follow you on Twitter, I had to sign up there..."
Daughter: "What's Facebook?"
</Musing>

Tuesday, June 2, 2009

Google (Tidal) Wave is here...with lots of questions.

If you haven't yet heard, you will -- Google Wave is here. It's a pretty amazing thing, but prompts many questions in its wake (pardon the pun).

This blog entry details just a few of the provocative questions that Wave raises:
http://storm.alert.sk/blog/2009/06/02/Good-Vibrations

Friday, December 5, 2008

Great Synopis of the state of Web Metadata Discovery

Eran Hammer-Lahav has posted a great synopsis of the state of Web Metadata Discovery.

Update: Eran has blogged about the Discovery Protocol Stack here.