Business Semantic Interoperability – Part 2

In part 1 I talked about the technical solutions that are all about interoperability at the interchange level, more precise, at the message standards level.  Sadly, message developers in different standards organizations, such as OASIS (UBL), OAG, GS1 (formally EAN-UCC), SWIFT, UN/CEFACT and ASC X12 , missed the golden opportunity in creating a single standardized core component library (CCL), such as the UN/CEFACT CCL, for globally and cross-sectorial use. Instead we have many libraries, that are based on the ebXML Core Component Technical Specification (CCTS) but are still not 100% interoperable because of flexibility of the specification.The good news is that all those core component names within all those libraries are based on the same strict naming convention, outlined in the CCTS, resulting in perfect technical names. The bad news is that most implementers, especially those not involved in the message development, are using their own domain specific naming, which is in most cases very different from the perfect technical name.

What is needed to help bridge this gap between the standardized semantics and the domain specific everyday semantic is a reference work that contains everyday business terms with their definitions, groups them together according to similarity of meaning (containing synonyms and sometimes antonyms), and has reference links to all those perfect technical names of the various message within their libraries and/or directories.

Such a reference work was first envision in the early 1990‘s; it was called the Basic Semantic Repository (BSR). With the publication in 2000 of ISO/TS 16668, it became the Basic Semantic Register. However, the TS was withdraw in 2004 for reason I still don’t understand.

I am not proposing to resurrect the BSR, however, there is much we can use from the work of the BSR and other efforts such the Universal Data Element Framework (UDEF) that provides the foundation for building an enterprise-wide controlled vocabulary. What I am proposing is to create a Business Semantic Thesaurus (BST) that contain the everyday terms used across all domains, domestically and globally. Let’s not forget that the same business concept in one specific domain may have a different name in another domain. To ensure interoperability between those domains, implementers must recognize those differences when trying to figure what that perfect technical name within that message specification actually is.

To show what I am trying to convey, I always start with the example of ‘delivery date’, which has different meaning in the shipping industry and health care, but even within the same industry, there are different terms having either the exact same meaning, or are very close to each other, such as:

earliest, estimated, approximate, projected, proposed delivery date

There are a number of colloquial terms for postal code:

  • postal code – General term is used directly in Canada.
  • postcode – This portmanteau is popular in many English-speaking countries.
  • ZIP code – The standard term in the United States and the Philippines; ZIP is an acronym for ZoneImprovement Plan.
  • PIN code/pincode – The standard term in India; PIN is an acronym for Postal Index Number.

The first two terms are very similar, but what about correlating ZIP and PIN as being the same concept?

And last, but not least, the principal administrative division of certain countries:

territory, region, state, province, department, canton, area, district, sector, zone, division

For now all examples are in English, add other languages to the mix, and things get even more complicated., such as relating the German PLZ to the ZIP and PIN acronyms.

Bottom line, what is needed to help implementers, especially those not involved in the standards development, which are most, is a solution that allows users to search for their domain specific (everyday) name, resulting in being presented with the definition and any existing related reference links within their own and/or other domains. The related references may be an exact match or are narrower or broader terms. The solution should also provide for multi-lingual entries to allow global implementers, not familiar with the English names mostly found in international message definitions, to help with the translation.

I am not aware of any project currently underway that addresses the problem identified. Therefore the question arises, how do we get such a solution? The simple answer would be, let UN/CEFACT or ISO work on it. The problem is that not only are their resources are limited, it would take a long time to get to the working BST, since work would first  be done on the “specification” to define the system. Based on past experience that would take at least 2 years, if not more, before content could be created.

The other alternative is to create a “Open-Source” project to get this underway. Depending on the volunteers, using existing semantic work such as SKOS and XDFX, content could be created very soon. SKOS is an area of work developing specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, and XDFX is a project to unite all existing open dictionaries and provide both users and developers with universal XML-based format. Neither one has the complete answer, but using the ‘best of breed’ concept as a starting point it will not take long to define the details for BST.

I am already in contact with a number of colleagues that are eager to work on this project. If you are interested, please let me know. I will provide updates as work progresses on getting the project up and running. It already has a name – Lingumatic. It is very difficult these days to come up with a name that is somewhat related to the project, that can also be used as a internet domain name.  Playing a bit with the Latin language, that is the best we could come up with. So please, come back soon to see where this will end up.