Application logging using global inception identifier

March 7, 2015

For multithreaded programs that write to log files a best practice is to include a tracking ID. What should this ID be and how to use it? The following is presented as a ‘design pattern’.


For machine readability and tool use, a non-business related global inception ID (GIID) should be used for every log output. This ID is a combination of an ‘origin id’ and ‘unique id’. When each new business level or operational id is obtained or generated, it is logged to provide an ‘association’ with this GIID. This is similar to various existing practices so it is presented as a Software Design Pattern.

Author: Josef Betancourt, Jan 11, 2015. Originally posted here.

CR Categories: H.3.4 [Systems and Software]; D1.3 [Concurrent Programming]; D.2.9 [Management]; K.6 [MANAGEMENT OF COMPUTING AND INFORMATION SYSTEMS]; K.6.3 [Software Management]

  1. Context
  2. Forces
  3. Solution
  4. Consequence
  5. Implementation
  6. Appendix
  7. Further reading


App Logs

An application log file records runtime programmatic events: details, exceptions, debug, and performance related measures. This is different than specialized server log files, such as a webserver’s access logs, error.log, etc. The latter are more standardized, for example with W3C Extended Log File Format, and well supported.

App logs usually capture details at specific logging levels and named contexts. In the Java ecosystem there are plenty of libraries to support this and now the JDK supports this as part of the java.util.logging package.

Despite the advances made in logging APIs and support within operating systems and frameworks, app logs are at a primitive level of software maturity. What A. Chuvakin and G. Peterson describe as the “… horrific world of application logging” is composed of ad hoc, minimally documented, unstructured, untested, and under-specified delivered components.

Attempts to create widely used industry standards have failed and every business, software project, dev team, or industry reinvents and attempt to tackle the same problems.


In the context of server app logs, multiple sessions will output log information that can be intermixed. These sessions can be part of an ongoing process, such as a user interaction with a web site.

External integration points (web services, database, etc) may also be invoked. Unless each session is identified in the log and integration logs, subsequent support, debug, and auditing are very difficult.

The problem is not just tracking where and when ‘integration’ occurred or its non-functional integration criteria (i.e., timing), but the tracking of subsequent logging, if any, at that location.

App logs are used extensively during development. Their importance is illustrated by an old mime “debuggers are for wimps”. As such, logs with impeccable tracking used for design and test are a good Return On Investment (ROI).

The same is true for deployed systems. In many cases the only information available on historical behavior is in a log file.

This seems like a programming 101 lesson, but it is widely disregarded in practice. That log file output is a minor concern and sometimes not even part of a “code review” is puzzling.


1. A service must invoke a distributed call to another system. The service has retry logic, and logs each failure. If each log output does not identify the session or operation, the retries could get mixed with other threads. Identifying an end user’s request is a hit or miss bracketing of time stamps if the development team did not enough identifiable data in each log output.

2. A computer savvy end user or family may attempt to register into your system with multiple browsers simultaneously. This could cause problems if multiple logins are supported and an error occurs. How do you track this and debug it?

3. The app server makes a remote call to a service integration point and that service fails. How is the owner of that service informed as to the specific invocation? There are probably deployed systems where one would have to compare time stamps on log output to even coordinate where the two systems communicated and even then it is vague. Some systems may not even do any logging and the unless there is a fault of some kind.

4. You have to identify time periods based on hazy user complaints, search through multiple log files with regular expressions, then walk each output to recreate a specific error scenario. Isn’t this manual drudgery what computers were supposed to eliminate?


Global Inception ID

Logging with Unique identifiers is encouraged as a best practice:

“Unique identifiers such as transaction IDs and user IDs are tremendously helpful when debugging, and even more helpful when you are gathering analytics. Unique IDs can point you to the exact transaction. Without them, you might only have a time range to use. When possible, carry these IDs through multiple touch points and avoid changing the format of these IDs between modules. That way, you can track transactions through the system and follow them across machines, networks, and services.” —

This unique identifier is generalized so that on first entry into a system or the start of a process, A Global Inception ID (GIID), is assigned to distinguish that ongoing process from others. A more descriptive term would be a Global Tracking ID, but that conjures up privacy and security concerns and is already being used in commerce for a different purpose. But ‘inception ID’ brings up visions of barcodes on people’s foreheads. Ok, how about “bludzwknxxkjysjkajskjjj”?

The term “Global” is to indicate that this ID is unique in a specific enterprise system. The uniqueness comes from its creation on a first contact basis on a specific subsystem. In essence this is a log tracking ID.

For example, a web server or an app server would be the first point of contact or request from an external User. The GIID, consisting of a combination of origin id and a unique id, would be created at this point. GIID ::= originID uniqueID

In article “Write Logs for Machines, use JSON” Paul Querna uses the term “txnId” for this type of ID:

“… this is essentially a unique identifier that follows a request across all our of services. When a User hits our API we generate a new txnId and attach it to our request object. Any requests to a backend service also include the txnId. This means you can clearly see how a web request is tied to multiple backend service requests, or what frontend request caused a specific Cassandra query.”

Another term for this GIID, or ‘txnId’ is Correlation ID. This terminology is used in SharePoint.

The correlation ID is a GUID (globally unique identifier) that is automatically generated for every request that the SharePoint web server receives.

Basically, it is used to identify your particular request, and it persists throughout further communications, if any, between the other servers in the farm. Technically, this correlation ID is visible at every level in the farm, even at a SQL profiler level and possibly on a separate farm from which your SharePoint site consumes federated services. So for example, if your request needs to fetch some information from an application server (say, if you are using the web client to edit an Excel spreadsheet), then all the other operations that occur will be linked to your original request via this unique correlation ID, so you can trace it to see where the failure or error occurred, and get something more specific than “unknown error”. —

Various ‘Structured Logging’ efforts or syslog implementations already contain a ‘sending’ field specification. The GIID incorporates the sender id as the Origin ID, and this combination is more amendable to human and textual tools parsing.




A good candidate for a GIID must be large enough to satisfy uniqueness requirements. This could be, for example, a 36 character field. Where the log files are manually inspected with a text editor, this increases the log line which already contains many common fields like a time stamp.


Unintentionally, “bad” logging practices makes it harder to track and correlate personally identifiable information (PIN). With the use the trans-system GIID, correlation between various business related identifiers is made easier.

The correlation ID is not necessarily a secret, but like other tracking objects like cookies, can be used for information discovery or questionable information storage. But, if an attack can already access your log files, there are other more serious issues?


What determines the first contact subsystem? A true distributed system could be configured or dynamically configured so that any system could be the first contact system. If so, then each subsystem is creating GIID and passing that GIID to other systems that are themselves creating GIIDs.

One approach to handle this is that a system will only create a GIID if none is present in the incoming request.


For user interfaces, exposing the GIID or parts of it in exception situations can be beneficial:

“We also send the txnId to our user’s in our 500 error messages and the X-Response-Idheader, so if a user reports an issue, we can quickly see all of the related log entries.” —

Compare this to the Hunt The Wampus adventure in enterprises that only have an approximate time of an issue and must track this over multiple systems.


If a giid is part of a support system and as above the ID would be shared with Users in some way, would the value need some form of validity testing? Should it be tested that it is wellformed and include a checksum?

Example crc calculation for a UUID, based on textual version of id:

groovy -e " crc = new; crc.update(UUID.randomUUID().toString().getBytes());println Long.toHexString(crc.getValue())"



Origin ID

An OID uniquely identifies a system in an enterprise. This could be a web server or messaging system. Using a code for the actual system is recommended. Thus, instead of Acctsys, it would be a code, PE65K for example. Using a code is more ‘durable’ than a human readable name.

An extension is to also encode other information in the origin ID, such as application or subsystem identifiers.

Unique ID

This ID must not be a business related entity such as user id or account number. The simple reason is that these may occur in the logging record multiple times for different sessions or transactions. For example, user Jean Valjean with account number 24601 may log in multiple times into a web site. Tracking a specific session if a problem occurs is easier if we use a unique ID.

A business level ID may not even be relevant in another system that interacts with the origin point. In one system the ID could be one type of ID, and in the other the same party or user could be identified with a different ID.

Note that as soon as determined, accessed, or generated, a business level ID should be associated with the GIID. This could be a simple log output of that business ID which, since every log output has a GIID, will associate the business ID or value with the GIID.

Similarly when the same process communicates with another system, that system’s unique identifiers and related business IDs will also be associated with the GIID. For example, a web service will take the GIID and relate it to its own ID(s). Now a support engineer can follow the complete path of the system response to a user session.

ID creation

The easiest approach is to use the entry system’s session id created by the server. A potential problem is that this session id is not guaranteed to be unique and ends when the session expires. A UUID solves most problems.

Sample UUID generation in Groovy language:

groovy -e "println UUID.randomUUID().toString().replace('-','')"

If the system ID is 3491 then the above UUID is used to create the GIID and use in logging:

20110227T23:34:37,901; EV={_ID:”34911f788da1ac4a43bb82adb8e61cfcb205″, USERID:”felixthecat”, ….}

Alternative to UUID use?

A UUID is a 32 character string. Could something smaller be used? Perhaps, but eventually the complexity of threaded systems would make the uniqueness constraint of any ID approach a comparable length.

Other approaches are possible. Most of these will incorporate a timestamp in some way. Note that a UUID contains a timestamp.

An example of a ‘unique’ id is used by MongoDB’s ObjectID specification. That spec calls for a 12-byte BSON type of:
• a 4-byte value representing the seconds since the Unix epoch,
• a 3-byte machine identifier,
• a 2-byte process id, and
• a 3-byte counter, starting with a random value.
An example of an ObjectID string representation is ObjectId(“507f1f77bcf86cd799439011”)

Log Framework support for GIID

The use of a GIID is a ‘cross-cutting’ concern. Requiring programmatic use of this ID would be burdensome and error-prone, even if stored in a thread-safe context.

Some logging frameworks support the concept of “nested diagnostic contexts”. This is a way of storing an ID so that interleaved logging is properly identified. See for more information.

Example usage

In a conventional Java server application a JSP or template system would obtain a GIID and insert it into generated pages and associated client side scripts. That GIID would also be stored in the server side session. Since the GIID is stored at the session it is accessible to the various services and components on a server.

This ID is embedded in request to other distributed servers and provides event correlation. Thus the logging systems will have access to the GIID and Client or server side errors can then display or use the GIID for tracking and reporting to assist support engineers.

Since the client also has the GIID, it can display or use this ID for customer service processing.

Of course, this would make more sense if it is a part of a wider Application Service Management (ASM) system.

Standards for IDs

Though many standards specify ID handling, modern architectures, especially web based or distributed, emphasize a stateless protocol. A GIID requirement could be one of those legerdemain stateful practices.

Development impacts

If logging is a deployed feature of an application then it too needs testing. But, since log output is an integration point, it does not fall under the “unit” testing umbrella. There is even some doubt if this should even be tested! Here is one example: Unit Testing: Logging and Dependency Injection
If log files can contain security flaws, convey data, impact support, and impair performance, then they should be tested that they conform to standards. Project management spreadsheets needs to add rows for logging concerns.


Log output can be developer tested using the appropriate XUnit framework, like JUnit.
Mocking frameworks provide a means of avoiding actually sending the output of a logger to an external ‘appender’. “Use JMockit to Unit test logging output”.
In development of a project, the log output changes rapidly as the code changes. Selecting where in the software development life cycle (SDLC) to test logging or even specify what logs should contain is difficult.
One approach is that the deployed system will not do any app logging that was not approved by the stake holders. These must be “unit” tested, and all development support logging is removed or disabled except for use in a development environment.


There is no need to change every subsystem to use this log tracking approach. If the GIID is created somewhere in the “path” of a process or service, it adds value. Other systems can gradually start to use a tracking ID. Thus, the tools and training to use this capability can also be progressively introduced.

About this post

I was going to title this article ‘Logging in Anger’, as a response to my own experiences with application logging. Alas, there are so many issues that I had time to only focus on one as a result of a recent stint supporting an application that exhibits the typical logging anti-patterns. Example: it’s bad to get a null pointer exception, but to not know which argument to a function caused this?


Structured Logging

(this article was going to add more info on incorporating a GIID into a Structured Logging framework. This section is here for refernce)
Structured Logging is a type of app log file that is data based rather than prose based. Thus, it is machine readable and amendable to high-level tools, not just a text editor.

Treating logs as data gives us greater insight into the operational activity of the systems we build. Structured logging, which is using a consistent, predetermined message format containing semantic information, builds on this technique …. We recommend adopting structured logging because the benefits outweigh the minimal effort involved and the practice is becoming the default standard. —

An example, is a system where the log output uses a predetermined message format. An overview of such systems is found in chapter 5 of “Common Event Expression”,

Note this should not be confused with a similar sounding technology called “Log-structured file system”.

Further reading

  1. Log management and intelligence,
  2. Logging a global ID in multiple components,
  3. Application Service Management (APM) system
  4. Application performance management,
  5. The art of application logging,
  6. Patterns For Logging Diagnostic Messages,
  8. How to test valid UUID/GUID?
  9. Log Data as a Valuable Tool in the DevOps Lifecycle (and Beyond),
  10. OWASP – Logging Cheat Sheet,
  11. How to Do Application Logging Right,
  12. Request for comment Structured Logging,
  13. 6 – Logging What You Mean: Using the Semantic Logging Application Block,
  14. A Review of Event Formats as Enablers of event-driven BPM,
  15. Basic Android Debugging with Logs,
  16. Mapped diagnostic context vs Nested diagnostic context,
  17. Building Secure Applications: Consistent Logging,
  18. Log for machines in JSON,
  19. Logging Discussion,
  20. CEE,
  21. CEE is a Failure.,
  22. Centralized Logging Architecture,
  23. Centralized Logging,
  24. Logging and the utility of message patterns,
  25. Payment Application Data Security Standard,
    Payment application must facilitate centralized logging.
    Note: Examples of this functionality may include, but are not limited to:
    • Logging via industry standard log file mechanisms such as Common Log File System (CLFS), yslog, delimited text, etc.
    • Providing functionality and documentation to convert the application’s proprietary log format into industry standard log formats suitable for prompt, centralized logging.
    Aligns with PCI DSS Requirement 10.5.3
  26. NIST 800-92 “Guide to Computer Security Log Management”,
  27. UniqueID Generator: A Pattern for Primary Key Generation,
  28. java.util.logging,

JavaScript Chaining Promises to Promises

October 28, 2014

When you create a Promise you can chain asynchronous behavior by using the then(success,fail) method. The success and fail are callback functions invoked appropriately when the promise is resolved or rejected. Note that since ‘then(…)’ returns a promise, this technique is called promise chaining: p.then(…).then(….).

What I discuss in this post are two subjects
1. How to ‘stitch’ multiple promises together?
2. How to ‘splice’ in an object into a promise chain?
3. How to chain using a promise object as an argument?

What if in a complex application you already have promises objects and want to create promise chains linking these objects to each other? How would you do this with ‘then’ methods? Amazingly, in most examples of using Promises on the web this simple scenario is never presented directly. Granted it doesn’t *seem* to be relevant.

Here is the source of a simple demo based on one of the Dojo Deferred examples. It is ‘normal’ promise chaining. I’m using this as the basis for the investigation presented here. The output displayed on page should be: completed with value: 75

The actual chaining ‘stitching’ is setupTrans().then(function(x){return action()}).then(outputValue);

<!DOCTYPE html>
<html >

<script src=""  data-dojo-config="async:false"></script>
require(["dojo/when", "dojo/Deferred", "dojo/dom", "dojo/on", "dojo/domReady!"],function(when, Deferred, dom, on){
	  var balance = 100;

	  function asyncProcess(x){
	    var deferred = new Deferred();
	      if(x === 'plus' || x === 'minus'){
	      	 x === 'plus' ? (balance += 25) : (balance -= 25);
	    }, 1000);
	    return deferred.promise;

	  function setupTrans(){
	  	return asyncProcess('z');

	  function debitAccount(){
		return asyncProcess('minus');
	  function creditAccount(){
		return asyncProcess('plus');

	  function outputValue(value){
	    dom.byId("output").innerHTML += "<br/>completed with value: " + value;
	  on(dom.byId("startButton"), "click", function(){
	  	var debit = true;
	  	var action = debit ? debitAccount : creditAccount;
	  	.then(function(x){return action()})
<body class="claro">
	<div id="output"></div><p />
	<button type="button" id="startButton">Start</button>

What is ugly is the inline function. Luckily I found the “Flattening Promise Chains” article where the author shows how to get rid of the function and simplify. However that article didn’t answer my question, how can you simply and directly chain promise objects. What the article showed is that you have to use the correct structure or design. Well, of course, but …

I thought this would be difficult. The Dojo documentation page never mentions that instead of the first ‘then’ argument being onFulfilled handler function it could be an actual promise object. But this is mentioned in specs such as the Promises/A+ specification. Or does it? The ‘Promise resolution procedure’ seems to mix compile time view with runtime, or I’m reading it incorrectly. Q documentation is very good on this. Will have to research further.

To directly chain separate promise chains the above code sample is modified as follows by just using an existing promise as the argument to another chains then(…) method. However, this can only be done using a promise returning function.

  on(dom.byId("startButton"), "click", function(){
  	var debit = true;
  	var action = debit ? debitAccount : creditAccount;

Or, to inline the decision:

  on(dom.byId("startButton"), "click", function(){
  	var debit = true;
  	.then(debit ? debitAccount : creditAccount)

Chaining using actual promise objects inside the then(f,r) is made easy using the approach shown next, using “scalars”.

Chaining values into promise chains
While looking at this subject I thought about a situation where we want to directly insert a known value into the chain so that the next ‘link’ will use this as the prior handler’s return value.

We have a chain, and I want to put the great Answer in there. This won’t work: var p1 = promiseReturningFunction().then(…).then(’42’).then(….).

Sure you can wrap ’42’ in an inline function then(function(){return ’42’}), but that is yucky. How about a function that does that:

     /** return a function that returns the value */
     function now(value){
    	  return function(){return value;};

Now you can do: var p1 = async().then(…).then(now(’42’)).then(….);

Here is the prior source code with the changes. This is just some of the script changes. The output on screen is:

completed with value: 75
completed with value: Goodbye

      /* ...  */

      function outputValue(value){
        dom.byId("output").innerHTML += "<br/>completed with value: " + value;
      function now(value){
    	  return function(){return value;};
      on(dom.byId("startButton"), "click", function(){
        var debit = true;
        var action = debit ? debitAccount : creditAccount;
        var promise1 = setupTrans()
        .then(debit ? debitAccount : creditAccount)
        	console.log("now do another promise chain...");

Using promises objects directly in then(f,r) by using now(x)
With the Promises implementation in Dojo, you can’t use promiseObject.then(anotherPromiseObject). And you can’t use promiseObject.then(when(anotherPromiseObject)). At least, I haven’t figured how to do it.

With the now(x) function shown above, this is easy. The code below will output:

completed with value: 75
completed with value: 100
completed with value: undefined
completed with value: finished both chains

     function now(value){
    	  return function(){return value;};
      on(dom.byId("startButton"), "click", function(){
        var debit = true;
        var action = debit ? debitAccount : creditAccount;
        var promise1 = setupTrans()
        	console.log("now do another promise chain...");
        debit = false;
        action = debit ? debitAccount : creditAccount;
        var promise2 = setupTrans()
        .then(now("finished both chains"))

Further reading

  1. Original post
  2. Flattening Promise Chains
  3. A Deeper Dive Into JavaScript Promises
  4. How-to Compose Node.js Promises with Q
  5. Promise Anti-patterns
  6. dojo/when
  7. q
  8. General Promise Resources
  9. Promises/A+

Proactive automotive apps?

January 5, 2013

In a previous post, “Synergistic Social Agent Network Cloud” I argued for more proactive apps. I was just reading something that is related to that topic: “Ford Hybrid’s EV+ Feature Learns and Automatically Adjusts Powertrain to Deliver More Electric-Only Driving” Also see “Proactive Agents.”

More robust JUnit tests

November 13, 2012

I created a blog post “Behavior counts for improved JUnit tests” on my other blog.  It shows a possible technique to assure that JUnit tests are robust and are really testing what you think they are testing.

How to interpret User Ratings?

October 28, 2012

I was thinking about those user ratings on products again. I last posted here. Fascinating topic. We all make choices, how?

In one page they mentioned a comic that is so relevant, not only to the subject but also to the Hurricane Sandy, what a coincidence: TornadoGuard


  1. How do you rate user ratings?

The clunkiness of user interfaces

September 18, 2012

So much research and patents on UI design and interaction, yet we still have the most banal GUIs. For example, to share a web page on my Android phone I have to do about twenty operations. We don’t really notice how much we have to do since we are used to it and it becomes ‘natural’ like riding a bike.

In a new blog post I tackled this issue. I think those twenty steps can be easily reduced to two.



  • User Experience Design (UXD)
  • Interaction Design (IxD)

New blog

July 7, 2012

Starting to use a new blog at:

Why? One reason is to start learning a little more technology and have more control of a web site.


According to Names of large numbers

Name Short Scale Long Scale
Octodecillion 10^57 10^108


Get every new post delivered to your Inbox.