Archive for the ‘Technical’ Category

I U+2665 Logs

Sunday, November 26th, 2017

Just finished reading “I Heart Logs: Event Data, Stream Processing, and Data Integration”, written by Jay Kreps (who developed Apache Kafka).


  • One of the key challenges in distributed systems is reasoning about a global order of events (these events could be anything such as clicks on a webpage, transactions, price updates to a stock etc.). Keep in mind that in a distributed system, there is no global physical clock (for a good take on this, see ACMQueue)
  • Logs can be used as a key abstraction in distributed systems, as they provide a journal of what happened in which order
  • Logs have been valuable as a mechanism to replicate data across different databases for a long time
  • Traditional ETL processing using hourly or daily data loads are less effective nowadays due to volume and velocity of data changes
  • Enterprises can leverage Log based data pipeline by publishing all of its data to a central Log system
  • Each application in that case is responsible for publishing data to the central Log pipeline in a canonical data model
  • Downstream systems can subscribe to data feeds from the Log and consume data at their own pace
  • Log based data processing helps to decouple producers and consumers, in the same way as traditional message queue does
  • A system like Kafka, apart from providing a publish/subscribe model can also serve as a very useful buffer between producers and consumers

The rise of APIs

Saturday, August 12th, 2017

The rise of APIs is a phrase that is sometimes used to describe a trend in the software industry whereby requirements are met by consuming external services programmatically via the internet. Traditionally, Independent Software Vendors (ISVs) developed and marketed their products as installable applications and charged per CPU or per the number of instances deployed. ISVs made money by selling software and customers were more or less forced to upgrade their on-premise installation every few years. A gradual shift from an installation model to a subscription model has been happening since the early years of world wide web, though mightiest vendors such as Microsoft and Oracle resisted that for some good reason.

By challenging the established delivery model, a new era of application vendors such as Salesforce was able to provide their services in a much faster and cheaper way. Small to medium sized customers who otherwise could not have afforded expensive on-premise installation and maintenance happily adopted the new Software as a Service (SaaS) model.

Application Programming Interfaces or APIs enable more fine grained, targeted and transparent consumption of services over the internet and can be seen as another major milestone in the evolution of SaaS. Nevertheless, until recently, the use of APIs was generally limited to developers who programmed components and services for a particular platform. Only during the last decade, businesses began to see APIs as a key enabler for innovations, ranging from internal process improvements to establishment of entirely new markets.

Microservices complement an API first architecture by forcing developers and architects to think about the service contract from the inception phase itself.

Working with legacy code

Thursday, May 11th, 2017

Dealing with legacy code is usually inevitable for a Software Architect. In one way or the other, complex systems are built by cobbling together several lesser or equally complex systems. Successful organisations usually have a portfolio consisting of some legacy code which forms the basis of newer modules and applications which are being actively developed and maintained.

Unfortunately, system architecture rarely evolve in a way whereby the architecture is continuously refined and kept in pace along with the addition of new features and functionalities, making legacy code often poorly structured and poorly documented. In practice, market pressures, evolving business landscape and conflicting priorities between business stakeholders and technical stakeholders will conspire to make a code-base less and less maintainable over the years.

If the organisation and its management are progressive enough, they would have already understood the need to modernize the system and adopt state-of-the art technologies and architecture practices. But nevertheless, it is rarely a straightforward exercise and usually any transition of legacy systems can take several years.

But for any transition from a given architecture (say, a JEE based monolith) to a target architecture (say, microservices) to be successful, people who are involved in the transition efforts must have a decent understanding of the intricacies of the legacy system. Also it is worth noting that more often than not, legacy systems may not have a well defined architecture at all.

Following are some of the important lessons from my experience:

Respect the system

First thing is to understand that system has worked well enough and long enough for it to become your problem, so there must be something good about it. Also despite all the shortcomings, it has managed to mint money for the owners of the company and pay its bills, including your own salary.

Keep the doors of dialogue open to the original developers

It is likely that original developers have all left the company, leaving present developers all alone to deal with an ill documented system. Nevertheless, they may have been working under great pressures when they wrote the system. While you must appreciate the shortcomings of the system which they had build, you must also keep a door open for dialogue with them if they are available as consultants or freelancers

Deal with the sh*t

This may take weeks to months, depending upon the complexity of the system. Do not get easily carried away by the presentations given in the microservice conferences — they make everything sound very simple, simpler than they are in real businesses. Also I doubt whether some of those speakers who are full time speakers have that much time left to architect or develop complex solutions. Please note the key points which could smoothen the process of understanding a large, complex, monolithic, legacy code base

  •  Understand the business requirements — each class with more than 1000 lines has probably a history behind it
  • Read existing document — often you will be disappointed to know that there is not much documentation or Javadoc available, but do not give up. Also it is worth trying to understand the evolution of the code through user stories and bug reports by scanning a project management tool such as Jira
  • Check out the code, build it, run it locally even if you think you are too important to do that
  • Determine the edges of the system. Behind the curtains, almost all of the systems work by taking some inputs, process them and present or store the output. Follow the flow of data
  • Understand the data model — for very large systems, data-structure is usually relatively simpler than the code itself. Also there is higher chance that data model is better taken care by a DBA and could be cleaner
  • Identify and understand seams in the system — no explanations required
  • Approach it as an end-user — how does his actions get handled? Take one or two of the most critical use cases
  • Sketch and document — no need of doing it using UML, the key here is to keep it simple and understandable


An Architect or a developer who works with complex code-bases will sooner or later have to deal with legacy code. The key to tame such a system is patience and perseverance along with some of the practical tips which I have mentioned above


Questions to ask before choosing a Collection implementation in Java

Wednesday, May 25th, 2016

In any non-trivial software application, developers are faced with having to deal with a collection of objects. Java Collections framework provides a rich set of options, but quite often I have seen people choosing one of the popular options such as an ArrayList or a HashSet without thinking through the consequences of their choice. On the other hand, some choose “performance” as the foremost criteria even though the scalability requirements are hard to predict beforehand.

My rule of thumb in choosing the right implementation of the Collection or Map interface is as follows:

Thread safety

In my opinion, thread safety requirements are a good start for narrowing down your options. Do I need thread safety or not? Most of the Collection implementations do not provide thread safety by default due to performance reasons. If you need your code to be thread-safe, then either you have to use a thread-safe implementation or be willing to wrap your collection using Collections.synchronized


Next critical criteria in choosing a good implementation are whether you have to store duplicate elements or not. Many collection implementations do permit duplicates, though this may not be desirable depending upon your requirements. This applies to duplicate elements in case of Lists or Queues and duplicate keys in case of a key-value pair structure such as Map.


Does the ordering of elements in the collection matter? If yes, what type of ordering — natural ordering or the insertion order?


Do we have to store nulls or not is another factor which could influence your choice.


This should be in my opinion the last criteria for most of the applications. The performance of different implementations varies, depending upon the type of implementation and also depending upon the thread-safety. ArrayList provides very fast insertion and retrieval, making it a good candidate for collections which require fast access to random elements. But if you want to remove elements frequently, then LinkedList is a better candidate. But in practice for most cases, you may not know in advance how much scalability is needed. Also, performance differences will start to show up only for very large datasets, so better to avoid premature optimization.

Quick look
Type Implementation Thread-safe Ordered Duplicates Allows null Insertion Retrieval Removal
List ArrayList No Yes Yes Yes O(1) O(1) O(n)
List LinkedList No Yes Yes Yes O(1) O(n) O(1)
List CopyOnArrayList Yes Yes Yes Yes O(n) O(1) O(n)
Set HashSet No No No Yes O(1) na O(1)
Set LinkedHashSet No Yes No Yes O(1) na O(1)
Set TreeSet No Yes No Yes O(logn)  na O(logn)
Set ConcurrentSkipListSet Yes Yes No No  O(logn)  O(logn)  O(logn)
Queue ArrayBlockingQueue Yes Yes No No O(1) O(1) O(1)
Queue PriorityQueue No Yes No No O(logn) O(1) O(logn)
Queue LinkedBlockingQueue Yes Yes No No O(1) O(1) O(1)


Choosing the right Collection implementation is often tricky, but you can’t make serious mistakes if you keep in mind some of the above rules of thumb. Thread-safety and ordering requirements should primarily dictate the choice of any implementation and only when you are faced with unusual scalability requirements, start looking at performance as major decision criteria.


[Book review] Infrastructure As Code

Thursday, March 24th, 2016

I am currently reading the book Infrastructure As Code

Treating Infrastructure as Code requires not just automation and tools, rather a complete change of mindset from the traditional approaches to Infrastructure management.

Key Lessons:

  • Treat Infrastructure as Software and Data
  • Avoid Snowflake servers
  • Automate everything
  • Automate everything..and let the provisioning, maintenance and lifecycle management happen automatically
  • Automate using specialized DevOps tools rather than custom scripts
  • Implement standard and time tested Software engineering practices such as Testing, Version Control, Change and Release Management etc. for Infrastructure

Automation fear spiral

If you do not follow automation consistently, your automation scripts can get out of sync. with the physical reality which will make the use of automation script even more risky, resulting in a fear spiral.


(Image courtesy: Chapter 1 of the book)

I plan to write a review of the book, once finished reading

Lessons Learned in Software Development

Thursday, March 24th, 2016

22 lessons, taken from the blog


1. Start small, then extend. Whether creating a new system, or adding a feature to an existing system, I always start by making a very simple version with almost none of the required functionality. Then I extend the solution step by step, until it does what it is supposed to. I have never been able to plan everything out in detail from the beginning. Instead, I learn as I go along, and this newly discovered information gets used in the solution.

I like this quote from John Gall:  “A complex system that works is invariably found to have evolved from a simple system that worked.”

4. All new lines must be executed at least once. Before you are done with a feature, you have to test it. Otherwise, how do you know that it does what it is supposed to do? Often, the best way is by automatic tests, but not always. But no matter what, every new line of code has to be executed at least once.

Sometimes it can be hard to trigger the right conditions. Fortunately, it’s easy to cheat a bit. For example, the error handling on a database call can be checked by temporarily misspelling a column name. Or, an if-statement can be temporarily inverted (“if error” becomes “if not error”) in order to trigger something that rarely happens, just to make sure that code is run and does what it should.

Sometimes I see bugs that show that a certain line of code can never have been run by the developer. It can look fine when reviewed, but still not work. You avoid embarrassment if your policy is to always execute every new line you write.  => This one is my favorite


9. There will always be bugs. I don’t like approaches to software development that claim to “get it right the first time”. No matter how much effort you put in, there will always be bugs (the definition of a bug pretty much is: “we didn’t think of that”). A much better approach is to have a system in place that lets you quickly troubleshoot problems, fix the bugs and deploy the fixes.


15. Face to face has the highest bandwidth. When discussing how to solve a problem, being face to face beats video, call, chat and email. I am often amazed at how much better the solutions are after discussing them in person with colleagues.


19. Try it. If you are unsure of how a certain language feature works, it is easy to write a little program that shows how it works. The same applies when testing the system you are developing. What happens if I set this parameter to -1? What happens if this service is down when I reboot the system? Explore how it works – fiddling around often reveals bugs, and at the same time it deepens your understanding of how the system works.

8 caricatures of an Enterprise Architect – Part 1 [Perspective]

Sunday, February 21st, 2016

Management guru Tom Peters once wrote an essay about the balancing acts which a Project Manager has to perform (in 8 different areas) in-order to be effective. Similarly, the book Collaborative Enterprise Architecture details the balancing acts required by an Enterprise Architect with the help of 8 caricatures.

Each of these 8 EA caricatures fall into one or the other extremes when he is confronted with addressing challenges in one of the four core duties (or dimensions as the authors put it) of an EA — the four core dimensions of EA being Perspective, Governance, Strategy and Transformation.

4 dimensions

Perspective: “Helicopter perspective” with little involvement in the hands-on IT business Vs. knee-deep in concrete architecture work in programs and projects

Governance: A laissez-faire approach, where each project has a high level of freedom Vs. rigorous control system in place to supervise compliance

Strategy: Great vision Vs. no long term planning at all

Transformation: Too Fast Vs. very slow

In each dimension, the EA group should find its proper position between the extremes, though no EA will be a 100% version of a caricature.


On one side, an enterprise architect is expected to have a broad overview of both the IT and the business landscapes. On the other side, the enterprise architect needs to retain her grip on the details of business and technology to such an extent that she still understands the reality on the ground. This is a wide chasm that’s not always easy to bridge


An EA who is too decoupled from the community of Project Architects, developers. In some cases, she may be equally unaware of the desires and frustrations of the actual users of the applications.

Symptoms:  A tendency to focus (only) on the “big picture”. However, many of these big pictures she meets in practice have been over-abstracted to the point of insignificance and may no longer address any relevant question.

Another symptom of a cloud cuckoo land is to depend too much on top-down thinking, without inviting or considering enough feedback from the ground

Effect:  EA or EA organization in Cloud Cuckoo Land is ignored or circumvented by both the IT and business departments on the ground.


This is the EA who focuses too much on purely technical issues or works merely as project architect, she runs the risk of neglecting the broad view and is not taken seriously by the business. This promotes the perception of IT as a cost-driving commodity instead of a business enabler. The role of an EA organization is then likely to be reduced to achieve cost reduction by managing standards and conducting IT rationalizations.

Symptoms: An EA who is too much of an “expert” in one or other technologies and who is not a generalist. He is a one-track specialist with a narrow focus on specific technologies or business requirements (such as security, performance, or user interface and stability), and misses a holistic view of the enterprise IT landscape.

Insufficient architectural skills can be another symptom — i.e. believing that an Architect is always someone who can code faster or better.

Effect:  An EA working with too much focus on the details and too little attention to the broad vision, tends to manage only singular aspects of the system. He will fail to fulfill EA’s claim to shape the IT landscape actively and capitulates in the face of complexity.
This situation drives IT into the passive role of a mere commodity. Commodities, however, are primarily perceived as cost drivers. Therefore the IT function will quickly find itself in an endless loop of cost reduction demands from the CEO.
An EA group operating too much on the detail level will not be able to ensure the design of appropriate, business-aligned IT applications.

See also: Chief Technology Mechanic, a derogatory term coined for the CIO running the shop in the above caricature

Using trello for managing an apartment moving

Monday, January 11th, 2016

Trello is a very popular task management tool. I use it for managing some of my personal projects such as keeping track of my Reading list and for actively managing my Learning schedule. Though kanbanflow also offers similar capabilities, trello has a good app available for iPhone.

trello can also be a very useful tool for managing various tasks related to an apartment moving. Though trello wouldn’t solve the mundane tasks for you (for that, probably could be of help), it can help to make the long list of to-dos somewhat more manageable and sometimes less dull. There is also the mild Dopamine rush you get when you move a task from to-do to doing and finally to done.

This is how my trello board for “moving” looks like


Visual Guide to NoSQL Systems

Sunday, June 21st, 2015

This blog provides a really cool diagram which compares and contrasts different NoSql databases that are available in the market, based on CAP theorem 

I found the comparison here equally useful as well.



Docker explained in layman’s terms

Tuesday, April 14th, 2015

There are quite a lot of good documentation available about Docker including the official one, but many of them start explaining lxc, cgroupsLinux kernel and UnionFS in the very first paragraph, thus scaring off many readers who are not Linux geeks.

In this article I will try to explain Docker in layman’s terms. No prior knowledge of virtual machines, cloud deployments or DevOps practices are assumed to understand this post.


Docker is an open source platform which can be used to package, distribute and run your applications. Docker provides an easy and efficient way to encapsulate applications (e.g. a Java web application) and any required infrastructure to run that application (e.g. Red hat Linux OS, Apache web server, Tomcat application server, mySQL database etc.) as a single “Docker image” which can then be shared through a central, shared “Docker registry“. The image can then be used to launch a “Docker container” which makes the contained application available from the host where the Docker container is running.

Docker provides some convenient tools to build Docker images in a simple and efficient way. A Docker container on the other hand is a kind of light weight virtual machine with considerably smaller memory and disk space footprint than a full blown virtual machine.

By enabling fast, convenient and automated deployments, Docker has the effect of shortening the cycle between writing code, testing code and getting it live on Production. On the other hand, by providing a light weight container to run the application, Docker enables very efficient utilization of hardware and CPU resources.

Docker is open source and can be installed on your notebook or on any servers where you want to build and run your Docker images and containers (provided you meet the minimum system requirements).

Docker deployment workflow from 1000 feet

Before we look into individual components of the Docker ecosystem, let us try to understand one of the ways, how Docker workflow makes sense in the Software Development Life Cycle.

In-order to support this workflow, we need a CI tool and a configuration management tool. I picked Bamboo and Ansible, even though the workflow would remain the same for any other tools or modes of deployment. Below is a possible workflow:

  1. Code is committed to Git repository
  2. A job is triggered by Bamboo to build the application from the source code and run unit/integration tests.
  3. If tests are successful, Bamboo builds a Docker image which is pushed to a “Docker registry”. (Dockerfile, which is used to provide a template for building the image is typically committed as part of the codebase)
  4. Bamboo runs an Ansible playbook to deploy the image to QA servers
  5. If QA tests are passed as well, Ansible can be used to deploy and start the container in production

Docker ecosystem

Docker client and server

Docker is a client-server application. The Docker client talks to a Docker server (also called a daemon), which in turn does all the work. Docker is equipped with a command line client binary called Docker and a full Restful API. Docker client and server could be running on the same host or on different hosts.

Docker images

Images are the building blocks of the Docker world. They are the build part of Docker’s lifecycle. They are built step-by-step using a series of instructions which are typically described in a simple text configuration file called “Dockerfile”.  Docker provides a simple text based way of declaring the infrastructure and the environment dependencies.

Docker images are highly portable across hosts and environments. Just like compiled Java code can be run on any operating system where JVM is installed, a Docker image can be run in a Docker container on any host that runs Docker.

Docker registry

Docker registries are the distribution component of Docker. Docker stores the images which you build in a registry. There are two types of registries: public and private. Docker Inc. operates the public registry for images called Docker Hub. You can create an account on the Docker hub and store and share your images, but you have the option of keeping your images in Docker Hub private as well.

It is also possible to create your own registry behind your corporate firewall.

Docker container

If images are the building or packing aspect of Docker, containers are the runtime or execution aspect of Docker. Containers are launched from images and may contain one or more running processes.

Docker borrows the concept of standard shipping container, used to transport goods globally, as a model for its containers. But instead of goods, Docker containers ship software. Each container contains a software image — its cargo — and like its physical counterpart allows a set of operations to be performed. For example, it can be created, started, stopped, restarted and destroyed.

Like a shipping container, Docker doesn’t care about the contents of the container while performing these actions; for example whether a container is a web server, a database or an application server. Each container is loaded the same way as any other container.

Docker also doesn’t care where you ship your container: you can build on your laptop, upload to a registry, then download to a physical or virtual server, test, deploy to a cluster of a dozen Amazon EC2 hosts, and run.

 Benefits of Docker

  1. Efficient hardware utilization: When compared to hypervisor-based virtual machines, Docker containers use less memory, CPU cycles and disk space. This enables more efficient utilization of hardware resources. You can run more containers than virtual machines on a given host, resulting in higher density.
  2. Security: Docker isolates many aspects of the underlying host from an application running in a container without root privileges. Also a container has a smaller attack surface than a complete virtual machine.
  3. Consistency and portability: Docker provides a way to standardize application environments and enables portability of the application across environments.
  4. Fast, Efficient deployment cycle: Docker reduces the cycle time between code being written, code being tested, deployed and used.
    Below quote is taken from from official documentation

    Your developers write code locally and share their development stack via Docker with their colleagues. When they are ready, they push their code and the stack they are developing onto a test environment and execute any required tests. From the testing environment, you can then push the Docker images into production and deploy your code.

  5. Ease of use: Docker’s low overhead and quick startup times make running multiple containers less tedious.
  6. Encourages SOA: Docker is a natural fit for microservices and SOA architectures since each container typically runs a single process or application and you can run multiple containers with little overhead on the same system.
  7. Segregation of duties: In Docker ecosystem, Developers care about making the application work inside the container and Ops cares about managing those containers.

Containers Vs. Virtual machines


  • Virtual machines have a full OS with its own memory management, device drivers, daemons, etc. Containers share the host’s OS and are therefore lighter weight.
  • Because Containers are lightweight, starting a container takes approximately a second whereas booting a VM can take several minutes
  • Containers can generally run only same or similar operating system as the host operating system. e.g., they can run Red Hat on an Ubuntu host, but you can’t run Windows on an Ububtu host. (In practice, for most practical cases it is not a real requirement to run different OS types)
  • While using VMs, in theory, vulnerabilities in particular OS versions can’t be leveraged to compromise other VMs running on the same physical host. Since containers share the same kernel, admins and software vendors need to apply special care to avoid security issues from adjacent containers.
    Countering this argument is that lightweight containers lack the larger attack surface of the full OS needed by a VM, combined with the potential exposures of the hypervisor itself.


  1. Docker user guide
  2. Security risks and vulnerabilities of Docker
  3. Containers Vs. VMs
  4. Docker: Using Linux Containers to Support Portable Application Deployment
  5. Contain yourself: The layman’s guide to Docker
  6. Deploying Java applications with Docker