I U+2665 Logs

November 26th, 2017

Just finished reading “I Heart Logs: Event Data, Stream Processing, and Data Integration”, written by Jay Kreps (who developed Apache Kafka).

Summary:

  • One of the key challenges in distributed systems is reasoning about a global order of events (these events could be anything such as clicks on a webpage, transactions, price updates to a stock etc.). Keep in mind that in a distributed system, there is no global physical clock (for a good take on this, see ACMQueue)
  • Logs can be used as a key abstraction in distributed systems, as they provide a journal of what happened in which order
  • Logs have been valuable as a mechanism to replicate data across different databases for a long time
  • Traditional ETL processing using hourly or daily data loads are less effective nowadays due to volume and velocity of data changes
  • Enterprises can leverage Log based data pipeline by publishing all of its data to a central Log system
  • Each application in that case is responsible for publishing data to the central Log pipeline in a canonical data model
  • Downstream systems can subscribe to data feeds from the Log and consume data at their own pace
  • Log based data processing helps to decouple producers and consumers, in the same way as traditional message queue does
  • A system like Kafka, apart from providing a publish/subscribe model can also serve as a very useful buffer between producers and consumers

The rise of APIs

August 12th, 2017

The rise of APIs is a phrase that is sometimes used to describe a trend in the software industry whereby requirements are met by consuming external services programmatically via the internet. Traditionally, Independent Software Vendors (ISVs) developed and marketed their products as installable applications and charged per CPU or per the number of instances deployed. ISVs made money by selling software and customers were more or less forced to upgrade their on-premise installation every few years. A gradual shift from an installation model to a subscription model has been happening since the early years of world wide web, though mightiest vendors such as Microsoft and Oracle resisted that for some good reason.

By challenging the established delivery model, a new era of application vendors such as Salesforce was able to provide their services in a much faster and cheaper way. Small to medium sized customers who otherwise could not have afforded expensive on-premise installation and maintenance happily adopted the new Software as a Service (SaaS) model.

Application Programming Interfaces or APIs enable more fine grained, targeted and transparent consumption of services over the internet and can be seen as another major milestone in the evolution of SaaS. Nevertheless, until recently, the use of APIs was generally limited to developers who programmed components and services for a particular platform. Only during the last decade, businesses began to see APIs as a key enabler for innovations, ranging from internal process improvements to establishment of entirely new markets.

Microservices complement an API first architecture by forcing developers and architects to think about the service contract from the inception phase itself.

Working with legacy code

May 11th, 2017

Dealing with legacy code is usually inevitable for a Software Architect. In one way or the other, complex systems are built by cobbling together several lesser or equally complex systems. Successful organisations usually have a portfolio consisting of some legacy code which forms the basis of newer modules and applications which are being actively developed and maintained.

Unfortunately, system architecture rarely evolve in a way whereby the architecture is continuously refined and kept in pace along with the addition of new features and functionalities, making legacy code often poorly structured and poorly documented. In practice, market pressures, evolving business landscape and conflicting priorities between business stakeholders and technical stakeholders will conspire to make a code-base less and less maintainable over the years.

If the organisation and its management are progressive enough, they would have already understood the need to modernize the system and adopt state-of-the art technologies and architecture practices. But nevertheless, it is rarely a straightforward exercise and usually any transition of legacy systems can take several years.

But for any transition from a given architecture (say, a JEE based monolith) to a target architecture (say, microservices) to be successful, people who are involved in the transition efforts must have a decent understanding of the intricacies of the legacy system. Also it is worth noting that more often than not, legacy systems may not have a well defined architecture at all.

Following are some of the important lessons from my experience:

Respect the system

First thing is to understand that system has worked well enough and long enough for it to become your problem, so there must be something good about it. Also despite all the shortcomings, it has managed to mint money for the owners of the company and pay its bills, including your own salary.

Keep the doors of dialogue open to the original developers

It is likely that original developers have all left the company, leaving present developers all alone to deal with an ill documented system. Nevertheless, they may have been working under great pressures when they wrote the system. While you must appreciate the shortcomings of the system which they had build, you must also keep a door open for dialogue with them if they are available as consultants or freelancers

Deal with the sh*t

This may take weeks to months, depending upon the complexity of the system. Do not get easily carried away by the presentations given in the microservice conferences — they make everything sound very simple, simpler than they are in real businesses. Also I doubt whether some of those speakers who are full time speakers have that much time left to architect or develop complex solutions. Please note the key points which could smoothen the process of understanding a large, complex, monolithic, legacy code base

  •  Understand the business requirements — each class with more than 1000 lines has probably a history behind it
  • Read existing document — often you will be disappointed to know that there is not much documentation or Javadoc available, but do not give up. Also it is worth trying to understand the evolution of the code through user stories and bug reports by scanning a project management tool such as Jira
  • Check out the code, build it, run it locally even if you think you are too important to do that
  • Determine the edges of the system. Behind the curtains, almost all of the systems work by taking some inputs, process them and present or store the output. Follow the flow of data
  • Understand the data model — for very large systems, data-structure is usually relatively simpler than the code itself. Also there is higher chance that data model is better taken care by a DBA and could be cleaner
  • Identify and understand seams in the system — no explanations required
  • Approach it as an end-user — how does his actions get handled? Take one or two of the most critical use cases
  • Sketch and document — no need of doing it using UML, the key here is to keep it simple and understandable

Conclusion

An Architect or a developer who works with complex code-bases will sooner or later have to deal with legacy code. The key to tame such a system is patience and perseverance along with some of the practical tips which I have mentioned above

 

Questions to ask before choosing a Collection implementation in Java

May 25th, 2016

In any non-trivial software application, developers are faced with having to deal with a collection of objects. Java Collections framework provides a rich set of options, but quite often I have seen people choosing one of the popular options such as an ArrayList or a HashSet without thinking through the consequences of their choice. On the other hand, some choose “performance” as the foremost criteria even though the scalability requirements are hard to predict beforehand.

My rule of thumb in choosing the right implementation of the Collection or Map interface is as follows:

Thread safety

In my opinion, thread safety requirements are a good start for narrowing down your options. Do I need thread safety or not? Most of the Collection implementations do not provide thread safety by default due to performance reasons. If you need your code to be thread-safe, then either you have to use a thread-safe implementation or be willing to wrap your collection using Collections.synchronized

Duplicates

Next critical criteria in choosing a good implementation are whether you have to store duplicate elements or not. Many collection implementations do permit duplicates, though this may not be desirable depending upon your requirements. This applies to duplicate elements in case of Lists or Queues and duplicate keys in case of a key-value pair structure such as Map.

Order

Does the ordering of elements in the collection matter? If yes, what type of ordering — natural ordering or the insertion order?

Null

Do we have to store nulls or not is another factor which could influence your choice.

Performance

This should be in my opinion the last criteria for most of the applications. The performance of different implementations varies, depending upon the type of implementation and also depending upon the thread-safety. ArrayList provides very fast insertion and retrieval, making it a good candidate for collections which require fast access to random elements. But if you want to remove elements frequently, then LinkedList is a better candidate. But in practice for most cases, you may not know in advance how much scalability is needed. Also, performance differences will start to show up only for very large datasets, so better to avoid premature optimization.

Quick look
Type Implementation Thread-safe Ordered Duplicates Allows null Insertion Retrieval Removal
List ArrayList No Yes Yes Yes O(1) O(1) O(n)
List LinkedList No Yes Yes Yes O(1) O(n) O(1)
List CopyOnArrayList Yes Yes Yes Yes O(n) O(1) O(n)
Set HashSet No No No Yes O(1) na O(1)
Set LinkedHashSet No Yes No Yes O(1) na O(1)
Set TreeSet No Yes No Yes O(logn)  na O(logn)
Set ConcurrentSkipListSet Yes Yes No No  O(logn)  O(logn)  O(logn)
Queue ArrayBlockingQueue Yes Yes No No O(1) O(1) O(1)
Queue PriorityQueue No Yes No No O(logn) O(1) O(logn)
Queue LinkedBlockingQueue Yes Yes No No O(1) O(1) O(1)

Summary

Choosing the right Collection implementation is often tricky, but you can’t make serious mistakes if you keep in mind some of the above rules of thumb. Thread-safety and ordering requirements should primarily dictate the choice of any implementation and only when you are faced with unusual scalability requirements, start looking at performance as major decision criteria.

 

[Book review] Infrastructure As Code

March 24th, 2016

I am currently reading the book Infrastructure As Code

Treating Infrastructure as Code requires not just automation and tools, rather a complete change of mindset from the traditional approaches to Infrastructure management.

Key Lessons:

  • Treat Infrastructure as Software and Data
  • Avoid Snowflake servers
  • Automate everything
  • Automate everything..and let the provisioning, maintenance and lifecycle management happen automatically
  • Automate using specialized DevOps tools rather than custom scripts
  • Implement standard and time tested Software engineering practices such as Testing, Version Control, Change and Release Management etc. for Infrastructure

Automation fear spiral

If you do not follow automation consistently, your automation scripts can get out of sync. with the physical reality which will make the use of automation script even more risky, resulting in a fear spiral.

ch01-automation-fear-spiral

(Image courtesy: Chapter 1 of the book)

I plan to write a review of the book, once finished reading

Lessons Learned in Software Development

March 24th, 2016

22 lessons, taken from the blog http://henrikwarne.com/2015/04/16/lessons-learned-in-software-development/

DEVELOPMENT

1. Start small, then extend. Whether creating a new system, or adding a feature to an existing system, I always start by making a very simple version with almost none of the required functionality. Then I extend the solution step by step, until it does what it is supposed to. I have never been able to plan everything out in detail from the beginning. Instead, I learn as I go along, and this newly discovered information gets used in the solution.

I like this quote from John Gall:  “A complex system that works is invariably found to have evolved from a simple system that worked.”

4. All new lines must be executed at least once. Before you are done with a feature, you have to test it. Otherwise, how do you know that it does what it is supposed to do? Often, the best way is by automatic tests, but not always. But no matter what, every new line of code has to be executed at least once.

Sometimes it can be hard to trigger the right conditions. Fortunately, it’s easy to cheat a bit. For example, the error handling on a database call can be checked by temporarily misspelling a column name. Or, an if-statement can be temporarily inverted (“if error” becomes “if not error”) in order to trigger something that rarely happens, just to make sure that code is run and does what it should.

Sometimes I see bugs that show that a certain line of code can never have been run by the developer. It can look fine when reviewed, but still not work. You avoid embarrassment if your policy is to always execute every new line you write.  => This one is my favorite

TROUBLESHOOTING

9. There will always be bugs. I don’t like approaches to software development that claim to “get it right the first time”. No matter how much effort you put in, there will always be bugs (the definition of a bug pretty much is: “we didn’t think of that”). A much better approach is to have a system in place that lets you quickly troubleshoot problems, fix the bugs and deploy the fixes.

COOPERATION

15. Face to face has the highest bandwidth. When discussing how to solve a problem, being face to face beats video, call, chat and email. I am often amazed at how much better the solutions are after discussing them in person with colleagues.

MISCELLANEOUS

19. Try it. If you are unsure of how a certain language feature works, it is easy to write a little program that shows how it works. The same applies when testing the system you are developing. What happens if I set this parameter to -1? What happens if this service is down when I reboot the system? Explore how it works – fiddling around often reveals bugs, and at the same time it deepens your understanding of how the system works.

8 caricatures of an Enterprise Architect – Part 1 [Perspective]

February 21st, 2016

Management guru Tom Peters once wrote an essay about the balancing acts which a Project Manager has to perform (in 8 different areas) in-order to be effective. Similarly, the book Collaborative Enterprise Architecture details the balancing acts required by an Enterprise Architect with the help of 8 caricatures.

Each of these 8 EA caricatures fall into one or the other extremes when he is confronted with addressing challenges in one of the four core duties (or dimensions as the authors put it) of an EA — the four core dimensions of EA being Perspective, Governance, Strategy and Transformation.

4 dimensions

Perspective: “Helicopter perspective” with little involvement in the hands-on IT business Vs. knee-deep in concrete architecture work in programs and projects

Governance: A laissez-faire approach, where each project has a high level of freedom Vs. rigorous control system in place to supervise compliance

Strategy: Great vision Vs. no long term planning at all

Transformation: Too Fast Vs. very slow

In each dimension, the EA group should find its proper position between the extremes, though no EA will be a 100% version of a caricature.

PERSPECTIVE: BETWEEN BIRD’S-EYE VIEW AND NITTY-GRITTY ON THE GROUND

On one side, an enterprise architect is expected to have a broad overview of both the IT and the business landscapes. On the other side, the enterprise architect needs to retain her grip on the details of business and technology to such an extent that she still understands the reality on the ground. This is a wide chasm that’s not always easy to bridge

CARICATURE #1: LIVING IN CLOUD CUCKOO LAND

An EA who is too decoupled from the community of Project Architects, developers. In some cases, she may be equally unaware of the desires and frustrations of the actual users of the applications.

Symptoms:  A tendency to focus (only) on the “big picture”. However, many of these big pictures she meets in practice have been over-abstracted to the point of insignificance and may no longer address any relevant question.

Another symptom of a cloud cuckoo land is to depend too much on top-down thinking, without inviting or considering enough feedback from the ground

Effect:  EA or EA organization in Cloud Cuckoo Land is ignored or circumvented by both the IT and business departments on the ground.

CARICATURE #2: IN THE CHIEF MECHANIC’S WORKSHOP

This is the EA who focuses too much on purely technical issues or works merely as project architect, she runs the risk of neglecting the broad view and is not taken seriously by the business. This promotes the perception of IT as a cost-driving commodity instead of a business enabler. The role of an EA organization is then likely to be reduced to achieve cost reduction by managing standards and conducting IT rationalizations.

Symptoms: An EA who is too much of an “expert” in one or other technologies and who is not a generalist. He is a one-track specialist with a narrow focus on specific technologies or business requirements (such as security, performance, or user interface and stability), and misses a holistic view of the enterprise IT landscape.

Insufficient architectural skills can be another symptom — i.e. believing that an Architect is always someone who can code faster or better.

Effect:  An EA working with too much focus on the details and too little attention to the broad vision, tends to manage only singular aspects of the system. He will fail to fulfill EA’s claim to shape the IT landscape actively and capitulates in the face of complexity.
This situation drives IT into the passive role of a mere commodity. Commodities, however, are primarily perceived as cost drivers. Therefore the IT function will quickly find itself in an endless loop of cost reduction demands from the CEO.
An EA group operating too much on the detail level will not be able to ensure the design of appropriate, business-aligned IT applications.

See also: Chief Technology Mechanic, a derogatory term coined for the CIO running the shop in the above caricature

Using trello for managing an apartment moving

January 11th, 2016

Trello is a very popular task management tool. I use it for managing some of my personal projects such as keeping track of my Reading list and for actively managing my Learning schedule. Though kanbanflow also offers similar capabilities, trello has a good app available for iPhone.

trello can also be a very useful tool for managing various tasks related to an apartment moving. Though trello wouldn’t solve the mundane tasks for you (for that, probably www.fancyhands.com could be of help), it can help to make the long list of to-dos somewhat more manageable and sometimes less dull. There is also the mild Dopamine rush you get when you move a task from to-do to doing and finally to done.

This is how my trello board for “moving” looks like

trello

Visual Guide to NoSQL Systems

June 21st, 2015

This blog provides a really cool diagram which compares and contrasts different NoSql databases that are available in the market, based on CAP theorem 

I found the comparison here equally useful as well.

 

 

The trajectory of a software engineer… and where it all goes wrong

April 20th, 2015

Just came across a thought-provoking post by a developer called Michael Church on the subject of the trajectory of a software engineer:

https://michaelochurch.wordpress.com/2012/01/26/the-trajectory-of-a-software-engineer-and-where-it-all-goes-wrong/

Using a 0.0 to 3.0 scale, he categorizes a Software developer from a novice to a “Senior fellow”. A novice is a rookie programmer who may not provide enough value for the remuneration which he/she is being paid whereas a Senior Fellow is one of the best known programmer in the world, typically known for his outstanding and groundbreaking contributions to the industry as a whole.

While most programmers may start off as novice (0.0 to 0.4) or marginally better, very few move beyond a 1.0. At a level of 1.0-1.3, a programmer becomes something what Michael calls a full-fledged adder, a stage where they are able to make decent contributors to the projects which they directly work on. But unfortunately, most programmers fail to advance much further than that, often not necessarily due to intellectual limitations, but rather due to lack of drive and curiosity.

In my opinion, thought leaders like Martin Fowler or Kent Beck probably falls under 2.4 to 2.6: Global multiplier (“Fellow”) whereas the likes of Linus Torwalds, Peter Norvig and Richard Stallman falls under 2.7 to 3.0: Senior fellow — the highest level of distinction possible.