November « 2017 « $foo $bar

Archive for November, 2017

Sunday, November 26th, 2017

Just finished reading “I Heart Logs: Event Data, Stream Processing, and Data Integration”, written by Jay Kreps (who developed Apache Kafka).

Summary:

One of the key challenges in distributed systems is reasoning about a global order of events (these events could be anything such as clicks on a webpage, transactions, price updates to a stock etc.). Keep in mind that in a distributed system, there is no global physical clock (for a good take on this, see ACMQueue)
Logs can be used as a key abstraction in distributed systems, as they provide a journal of what happened in which order
Logs have been valuable as a mechanism to replicate data across different databases for a long time
Traditional ETL processing using hourly or daily data loads are less effective nowadays due to volume and velocity of data changes
Enterprises can leverage Log based data pipeline by publishing all of its data to a central Log system
Each application in that case is responsible for publishing data to the central Log pipeline in a canonical data model
Downstream systems can subscribe to data feeds from the Log and consume data at their own pace
Log based data processing helps to decouple producers and consumers, in the same way as traditional message queue does
A system like Kafka, apart from providing a publish/subscribe model can also serve as a very useful buffer between producers and consumers

Posted in Technical | Comments Closed