System Design

Posts

Apache Kafka

September 08, 2021

Apache Kafka is a distributed streaming platform. Why use Apache Kafka? Its abstraction is a queue and it features a distributed pub-sub messaging system that resolves N^2 relationships to N. Publishers and subscribers can operate at their own rates. super fast with zero-copy technology support fault-tolerant data persistence It can be applied to logging by topics messaging system geo-replication stream processing Why is Kafka so fast? Kafka is using zero copy in which that CPU does not perform the task of copying data from one memory area to another. Without zero copy: With zero copy: Architecture Looking from outside, producers write to brokers, and consumers read from brokers. Data is stored in topics and split into partitions which are replicated. Producer publishes messages to a specific topic. Write to in-memory buffer first and flush to disk. append-only sequence write for fast write. Available to read after write to disks. Consumer pulls mess...

September 08, 2021

Content delivery network(CDN) Source: Why use a CDN A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon's CloudFront support dynamic content. The site's DNS resolution will tell clients which server to contact. Serving content from CDNs can significantly improve performance in two ways: Users receive content from data centers close to them Your servers do not have to serve requests that the CDN fulfills Push CDNs Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage. Sites with a small a...

CAP Theorem

September 08, 2021

CAP Theorem: Revisited In today's technical landscape, we are witnessing a strong and increasing desire to scale systems out when additional resources (compute, storage, etc.) are needed to successfully complete workloads in a reasonable time frame. This is accomplished through adding additional commodity hardware to a system to handle the increased load. As a result of this scaling strategy, an additional penalty of complexity is incurred in the system. This is where the CAP theorem comes into play. The CAP Theorem states that, in a distributed system (a collection of interconnected nodes that share data.), you can only have two out of the following three guarantees across a write/read pair: Consistency, Availability, and Partition Tolerance - one of them must be sacrificed. However, as you will see below, you don't have as many options here as you might think. Consistency - A read is guaranteed to return the most recent write for a given client. Availability - ...

Search This Blog

System Design

Posts

Design TinyUrl

Apache Kafka

CAP Theorem