وقتی دربارهی Scaling سیستمها صحبت میکنیم، بیشتر ذهنها سمت کد بهینه یا زیرساخت قوی میره.
اما تجربه نشون داده که نقطهی شروع واقعی Observability هست.
بدون دید، ما در تاریکی حرکت میکنیم.
📌 گوگل در کتاب Site Reliability Engineering (SRE) چهار سیگنال اصلی رو معرفی کرده که پایهی مانیتورینگ هر سیستم مقیاسپذیر هستن:
1️⃣ Latency (زمان پاسخگویی): اولین چیزی که کاربر حس میکنه.
2️⃣ Traffic (حجم ترافیک): درک درست از بار سیستم.
3️⃣ Errors (نرخ خطا): قویترین نشانهی اینکه «چیزی درست کار نمیکنه».
4️⃣ Saturation (میزان پرشدن منابع): وقتی منابع به مرز ظرفیت نزدیک میشن، سیستم شکننده میشه.
💡 به عنوان یک Tech Lead، سوال اصلی ما این نیست که «کد رو با چه Frameworkی مینویسیم»،
بلکه اینه که:
چطور از روز اول میخوایم Latency, Traffic, Errors و Saturation رو ببینیم و پایش کنیم؟
اگر پاسخ به این سوال رو به تعویق بندازیم، هزینهش رو بعدها با Outage، Firefighting و مشتریان ناراضی پرداخت میکنیم.
#Observability #SystemDesign #SiteReliabilityEngineering #TechLeadership #Monitoring
ترجمه:
Hi When we talk about scaling systems, most minds go to the optimal code or strong infrastructure.
But the experience has shown that the real starting point is ObServability.
Without vision, we move in the dark.
In the Site Reliability Engineering (SRE), Google introduces four main signals that are the basis of any scalable system:
1- Latence (Response Time): The first thing the user feels.
2- Traffic (traffic volume): A good understanding of the system load.
3. Errors: The strongest sign that “nothing is right”.
4- Saturation: When resources are closer to the capacity boundary, the system becomes fragile.
💡 As a Tech Lead, our main question is not to “write the code with what framework”,
But that is:
How do we want to see and monitor Latence, Traffic, Errors and Saturation from day one?
If we postpone the answer to this question, we will pay for it later with Outage, Firefighting and dissatisfied customers.
#Observability #systemdesign #sitereliagninging #techleadership #monitaling
When we talk about scaling systems, most minds go to the optimal code or strong infrastructure.
But the experience has shown that the real starting point is ObServability.
Without vision, we move in the dark.
In the Site Reliability Engineering (SRE), Google introduces four main signals that are the basis of any scalable system:
1- Latence (Response Time): The first thing the user feels.
2- Traffic (traffic volume): A good understanding of the system load.
3. Errors: The strongest sign that “nothing is right”.
4- Saturation: When resources are closer to the capacity boundary, the system becomes fragile.
💡 As a Tech Lead, our main question is not to “write the code with what framework”,
But that is:
How do we want to see and monitor Latence, Traffic, Errors and Saturation from day one?
If we postpone the answer to this question, we will pay for it later with Outage, Firefighting and dissatisfied customers.
#Observability #systemdesign #sitereliagninging #techleadership #monitaling