- Регистрация
- 1 Мар 2015
- Сообщения
- 1,481
- Баллы
- 155
Monitoring is a critical part of SRE practices to manage and ensure systems and services reliability. 4 Golden Signals is key metrics that used to monitor the health of your service and underlying systems effectively.
Monitoring
Monitoring is the process of collecting, processing, aggregating, and displaying real-time quantitative data about a system.
This allows engineers to understand system behavior, detect anomalies, and make informed decisions based on metrics.
Here are few benefits of doing monitoring:
1. Analyzing Long-Term Trends ??
2. Alerting ?
3. Conducting Ad Hoc Retrospective Analysis
?
The Four Golden Signals -





The time it takes for a system to respond to a request. High latency can directly affect user experience and cause business downfall. It’s critical to monitor both successful and failed request latencies.
Represents the volume of demand on your system, typically measured in requests per second (RPS), transactions per second (TPS), or similar.


Measures the number of requests that fail either explicitly (e.g., HTTP 500 errors) or implicitly (e.g., timeouts or incorrect responses).
Indicates how "full" your system is. This can refer to CPU, memory, I/O usage, or any resource that might become a bottleneck.
References
Monitoring
Monitoring is the process of collecting, processing, aggregating, and displaying real-time quantitative data about a system.
This allows engineers to understand system behavior, detect anomalies, and make informed decisions based on metrics.
Here are few benefits of doing monitoring:
1. Analyzing Long-Term Trends ??
Monitoring helps track the growth and usage patterns of applications over time. You can observe metrics like database size or daily active user count.This historical data supports better technical and business decision-making.
2. Alerting ?
When the system can't self-heal, alerts help engineers investigate the issue, determine the root cause, and take corrective action immediately.Monitoring enables the system to notify you when something is broken or about to break.
3. Conducting Ad Hoc Retrospective Analysis
For example, if your system experiences a spike in latency, you can correlate this with other metrics collected at that time to debug and find the root cause.Monitoring provides a trail of metrics that can be analyzed after an incident.
The Four Golden Signals -
1. Latency — Request Service TimeFoundational building blocks of an effective monitoring strategy. They cover the most essential aspects of system health and performance. Focusing on these core signals helps minimize noise and reduce maintenance overhead.
The time it takes for a system to respond to a request. High latency can directly affect user experience and cause business downfall. It’s critical to monitor both successful and failed request latencies.
2. Traffic — User Demand ??By applying monitoring on these metrics, high latency can be alerted and take actions before user's complains
Represents the volume of demand on your system, typically measured in requests per second (RPS), transactions per second (TPS), or similar.
3. Errors — Rate of Failed RequestsMonitoring traffic can helps anticipate scaling needs and detect abnormal usage patterns
Measures the number of requests that fail either explicitly (e.g., HTTP 500 errors) or implicitly (e.g., timeouts or incorrect responses).
4. Saturation — System Capacity ??Monitor high error rate can helps indicating abnormalities and wrong in the systems
Indicates how "full" your system is. This can refer to CPU, memory, I/O usage, or any resource that might become a bottleneck.
By focusing on these golden signals - you gain critical visibility for your system's health and performance.Saturation metrics help predict and prevent outages caused by overutilization.
References