(Senior DevOps Manager & Principal Architect)
Rajesh Kumar — an award-winning academician and consultant trainer, with 15+ years’ experience in diverse skill management, who has more than a decade of experience in training large and diverse groups across multiple industry sectors.
Monitoring is the most basic component in their reliability pyramid and enables incident response and postmortems.
Once upon a time there was “Monitoring”
Observability is a superset of monitoring. It provides not only high-level overviews of the system’s health but also highly granular insights into the implicit failure modes of the system.
In addition, an observable system furnishes ample context about its inner workings, unlocking the ability to uncover deeper, systemic issues.
Monitoring, on the other hand, is best suited to report the overall health of systems and to derive alerts.
Telemetry is the collection of measurements or other data at remote points and their automatic transmission to receiving equipment for monitoring. The word is derived from the Greek roots tele, "remote", and metron, "measure".
Four essential telemetry data types
Metrics represent the data in your system, monitoring is the process of collecting, aggregating, and analyzing those values to improve awareness of your components' characteristics and behavior.
Metrics capture a value pertaining to your systems at a specific point in time — for example, the number of users currently logged in to a web application. Therefore, metrics are usually collected once per second, one per minute, or at another regular interval to monitor a system over time.
There are two important categories of metrics in our framework: work metrics and resource metrics.
Work metrics indicate the top-level health of your system by measuring its useful output. It’s often helpful to break them down into four subtypes:
A server’s resources include such physical components as CPU, memory, disks, and network interfaces. Including a database or a geolocation microservice, can also be considered a resource if another system requires that component to produce work. For each resource in your system, try to collect metrics that cover four key areas:
In addition to metrics, which are collected more or less continuously, some monitoring systems can also capture events: discrete, infrequent occurrences that can provide crucial context for understanding what changed in your system’s behavior. Some examples:
An event usually carries enough information that it can be interpreted on its own. Events capture what happened, at a point in time, with optional additional information.
These platforms provide you the opportunity to connect with peers and industry DevOps leaders, where you can share, discuss or get information on latest topics or happenings in DevOps culture and grow your DevOps professionals network.
DevOps |
Build & Release |
DevOps |
Build & Release |
DevOpsSchool |
DevOps Group |
BestDevOps.com |
DevOpsSchool — Lets Learn, Share & Practice DevOps
Datadog Course
2. Datadog Infrastructure Monitoring