Azure Blaze: Observability

Wednesday, 28 August 2019

Observability

What are Observability?

Observability is the activities that involve measuring, collecting, and analyzing various diagnostics signals from a system. These signals may include metrics, traces, logs, events, profiles and more

Log aggregation
Application metrics
Audit logging
Distributed tracing
Exception tracking
Health check API
Log deployments and changes

Log aggregation

Definition Use a centralized logging service that aggregates logs from each service instance. The users can search and analyze the logs. They can configure alerts that are triggered when certain messages appear in the logs.

Issue: Handling a large volume of logs requires substantial infrastructure.
Note: Any solution should have minimal runtime overhead.

Application metrics

DefinitionInstrument a service to gather statistics about individual operations. Aggregate metrics in centralized metrics service, which provides reporting and alerting. There are two models for aggregating metrics:

push - the service pushes metrics to the metrics service
pull - the metrics services pulls metrics from the service

Examples

Instrumentation libraries:
- Prometheus client libraries
Metrics aggregation services:
- Prometheus

Benefits:It provides deep insight into application behavior
Drawbacks: Metrics code is intertwined with business logic making it more complicated
Issues: Aggregating metrics can require significant infrastructure

Audit logging

Definition: Record user activity in a database.

Benefits: Provides a record of user actions
Drawbacks: The auditing code is intertwined with the business logic, which makes the business logic more complicated

Distributed Tracing

Definition:

Assigns each external request a unique external request id
Passes the external request id to all services that are involved in handling the request
Includes the external request id in all log messages
Records information (e.g. start time, end time) about the requests and operations performed when handling a external request in a centralized service

Benefits:

It provides useful insight into the behavior of the system including the sources of latency
It enables developers to see how an individual request is handled by searching across aggregated logs for its external request id

Issues: Aggregating and storing traces can require significant infrastructure
Note:

External monitoring only tells you the overall response time and number of invocations - no insight into the individual operations
Any solution should have minimal runtime overhead
Log entries for a request are scattered across numerous logs

Tools:

Jaeger: open source, end-to-end distributed tracing Monitor and troubleshoot transactions in complex distributed systems
Open Zipkin - service for recording and displaying tracing information
Open Tracing - standardized API for distributed tracing

Exception tracking

Definition Report all exceptions to a centralized exception tracking service that aggregates and tracks exceptions and notifies developers.

Benefits: It is easier to view exceptions and track their resolution
Drawbacks: The exception tracking service is additional infrastructure
Note:

Exceptions must be de-duplicated, recorded, investigated by developers and the underlying issue resolved
Any solution should have minimal runtime overhead

Health Check API

Definition A service has an health check API endpoint (e.g. HTTP /health) that returns the health of the service. The API endpoint handler performs various checks, such as

the status of the connections to the infrastructure services used by the service instance
the status of the host, e.g. disk space
application specific logic

A health check client - a monitoring service, service registry or load balancer - periodically invokes the endpoint to check the health of the service instance.

Benefits: The health check endpoint enables the health of a service instance to be periodically tested
Drawbacks: The health check might not sufficiently comprehensive or the service instance might fail between health checks and so requests might still be routed to a failed service instance

Log deployments and changes

Definition Log every deployment and every change to the (production) environment.

Benefits: Enables deployments and changes to be easily correlated with issues leading to a faster resolution.

Technology stack for Observability

https://opentracing.io/ - OpenTracing is not a download or a program. Distributed tracing requires that software developers add instrumentation to the code of an application, or to the frameworks used in the application.

Reference

2 comments:

MS Azure Training in Hyderabad3 October 2019 at 03:39
Good post..Keep on sharing....
MS Azure Training in Hyderabad
MS Azure Training in Ameerpet
Microsoft Azure Training in Hyderabad
MS Azure Online Training
ReplyDelete
Replies
DevOps26 October 2019 at 00:07
It was a great information. Thanks for sharing.
MS Azure Online Training
ReplyDelete
Replies

Add comment

Azure Blaze

Search This Blog