Search This Blog

Wednesday, 28 August 2019

Observability

What are Observability?

Observability is the activities that involve measuring, collecting, and analyzing various diagnostics signals from a system. These signals may include metrics, traces, logs, events, profiles and more

  • Log aggregation
  • Application metrics
  • Audit logging
  • Distributed tracing
  • Exception tracking
  • Health check API
  • Log deployments and changes

Log aggregation

Definition Use a centralized logging service that aggregates logs from each service instance. The users can search and analyze the logs. They can configure alerts that are triggered when certain messages appear in the logs.

Issue: Handling a large volume of logs requires substantial infrastructure.
Note: Any solution should have minimal runtime overhead.

Application metrics

DefinitionInstrument a service to gather statistics about individual operations. Aggregate metrics in centralized metrics service, which provides reporting and alerting. There are two models for aggregating metrics:

  • push - the service pushes metrics to the metrics service
  • pull - the metrics services pulls metrics from the service

Examples
  • Instrumentation libraries:
    • Prometheus client libraries
  • Metrics aggregation services:
    • Prometheus

Benefits:It provides deep insight into application behavior
Drawbacks: Metrics code is intertwined with business logic making it more complicated
Issues: Aggregating metrics can require significant infrastructure

Audit logging

Definition: Record user activity in a database.

Benefits: Provides a record of user actions
Drawbacks: The auditing code is intertwined with the business logic, which makes the business logic more complicated

Distributed Tracing

Definition:

  • Assigns each external request a unique external request id
  • Passes the external request id to all services that are involved in handling the request
  • Includes the external request id in all log messages
  • Records information (e.g. start time, end time) about the requests and operations performed when handling a external request in a centralized service

Benefits:
  • It provides useful insight into the behavior of the system including the sources of latency
  • It enables developers to see how an individual request is handled by searching across aggregated logs for its external request id

Issues: Aggregating and storing traces can require significant infrastructure
Note:
  • External monitoring only tells you the overall response time and number of invocations - no insight into the individual operations
  • Any solution should have minimal runtime overhead
  • Log entries for a request are scattered across numerous logs

Tools:

Exception tracking

Definition Report all exceptions to a centralized exception tracking service that aggregates and tracks exceptions and notifies developers.

Benefits: It is easier to view exceptions and track their resolution
Drawbacks: The exception tracking service is additional infrastructure
Note:
  • Exceptions must be de-duplicated, recorded, investigated by developers and the underlying issue resolved
  • Any solution should have minimal runtime overhead

Health Check API

Definition A service has an health check API endpoint (e.g. HTTP /health) that returns the health of the service. The API endpoint handler performs various checks, such as

  • the status of the connections to the infrastructure services used by the service instance
  • the status of the host, e.g. disk space
  • application specific logic

A health check client - a monitoring service, service registry or load balancer - periodically invokes the endpoint to check the health of the service instance.

Benefits: The health check endpoint enables the health of a service instance to be periodically tested
Drawbacks: The health check might not sufficiently comprehensive or the service instance might fail between health checks and so requests might still be routed to a failed service instance

Log deployments and changes

Definition Log every deployment and every change to the (production) environment.

Benefits: Enables deployments and changes to be easily correlated with issues leading to a faster resolution.

Technology stack for Observability

  • https://opentracing.io/ - OpenTracing is not a download or a program. Distributed tracing requires that software developers add instrumentation to the code of an application, or to the frameworks used in the application.

2 comments:

Elasticsearch - Nodes, clusters, and shards

Elastic Stack Video - Load your gun in short time.   Beginner's Crash Course to Ela...

Recent Post