1) Overview

Node degradation refers to the condition in which a node cannot perform most queries. If Oxla is misconfigured or faces a startup issue, it will enter a degraded state, return error, and reject all requests. This state can be temporary or permanent, affecting a single node or the entire cluster. This guide explains when degradation occurs and its impact on the node or cluster.

2) Cluster State

In Oxla, most errors that would crash a server should instead put it into a degraded state. Below are key terms related to the cluster state:

  • Healthiness: A cluster is healthy if the client can connect (e.g., via psql) to all nodes in the cluster.
  • Readiness: Readiness is related to the state of degradation of the leader node. If the leader node is degraded, the cluster cannot perform any queries. Therefore, Readiness can also be defined as the cluster’s ability to execute any query.
Exception
Invalid postgresql_port is an exception to the degraded state. Without this being appropriately set, the server is not even healthy.

3) Degradation State Period

The degradation state of a node can be categorized into two periods: Permanent and Temporary.

Permanent Degradation

Permanent degradation occurs when a node encounters an error from which it cannot recover. The server logs the reason for this error, and each query returns the error reason. As a result, the node goes into a degraded state. To attempt to resolve this issue, the node requires a reboot. The following are examples of errors that can put an Oxla node in a permanently degraded state:

  • Invalid configuration file.
  • Invalid OXLA_HOME layout or version.
  • An error occurred while reading the database state on the leader node.

Temporary Degradation

Temporary degradation occurs when a node cannot perform queries because it waits for certain conditions. The following are reasons for a temporary degraded state:

  • Unelected Leader (default starting state of each node).
  • The node is the Leader, but it has not been initialized yet.

4) Effects of Degraded State

EffectsDetails
Query Handling- When a degraded node receives a query, it responds with a degradation error and cannot process it.
- If the Leader is degraded, the whole cluster is considered degraded, and most queries are not processed.
Degradation Types- Permanent Degradation: Nodes permanently degraded are excluded from query planning.
- Temporary Degradation: Nodes temporarily degraded are assumed to recover and are not considered in query planning.
Query ExecutionThe SHOW NODES query requires the cluster to be ready and the scheduling node to not be degraded. It allows you to check the degradation status of each node in the cluster. A non-degraded leader collects data on every connected node, including degraded ones.