Overview

Node degradation refers to the condition in which a node cannot perform most queries. If Oxla is misconfigured or faces a startup issue, it will enter a degraded state, return an error and reject all requests. This state can be temporary or permanent, affecting a single node or the entire cluster. This guide explains when degradation occurs and its impact on the node or cluster.

Cluster State

In Oxla, most errors that would crash a server should instead put it into a degraded state. Below are the key terms related to the node / cluster state:

  • Liveness: node serves incoming client connections, e.g. via psql. It does not have to allow the user to connect to the database - returning an error on connection attempt still meets liveness condition.
  • Readiness: cluster can execute queries. It requires leader node to be in a proper state. If the leader node is degraded, the cluster is not ready to execute queries.
Exception
Invalid postgresql_port is an exception to the degraded state. Without it being properly set, even liveness condidtion is not met.

Degradation State Period

The degradation state of a node can be either permanent or temporary.

Permanent Degradation

Permanent degradation occurs when a node encounters an error from which it cannot recover. The server logs the reason for this error and each query returns the error reason. As a result, the node goes into a degraded state. In order to resolve the issue, the node requires a reboot. Here are a few error examples that can put an Oxla node in a permanently degraded state:

  • Invalid configuration file
  • Invalid OXLA_HOME layout or version
  • An error occurred while reading the database state on the leader node

Temporary Degradation

Temporary degradation occurs when a node cannot perform queries because it waits for specific conditions. Below you can find errors that are related to a temporary degraded state:

  • Unelected Leader (default starting state of each node)
  • The node is the Leader, but it has not been initialized yet

Effects of Degraded State

EffectsDetails
Database connectionIf the Leader is degraded, user cannot connect to the database and all connection attempts will return degradation error.
Query Handling- When a degraded node receives a query, it responds with a degradation error and cannot process it. - If the Leader is degraded, the whole cluster is considered degraded and most queries are not processed.
Degradation Types- Permanent Degradation: Nodes permanently degraded are excluded from query planning.
- Temporary Degradation: Nodes temporarily degraded are assumed to recover and are not considered in query planning.
Query ExecutionThe SHOW NODES query requires the cluster to be ready and the scheduling node to not be degraded. It allows you to check the degradation status of each node in the cluster. A non-degraded leader collects data on every connected node, including degraded ones.