Module cartridge.issues

Monitor issues across cluster instances.

Cartridge detects the following problems:

Replication:

  • "Replication from ... to ... isn't running" - when box.info.replication.upstream == nil;
  • "Replication from ... to ... is stopped/orphan/etc. (...)";
  • "Replication from ... to ...: high lag" - when upstream.lag > box.cfg.replication_sync_lag;
  • "Replication from ... to ...: long idle" - when upstream.idle > box.cfg.replication_timeout;

Failover:

  • "Can't obtain failover coordinator (...)";
  • "There is no active failover coordinator";
  • "Failover is stuck on ...: Error fetching appointments (...)";
  • "Failover is stuck on ...: Failover fiber is dead" - this is likely a bug;

Clock:

  • "Clock difference between ... and ... exceed threshold" limits.clock_delta_threshold_warning;

Memory:

  • "Running out of memory on ..." - when all 3 metrics items_used_ratio, arena_used_ratio, quota_used_ratio from box.slab.info() exceed limits.fragmentation_threshold_critical;
  • "Memory is highly fragmented on ..." - when items_used_ratio > limits.fragmentation_threshold_warning and both arena_used_ratio, quota_used_ratio exceed critical limit.

Tables

limits Thresholds for issuing warnings.


Tables

limits
Thresholds for issuing warnings. All settings are local, not clusterwide. They can be changed with corresponding environment variables (TARANTOOL_*) or command-line arguments. See cartridge.argparse module for details.

Fields:

  • fragmentation_threshold_critical number default: 0.9.
  • fragmentation_threshold_warning number default: 0.6.
  • clock_delta_threshold_warning number default: 5.
generated by LDoc 1.4.6 Last updated 2020-04-24 16:23:15