Module cartridge.issues
Monitor issues across cluster instances.
Cartridge detects the following problems:
Replication:
- "Replication from ... to ... isn't running" -
when
box.info.replication.upstream == nil
; - "Replication from ... to ... is stopped/orphan/etc. (...)";
- "Replication from ... to ...: high lag" -
when
upstream.lag > box.cfg.replication_sync_lag
; - "Replication from ... to ...: long idle" -
when
upstream.idle > box.cfg.replication_timeout
;
Failover:
- "Can't obtain failover coordinator (...)";
- "There is no active failover coordinator";
- "Failover is stuck on ...: Error fetching appointments (...)";
- "Failover is stuck on ...: Failover fiber is dead" - this is likely a bug;
Clock:
- "Clock difference between ... and ... exceed threshold"
limits.clock_delta_threshold_warning
;
Memory:
- "Running out of memory on ..." - when all 3 metrics
items_used_ratio
,arena_used_ratio
,quota_used_ratio
frombox.slab.info()
exceedlimits.fragmentation_threshold_critical
; - "Memory is highly fragmented on ..." - when
items_used_ratio > limits.fragmentation_threshold_warning
and botharena_used_ratio
,quota_used_ratio
exceed critical limit.
Tables
limits | Thresholds for issuing warnings. |
Tables
- limits
-
Thresholds for issuing warnings.
All settings are local, not clusterwide. They can be changed with
corresponding environment variables (
TARANTOOL_*
) or command-line arguments. See cartridge.argparse module for details.Fields:
- fragmentation_threshold_critical number default: 0.9.
- fragmentation_threshold_warning number default: 0.6.
- clock_delta_threshold_warning number default: 5.