Module cartridge.issues

Monitor issues across cluster instances.

Cartridge detects the following problems:

Replication:

  • “Replication from … to … isn’t running” - when box.info.replication.upstream == nil ;

  • “Replication from … to … is stopped/orphan/etc. (…)”;

  • “Replication from … to …: high lag” - when upstream.lag > box.cfg.replication_sync_lag ;

  • “Replication from … to …: long idle” - when upstream.idle > 2 * box.cfg.replication_timeout ;

Failover:

  • “Can’t obtain failover coordinator (…)”;

  • “There is no active failover coordinator”;

  • “Failover is stuck on …: Error fetching appointments (…)”;

  • “Failover is stuck on …: Failover fiber is dead” - this is likely a bug;

Tables

limits

Thresholds for issuing warnings. All settings are local, not clusterwide. They can be changed with corresponding environment variables ( TARANTOOL_* ) or command-line arguments. See cartridge.argparse module for details.

Fields:

  • fragmentation_threshold_critical: (number) default: 0.9.

  • fragmentation_threshold_warning: (number) default: 0.6.

  • clock_delta_threshold_warning: (number) default: 5.