MDEV-34064 [demo] return error from ha_commit_trans() in rpl_slave_state::truncate_state_table(THD *thd)
just a demo - a similar fix to MDEV-28105 and MDEV-27186.
Whether the 12607 error makes sense is perhaps a different matter:
mysqltest: At line 7: query 'CHANGE MASTER TO master_host='a',master_port=1,master_user='a',master_demote_to_slave=1' failed: <Unknown> (12607): This xid is not exist
MDEV-30929 spider.spider_fixes_part: wait and restart slave
In the absence of insight of the cause of spider.spider_fixes_part
failure as described in MDEV-30929, This is a workaround, which could
help narrow the possibility down to whether slave SQL thread attempts
to read from file that maybe not yet on disk. It does not otherwise
affect the coverage of the test.
I have pushed this commit 4 times, but have yet to encounter the
failure as described in MDEV-30929, so it could also fix the test and
stop the CI pollution.
Also replaced START SLAVE; with --source include/start_slave.inc
inside the slave_test_init.inc files.
MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure
Refinement of the original patch.
Move the code to reset the kill up into the parent class
Xid_apply_log_event, to also fix the similar issue for XA COMMIT.
Increase the number of slave retries in the test case
rpl.rpl_parallel_multi_domain_xa to fix some sporadic failures. The test
generates massive amounts of conflicting transactions in multiple
independent domains, which can cause multiple rollback+retry for a
transaction as it conflicts with transactions in other domains one-by-one.
Signed-off-by: Kristian Nielsen <email address hidden>
Don't deadlock kill event groups in other domains if they are not
SPECULATE_OPTIMISTIC. Such event groups may not be able to safely roll back
and retry (eg. DDL).
But do deadlock kill a transaction T2 from a blocked transaction U in another
domain, even if T2 has lower sub_id than U. Otherwise, in case of a cycle
T2->T1->U->T2, we might not break the cycle if U is not SPECULATE_OPTIMISTIC
Signed-off-by: Kristian Nielsen <email address hidden>
sporadic failures of binlog_encryption.rpl_parallel_gco_wait_kill
CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'
An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.
MDEV-33798: ROW base optimistic deadlock with concurrent writes on same table
One case is conflicting transactions T1 and T2 with different domain id, in
optimistic parallel replication in non-GTID mode. Then T2 will
wait_for_prior_commit on T1; and if T1 got a row lock wait on T2 it would
hang, as different domains caused the deadlock kill to be skipped in
thd_rpl_deadlock_check().
More generally, if we have transactions T1 and T2 in one domain/master
connection, and independent transactions U in another, then we can
still deadlock like this:
T1 row low wait on U
U row lock wait on T2
T2 wait_for_prior_commit on T1
This commit enforces the deadlock kill in these cases. If the waited-for
transaction is speculatively applied, then it will be deadlock killed in
case of a conflict, even if the two transactions are in different domains
or master connections.
Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>
90b95c6...
by
mariadb-DebarunBanerjee <email address hidden>
MDEV-33543 Server hang caused by InnoDB change buffer
Issue: When getting a page (buf_page_get_gen) with no latch option
(RW_NO_LATCH), the caller is not expected to follow the B-tree latching
order. However in buf_page_get_low we try to acquire shared page latch
unconditionally to wait for a page that is being loaded by another
thread concurrently. In general it could lead to latch order violation
and deadlock.
Currently it affects the change buffer insert path btr_latch_prev()
which tries to load the previous page out of order with RW_NO_LATCH and
two concurrent inserts into IBUF tree cause deadlock. This problem is
introduced in 10.6 by following commit.
commit 9436c778c3adba7c29dab5649668433d71e086f2 (MDEV-27058)
Fix: While trying to latch a page with RW_NO_LATCH, always use the
"*lock_try" interface and retry operation on failure after unfixing the
page.