On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> >
> > From my understanding this is not sufficient for SMP sync.
> >
> > If I read this comment correct:
> >
> > void aio_notify(AioContext *ctx)
> > {
> > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> > */
> > smp_mb();
> > if (ctx->notify_me) {
> >
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
>
> Based on the rest of the thread, this patch should also fix the bug:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
> aio_notify_accept(ctx);
>
> for (bh = ctx->first_bh; bh; bh = bh->next) {
> - if (bh->scheduled) {
> + if (atomic_mb_read(&bh->scheduled)) {
> return true;
> }
> }
>
>
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>
> void aio_notify(AioContext *ctx)
> {
> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
> + * atomic_add in aio_poll.
> */
> - smp_mb();
> - if (ctx->notify_me) {
> + if (atomic_mb_read(&ctx->notify_me)) {
> event_notifier_set(&ctx->notifier);
> atomic_mb_set(&ctx->notified, true);
> }
>
>
> Would you be able to test these (one by one possibly)?
Sure.
> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 iterations).
>
> Sorry for asking the obvious---500 iterations of what?
On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: AioContext *ctx) check(GSource *source) accept( ctx); mb_read( &bh->scheduled) ) { linux_aio( AioContext *ctx) AioContext *ctx) mb_read( &ctx->notify_ me)) { set(&ctx- >notifier) ; mb_set( &ctx->notified, true);
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> >
> > From my understanding this is not sufficient for SMP sync.
> >
> > If I read this comment correct:
> >
> > void aio_notify(
> > {
> > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> > */
> > smp_mb();
> > if (ctx->notify_me) {
> >
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
>
> Based on the rest of the thread, this patch should also fix the bug:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_
> aio_notify_
>
> for (bh = ctx->first_bh; bh; bh = bh->next) {
> - if (bh->scheduled) {
> + if (atomic_
> return true;
> }
> }
>
>
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_
>
> void aio_notify(
> {
> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
> + * atomic_add in aio_poll.
> */
> - smp_mb();
> - if (ctx->notify_me) {
> + if (atomic_
> event_notifier_
> atomic_
> }
>
>
> Would you be able to test these (one by one possibly)?
Sure.
> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 iterations).
>
> Sorry for asking the obvious---500 iterations of what?
The testcase mentioned in the Canonical issue: /bugs.launchpad .net/qemu/ +bug/1805256
https:/
It's a simple image convert:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Usually it got stuck after 3-20 iterations.
--Jan