[autopkgtest][focal] ruby-gnome/3.4.1-2build1 class:TestWebKit2GtkWebView failure

Bug #1868108 reported by Rafael David Tinoco
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Webkit
Fix Released
Medium
webkit2gtk (Ubuntu)
Fix Released
High
Sebastien Bacher
Focal
Fix Released
High
Sebastien Bacher

Bug Description

ruby-gnome/3.4.1-2* package fails with:

--------------------------------------------------------------------------------
2020-03-19T13:29:50+00:00: Running test for /tmp/autopkgtest.2iRDtE/build.GH8/src/webkit2-gtk
--------------------------------------------------------------------------------
Loaded suite test
Started
(run-test.rb:9142): dbind-WARNING **: 13:29:51.769: AT-SPI: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files

..[2020-03-19 13:29:52] INFO WEBrick 1.6.0
[2020-03-19 13:29:52] INFO ruby 2.7.0 (2019-12-25) [powerpc64le-linux-gnu]
[2020-03-19 13:29:52] INFO WEBrick::HTTPServer#start: pid=9142 port=37651

(WebKitWebProcess:9153): dbind-WARNING **: 13:29:52.607: AT-SPI: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files
F
===============================================================================
Failure: test: #load_uri(TestWebKit2GtkWebView::#load_uri):
  <true> expected but was
  <false>
/tmp/autopkgtest.2iRDtE/build.GH8/src/webkit2-gtk/test/test-webkit2-gtk-web-view.rb:106:in `block (2 levels) in <class:TestWebKit2GtkWebView>'
     103: @view.load_uri(http_url)
     104: loop.run
     105:
  => 106: assert_true(loaded)
     107: end
     108: end
     109: end
===============================================================================[2020-03-19 13:30:22] INFO going to shutdown ...
[2020-03-19 13:30:22] INFO WEBrick::HTTPServer#start done.

......
Finished in 30.95292437 seconds.
-------------------------------------------------------------------------------
9 tests, 9 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
88.8889% passed
-------------------------------------------------------------------------------
0.29 tests/s, 0.29 assertions/s
Failed to run test: webkit2-gtk
--------------------------------------------------------------------------------
Failed targets: webkit2-gtk
Gdk-Message: 13:30:22.622: WebKitWebProcess: Fatal IO error 2 (No such file or directory) on X server :99.

autopkgtest [13:30:22]: test run-test: -----------------------]
run-test FAIL non-zero exit status 1
autopkgtest [13:30:23]: test run-test: - - - - - - - - - - results - - - - - - - - - -
autopkgtest [13:30:23]: @@@@@@@@@@@@@@@@@@@@ summary
run-test FAIL non-zero exit status 1
Exit request sent.

When running in ppc64el. This is likely due a timeout in glib main loop for that architecture but can't know for sure. I'm opening this bug so I can blacklist that particular test.

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

Created attachment 393861
small script testcase

Unsure what useful information to provide but the webkit2gtk 2.26 -> 2.28 upgrade is blocked in Ubuntu because some reverse depends tests fail on s390x/ppc64el

Trying a simple python webkitgtk script, using 2.26 the webpage loads fine but 2.28 gives an empty view...

Any hint on how to debug would be useful since that blocks the new serie to make it to the current Ubuntu serie

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

The MiniBrowser provided by webkitgtk has the same issue

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to seb128 from comment #1)
> The MiniBrowser provided by webkitgtk has the same issue

Can you try running the script after exporting the environment variable WEBKIT_DISABLE_COMPOSITING_MODE=1 ? and with the environment variable JavaScriptCoreUseJIT=0 ? does any of it help?

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

WEBKIT_DISABLE_COMPOSITING_MODE=1 doesn't seem to make a difference

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

BTW we know JavaScriptCore tests on both of these architectures started failing sometime between 2.26 and 2.28, so wouldn't surprise me if they're totally busted. I have pending to bisect the issues and then report bugs, but no promises as to when.

 * ppc64le is special because it uses large page sizes (and different sizes on different distros!)
 * s390x is special because it's big endian

Ideally, we would set up JSCOnly EWS for these architectures, but due to the nature of the architectures, that's not likely going to be easy, especially for s390x.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Michael Catanzaro from comment #4)
> BTW we know JavaScriptCore tests on both of these architectures started
> failing sometime between 2.26 and 2.28, so wouldn't surprise me if they're
> totally busted.

(Separate issues, btw, because of course it would be too easy for there to be just one problem for both arches. ;)

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Carlos Alberto Lopez Perez from comment #2)
> and with the environment variable JavaScriptCoreUseJIT=0 ? does any of it help?

Hopefully that should not make any difference, because these arches should both be using cloop.

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

One of the test failing is ruby-gnome's webkit test unit
https://salsa.debian.org/ruby-team/ruby-gnome/-/blob/master/webkit2-gtk/test/run-test.rb

The Ubuntu package calls it under xvfb. It makes a bit easier to test for the issue/regression from a command line only environment.

I confirmed the issues on a graphical environment by using ssh -X to a porter box

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

Sorry, I overlooked earlier than you asked also about JavaScriptCoreUseJIT=0 ... that doesn't make a difference either

Revision history for this message
In , carloslp (carloslp) wrote :

If your test case, can you see a a process named WebKitWebProcess still alive after several seconds (let's say: 1 minute after loading the test) or does this process dies?

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

> If your test case, can you see a a process named WebKitWebProcess still alive after several seconds

No, that process goes away a few seconds after the view is opened

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

attaching with gdb, it gives

Thread 1 "WebKitWebProces" received signal SIGABRT, Aborted.
0x000078724d300468 in __libc_signal_restore_set (set=0x7ffffda70cb8)
    at ../sysdeps/unix/sysv/linux/internal-signals.h:86
86 ../sysdeps/unix/sysv/linux/internal-signals.h: No such file or directory.
(gdb) bt
#0 0x000078724d300468 in __libc_signal_restore_set (set=0x7ffffda70cb8)
    at ../sysdeps/unix/sysv/linux/internal-signals.h:86
#1 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#2 0x000078724d2d7cd0 in __GI_abort () at abort.c:79
#3 0x000078724b27eed4 in JSC::Config::permanentlyFreeze() ()
    at /lib/powerpc64le-linux-gnu/libjavascriptcoregtk-4.0.so.18
#4 0x000078724b4b0fd0 in JSC::VM::VM(JSC::VM::VMType, JSC::HeapType) ()
    at /lib/powerpc64le-linux-gnu/libjavascriptcoregtk-4.0.so.18
#5 0x000078724b4b2ad4 in JSC::VM::create(JSC::HeapType) ()

I will try to install debug symbols

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to seb128 from comment #10)
> > If your test case, can you see a a process named WebKitWebProcess still alive after several seconds
>
> No, that process goes away a few seconds after the view is opened

great.. we have to know why it dies.

Don't attach gdb to it, just enable core generation by running "ulimit -c unlimited", then run your test case and wait until it dies and finishes writting the core file (can take a while).

Then generate a backtrace from the core file with this command.

gdb --batch -ex "thread apply all bt full" /path/to/WebKitWebProcess core &> core_backtrace.txt

and upload here the txt file.

btw: install the dbg-sym packages before genearating the backtrace

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

Created attachment 393897
gdb

Using gdb on the process should be the same no?

the ulimit/core doesn't work, probably because of apport

Anyway, I got one with gdb on the process, attaching the file now, let me know if I should better try your suggested way for some reason

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

in fact apport collected the dump, using gdb on it gives the same stacktrace that it does on the process

Revision history for this message
In , carloslp (carloslp) wrote :
Download full text (7.4 KiB)

So its crashing here:

#3 0x0000705dcc8beed4 in CRASH_WITH_INFO(...) () at DerivedSources/ForwardingHeaders/wtf/Assertions.h:660
#4 JSC::Config::permanentlyFreeze() () at ../Source/JavaScriptCore/runtime/JSCConfig.cpp:78
#5 0x0000705dccaf0fd0 in JSC::VM::VM(JSC::VM::VMType, JSC::HeapType) () at ../Source/JavaScriptCore/runtime/VM.cpp:586
#6 0x0000705dccaf2ad4 in JSC::VM::create(JSC::HeapType) () at ../Source/JavaScriptCore/runtime/VM.cpp:703
#7 0x0000705dd06d0b48 in WebCore::commonVMSlow() () at ../Source/WebCore/bindings/js/CommonVM.cpp:55
#8 0x0000705dd0e9fe74 in WebCore::commonVM() () at ../Source/WebCore/bindings/js/CommonVM.h:52
#9 WebCore::PageScriptDebugServer::PageScriptDebugServer(WebCore::Page&) () at ../Source/WebCore/inspector/PageScriptDebugServer.cpp:58
#10 0x0000705dd0e87548 in WebCore::InspectorController::InspectorController(WebCore::Page&, WebCore::InspectorClient*) () at ../Source/WebCore/inspector/InspectorController.cpp:105
#11 0x0000705dd116e254 in std::make_unique<WebCore::InspectorController, WebCore::Page&, WebCore::InspectorClient*&>(WebCore::Page&, WebCore::InspectorClient*&) () at /usr/include/c++/9/bits/unique_ptr.h:857
#12 WTF::makeUnique<WebCore::InspectorController, WebCore::Page&, WebCore::InspectorClient*&>(WebCore::Page&, WebCore::InspectorClient*&) () at DerivedSources/ForwardingHeaders/wtf/StdLibExtras.h:483
#13 WebCore::Page::Page(WebCore::PageConfiguration&&) () at ../Source/WebCore/page/Page.cpp:279
#14 0x0000705dcfbdf4f8 in std::make_unique<WebCore::Page, WebCore::PageConfiguration>(WebCore::PageConfiguration&&) () at /usr/include/c++/9/bits/unique_ptr.h:857
#15 WTF::makeUnique<WebCore::Page, WebCore::PageConfiguration>(WebCore::PageConfiguration&&) () at DerivedSources/ForwardingHeaders/wtf/StdLibExtras.h:483
#16 WebKit::WebPage::WebPage(WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters&&) () at ../Source/WebKit/WebProcess/WebPage/WebPage.cpp:536
#17 0x0000705dcfbe0254 in WebKit::WebPage::create(WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters&&) () at ../Source/WebKit/WebProcess/WebPage/WebPage.cpp:379
#18 0x0000705dcf994ad8 in WebKit::WebProcess::createWebPage(WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters&&) () at ../Source/WebKit/WebProcess/WebProcess.cpp:685
#19 0x0000705dcf434e08 in IPC::callMemberFunctionImpl<WebKit::WebProcess, void (WebKit::WebProcess::*)(WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters&&), std::tuple<WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters>, 0ul, 1ul>(WebKit::WebProcess*, void (WebKit::WebProcess::*)(WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters&&), std::tuple<WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters>&&, std::integer_sequence<unsigned long, 0ul, 1ul>) () at ../Source/WebKit/Platform/IPC/HandleMessage.h:41
#20 IPC::callMemberFunction<WebKit::WebProcess, void (WebKit::WebProcess::*)(WTF::ObjectIdentifier<WebCore::PageIdentifierType>, WebKit::WebPageCreationParameters&&), std::tuple<WTF::ObjectId...

Read more...

Revision history for this message
In , carloslp (carloslp) wrote :

That code crashing was added on r249808

Revision history for this message
In , Mark-lam (mark-lam) wrote :

(In reply to Carlos Alberto Lopez Perez from comment #16)
> That code crashing was added on r249808

This crash means that ppc64el/s390x is lacking this support. If you don't want the feature, you can add an #elif for that platform and make this a no-op for it. Alternatively, you can implement the support.

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to Mark Lam from comment #17)
> (In reply to Carlos Alberto Lopez Perez from comment #16)
> > That code crashing was added on r249808
>
> This crash means that ppc64el/s390x is lacking this support.

Do you mean it lacks support for mprotect() with PROT_READ ?

>If you don't want the feature, you can add an #elif for that platform and make this a no-op for it. Alternatively, you can implement the support.

Making it a no-op for this architectures seems fine to me.

@seb128: can you test if this patch fixes the issue?

diff --git a/Source/JavaScriptCore/runtime/JSCConfig.cpp b/Source/JavaScriptCore/runtime/JSCConfig.cpp
index 01e0e63..9c57da8 100644
--- a/Source/JavaScriptCore/runtime/JSCConfig.cpp
+++ b/Source/JavaScriptCore/runtime/JSCConfig.cpp
@@ -70,6 +70,7 @@ void Config::permanentlyFreeze()
     // There's no going back now!
     result = vm_protect(mach_task_self(), reinterpret_cast<vm_address_t>(&g_jscConfig), ConfigSizeToProtect, DisallowPermissionChangesAfterThis, VM_PROT_READ);
 #elif OS(LINUX)
+ return;
     result = mprotect(&g_jscConfig, ConfigSizeToProtect, PROT_READ);
 #elif OS(WINDOWS)
     // FIXME: Implement equivalent, maybe with VirtualProtect.

And if it does can you let me know which one its the value of CMAKE_SYSTEM_PROCESSOR for s390x? you can find this by grepping on the build directory for this string.

And other question: does this only affect ppc64el or also ppc64 and ppc?

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to Carlos Alberto Lopez Perez from comment #18)
> And if it does can you let me know which one its the value of
> CMAKE_SYSTEM_PROCESSOR for s390x? you can find this by grepping on the build
> directory for this string.
>

Also upload here a txt file with the output of the command "echo | gcc -E -dM -" on s390x

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Sebastien, you told me there was no crash. :( A crash makes everything way easier!

(In reply to Carlos Alberto Lopez Perez from comment #18)
> Do you mean it lacks support for mprotect() with PROT_READ ?

errno is set on failure, so WebKit should check errno and print a proper error using strerror(errno) to give us an idea of what's going wrong. But I'm pretty sure it's going to be EINVAL. Look:

EINVAL addr is not a valid pointer, or not a multiple of the system
              page size.

And I see we have in JSCConfig.h:

#if !OS(WINDOWS)
constexpr size_t ConfigSizeToProtect = 16 * KB;
#else
constexpr size_t ConfigSizeToProtect = 4 * KB;
#endif

which is definitely incorrect. This is reminiscent of bug #182923, fixed in r232909. Note that in that revision, the task was to find a block size that was *at least as big* as the actual page size, so it was not a precise fix: we just guessed 64 KB was big enough for strange architectures without attempting to be precise about it. Hopefully the same approach would work here?

I'm not sure where to find correct pages sizes to assume. Looking at an internal email that doesn't cite any sources, it looks like s390x uses 4 KB pages, ppc64le uses 64 KB pages on RHEL but 4 KB on various other distros (would be too easy otherwise!), aarch64 uses 64 KB pages... though aarch64 is getting 16 KB block sizes in MarkedBlock.h without crashing, because we have CPU(ARM64) so it shouldn't be CPU(UNKNOWN). And afaik all of the above can be changed at kernel compile time, so we just hope by convention that it's the same on all distros, except ppc64le where we know it isn't. So conclusion: ?????????

The best solution is to get page size at runtime using sysconf(_SC_PAGESIZE), but it looks like the code really wants a compile-time solution. So maybe just hardcode 64 KB for these CPUs and for CPU(UNKNOWN)? Ideally we would share the value with MarkedBlock.h? clopez, what do you think?

Anyway, Sebastien, try changing ConfigSizeToProtect to 64 * KB and see if that works. My guess is that will fix ppc64le, but something somewhere else will be broken for s390x.

P.S. You know if Debian and Red Hat can't agree on whether to spell ppc64le or ppc64el, we definitely won't be able to agree on page size. :P It looks like Debian distros are saying ppc64el while Red Hat and SUSE say ppc64le. Whatever....

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Carlos Alberto Lopez Perez from comment #18)
> And other question: does this only affect ppc64el or also ppc64 and ppc?

(Looks like neither Debian nor Red Hat supports ppc64 or ppc anymore, so it probably doesn't matter much.)

Revision history for this message
In , Mark-lam (mark-lam) wrote :

(In reply to Michael Catanzaro from comment #20)
> Sebastien, you told me there was no crash. :( A crash makes everything way
> easier!
>
> (In reply to Carlos Alberto Lopez Perez from comment #18)
> > Do you mean it lacks support for mprotect() with PROT_READ ?
>
> errno is set on failure, so WebKit should check errno and print a proper
> error using strerror(errno) to give us an idea of what's going wrong. But
> I'm pretty sure it's going to be EINVAL. Look:
>
> EINVAL addr is not a valid pointer, or not a multiple of the system
> page size.
>
> And I see we have in JSCConfig.h:
>
> #if !OS(WINDOWS)
> constexpr size_t ConfigSizeToProtect = 16 * KB;
> #else
> constexpr size_t ConfigSizeToProtect = 4 * KB;
> #endif
>
> which is definitely incorrect. This is reminiscent of bug #182923, fixed in
> r232909. Note that in that revision, the task was to find a block size that
> was *at least as big* as the actual page size, so it was not a precise fix:
> we just guessed 64 KB was big enough for strange architectures without
> attempting to be precise about it. Hopefully the same approach would work
> here?

I was surprised that this could be the issue because there's an assert in Config::permanentlyFreeze() that ensures that ConfigSizeToProtect is an integral multiple of page size. However, checking the code, I see:

#if PLATFORM(COCOA)
    RELEASE_ASSERT(roundUpToMultipleOf(vmPageSize(), ConfigSizeToProtect) == ConfigSizeToProtect);
#endif

It's a pity that vmPageSize() is not available on non-cocoa ports, or this issue would be more clearly identified.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Mark Lam from comment #22)
> It's a pity that vmPageSize() is not available on non-cocoa ports, or this
> issue would be more clearly identified.

Looks like everything in ResourceUsage.h is guarded by #if PLATFORM(COCOA).

At least vmPageSize() would be very easy to implement. The Cocoa impl is:

size_t vmPageSize()
{
#if PLATFORM(IOS_FAMILY)
    return vm_kernel_page_size;
#else
    static size_t cached = sysconf(_SC_PAGESIZE);
    return cached;
#endif
}

The Linux impl would be 100% identical to the non-IOS_FAMILY path there. Just needs to return sysconf(_SC_PAGESIZE). There's also _SC_PAGE_SIZE, which is a synonym. Both of these are specified by POSIX.1, so they should work on every Unix, not just Linux.

Revision history for this message
In , Tpopela (tpopela) wrote :

(In reply to Michael Catanzaro from comment #20)
> I'm not sure where to find correct pages sizes to assume. Looking at an
> internal email that doesn't cite any sources, it looks like s390x uses 4 KB
> pages, ppc64le uses 64 KB pages on RHEL but 4 KB on various other distros
> (would be too easy otherwise!), aarch64 uses 64 KB pages...

Some correction:

s390x, ppc64(le) in RHEL and Fedora use 64 KB page size
aarch64 on RHEL uses 64 KB and on Fedora 4 KB page size

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

> @seb128: can you test if this patch fixes the issue?

@Carlos, your return patch does fix the issue indeed!

> And if it does can you let me know which one its the value of
> CMAKE_SYSTEM_PROCESSOR for s390x? you can find this by grepping
> on the build directory for this string.

CMakeFiles/3.16.3/CMakeSystem.cmake:set(CMAKE_SYSTEM_PROCESSOR "s390x")

> And other question: does this only affect ppc64el or also ppc64 and ppc?

Ubuntu also provides the first variant and Debian doesn't run its tests on those archs so I can't easily say

> Sebastien, you told me there was no crash. :( A crash makes everything way easier!

@Michael, sorry about that, the small script doesn't print anything on stdout/stderr and neither does MiniBrowser. Since the UI was still open and no eror was printer I assumed there was no crash, I didn't know there was a background rendering process that could crash like this (I should have checked the apport directory but since it's in a remote server I didn't get notified about that either)

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

I just tested on s390x and the rendering in fact works there, we have some autopkgtest failures but they seem different from the issue described on this bug.
I've updated the title now to mention only ppc64el now
Sorry if that created confusion or extra work

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

> Anyway, Sebastien, try changing ConfigSizeToProtect to 64 * KB and see if
> that works. My guess is that will fix ppc64le, but something somewhere
> else will be broken for s390x.

The change seems to resolve the issue on ppc indeed, and as stated in the previous comment we can ignore s390x now

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Changed in ruby-gnome (Ubuntu Focal):
assignee: nobody → Lucas Kanashiro (lucaskanashiro)
status: New → Confirmed
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Trying to hint this as a force-badtest in britney. Whenever this is solved, please remove force-badtest.

Revision history for this message
Sebastien Bacher (seb128) wrote :

Don't froce/badtest that one, webkit2gtk is busted for real on ppc64el and the test did catch a valid regression, try starting MiniBrowser from webkgit and it fails to load any webpage

affects: ruby-gnome (Ubuntu Focal) → webkit2gtk (Ubuntu Focal)
Changed in webkit2gtk (Ubuntu Focal):
assignee: Lucas Kanashiro (lucaskanashiro) → Sebastien Bacher (seb128)
importance: Undecided → High
status: Confirmed → In Progress
Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Tomas Popela from comment #24)
> Some correction:
>
> s390x, ppc64(le) in RHEL and Fedora use 64 KB page size
> aarch64 on RHEL uses 64 KB and on Fedora 4 KB page size

That makes a *lot* more sense than the mail I was looking at.

In that case, aarch64 must be broken on RHEL for a while now, because MarkedBlock.h uses 16 KB block size on that architecture.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Michael Catanzaro from comment #28)
> In that case, aarch64 must be broken on RHEL for a while now, because
> MarkedBlock.h uses 16 KB block size on that architecture.

Tomas pointed me to a downstream patch:

diff --git a/Source/JavaScriptCore/heap/MarkedBlock.h b/Source/JavaScriptCore/heap/MarkedBlock.h
index e240f0ae..6bf88692 100644
--- a/Source/JavaScriptCore/heap/MarkedBlock.h
+++ b/Source/JavaScriptCore/heap/MarkedBlock.h
@@ -68,7 +68,7 @@ public:
     static constexpr size_t atomSize = 16; // bytes

     // Block size must be at least as large as the system page size.
-#if CPU(PPC64) || CPU(PPC64LE) || CPU(PPC) || CPU(UNKNOWN)
+#if CPU(PPC64) || CPU(PPC64LE) || CPU(PPC) || CPU(ARM64) || CPU(UNKNOWN)
     static constexpr size_t blockSize = 64 * KB;
 #else
     static constexpr size_t blockSize = 16 * KB;

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Hey Mark, how was ConfigSizeToProtect chosen?

#if !OS(WINDOWS)
constexpr size_t ConfigSizeToProtect = 16 * KB;
#else
constexpr size_t ConfigSizeToProtect = 4 * KB;
#endif

Is it supposed to match page size (in which case most Linux architectures should be using 4 KB rather than 16 KB)? Or is it desired to be exactly 16 KB everywhere regardless of page size unless page size is bigger than 16 KB? Why is Windows using 4 KB while everything else uses 16 KB?

Same question applies to the blockSize in MarkedBlock.h. In that case, the code is a bit more clear, and I guess the desired block size is min(16 KB, page size)?

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Seb, here's your workaround patch for Ubuntu, which should be safe until we figure out a proper upstream fix. It maintains the current behavior except on architectures that are already likely broken:

diff --git a/Source/JavaScriptCore/runtime/JSCConfig.h b/Source/JavaScriptCore/runtime/JSCConfig.h
index 1ae53a56431a..ec67610057b8 100644
--- a/Source/JavaScriptCore/runtime/JSCConfig.h
+++ b/Source/JavaScriptCore/runtime/JSCConfig.h
@@ -34,10 +34,12 @@ class ExecutableAllocator;
 class FixedVMPoolExecutableAllocator;
 class VM;

-#if !OS(WINDOWS)
-constexpr size_t ConfigSizeToProtect = 16 * KB;
-#else
+#if OS(WINDOWS)
 constexpr size_t ConfigSizeToProtect = 4 * KB;
+#elif CPU(PPC64) || CPU(PPC64LE) || CPU(PPC) || CPU(UNKNOWN)
+constexpr size_t ConfigSizeToProtect = 64 * KB;
+#else
+constexpr size_t ConfigSizeToProtect = 16 * KB;
 #endif

 #if ENABLE(SEPARATED_WX_HEAP)

Revision history for this message
In , Mark-lam (mark-lam) wrote :

(In reply to Michael Catanzaro from comment #30)
> Hey Mark, how was ConfigSizeToProtect chosen?
>
> #if !OS(WINDOWS)
> constexpr size_t ConfigSizeToProtect = 16 * KB;
> #else
> constexpr size_t ConfigSizeToProtect = 4 * KB;
> #endif
>
> Is it supposed to match page size (in which case most Linux architectures
> should be using 4 KB rather than 16 KB)? Or is it desired to be exactly 16
> KB everywhere regardless of page size unless page size is bigger than 16 KB?
> Why is Windows using 4 KB while everything else uses 16 KB?
>
> Same question applies to the blockSize in MarkedBlock.h. In that case, the
> code is a bit more clear, and I guess the desired block size is min(16 KB,
> page size)?

Look at JSC::Config::ensureSize. It is padded so that sizeof(JSC::Config) is a multiple of OS page size. ConfigSizeToProtect is shone to be that rounded up multiple of OS page size.

Revision history for this message
In , Mark-lam (mark-lam) wrote :

(In reply to Mark Lam from comment #32)
> Look at JSC::Config::ensureSize. It is padded so that sizeof(JSC::Config)
> is a multiple of OS page size. ConfigSizeToProtect is shone to be that
> rounded up multiple of OS page size.

/shone/set/

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to Michael Catanzaro from comment #28)
> The best solution is to get page size at runtime using
> sysconf(_SC_PAGESIZE), but it looks like the code really wants a
> compile-time solution. So maybe just hardcode 64 KB for these CPUs and for
> CPU(UNKNOWN)? Ideally we would share the value with MarkedBlock.h? clopez,
> what do you think?
>

(In reply to Tomas Popela from comment #24)
>
> s390x, ppc64(le) in RHEL and Fedora use 64 KB page size
> aarch64 on RHEL uses 64 KB and on Fedora 4 KB page size

I wonder about doing something like the patch below to avoid future problems in cases where we can't predict the page size.

diff --git a/Source/JavaScriptCore/runtime/JSCConfig.cpp b/Source/JavaScriptCore/runtime/JSCConfig.cpp
index 79cc2b67ba9..b85393e1def 100644
--- a/Source/JavaScriptCore/runtime/JSCConfig.cpp
+++ b/Source/JavaScriptCore/runtime/JSCConfig.cpp
@@ -33,6 +33,7 @@
 #include <mach/mach.h>
 #elif OS(LINUX)
 #include <sys/mman.h>
+#include <unistd.h>
 #endif

 namespace JSC {
@@ -70,7 +71,10 @@ void Config::permanentlyFreeze()
     // There's no going back now!
     result = vm_protect(mach_task_self(), reinterpret_cast<vm_address_t>(&g_jscConfig), ConfigSizeToProtect, DisallowPermissionChangesAfterThis, VM_PROT_READ);
 #elif OS(LINUX)
- result = mprotect(&g_jscConfig, ConfigSizeToProtect, PROT_READ);
+ // Some architectures on Linux may have non-default page size.
+ // In that cases, avoid the crash in the RELEASE_ASSERT below due to using mprotect with a wrong page size.
+ if (sysconf(_SC_PAGESIZE) == ConfigSizeToProtect)
+ result = mprotect(&g_jscConfig, ConfigSizeToProtect, PROT_READ);
 #elif OS(WINDOWS)
     // FIXME: Implement equivalent, maybe with VirtualProtect.
     // Also need to fix WebKitTestRunner

Does this looks like a good idea?

Revision history for this message
In , Mark-lam (mark-lam) wrote :

(In reply to Carlos Alberto Lopez Perez from comment #34)
> (In reply to Michael Catanzaro from comment #28)
> > The best solution is to get page size at runtime using
> > sysconf(_SC_PAGESIZE), but it looks like the code really wants a
> > compile-time solution. So maybe just hardcode 64 KB for these CPUs and for
> > CPU(UNKNOWN)? Ideally we would share the value with MarkedBlock.h? clopez,
> > what do you think?
> >

Hardcoding to 64K is a good approach in addition to the check below.

> (In reply to Tomas Popela from comment #24)
> >
> > s390x, ppc64(le) in RHEL and Fedora use 64 KB page size
> > aarch64 on RHEL uses 64 KB and on Fedora 4 KB page size
>
>
> I wonder about doing something like the patch below to avoid future problems
> in cases where we can't predict the page size.
>
>
> diff --git a/Source/JavaScriptCore/runtime/JSCConfig.cpp
> b/Source/JavaScriptCore/runtime/JSCConfig.cpp
> index 79cc2b67ba9..b85393e1def 100644
> --- a/Source/JavaScriptCore/runtime/JSCConfig.cpp
> +++ b/Source/JavaScriptCore/runtime/JSCConfig.cpp
> @@ -33,6 +33,7 @@
> #include <mach/mach.h>
> #elif OS(LINUX)
> #include <sys/mman.h>
> +#include <unistd.h>
> #endif
>
> namespace JSC {
> @@ -70,7 +71,10 @@ void Config::permanentlyFreeze()
> // There's no going back now!
> result = vm_protect(mach_task_self(),
> reinterpret_cast<vm_address_t>(&g_jscConfig), ConfigSizeToProtect,
> DisallowPermissionChangesAfterThis, VM_PROT_READ);
> #elif OS(LINUX)
> - result = mprotect(&g_jscConfig, ConfigSizeToProtect, PROT_READ);
> + // Some architectures on Linux may have non-default page size.
> + // In that cases, avoid the crash in the RELEASE_ASSERT below due to
> using mprotect with a wrong page size.
> + if (sysconf(_SC_PAGESIZE) == ConfigSizeToProtect)
> + result = mprotect(&g_jscConfig, ConfigSizeToProtect, PROT_READ);

This check is not correct. ConfigSizeToProtect does not need to be strictly equal to sysconf(_SC_PAGESIZE). It needs to be a multiple of sysconf(_SC_PAGESIZE), where the multiple can be 1 or higher. You can use WTF::roundUpToMultipleOf() to compute that.

>
>
>
> Does this looks like a good idea?

Revision history for this message
In , Mark-lam (mark-lam) wrote :

(In reply to Mark Lam from comment #35)
> (In reply to Carlos Alberto Lopez Perez from comment #34)
> > (In reply to Michael Catanzaro from comment #28)
> > > The best solution is to get page size at runtime using
> > > sysconf(_SC_PAGESIZE), but it looks like the code really wants a
> > > compile-time solution. So maybe just hardcode 64 KB for these CPUs and for
> > > CPU(UNKNOWN)? Ideally we would share the value with MarkedBlock.h? clopez,
> > > what do you think?
> > >
>
> Hardcoding to 64K is a good approach in addition to the check below.

I should qualify this statement: hardcoding to 64K is a better workaround than disabling this feature outright.

The feature is a security mitigation. If preventing the crash is a higher priority, the proposed check is good at the price of disabling this mitigation.

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to Mark Lam from comment #36)
> (In reply to Mark Lam from comment #35)
> > (In reply to Carlos Alberto Lopez Perez from comment #34)
> > > (In reply to Michael Catanzaro from comment #28)
> > > > The best solution is to get page size at runtime using
> > > > sysconf(_SC_PAGESIZE), but it looks like the code really wants a
> > > > compile-time solution. So maybe just hardcode 64 KB for these CPUs and for
> > > > CPU(UNKNOWN)? Ideally we would share the value with MarkedBlock.h? clopez,
> > > > what do you think?
> > > >
> >
> > Hardcoding to 64K is a good approach in addition to the check below.
>
> I should qualify this statement: hardcoding to 64K is a better workaround
> than disabling this feature outright.
>
> The feature is a security mitigation. If preventing the crash is a higher
> priority, the proposed check is good at the price of disabling this
> mitigation.

I have to admit I don't understand the attack/mitigation scenario, neither what "Harden JSC against the abuse of runtime options." really means (even after reading the changelog of r249808 ).

How this options are supposed to be abused? Can that be done by a malicious website?

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

I think RELEASE_ASSERT() is the right thing to do if we somehow get the page size wrong. But let's RELEASE_ASSERT() on the sysconf(_SC_PAGESIZE) check instead of later when mprotect() fails. I'll cook a patch.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Here's an initial patch.

Mark, any idea what's up with WTF::pageSize() vs. WTF::vmPageSize()? They do the same thing on all platforms except iOS family, where pageSize() uses the userspace page size (16 KiB, vm_page_size) whereas vmPageSize() uses the kernel page size (4 KiB, vm_kernel_page_size). That's a strange difference. At first I had implemented vmPageSize() for OS(UNIX), but eventually I realized I could just get rid of that because it's really only needed in ResourceUsageCocoa.cpp, and changed JSCConfig to use pageSize() rather than vmPageSize(). Should be fine for iOS family because roundUpToMultipleOf(4 KiB, 16 KiB) == roundUpToMultipleOf(4 KiB, 16 KiB) == 16 KiB. I wound up not touching vmPageSize() at all, but it seems really suspicious to me. There's also bmalloc::vmPageSize(), which is the same as WTF::pageSize(), i.e. on iOS bmalloc::vmPageSize() returns a different value than WTF::vmPageSize() because bmalloc uses the userspace value while WTF uses the kernel value.)

Behavior change: I reduced the size of JSCConfig from 16 KiB down to 4 KiB on macOS and most Linux architectures. I assume that makes sense because there's no reason for it to be any bigger than a single page, yes? I can be the hero who saved 12 KiB per web process?

Then I made no behavior change in MarkedBlock.h. I used std::max(16 * KiB, CeilingOnPageSize) there, which I assume that makes sense because any changes there would affect object fragmentation, which could have performance impacts. Didn't want to mess with that.

Oh, and I changed pageSize() to use sysconf(_SC_PAGESIZE) instead of getpagesize() just because that's deprecated. And added RELEASE_ASSERTs in pageSize().

Let's see if EWS likes it....

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Created attachment 394053
Patch

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Michael Catanzaro from comment #39)
> Here's an initial patch.
>
> Mark, any idea what's up with WTF::pageSize() vs. WTF::vmPageSize()? They do
> the same thing on all platforms except iOS family,

(Um, well it's only implemented on Cocoa, so they do the same thing on Cocoa platforms, except iOS family.)

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Michael Catanzaro from comment #39)
> Should be fine for iOS family because roundUpToMultipleOf(4
> KiB, 16 KiB) == roundUpToMultipleOf(4 KiB, 16 KiB) == 16 KiB.

I meant to write: roundUpToMultipleOf(4 KiB, 16 KiB) == roundUpToMultipleOf(16 KiB, 16 KiB) == 16 KiB.

Revision history for this message
In , Tpopela (tpopela) wrote :

Comment on attachment 394053
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394053&action=review

> Source/WTF/wtf/PageBlock.h:47
> +#if CPU(UNKNOWN) || CPU(PPC) || (CPU(ARM64) && !PLATFORM(IOS_FAMILY))

I think that PPC here means the old 32-bit PowerPC architecture no? Also CPU(ARM64) here will be wrong for Fedora and others, but I fail to see, where the 4 KB will be chosen over these 64 KB.

Revision history for this message
In , Mark-lam (mark-lam) wrote :

Comment on attachment 394053
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394053&action=review

> Source/JavaScriptCore/runtime/JSCConfig.h:38
> +constexpr size_t ConfigSizeToProtect = CeilingOnPageSize;

We want Mac to match iOS. Please only change this for non-Darwin/Cocoa platforms.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Tomas Popela from comment #43)
> I think that PPC here means the old 32-bit PowerPC architecture no?

No, it means any variety of PPC. The old condition was redundant.

> Also
> CPU(ARM64) here will be wrong for Fedora and others, but I fail to see,
> where the 4 KB will be chosen over these 64 KB.

That's OK because it's just a ceiling. We only need to make sure the value is large enough for everyone.

(In reply to Mark Lam from comment #44)
> We want Mac to match iOS. Please only change this for non-Darwin/Cocoa
> platforms.

Sure, but did you read comment #39? The iOS value seems to be wrong. I don't know much about iOS, but surely using the kernel page size for userspace pages isn't right? Seems like it can probably be reduced to 4 KB on iOS as well? (I bet EWS will prove me wrong if I try it... but would be interesting to see.)

Revision history for this message
In , Sbarati (sbarati) wrote :

Comment on attachment 394053
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394053&action=review

> Source/WTF/wtf/PageBlock.h:50
> +constexpr size_t CeilingOnPageSize = 16 * KB;

This number should be 16k for all Darwin.

Revision history for this message
In , Sbarati (sbarati) wrote :

You probably also need the same thing inside bmalloc unless you’re not using it.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

I'll upload a new patch that always uses 16 KB for all Cocoa builds, so you can have the same value everywhere.

(In reply to Saam Barati from comment #47)
> You probably also need the same thing inside bmalloc unless you’re not using
> it.

bmalloc determines page size at runtime, so it always does the right thing.

(In reply to Michael Catanzaro from comment #45)
> Sure, but did you read comment #39? The iOS value seems to be wrong. I don't
> know much about iOS, but surely using the kernel page size for userspace
> pages isn't right? Seems like it can probably be reduced to 4 KB on iOS as
> well? (I bet EWS will prove me wrong if I try it... but would be interesting
> to see.)

Sorry, I got this completely backwards. iOS kernel page size is 4 KB, userspace page size is 16 KB. So I think ResourceUsageCocoa.cpp is returning bad results on iOS, by a factor of four. Someone from Apple could confirm and report a separate bug for this?

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Michael Catanzaro from comment #48)
> bmalloc determines page size at runtime, so it always does the right thing.

Actually, we disable bmalloc on architectures that don't use 4 KB pages because it doesn't work. I looked for a place within bmalloc where it assumes page size at compile time and found bmalloc/Sizes.h. I guess if we update this, bmalloc might work on more architectures?

We've had bmalloc disabled on these architectures for years. I never realized it might be so simple as changing page size. Anyway, we can address this in a separate bug.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Created attachment 394085
Patch

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Sebastien, please check that this new version of the patch also works for Ubuntu. Thanks!

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Michael Catanzaro from comment #45)
> (In reply to Tomas Popela from comment #43)
> > I think that PPC here means the old 32-bit PowerPC architecture no?
>
> No, it means any variety of PPC. The old condition was redundant.

Ugh sorry, you're right, it's broken. New version incoming.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Michael Catanzaro from comment #52)
> Ugh sorry, you're right, it's broken. New version incoming.

CPU(PPC) means 32-bit PPC only.

CPU(PPC64) means ppc64 OR ppc64le, so having both in the condition is redundant.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Created attachment 394087
Patch

Revision history for this message
In , Tpopela (tpopela) wrote :

(In reply to Michael Catanzaro from comment #53)
> (In reply to Michael Catanzaro from comment #52)
> > Ugh sorry, you're right, it's broken. New version incoming.
>
> CPU(PPC64) means ppc64 OR ppc64le, so having both in the condition is
> redundant.

Is it?

/* CPU(PPC64) - PowerPC 64-bit Big Endian */
#if ( defined(__ppc64__) \
    || defined(__PPC64__)) \
    && defined(__BYTE_ORDER__) \
    && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
#define WTF_CPU_PPC64 1
#define WTF_CPU_KNOWN 1
#endif

/* CPU(PPC64LE) - PowerPC 64-bit Little Endian */
#if ( defined(__ppc64__) \
    || defined(__PPC64__) \
    || defined(__ppc64le__) \
    || defined(__PPC64LE__)) \
    && defined(__BYTE_ORDER__) \
    && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
#define WTF_CPU_PPC64LE 1
#define WTF_CPU_KNOWN 1
#endif

from wtf/PlatformCPU.h

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Tomas Popela from comment #55)
> && defined(__BYTE_ORDER__) \
> && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
> #define WTF_CPU_PPC64LE 1
> #define WTF_CPU_KNOWN 1
> #endif

Apparently I am awful at reading code.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Created attachment 394110
Patch

Revision history for this message
In , Sebastien Bacher (seb128) wrote :

@Michael,

I will give a try to the patch on monday, the JSCConfig.h code in webkitgtk 3.28 is different so the patch doesn't apply, I tried to adapt it but probably screwed something since it failed to build but it's a bit late to rework that today now

Revision history for this message
In , carloslp (carloslp) wrote :

*** Bug 209171 has been marked as a duplicate of this bug. ***

Revision history for this message
In , dkg (dkg0) wrote :

I believe the same crash is happening on mipsel as well, so if these fixes can be tested on that platform, that would be great.

Here's a build of geary on the debian mipsel buildd that shows this failure during the test suite:

 https://buildd.debian.org/status/fetch.php?pkg=geary&arch=mipsel&ver=3.36.0-2&stamp=1584411277&raw=0

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to Daniel Kahn Gillmor from comment #60)
> I believe the same crash is happening on mipsel as well, so if these fixes
> can be tested on that platform, that would be great.
>
> Here's a build of geary on the debian mipsel buildd that shows this failure
> during the test suite:
>
> https://buildd.debian.org/status/fetch.php?pkg=geary&arch=mipsel&ver=3.36.0-
> 2&stamp=1584411277&raw=0

We do extensive testing of JSC on mipsel (32-bits) and we have not noticed a problem with this (AFAIK)

The MIPS board we use for testing (ci20 with buidroot) have a pagesize of 4096 (same than Linux/x86_64)

Can you confirm which page size does Debian uses for the mipsel port? You can get that by running the command "getconf PAGESIZE" on that mipsel machine.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Comment on attachment 394110
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394110&action=review

> Source/WTF/wtf/PageBlock.h:50
> +// Use 64 KiB for any unknown CPUs to be conservative. This covers s390x, which doesn't currently
> +// have its own CPU() macro and which also uses 64 KiB pages.

This is wrong. s390x uses 4 KiB.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Created attachment 394268
Patch

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Created attachment 394269
Patch

Revision history for this message
In , Tpopela (tpopela) wrote :

(In reply to Michael Catanzaro from comment #62)
> Comment on attachment 394110 [details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=394110&action=review
>
> > Source/WTF/wtf/PageBlock.h:50
> > +// Use 64 KiB for any unknown CPUs to be conservative. This covers s390x, which doesn't currently
> > +// have its own CPU() macro and which also uses 64 KiB pages.
>
> This is wrong. s390x uses 4 KiB.

Sorry about that.. I was wrong/mistaken for the last 7 years..

Revision history for this message
In , Mark-lam (mark-lam) wrote :

Comment on attachment 394269
Patch

r=me if EWS bots are happy.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Thanks Mark!

We've confirmed this fixes ppc64le on Red Hat's CI, so I can be pretty confident it will fix Ubuntu's problem too. I trust Sebastien will let us know when he's tested it.

(In reply to Daniel Kahn Gillmor from comment #60)
> I believe the same crash is happening on mipsel as well, so if these fixes
> can be tested on that platform, that would be great.
>
> Here's a build of geary on the debian mipsel buildd that shows this failure
> during the test suite:
>
> https://buildd.debian.org/status/fetch.php?pkg=geary&arch=mipsel&ver=3.36.0-
> 2&stamp=1584411277&raw=0

How do you know it's this particular crash?

This patch is definitely not going to affect anything on MIPS, because it assumes 4 kiB page size on MIPS. We can change that if you have access to an affected MIPS system and report the 'getconf PAGESIZE' you see on that machine. (If it returns 4096, then your bug is something else.) I don't have access to any MIPS systems, so I can't test this for you.

Revision history for this message
In , Ews-feeder (ews-feeder) wrote :

Committed r258857: <https://trac.webkit.org/changeset/258857>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 394269.

Revision history for this message
In , Webkit-bug-importer (webkit-bug-importer) wrote :

<rdar://problem/60780778>

Revision history for this message
In , Sbarati (sbarati) wrote :

(In reply to Michael Catanzaro from comment #49)
> (In reply to Michael Catanzaro from comment #48)
> > bmalloc determines page size at runtime, so it always does the right thing.
>
> Actually, we disable bmalloc on architectures that don't use 4 KB pages
> because it doesn't work. I looked for a place within bmalloc where it
> assumes page size at compile time and found bmalloc/Sizes.h. I guess if we
> update this, bmalloc might work on more architectures?
>
> We've had bmalloc disabled on these architectures for years. I never
> realized it might be so simple as changing page size. Anyway, we can address
> this in a separate bug.

I meant the "config size to protect" thing inside Gigacage code.

Revision history for this message
In , Sbarati (sbarati) wrote :

(In reply to Saam Barati from comment #70)
> (In reply to Michael Catanzaro from comment #49)
> > (In reply to Michael Catanzaro from comment #48)
> > > bmalloc determines page size at runtime, so it always does the right thing.
> >
> > Actually, we disable bmalloc on architectures that don't use 4 KB pages
> > because it doesn't work. I looked for a place within bmalloc where it
> > assumes page size at compile time and found bmalloc/Sizes.h. I guess if we
> > update this, bmalloc might work on more architectures?
> >
> > We've had bmalloc disabled on these architectures for years. I never
> > realized it might be so simple as changing page size. Anyway, we can address
> > this in a separate bug.
>
> I meant the "config size to protect" thing inside Gigacage code.

I believe the "smallPageSize" stuff inside bmalloc's Sizes.h isn't required to be what the OS says it is. But I need to read a bit closer on how it's used.

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to EWS from comment #68)
> Committed r258857: <https://trac.webkit.org/changeset/258857>
>
> All reviewed patches have been landed. Closing bug and clearing flags on
> attachment 394269 [details].

This has caused a regression on our ARM64 JSCOnly testers (which use 4KB of page size).
The tester its crashing all the time.
https://build.webkit.org/builders/JSCOnly%20Linux%20AArch64%20Release

Its also making the JSCOnly ARM64 EWS queue worthless (since it fails all the time) https://build.webkit.org/buildslaves/jsconly-linux-igalia-bot-2

So I'm reverting this now, and I'll see if we can add here more info about why its crashing in order to rework the patch.

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to Carlos Alberto Lopez Perez from comment #72)
> Its also making the JSCOnly ARM64 EWS queue worthless (since it fails all
> the time) https://build.webkit.org/buildslaves/jsconly-linux-igalia-bot-2
>

Agh! forgot about that, we don't have an EWS for JSCOnly ARM64.
That link was just the same than the one for the buildbot, just different view.

Revision history for this message
In , Zan-f (zan-f) wrote :

Created attachment 394736
JSCOnly aarch64 strace log

Here's output of strace for the SIGSEGV that occurs immediately after launch.

Changed in webkit:
importance: Unknown → Medium
status: Unknown → Fix Released
Changed in webkit2gtk (Ubuntu Focal):
status: In Progress → Fix Released
Revision history for this message
In , dkg (dkg0) wrote :

should this remain RESOLVED FIXED if the patches have been reverted? I'm sorry i haven't followed the changes closely enough upstream. Is there a version where these fixes have landed?

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

We wound up not reverting the patches.

aarch64 is still broken (bug #209670), but ppc64le is now fine.

Solving bug #209670 is basically impossible. We are being expected to provide a value at compile time that cannot be known until runtime. Very frustrating.

Oh, and I discovered people are actually using huge pages, so WebKit is guaranteed to be broken for them no matter what. *shrug!*

Revision history for this message
In , Mark-lam (mark-lam) wrote :

(In reply to Michael Catanzaro from comment #76)
> We wound up not reverting the patches.
>
> aarch64 is still broken (bug #209670), but ppc64le is now fine.
>
> Solving bug #209670 is basically impossible. We are being expected to
> provide a value at compile time that cannot be known until runtime. Very
> frustrating.
>
> Oh, and I discovered people are actually using huge pages, so WebKit is
> guaranteed to be broken for them no matter what. *shrug!*

One way around this is to disable this mitigation for those platforms that cannot be statically configured: basically, on those platforms, at runtime, check if page size is within acceptable limits (and whatever other criteria you need), and if not, disable the mitigation and don’t do the page freeze.

On platforms that can guarantee their page sizes, this option to disable the mitigation should be #if’d out.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

Yeah you're right, that's exactly what we should do. We can check page size at runtime only on Linux platforms and disable config freezing, JIT, and bmalloc when pages are large. Will handle in bug #209670.

Revision history for this message
In , Mark-lam (mark-lam) wrote :

Comment on attachment 394269
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394269&action=review

> Source/WTF/ChangeLog:11
> + will be even easier to detect if we change RELEASE_ASSERT_WITH_MESSAGE() to actually print

We should not change RELEASE_ALERT_WITH_MESSAGE() to unconditionally print its message. That message is only meant to be used in debug builds. But if you really want a message in a release build, you can always check the condition first and print your message before doing the RELEASE_ASSERT. I do not recommend doing this in the general case (for performance reasons), but in your implementation here, it’ll only be executed once. So checking and printing that error message won’t hurt much.

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

(In reply to Mark Lam from comment #79)
> We should not change RELEASE_ALERT_WITH_MESSAGE() to unconditionally print
> its message.

You probably want to look at bug #204399 ;)

Revision history for this message
In , Michael Catanzaro (mike-catanzaro) wrote :

*** Bug 204017 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.