autopkgtests broken in hirsute - error: int32 scalar cannot be indexed with .

Bug #1911400 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Octave
Unknown
Unknown
octave (Ubuntu)
New
Undecided
Unassigned
octave-parallel (Ubuntu)
New
Undecided
Unassigned

Bug Description

This is failing reliably on autopkgtes infra:

- initially vs 3.1.3
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20201210_093838_8516f@/log.gz
- Trigger octave/6.1.1~hg.2020.12.27-3 octave-parallel/4.0.0-2build1 octave-struct/1.0.16-8:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210108_091245_ef9a9@/log.gz
- all-proposed:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210112_151852_47229@/log.gz
- this actually fails before the new octave, with the intorduction of the new octave-parallel
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20201024_131217_175b0@/log.gz

Containers with hirsute and hirsute-proposed work fine.
The following is the same in debian-sid, hirsute and hirsute-proposed:

root@h:~/octave-parallel-3.1.3# DH_OCTAVE_TEST_ENV="xvfb-run -a" /usr/bin/dh_octave_check --use-installed-package
Checking package...
Checking m files ...
[inst/pararrayfun.m]
...
[parcellfun]
PASSES 1 out of 1 test
Summary: 11 tests, 11 passed, 0 known failures, 0 skipped

Local VM based autopkgtests all work.
They work for hirsute and hirsute-proposed and and a selection of just
octave,octave-parallel,octave-struct.

Even the old tests against octave-parallel 3.1.3 failed.
So it is not just "coming with 4.x" of octave-parallel.
On LP they failed as well:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210106_025958_c2157@/log.gz

Checking slightly deeper showed that the initial fail vs 3.1.3 was
NOT the same, it was a badpkg
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20201210_093838_8516f@/log.gz
But reruns of that apear to be the same as we see in other times
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210102_194323_dbfce@/log.gz

All failing ones of them come down to:

!!!!! test failed
int32 scalar cannot be indexed with {

In comparison this seems fine on debci, there all 4.0.0-2 runs LGTM
https://ci.debian.net/packages/o/octave-parallel/unstable/amd64/
https://ci.debian.net/data/autopkgtest/unstable/amd64/o/octave-parallel/9643500/log.gz

Another moving piece in this puzzle is dh-ocatve which was updated on
30th December 2020. There isn't an old version of that in hirsute anymore,
it migrated.
 dh-octave | 0.7.6 | groovy/universe | source, all
 dh-octave | 1.0.3 | hirsute/universe | source, all
But the changelog isn't too suspicious.

Not sure if it is important - but the order on the tests differ.
Each run seem to have a random combination, but that is true for good and
bad runs.

Just to be clear on the error, it is of this type:
https://octave.org/doc/v4.2.1/Integer-Data-Types.html
octave:4> data.foo
error: scalar cannot be indexed with .
octave:5> data = int32(1234)
data = 1234
octave:6> data.foo
error: int32 scalar cannot be indexed with .

But without a reproducer it is hard where it might be from.
Maybe language dependent as local repros tend to get soem remainders
of local lang.

I'm out of good ideas, but will continue before taking a step back and masking the test.

I've run it with debug enabled without much gain - not even when comparing that to a debug enabled good run..
This was done in a PPA (https://launchpad.net/~paelzer/+archive/ubuntu/lp-1911400-octave-test-fails/+packages) that does the tests more verbose via:
$ xvfb-run -a octave-cli --debug --verbose --no-history --no-init-file --no-window-system inst/parcellfun.m

Further TODO's
- try the old dh-octave?

Related branches

tags: added: update-excuse
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

All proposed + PPA:

Locally:
$ sudo ~/work/autopkgtest/autopkgtest/runner/autopkgtest --no-built-binaries --apt-upgrade --apt-pocket=proposed --setup-commands="add-apt-repository ppa:paelzer/lp-1911400-octave-test-fail; apt update; apt -y upgrade" --shell octave-parallel_4.0.0-2ubuntu1~ppa1.dsc -- qemu --ram-size=1536 --cpus 2 ~/work/autopkgtest-hirsute-amd64.img
=> this kept working

PPA:
$ lp-test-ppa ppa:paelzer/lp-1911400-octave-test-fails --release hirsute --showskip --showpass
=> This kept failing

I've isolated the logs of one of the tests (inst/pararrayfun.m) and diffed them.
Unfortunately not much insight. After a whopping 15013 lines that fully match it directly runs into the fail in the bad case.
The issue is very close to the end with just 20 lines following in the good case before the logs fully match again (except reports stating that one test failed).

The asserts of the test are so far away - it seems this is "on the way out" of the test and not directly associated to the assert.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - I have pinged the octave community summarizing the case as I'm out of ideas how to further track the root issue. Maybe there is some octave trick that helps to pinpoint the bad code.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

octave also has an option to show the code it executes.
That has given me the same snippet for both tests it triggers.

 47367 + if (__verbose < 1)
 47368 + fprintf (__fid, "%s%s\n", __signal_block, __block);
 47369 + fflush (__fid);
 47370 + endif
 47371 + fprintf (__fid, "%s\n", __msg);
 47372 !!!!! test failed
 47373 int32 scalar cannot be indexed with {
 47374 + ## Show the variable context.
 47375 + if (! strcmp (__type, "error")
 47376 + && ! strcmp (__type, "testif")
 47377 + && ! strcmp (__type, "xtest")
 47378 + && ! all (__shared == " "))

That code is from
octave-common: /usr/share/octave/6.1.1~hg.2020.12.27-1/m/testfun/test.m
And it really seems it did just print that former error.

That message is constructed at
persistent __signal_fail = "!!!!! ";
...
              __msg = "test failed";
            endif
            __msg = [__signal_fail __msg "\n" lasterr()];

So we'd want to add something like stacktrace() when this is constructed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: I added an octave build to my PPA that adds a stack trace in the place where the message gets constructed. Maybe that helps to pinpoint the original source.

Build and test running...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And in my tests with further debug-modification the the stacktrace isn't helpful - so still no insight where to look at.

Also my communication with upstream/community on IRC didn't bring up a new approach to this.

To be sure I also ran things on canonistack instances - finally - there on arm64 it triggered.

[inst/parcellfun.m]
>>>>> /home/ubuntu/octave-parallel-4.0.0/inst/parcellfun.m
***** test
 assert (res = parcellfun (3, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}), [2, 6, 12, 20])
***** test
 assert (res = parcellfun (4, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "UniformOutput", false), {2, 6, 12, 20})
***** test
 assert (res = parcellfun (2, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "ChunksPerProc", 2), [2, 6, 12, 20])
***** test
 assert (res = parcellfun (4, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "CumFunc", @ (a, b) a + b), 40)
***** test
 assert (res = parcellfun (2, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40)
!!!!! test failed
int32 scalar cannot be indexed with {
***** test
 assert (ischar ((res = parcellfun (4, @ (x) sqrt (x), {1, "a", 3, 4, 5, 6},
                                    "ErrorHandler",
                                    @ (info, x) info.message,
                                    "UniformOutput", false)){2}))
6 tests, 5 passed, 0 known failure, 0 skipped
[inst/pararrayfun.m]

Hopefully that allows more debugging now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The combined script that /usr/bin/dh_octave_check creates out of internal code and octave-parallel-4.0.0/debian/check.m is this:

fid = fopen ("/tmp/tmp.ebXAvMYm49", "w");
disp ('Checking m files ...');
[usr_pkg, sys_pkg] = pkg ('list');
for i = 1 : length (sys_pkg)
    name = sys_pkg {1, i}.name;
    ## Do not load the package being checked, sinc
    ## old, incompatible version may be installed.
    if strcmp ("parallel", name) != 1
        pkg ('load', name);
    endif
endfor
pkg ('load', 'parallel');
disp ("[inst/parcellfun.m]");
[npass, ntest, nxfail, nskip] = test ("inst/parcellfun.m",
                                      ifelse (strcmp ("", ""),
                                              "verbose", ""));
printf ("%d test%s, %d passed, %d known failure%s, %d skipped\n",
        ntest, ifelse (ntest > 1, "s", ""), npass, nxfail,
        ifelse (nxfail > 1, "s", ""), nskip);
fprintf (fid, "%s %d %d %d %d\n", "inst/parcellfun.m", ntest, npass, nxfail, nskip);
disp ("[inst/pararrayfun.m]");
[npass, ntest, nxfail, nskip] = test ("inst/pararrayfun.m",
                                      ifelse (strcmp ("", ""),
                                              "verbose", ""));
printf ("%d test%s, %d passed, %d known failure%s, %d skipped\n",
        ntest, ifelse (ntest > 1, "s", ""), npass, nxfail,
        ifelse (nxfail > 1, "s", ""), nskip);
fprintf (fid, "%s %d %d %d %d\n", "inst/pararrayfun.m", ntest, npass, nxfail, nskip);
disp ("[inst/pserver.m]");
[npass, ntest, nxfail, nskip] = test ("inst/pserver.m",
                                      ifelse (strcmp ("", ""),
                                              "verbose", ""));
printf ("%d test%s, %d passed, %d known failure%s, %d skipped\n",
        ntest, ifelse (ntest > 1, "s", ""), npass, nxfail,
        ifelse (nxfail > 1, "s", ""), nskip);
fprintf (fid, "%s %d %d %d %d\n", "inst/pserver.m", ntest, npass, nxfail, nskip);
disp ('Checking C++ files ...');
[usr_pkg, sys_pkg] = pkg ('list');
for i = 1 : length (sys_pkg);
    name = sys_pkg {1, i}.name;
    ## Do not load the package being checked, sinc
    ## old, incompatible version may be installed.
    if strcmp ("parallel", name) != 1
        pkg ('load', name);
    endif
endfor
warning ('off', 'Octave:autoload-relative-file-name');
disp ("Run tests in debian/check.m");
try
    source ("debian/check.m");
    fprintf (fid, "debian/check.m 1 1 0 0\n");
catch
    fprintf (fid, "debian/check.m 1 0 1 0\n");
end_try_catch
fclose (fid);

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This can be split in several secitons.
- load the lib from the build-dir or the system pkg (the latter in this case)
- test file in inst/*: pararrayfun.m, pserver.m, parcellfun.m in our case
- checking C++ files
- run tests in debian/check.m

Knowing that we can simplify this to:

$ cat > foo << EOF
pkg ('load', 'parallel');
test ("inst/pararrayfun.m", "verbose");
EOF
$ xvfb-run -a octave-cli --no-history --silent --no-init-file --no-window-system foo

That is much better ...

That can be used in the interactive mode like:

And that can be used like this then:
$ octave-cli
octave:1> pkg ('load', 'parallel');
octave:2> pararrayfun (2, @ (x, y) x * y, [1, 2, 3, 4], [2, 3, 4, 5], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
    parcellfun at line 206 column 25
    chunk_parcellfun at line 47 column 25
    parcellfun at line 142 column 28
    pararrayfun at line 85 column 28

Even that can be more simplified:

Good:
octave:33> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 1, "CumFunc", @ (a, b) a + b), 40
ans = 40

Bad:
octave:35> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
    parcellfun at line 206 column 25
    pararrayfun at line 85 column 28

ChunksPerProc > 1 breaks it.
Might it be that we need at least 2 vcpus for the test?

Going into a 4 cpu env where things worked fine before.
But there it is also broken:

octave:1> pkg ('load', 'parallel');
octave:2> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
    parcellfun at line 206 column 25
    pararrayfun at line 85 column 28

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So without the assert I have 1 working but 2 failing with the int32 scalar issue.

octave:9> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
    parcellfun at line 206 column 25
    pararrayfun at line 85 column 28
octave:10> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 1, "CumFunc", @ (a, b) a + b), 40
ans = 40

But that isn't a fix/workaround as "1" with the assert is
assert (res = pararrayfun (2, @ (x, y) x * y, [1, 2, 3, 4], [2, 3, 4, 5], "ChunksPerProc", 1, "CumFunc", @ (a, b) a + b), 40)
error: ASSERT errors for: assert (res = pararrayfun (2, @(x, y)x * y, [1, 2, 3, 4], [2, 3, 4, 5], "ChunksPerProc", 1, "CumFunc", @(a, b)a + b),40)

  Location | Observed | Expected | Reason
     () O E Class int32 != double

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've involved upstream again and will need to see if there is a better solution.
It was shown that Debian also has issues with these tests depending on the parameter.

Since it works fine in all environments but autopkgtest and I now can keep the majority of testing alive and would just have to remove one test I think we should do that for now to unblock things.

I'm testing a build with that change ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I found that while local containers work, local VMs fail.
In General it seems it is something that is broken in VM's only - odd but interesting.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm - I thought I had checked that in comment #9 where I used a bigger canonistack env.
But locally in KVM guests I can show that
1 vpu - our fail
2 vcpu - working

And since we are dealing with software meant at parallelization it might be that this just needs at least 2 cpus to run properly.
That also explains why all the container based tests worked as they see more CPUs.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, the lac of CPUs is indeed the problem we are facing and a regression in the new version.
So much debugging and eventually such a trivial error :-/

I've filed a bug upstream about it at:
=> https://savannah.gnu.org/bugs/index.php?59869

With local autopkgtest using only 1 CPU I can finally recreate.
And in reverse that means if we mark this test as huge then it will succeed on autopkgtest.u.c \o/

MP to mark the test as huge:
https://code.launchpad.net/~paelzer/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/396300

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.