Ubuntu
octave-parallel package

autopkgtests broken in hirsute - error: int32 scalar cannot be indexed with .

Bug #1911400 reported by Christian Ehrhardt  on 2021-01-13

This bug affects 1 person

	Status	Importance	Assigned to
Octave	Unknown	Unknown	savannah #59869
octave (Ubuntu)	New	Undecided	Unassigned
octave-parallel (Ubuntu)	New	Undecided	Unassigned

Bug Description

This is failing reliably on autopkgtes infra:

- initially vs 3.1.3
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20201210_093838_8516f@/log.gz
- Trigger octave/6.1.1~hg.2020.12.27-3 octave-parallel/4.0.0-2build1 octave-struct/1.0.16-8:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210108_091245_ef9a9@/log.gz
- all-proposed:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210112_151852_47229@/log.gz
- this actually fails before the new octave, with the intorduction of the new octave-parallel
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20201024_131217_175b0@/log.gz

Containers with hirsute and hirsute-proposed work fine.
The following is the same in debian-sid, hirsute and hirsute-proposed:

root@h:~/octave-parallel-3.1.3# DH_OCTAVE_TEST_ENV="xvfb-run -a" /usr/bin/dh_octave_check --use-installed-package
Checking package...
Checking m files ...
[inst/pararrayfun.m]
...
[parcellfun]
PASSES 1 out of 1 test
Summary: 11 tests, 11 passed, 0 known failures, 0 skipped

Local VM based autopkgtests all work.
They work for hirsute and hirsute-proposed and and a selection of just
octave,octave-parallel,octave-struct.

Even the old tests against octave-parallel 3.1.3 failed.
So it is not just "coming with 4.x" of octave-parallel.
On LP they failed as well:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210106_025958_c2157@/log.gz

Checking slightly deeper showed that the initial fail vs 3.1.3 was
NOT the same, it was a badpkg
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20201210_093838_8516f@/log.gz
But reruns of that apear to be the same as we see in other times
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/o/octave-parallel/20210102_194323_dbfce@/log.gz

All failing ones of them come down to:

!!!!! test failed
int32 scalar cannot be indexed with {

In comparison this seems fine on debci, there all 4.0.0-2 runs LGTM
https://ci.debian.net/packages/o/octave-parallel/unstable/amd64/
https://ci.debian.net/data/autopkgtest/unstable/amd64/o/octave-parallel/9643500/log.gz

Not sure if it is important - but the order on the tests differ.
Each run seem to have a random combination, but that is true for good and
bad runs.

Just to be clear on the error, it is of this type:
https://octave.org/doc/v4.2.1/Integer-Data-Types.html
octave:4> data.foo
error: scalar cannot be indexed with .
octave:5> data = int32(1234)
data = 1234
octave:6> data.foo
error: int32 scalar cannot be indexed with .

But without a reproducer it is hard where it might be from.
Maybe language dependent as local repros tend to get soem remainders
of local lang.

I'm out of good ideas, but will continue before taking a step back and masking the test.

I've run it with debug enabled without much gain - not even when comparing that to a debug enabled good run..
This was done in a PPA (https://launchpad.net/~paelzer/+archive/ubuntu/lp-1911400-octave-test-fails/+packages) that does the tests more verbose via:
$ xvfb-run -a octave-cli --debug --verbose --no-history --no-init-file --no-window-system inst/parcellfun.m

Further TODO's
- try the old dh-octave?

See original description

Tags:

Related branches

~paelzer/autopkgtest-cloud:fix-octave-needing-more-than-one-cpu

Merged into autopkgtest-cloud:master at revision a270fac944eca5ca8ad2a75d5a13cf01b960c798

Iain Lane: Approve on 2021-01-14

Christian Ehrhardt  (paelzer) on 2021-01-13

tags:

added: update-excuse

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

All proposed + PPA:

Locally:
$ sudo ~/work/autopkgtest/autopkgtest/runner/autopkgtest --no-built-binaries --apt-upgrade --apt-pocket=proposed --setup-commands="add-apt-repository ppa:paelzer/lp-1911400-octave-test-fail; apt update; apt -y upgrade" --shell octave-parallel_4.0.0-2ubuntu1~ppa1.dsc -- qemu --ram-size=1536 --cpus 2 ~/work/autopkgtest-hirsute-amd64.img
=> this kept working

PPA:
$ lp-test-ppa ppa:paelzer/lp-1911400-octave-test-fails --release hirsute --showskip --showpass
=> This kept failing

I've isolated the logs of one of the tests (inst/pararrayfun.m) and diffed them.
Unfortunately not much insight. After a whopping 15013 lines that fully match it directly runs into the fail in the bad case.
The issue is very close to the end with just 20 lines following in the good case before the logs fully match again (except reports stating that one test failed).

The asserts of the test are so far away - it seems this is "on the way out" of the test and not directly associated to the assert.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

bad log of pararrayfun (on autopkgtest.ubuntu.com) Edit (487.2 KiB, text/plain)

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

good log of pararrayfun (ran in local VM) Edit (487.7 KiB, text/plain)

description:

updated

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

FYI - I have pinged the octave community summarizing the case as I'm out of ideas how to further track the root issue. Maybe there is some octave trick that helps to pinpoint the bad code.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

octave also has an option to show the code it executes.
That has given me the same snippet for both tests it triggers.

47367 + if (__verbose < 1)
47368 + fprintf (__fid, "%s%s\n", __signal_block, __block);
47369 + fflush (__fid);
47370 + endif
47371 + fprintf (__fid, "%s\n", __msg);
47372 !!!!! test failed
47373 int32 scalar cannot be indexed with {
47374 + ## Show the variable context.
47375 + if (! strcmp (__type, "error")
47376 + && ! strcmp (__type, "testif")
47377 + && ! strcmp (__type, "xtest")
47378 + && ! all (__shared == " "))

That code is from
octave-common: /usr/share/octave/6.1.1~hg.2020.12.27-1/m/testfun/test.m
And it really seems it did just print that former error.

That message is constructed at
persistent __signal_fail = "!!!!! ";
...
              __msg = "test failed";
            endif
            __msg = [__signal_fail __msg "\n" lasterr()];

So we'd want to add something like stacktrace() when this is constructed.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

FYI: I added an octave build to my PPA that adds a stack trace in the place where the message gets constructed. Maybe that helps to pinpoint the original source.

Build and test running...

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

And in my tests with further debug-modification the the stacktrace isn't helpful - so still no insight where to look at.

Also my communication with upstream/community on IRC didn't bring up a new approach to this.

To be sure I also ran things on canonistack instances - finally - there on arm64 it triggered.

[inst/parcellfun.m]
>>>>> /home/ubuntu/octave-parallel-4.0.0/inst/parcellfun.m
***** test
assert (res = parcellfun (3, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}), [2, 6, 12, 20])
***** test
assert (res = parcellfun (4, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "UniformOutput", false), {2, 6, 12, 20})
***** test
assert (res = parcellfun (2, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "ChunksPerProc", 2), [2, 6, 12, 20])
***** test
assert (res = parcellfun (4, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "CumFunc", @ (a, b) a + b), 40)
***** test
assert (res = parcellfun (2, @ (x, y) x * y, {1, 2, 3, 4}, {2, 3, 4, 5}, "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40)
!!!!! test failed
int32 scalar cannot be indexed with {
***** test
assert (ischar ((res = parcellfun (4, @ (x) sqrt (x), {1, "a", 3, 4, 5, 6},
                                    "ErrorHandler",
                                    @ (info, x) info.message,
                                    "UniformOutput", false)){2}))
6 tests, 5 passed, 0 known failure, 0 skipped
[inst/pararrayfun.m]

Hopefully that allows more debugging now.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

The combined script that /usr/bin/dh_octave_check creates out of internal code and octave-parallel-4.0.0/debian/check.m is this:

fid = fopen ("/tmp/tmp.ebXAvMYm49", "w");
disp ('Checking m files ...');
[usr_pkg, sys_pkg] = pkg ('list');
for i = 1 : length (sys_pkg)
    name = sys_pkg {1, i}.name;
    ## Do not load the package being checked, sinc
    ## old, incompatible version may be installed.
    if strcmp ("parallel", name) != 1
        pkg ('load', name);
    endif
endfor
pkg ('load', 'parallel');
disp ("[inst/parcellfun.m]");
[npass, ntest, nxfail, nskip] = test ("inst/parcellfun.m",
                                      ifelse (strcmp ("", ""),
                                              "verbose", ""));
printf ("%d test%s, %d passed, %d known failure%s, %d skipped\n",
        ntest, ifelse (ntest > 1, "s", ""), npass, nxfail,
        ifelse (nxfail > 1, "s", ""), nskip);
fprintf (fid, "%s %d %d %d %d\n", "inst/parcellfun.m", ntest, npass, nxfail, nskip);
disp ("[inst/pararrayfun.m]");
[npass, ntest, nxfail, nskip] = test ("inst/pararrayfun.m",
                                      ifelse (strcmp ("", ""),
                                              "verbose", ""));
printf ("%d test%s, %d passed, %d known failure%s, %d skipped\n",
        ntest, ifelse (ntest > 1, "s", ""), npass, nxfail,
        ifelse (nxfail > 1, "s", ""), nskip);
fprintf (fid, "%s %d %d %d %d\n", "inst/pararrayfun.m", ntest, npass, nxfail, nskip);
disp ("[inst/pserver.m]");
[npass, ntest, nxfail, nskip] = test ("inst/pserver.m",
                                      ifelse (strcmp ("", ""),
                                              "verbose", ""));
printf ("%d test%s, %d passed, %d known failure%s, %d skipped\n",
        ntest, ifelse (ntest > 1, "s", ""), npass, nxfail,
        ifelse (nxfail > 1, "s", ""), nskip);
fprintf (fid, "%s %d %d %d %d\n", "inst/pserver.m", ntest, npass, nxfail, nskip);
disp ('Checking C++ files ...');
[usr_pkg, sys_pkg] = pkg ('list');
for i = 1 : length (sys_pkg);
    name = sys_pkg {1, i}.name;
    ## Do not load the package being checked, sinc
    ## old, incompatible version may be installed.
    if strcmp ("parallel", name) != 1
        pkg ('load', name);
    endif
endfor
warning ('off', 'Octave:autoload-relative-file-name');
disp ("Run tests in debian/check.m");
try
    source ("debian/check.m");
    fprintf (fid, "debian/check.m 1 1 0 0\n");
catch
    fprintf (fid, "debian/check.m 1 0 1 0\n");
end_try_catch
fclose (fid);

The combined script that /usr/bin/dh_octave_check creates out of internal code and octave-parallel-4.0.0/debian/check.m is this:

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

This can be split in several secitons.
- load the lib from the build-dir or the system pkg (the latter in this case)
- test file in inst/*: pararrayfun.m, pserver.m, parcellfun.m in our case
- checking C++ files
- run tests in debian/check.m

Knowing that we can simplify this to:

$ cat > foo << EOF
pkg ('load', 'parallel');
test ("inst/pararrayfun.m", "verbose");
EOF
$ xvfb-run -a octave-cli --no-history --silent --no-init-file --no-window-system foo

That is much better ...

That can be used in the interactive mode like:

And that can be used like this then:
$ octave-cli
octave:1> pkg ('load', 'parallel');
octave:2> pararrayfun (2, @ (x, y) x * y, [1, 2, 3, 4], [2, 3, 4, 5], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
    parcellfun at line 206 column 25
    chunk_parcellfun at line 47 column 25
    parcellfun at line 142 column 28
    pararrayfun at line 85 column 28

Even that can be more simplified:

Good:
octave:33> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 1, "CumFunc", @ (a, b) a + b), 40
ans = 40

Bad:
octave:35> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
parcellfun at line 206 column 25
pararrayfun at line 85 column 28

ChunksPerProc > 1 breaks it.
Might it be that we need at least 2 vcpus for the test?

Going into a 4 cpu env where things worked fine before.
But there it is also broken:

octave:1> pkg ('load', 'parallel');
octave:2> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
parcellfun at line 206 column 25
pararrayfun at line 85 column 28

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

#10

So without the assert I have 1 working but 2 failing with the int32 scalar issue.

octave:9> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 2, "CumFunc", @ (a, b) a + b), 40
error: int32 scalar cannot be indexed with {
error: called from
parcellfun at line 206 column 25
pararrayfun at line 85 column 28
octave:10> pararrayfun (1, @ (x, y) x * y, [1, 2], "ChunksPerProc", 1, "CumFunc", @ (a, b) a + b), 40
ans = 40

But that isn't a fix/workaround as "1" with the assert is
assert (res = pararrayfun (2, @ (x, y) x * y, [1, 2, 3, 4], [2, 3, 4, 5], "ChunksPerProc", 1, "CumFunc", @ (a, b) a + b), 40)
error: ASSERT errors for: assert (res = pararrayfun (2, @(x, y)x * y, [1, 2, 3, 4], [2, 3, 4, 5], "ChunksPerProc", 1, "CumFunc", @(a, b)a + b),40)

Location | Observed | Expected | Reason
() O E Class int32 != double

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-13:

#11

I've involved upstream again and will need to see if there is a better solution.
It was shown that Debian also has issues with these tests depending on the parameter.

Since it works fine in all environments but autopkgtest and I now can keep the majority of testing alive and would just have to remove one test I think we should do that for now to unblock things.

I'm testing a build with that change ...

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-14:

#12

The IRC discussion ended with a request to file a bug report upstream which I'll do tomorrow.

The fix disabling the particular call in the test is in the PPA and running tests now.

PPA: https://launchpad.net/~paelzer/+archive/ubuntu/lp-1911400-octave-test-fails/+packages
Test resuts LGTM with that:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute-paelzer-lp-1911400-octave-test-fails/hirsute/amd64/o/octave-parallel/20210114_054820_47988@/log.gz
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute-paelzer-lp-1911400-octave-test-fails/hirsute/amd64/o/octave-parallel/20210114_055400_e343a@/log.gz
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute-paelzer-lp-1911400-octave-test-fails/hirsute/s390x/o/octave-parallel/20210114_054541_ab0bc@/log.gz
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute-paelzer-lp-1911400-octave-test-fails/hirsute/s390x/o/octave-parallel/20210114_055346_e343a@/log.gz

I'm also filing an upstream bug about it

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-14:

#13

I found that while local containers work, local VMs fail.
In General it seems it is something that is broken in VM's only - odd but interesting.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-14:

#14

Hmm - I thought I had checked that in comment #9 where I used a bigger canonistack env.
But locally in KVM guests I can show that
1 vpu - our fail
2 vcpu - working

And since we are dealing with software meant at parallelization it might be that this just needs at least 2 cpus to run properly.
That also explains why all the container based tests worked as they see more CPUs.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2021-01-14:

#15

Ok, the lac of CPUs is indeed the problem we are facing and a regression in the new version.
So much debugging and eventually such a trivial error :-/

I've filed a bug upstream about it at:
=> https://savannah.gnu.org/bugs/index.php?59869

With local autopkgtest using only 1 CPU I can finally recreate.
And in reverse that means if we mark this test as huge then it will succeed on autopkgtest.u.c \o/

MP to mark the test as huge:
https://code.launchpad.net/~paelzer/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/396300