initgroups() fails when using libnss-ldap (but not nscd)

Bug #509734 reported by greenmoss
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
at (Ubuntu)
Triaged
Medium
Unassigned
libnss-ldap (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Binary package hint: at

In all of my installations of Karmic (5 so far), atd jobs refuse to run. At the requested time of execution, I instead see the following in my cron.log:

Jan 19 11:48:00 myhost atd[9054]: Cannot delete saved userids: Operation not permitted

Assuming you have the "mail" and "at" executables installed, the bug can be repeated with a simple test:

echo 'echo "testing atd" | mail -s "atd test" root' | sudo at `date -d "2 minutes" +%H:%M`

If it worked, root would receive mail. Instead, you will see the error in your logs in one to two minutes. As far as I can tell *all* at jobs are affected, rendering this package completely unusable.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

AppArmor profile? Seems clamav had a similar issue recently.

433764

Revision history for this message
greenmoss (ktyubuntu) wrote :

Uninstalled apparmor and repeated test; still failed.

Revision history for this message
Ansgar Burchardt (aburch) wrote :

Hi,

could you please run strace on atd and then submit a job:

    strace -f -o strace.txt -p $pid
    echo echo foo | at now

(replace $pid with the process id of atd). This should produce a file strace.txt. Please attach it to this bug report.

Regards,
Ansgar

Revision history for this message
greenmoss (ktyubuntu) wrote :

strace output is attached

Revision history for this message
Ansgar Burchardt (aburch) wrote : Re: [Bug 509734] Re: execution fails with "Cannot delete saved userids: Operation not permitted"

Hi,

greenmoss writes:
> strace output is attached
Thanks.

This is the relevant section from atd.c:
   342 if (chdir(ATJOB_DIR) < 0)
   343 perr("Cannot chdir to " ATJOB_DIR);
   344 PRIV_START
   345 nice((tolower((int) queue) - 'a' + 1) * 2);
   346 if (initgroups(pentry->pw_name, pentry->pw_gid))
   347 perr("Cannot delete saved userids");

And the same part in the output from strace:
   522 20241 chdir("/var/spool/cron/atjobs") = 0
   523 20241 setreuid32(1, 0) = 0
   524 20241 setregid32(1, 0) = 0
   525 20241 getpriority(PRIO_PROCESS, 0) = 20
   526 20241 setpriority(PRIO_PROCESS, 0, 2) = 0
   527 20241 getpriority(PRIO_PROCESS, 0) = 18

So far everything looks ok. Now only initgroups() is left:
   528 20241 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 5

While looking up the groups, suddenly the following happens:
   828 20241 getuid32() = 1
   829 20241 mlock(0xb7348000, 32768) = 0
   830 20241 geteuid32() = 0
   831 20241 setuid32(1) = 0
   832 20241 getuid32() = 1
   833 20241 geteuid32() = 1
   834 20241 setuid32(0) = -1 EPERM (Operation not permitted)

It looks like the NSS module drops privileges?!
Of course, setgroups thus fails:

  2005 20241 setgroups32(2, [0, 512]) = -1 EPERM (Operation not permitted)

You seem to be using the libnss-ldap module. Does at work correctly if
you disable it?

Regards,
Ansgar

Revision history for this message
greenmoss (ktyubuntu) wrote : Re: execution fails with "Cannot delete saved userids: Operation not permitted"

I removed libnss-ldap, re-tried the at test, and it worked. So you are correct: libnss-ldap and at do not like each other.

Revision history for this message
Ansgar Burchardt (aburch) wrote : Re: [Bug 509734] Re: execution fails with "Cannot delete saved userids: Operation not permitted"

Hi,

Ansgar Burchardt <ansgar@43-1.org> writes:

> While looking up the groups, suddenly the following happens:
> 828 20241 getuid32() = 1
> 829 20241 mlock(0xb7348000, 32768) = 0
> 830 20241 geteuid32() = 0
> 831 20241 setuid32(1) = 0
> 832 20241 getuid32() = 1
> 833 20241 geteuid32() = 1
> 834 20241 setuid32(0) = -1 EPERM (Operation not permitted)

I think I found the suspect: libgcrypt11/1.4.4-2ubuntu2.
The function lock_pool from src/secmem.c contains the following code:

  uid = getuid ();
  [...]
  err = mlock (p, n);
  [...]
  if (uid && ! geteuid ())
    {
      /* check that we really dropped the privs.
       * Note: setuid(0) should always fail */
      if (setuid (uid) || getuid () != geteuid () || !setuid (0))
        log_fatal ("failed to reset uid: %s\n", strerror (errno));
    }

This matches the output from strace above.

(libgcrypt is used via libnss-ldap → openldap → libgnutls → libgcrypt)

Regards,
Ansgar

Revision history for this message
Ansgar Burchardt (aburch) wrote :

Hi,

greenmoss <email address hidden> writes:

> I removed libnss-ldap, re-tried the at test, and it worked. So you are
> correct: libnss-ldap and at do not like each other.

You can try to install "nscd" (name service cache daemon). Then the
communication with the LDAP server should be handled by nscd instead of
the atd process. This should avoid the issue with gcrypt messing up the
user ids.

Regards,
Ansgar

Revision history for this message
greenmoss (ktyubuntu) wrote : Re: execution fails with "Cannot delete saved userids: Operation not permitted"
Download full text (3.6 KiB)

I've had nscd installed since before I noticed this problem. So that didn't work for me. My nscd configuration is the Ubuntu default:

#
# /etc/nscd.conf
#
# An example Name Service Cache config file. This file is needed by nscd.
#
# Legal entries are:
#
# logfile <file>
# debug-level <level>
# threads <initial #threads to use>
# max-threads <maximum #threads to use>
# server-user <user to run server as instead of root>
# server-user is ignored if nscd is started with -S parameters
# stat-user <user who is allowed to request statistics>
# reload-count unlimited|<number>
# paranoia <yes|no>
# restart-interval <time in seconds>
#
# enable-cache <service> <yes|no>
# positive-time-to-live <service> <time in seconds>
# negative-time-to-live <service> <time in seconds>
# suggested-size <service> <prime number>
# check-files <service> <yes|no>
# persistent <service> <yes|no>
# shared <service> <yes|no>
# max-db-size <service> <number bytes>
# auto-propagate <service> <yes|no>
#
# Currently supported cache names (services): passwd, group, hosts, services
#

# logfile /var/log/nscd.log
# threads 4
# max-threads 32
# server-user nobody
# stat-user somebody
        debug-level 0
# reload-count 5
        paranoia no
# restart-interval 3600

        enable-cache passwd yes
        positive-time-to-live passwd 600
        negative-time-to-live passwd 20
        suggested-size passwd 211
        check-files passwd yes
        persistent passwd yes
        shared passwd yes
        max-db-size passwd 33554432
        auto-propagate passwd yes

        enable-cache group yes
        positive-time-to-live group 3600
        negative-time-to-live group 60
        suggested-size group 211
        check-files group yes
        persistent group yes
        shared group yes
        max-db-size group 33554432
        auto-propagate group yes

# hosts caching is broken with gethostby* calls, hence is now disabled
# per default. See /usr/share/doc/nscd/NEWS.Debian.
        enable-cache hosts no
        positive-time-to-live hosts 3600
        negative-time-to-live hosts 20
        suggested-size hosts 211
        check-files hosts yes
        persistent hosts yes
        shared hosts yes
        max-db-size ho...

Read more...

Revision history for this message
Ansgar Burchardt (aburch) wrote : Re: [Bug 509734] Re: execution fails with "Cannot delete saved userids: Operation not permitted"

Hi,

greenmoss <email address hidden> writes:
> I've had nscd installed since before I noticed this problem. So that
> didn't work for me. My nscd configuration is the Ubuntu default:

That is strange. Enabling nscd works around the problem on Debian for
me. Did you restart atd after installing nscd?

Regards,
Ansgar

Revision history for this message
greenmoss (ktyubuntu) wrote : Re: execution fails with "Cannot delete saved userids: Operation not permitted"

Here's what I tried:

on 1st Karmic machine:
manually restart atd
atd mail test
== it works

on another Karmic machine:
atd mail test
== it fails
manually restart atd
atd mail test
== it works

In order to be thorough, I rebooted the first machine, did *not* manually restart atd, and re-tried my mail test. It still worked. So my workaround for this problem appears to be manually restarting atd everywhere.

The quirk in this is that all of my Karmic machines had already had nscd installed, and then rebooted several times. So presumably atd had *already* been restarted during those reboots.

Maybe it's also relevant that I first had Jaunty installed on all of these machines, then upgraded to Karmic?

So the following questions still remain:

- shouldn't an atd stop/start due to a machine reboot have fixed this?
- should there be package modifications to atd, nscd, and/or libgcrypt to fix, document, or work around this problem?

Revision history for this message
Ansgar Burchardt (aburch) wrote : Re: [Bug 509734] Re: execution fails with "Cannot delete saved userids: Operation not permitted"

Hi,

greenmoss writes:
> - shouldn't an atd stop/start due to a machine reboot have fixed this?

This might not be enough if nscd is started after atd, but I don't know
how they interact in detail. In any case this is a bug that should be
fixed:

> - should there be package modifications to atd, nscd, and/or libgcrypt
> to fix, document, or work around this problem?

Yes, I already filed a bug for libgcrypt in Debian [1]. I think the
library's behavior is not right here.

Regards,
Ansgar

[1] http://bugs.debian.org/566351

Changed in at (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
summary: - execution fails with "Cannot delete saved userids: Operation not
- permitted"
+ initgroups() fails when using libnss-ldap (but not nscd)
Revision history for this message
Ansgar Burchardt (aburch) wrote :

Also affects libnss-ldap.
This only concerns programs running with effective uid = 0 and real uid != 0. The system must be configured to use LDAP via SSL.

Changed in libnss-ldap (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
greenmoss (ktyubuntu) wrote :

Back with more information:

After running for "a while" with at jobs being successfully executed, atd will start giving errors as described above. I have not yet managed to discover the maximum amount of time between an atd restart and a successful atd job execution.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.