add workaround for Windows/IIS6 FTP server strangeness causing unlock failures.

Bug #412244 reported by Alexander Regueiro
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Wishlist
Unassigned
Breezy
Won't Fix
Undecided
Unassigned

Bug Description

bzr renames and deletes a directory over FTP to cause an unlock. If the server does not process these commands in an immediate, atomic manner, then attempting to rename a new lock directory into place will fail.

During bzr push to a new branch bzr locks the repository twice:
 - creating it
 - pushing the new content

With a server that doesn't process deletes immediately these two behaviours combine to give the user a confusing lock held error - with the user as the lock holder, and the current bzr process id.

This should eventually right itself when the server does process the requested deletes.

As it is on II6 FTP server, CHMOD errors may be reported as well.

Debugging has shown that the server is not processing either RENAME correctly from the locks held directory to the unlocked state (which is a temporary directory name) or the RM of the lock/held file.

The observed symptoms at the FTP level are:
the rename is reported successful.
deleting the new path of the temp/info and the temp directory succeeds
but the directory is still present when we check later and the contents of the info file can s till be read from lock/held.

Various tests have been done, including reading-back from the held/info file to make sure it still exists, and these have shown some very inconsistent behaviour. The most recent test showed this behaviour:

43.900 FTP rename: /texdotnet/.bzr/branch-lock/held => /texdotnet/.bzr/branch-lock/releasing.a7rj960i72avjpn2u4qr.tmp
44.927 FTP get: /texdotnet/.bzr/branch-lock/held/info
45.220 FTP get: /texdotnet/.bzr/branch-lock/held/info
45.499 FTP rename: /texdotnet/.bzr/branch-lock/held => /texdotnet/.bzr/branch-lock/releasing.a7rj960i72avjpn2u4qr.tmp # ERRORS

That is, this FTP server lets bzr rename a directory, then read a file from within that old directory path twice, but when we try to rename the directory again it errors.

It looks like some sort of fs caching/locking is occuring in the server. We have speculated about virus scanners and cluster file systems but have no concrete details from the FTP server operator, who insist that the server is operating 'normally'.

RFC 959 section 4.2 specifies that incomplete operations should send a 1xx Positive Preliminary Reply - the server is not doing that, rather it is sending a 2xx - Positive completion reply. This is clearly in violation of the FTP specification.

Tags: ftp lock win32

Related branches

summary: - Cannot push to Windows FTP server (Microsoft FTP Service IIS6)
+ unlock fails to unlock over FTP with Windows FTP server (Microsoft FTP
+ Service IIS6)
Revision history for this message
Robert Collins (lifeless) wrote : Re: unlock fails to unlock over FTP with Windows FTP server (Microsoft FTP Service IIS6)

So we did a log of this on IRC with -Dtransport. This shows the held->tmp rename succeeding, a nd the subsequent
rm tmp/info
rmd tmp
succeeding too, but the lock remains in place.

my theory is a virus scanner or some such interfering.

Some things we can do to narrow it down:
-add a read-back after the rename to make sure it does rename away successfully.

Something we did:
-tried a 2s delay before the rename, but this didn't appear to have any effect.

Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
tags: added: win32
tags: added: ftp lock
Revision history for this message
Robert Collins (lifeless) wrote :

modified unlock method in lockdir for debugging

            # gotta own it to unlock
            self.confirm()
            print "waiting two seconds"
            time.sleep(2)
            for pos in range(10):
                try:
                    if pos:
                        self.confirm()
                except LockBroken:
                    print "pre-rename missing lock ", i
                    break
                except Exception, e:
                    print "pre-rename fail: ", e
                    raise
                try:
                    self.transport.rename(self._held_dir, tmpname)
                except Exception, e:
                    print "fail: ", e
                    raise
                time.sleep(1)
                try:
                    self.confirm()
                except LockBroken:
                    print "rename-to-unlock successful"
                    break
                except Exception, e:
                    print "confirm fail: ", e
                    raise
                else:
                    print "ftp breakage"
            print "past unlock step"
            self._lock_held = False
            self.transport.delete(tmpname + self.__INFO_NAME)

description: updated
Revision history for this message
Robert Collins (lifeless) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :

The attachd files add enough data that you can use the StringO stuff in the log trace + the network trace to make a test - put the strings that are uploaded on disk as files with the right names so you can
put branch-lock/tmp.../info
rename branch-lock/tmp... branch-lock/held
etc.

The goal is now to demonstrate the problem clearly outside of bzr.

Revision history for this message
Alexander Regueiro (alexreg) wrote :
Revision history for this message
Alexander Regueiro (alexreg) wrote :
Download full text (3.7 KiB)

Attached is a script file for the Windows command-line FTP client, which reproduces this precise issue.

In general, the result is that the script suceeds in retrieving the info file with the final get command - it clearly should not, because it's parent folder ought to have previously been renamed and deleted. The log produced in this case is the following:

230 User alexreg-repos logged in.
ftp> mkd /texdotnet
257 "/texdotnet" directory created.
ftp> mkd /texdotnet/.bzr
257 "/texdotnet/.bzr" directory created.
ftp> mkd /texdotnet/.bzr/branch-lock
257 "/texdotnet/.bzr/branch-lock" directory created.
ftp> mkd /texdotnet/.bzr/branch-lock/0u7zlmi2ue.tmp
257 "/texdotnet/.bzr/branch-lock/0u7zlmi2ue.tmp" directory created.
ftp> put info /texdotnet/.bzr/branch-lock/0u7zlmi2ue.tmp/info
200 PORT command successful.
150 Opening ASCII mode data connection for /texdotnet/.bzr/branch-lock/0u7zlmi2u
e.tmp/info.
226 Transfer complete.
ftp: 115 bytes sent in 0.14Seconds 0.82Kbytes/sec.
ftp> rename /texdotnet/.bzr/branch-lock/0u7zlmi2ue.tmp /texdotnet/.bzr/branch-lo
ck/held
350 File exists, ready for destination name
250 RNTO command successful.
ftp> get /texdotnet/.bzr/branch-lock/held/info info-got
200 PORT command successful.
150 Opening ASCII mode data connection for /texdotnet/.bzr/branch-lock/held/info
(115 bytes).
226 Transfer complete.
ftp: 115 bytes received in 0.00Seconds 115000.00Kbytes/sec.
ftp> rename /texdotnet/.bzr/branch-lock/held /texdotnet/.bzr/branch-lock/releasi
ng.jxclhhpmuppakcm6gbz7.tmp
350 File exists, ready for destination name
250 RNTO command successful.
ftp> delete /texdotnet/.bzr/branch-lock/releasing.jxclhhpmuppakcm6gbz7.tmp/info
250 DELE command successful.
ftp> rm /texdotnet/.bzr/branch-lock/releasing.jxclhhpmuppakcm6gbz7.tmp/info
550 /texdotnet/.bzr/branch-lock/releasing.jxclhhpmuppakcm6gbz7.tmp/info: The sys
tem cannot find the file specified.
ftp> rmd /texdotnet/.bzr/branch-lock/releasing.jxclhhpmuppakcm6gbz7.tmp
250 RMD command successful.
ftp> get /texdotnet/.bzr/branch-lock/held/info
200 PORT command successful.
150 Opening ASCII mode data connection for /texdotnet/.bzr/branch-lock/held/info
(115 bytes).
226 Transfer complete.
ftp: 115 bytes received in 0.00Seconds 115000.00Kbytes/sec.
ftp> quit
221

As previously suggested, this would seem to imply one of two things:
* The rename and/or delete commands report success before their operations have actually concluded.
* The directory that exists before the rename/delete is cached, and the FTP server is using this cache for the final get command, hence its unexpected success.

On many occasions however, the script does not even finish, seemingly due to a failed attempt to open a data connection.

230 User alexreg-repos logged in.
ftp> mkd /texdotnet
257 "/texdotnet" directory created.
ftp> mkd /texdotnet/.bzr
257 "/texdotnet/.bzr" directory created.
ftp> mkd /texdotnet/.bzr/branch-lock
257 "/texdotnet/.bzr/branch-lock" directory created.
ftp> mkd /texdotnet/.bzr/branch-lock/0u7zlmi2ue.tmp
257 "/texdotnet/.bzr/branch-lock/0u7zlmi2ue.tmp" directory created.
ftp> put info /texdotnet/.bzr/branch-lock/0u7zlmi2ue.tmp/info
200 PORT command successful.
150 Opening ASCII mode data c...

Read more...

summary: - unlock fails to unlock over FTP with Windows FTP server (Microsoft FTP
- Service IIS6)
+ unlock fails to unlock over FTP with Windows IIS6 FTP server which keeps
+ files after delete
description: updated
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

There isn't really anything sensible bzr can do to fix this, downgrading to wishlist, as I'm told that an upgrade to windows 2008 + IIS7 has fixed the users specific problem (though other folk could run into the same issue).

Changed in bzr:
importance: Medium → Wishlist
summary: - unlock fails to unlock over FTP with Windows IIS6 FTP server which keeps
- files after delete
+ add workaround for Windows/IIS6 FTP server strangeness causing unlock
+ failures.
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
Changed in brz:
status: New → Won't Fix
tags: removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.