Bug #277537 “annotate says revision modified file, “bzr diff -c”...” : Bugs : Bazaar

Revision history for this message

GuilhemBichot (guilhem-bichot) wrote on 2008-10-03:

#1

Bazaar (bzr) 1.8dev
  from bzr checkout /home/mysql_src/logiciels/bzr_versions/dev
    revision: 3759
    revid: <email address hidden>
    branch nick: dev
  Python interpreter: /usr/bin/python 2.5
  Python standard library: /usr/lib/python2.5

Revision history for this message

GuilhemBichot (guilhem-bichot) wrote on 2008-10-03:

#2

I retried with MySQL 5.1 and there, with annotate, I could see that the revision which really introduced the line is:
<email address hidden>/white.intern.koehntopp.de-20071206104827-11862
(bzr diff -c agrees with that).
So, in MySQL 6.0, annotate goes wrong, that's the bug.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2008-10-06:

#3

While investigating, I did the following (6.0 being lp:~mysql/mysql-server/mysql-6.0/)

bzr branch -r 2497.123.9 6.0 to_be_merged
bzr branch -r 2597.36.9 6.0 after
bzr branch -r 2597.36.8 6.0 work
cd work
bzr merge ../to_be_merged
bzr commit -m 'trying to reproduce bug #277537'
bzr gblame sql/unireg.cc 198

In that context the annotations are correct in 'work' while being incorrect in 'after'.

Since there was no conflicts after the merge I wonder how this occurred, but it seems that something was done at that point before the commit (it could be due to a plugin, a different bzr version or yet something else).

Investigation continues but if you have additional info, that will be welcome...

Note that the block that is wrongly attributed is between line 198 and line 242 and that
line 198 in 'work' is attributed to revno 1810.849.1 as is line 242 in 'after'. Both being empty, there should be some action that fooled bzr here.

Revision history for this message

John A Meinel (jameinel) wrote on 2008-10-07:

#4

So, to try and give a simple answer...

It appears the parents are listed in the wrong order for some revisions. This was a known issue with older bzr versions, and I'm not 100% sure which version of bzr was used for the conversion.

bzr 0.91 claims to have fixed it, and updated "bzr check" to find this problem and "bzr reconcile" to correct it.

I'll try some ascii art:

A... introduced 'foo'
|\
B.C. both 'B' and 'C' modify 'foo'
|/
D... Just merging the changes back

At this point the parents of 'foo' should be "D => (B, C)", but for some reason
we have "D => (C, B)".

I believe this specifically happened because the old version used to use a
'set()' which did not preserve ordering.

Anyway, there is already code in "bzr reconcile" which is meant to fix this
case. I haven't finished running it yet (to be positive), but it should be
possible to run "bzr reconcile" in all affected repositories, and fix this.

Also, if this does fix it, it is probably reasonable to write something
streamlined, which only fixes these cases.

Or alternatively, gets run in one repository, and records what needed to be
changed, so it can just be replayed into all the other repositories which
have this problem.

Revision history for this message

GuilhemBichot (guilhem-bichot) wrote on 2008-10-07:

#5

Why should it be (B,C) and not (C,B): does the concept of mainline matter here?
If those reconcile-based solutions are found to work:
1) where should they be run (on the central host where our branches are hosted? On a dev's local branch and then pushed to central?)
2) how much time does such operation take?
3) is such operation blocking on the branch where it is run?
4) assuming it is run on a central host, will "bzr pull" propagate the fixes to dev's local branches?

Revision history for this message

John A Meinel (jameinel) wrote on 2008-10-07:

#6

Download full text (3.3 KiB)

So, I'm not sure why (B, C) matters so much versus (C, B). My best guess is that it has to do with null merges. I haven't looked at the specific texts, but my guess is that C somehow attributes everything to itself, because it had a cherry-pick or a null-merge done. Other than that, the order matters simply because we match one side first, and then only check for lines that did not match in the other parent.

I'm still not 100% sure why it matters, I'll try to get direct annotations and look at it. The annotation code is pretty straightforward, it does:

1) Get annotated lines for all parents
2) Compare the current un-annotated lines with the unannotated lines of the first parent.
   a) Wherever they match, take the annotation value from that parent
   b) Wherever they do not match, mark this as new in current revision
3) Compare the now annotated lines of this versus the annotated lines for the next parent.
  a) Wherever they match, just add them to the final result.
  b) For the annotated lines that do not match, match the unannotated lines against the unannotated lines for this parent.
     i. If the unannotated lines do not match, we keep the annotation as already generated
     ii. For lines which match, and the annotation says "current", we give them the annotation from the new parent
     iii. If the line already has a modified value, and it also matches the new parent, we do a heads() check to see if one revision_id supersedes the current revision_id. (For example, somebody modified a line, but then later modified it back, it will be annotated with the reverted revision)
        x) If one side clearly supersedes, use it
        y) If neither is an ancestor of the other, just pick one and use it

(y) was introduced because of your earlier concerns

For your other questions (in reverse order)

(4) push and pull won't propagate the changes, because they are only occurring in the deep ancestry (revisions which both sides already claim to have.) which effects 1. You can do the fix in any copy of the repository you want, but you need to find a way to get those changes to the other side.

The easiest ways are:
a) Run reconcile directly in that repository
b) Move the repository out of the way, and create a new empty one and do the fetch.

(3) It isn't strictly blocking, but I'm not 100% sure what happens if you add new revisions while doing reconcile. Ideally the new revisions would just get added on the side, and only the old revisions would get updated into the new pack file. I *think* that works, but I haven't specifically tested it.

(2) It is a bit slow, though I'm looking into it. ATM it isn't a 30s operation that is trivial to do on every repository. Which is why I suggested having something which could write out what had been modified, which could then be trivially applied to other repositories.

(1) As mentioned, it would generally be run on all repositories. Or you could run it locally, and then copy that to the central location.

I think reconcile should be run anyway, as it is better to have correct data stored. However, I can try and look to see if there is any obvious reason why the order of parents matters as much as it...

So, I'm not sure why (B, C) matters so much versus (C, B). My best guess is that it has to do with null merges. I haven't looked at the specific texts, but my guess is that C somehow attributes everything to itself, because it had a cherry-pick or a null-merge done. Other than that, the order matters simply because we match one side first, and then only check for lines that did not match in the other parent.

I'm still not 100% sure why it matters, I'll try to get direct annotations and look at it. The annotation code is pretty straightforward, it does:

1) Get annotated lines for all parents
2) Compare the current un-annotated lines with the unannotated lines of the first parent. 
   a) Wherever they match, take the annotation value from that parent
   b) Wherever they do not match, mark this as new in current revision
3) Compare the now annotated lines of this versus the annotated lines for the next parent.
  a) Wherever they match, just add them to the final result.
  b) For the annotated lines that do not match, match the unannotated lines against the unannotated lines for this parent.
     i. If the unannotated lines do not match, we keep the annotation as already generated
     ii. For lines which match, and the annotation says "current", we give them the annotation from the new parent
     iii. If the line already has a modified value, and it also matches the new parent, we do a heads() check to see if one revision_id supersedes the current revision_id. (For example, somebody modified a line, but then later modified it back, it will be annotated with the reverted revision)
        x) If one side clearly supersedes, use it
        y) If neither is an ancestor of the other, just pick one and use it

(y) was introduced because of your earlier concerns

For your other questions (in reverse order)

(4) push and pull won't propagate the changes, because they are only occurring in the deep ancestry (revisions which both sides already claim to have.) which effects 1. You can do the fix in any copy of the repository you want, but you need to find a way to get those changes to the other side.

The easiest ways are:
  a) Run reconcile directly in that repository
  b) Move the repository out of the way, and create a new empty one and do the fetch.

(3) It isn't strictly blocking, but I'm not 100% sure what happens if you add new revisions while doing reconcile. Ideally the new revisions would just get added on the side, and only the old revisions would get updated into the new pack file. I *think* that works, but I haven't specifically tested it.

(2) It is a bit slow, though I'm looking into it. ATM it isn't a 30s operation that is trivial to do on every repository. Which is why I suggested having something which could write out what had been modified, which could then be trivially applied to other repositories.

(1) As mentioned, it would generally be run on all repositories. Or you could run it locally, and then copy that to the central location.

I think reconcile should be run anyway, as it is better to have correct data stored. However, I can try and look to see if there is any obvious reason why the order of parents matters as much as it does. Looking at the algorithm, any lines which we do not agree on are resolved with an algorithm that does not care which came first.

Changed in bzr:
importance:	Undecided → High
status:	New → Triaged

Vincent Ladeuil (vila) on 2008-10-16

Changed in bzr:
assignee:	nobody → vila

Revision history for this message

Vincent Ladeuil (vila) wrote on 2008-10-16:

#7

Since the root cause is in the data produced during the conversion more than in bzr code being tricked by that data, the fix is better addressed by a dedicated plugin.

lp:~vila/bzr/fix277537 is pending review, if you want to try it earlier use a *copy* of your repository and give feedback.

brz branch lp:~vila/bzr/fix277537 ~/.bazaar/plugins/fix277537

bzr fix277537 ~/.bazaar/plugins/fix277537/known-inconsistent-file-revids <path to your *repo* copy>

Changed in bzr:
status:	Triaged → Fix Committed

Revision history for this message

John A Meinel (jameinel) wrote on 2008-10-17:

#8

So way back when, we tried to figure out why parent-order matters for annotate. It turns out that it doesn't in the general case, but it *does* in this specific case because there was a piece of code that was swapped.

Specifically, we have code that looks like:

PARENT 1:
A
B

PARENT 2:
B
C
A

Child:
B
C
A

So the problem seems to be that if you patch against one parent first, you find the match for 'B', and then don't match against A because it is out-of-order.

While if you match against the other parent first, then you match both B and A against the history, and when you match against the other parent, it just considers it introduced in the merge.

Both are valid because of the ambiguity involved in swapping code and a diff algorithm that can only match monotonically increasing lines. (If you have A B => B A, one will be matched and the other will not, but it is somewhat arbitrary which side gets matched.)

It is possible use a diff that can track moved code, though there are still problems. Most notably, if you use the annotation from before the move, how do you note that there *was* a move. As moving a hunk of code is a change to the code base, which should show up somehow in the annotations.

Anyway, by fixing the parent order in the per-file graph, "bzr diff -c XX" will use the same parents as "bzr annotate", which at least will have them agree about what was actually changed in that revision. So while it is arbitrary which we pick, at least post-reconcile both algorithms will make the same arbitrary selection.

Vincent Ladeuil (vila) on 2008-12-12

Changed in bzr:
status:	Fix Committed → Fix Released

Revision history for this message

GuilhemBichot (guilhem-bichot) wrote on 2009-01-16:

#9

Hello. I retried the testcase, and am posting here its current form; now (due to 6.0 evolving) the "EXTRA_DEBUG" etc line is presented by "annotate" as coming from:
revid:<email address hidden> ;
but "bzr diff -c" of this revision is empty.
No action needed, just to make the testcase up-to-date. We can see it's still a revision from the converter.
I then applied the plugin to my shared repository, and annotate says correctly:
revid:<email address hidden>/white.intern.koehntopp.de-20071217083134-45936

Revision history for this message

GuilhemBichot (guilhem-bichot) wrote on 2009-03-17:

#10

EXTRA_DEBUG line is gone now due to 6.0 evolving. A testcase for the MySQL 5.0 branch:
bzr annotate --show-ids --all sql/item_func.cc | grep "slave thread no"
if it says "jim" it's wrong, it should say "guilhem". After applying the plugin it's correct

Bazaar

annotate says revision modified file, "bzr diff -c" widly contradicts

Bug Description

Other bug subscribers

Remote bug watches