Comment 13 for bug 415508

Revision history for this message
Martin Pool (mbp) wrote :

It seems like we agree that path_content_summary, being a pre-content-filtering interface, ought to speak about the canonical form. So I added a test that it should, and it fails.

The generic WorkingTree path_content_summary code goes through self._sha_from_stat so this ought to go through the SHA1 provider that can tell you about content filtering.

(Incidentally it looks a lot like this will fail on pre-dirstate working trees through _sha_from_stat being undefined; maybe this code can never be reached in that case.)

path_content_summary seems to not actually return the hash, because the contract allows it to return None if it would be expensive to compute. In the case of filters its reasonable to assume that it would be expensive. This might mean my test has insufficient coverage if there are cases where it does both filter and cache.

At the moment WorkingTree.path_content_summary always directly uses the st_size as the size, but in fact it needs to got through something that can end up calling internal_size_sha_file_byname. However, that's going to end up filtering the whole file just to throw away the results, which is not great. Indeed there are other places in WorkingTree that seem to make the same assumption that they can just use the stat value. So that perhaps brings us back to just saying this interface should be removed altogether.

bzr-svn provides path_content_summary (nearly copy-and-pasted?) in svn workingtrees, and bzr-git doesn't use it at all in trunk afaics. So we should be ok to remove it.

The commit use of path_content_summary is very basic, really it only strongly cares about the kind. It does pass the content_summary down to record_entry_contents, which uses the link target etc but we could fudge that, or just remove the size.

TreeTransform also uses it and that might be harder to detangle. It uses it in _comparison_data, but only returns the kind and x-bit, so that's easy. It also has its own PreviewTree.path_content_summary which I doesn't seem to know about filtering, and it can probably cope without calling it.

So that's all good.

This has the benefit of path_content_summary never promising to return something that might require reading the whole file, which is likely to cause wasted work.

If you cut out the bits that path_content_summary can't quickly compute all it's doing is detecting the type and the x bit. It is useful to do those together so as to get a single stat call. Some of the other things like symlink targets take a separate io anyhow and the caller may not want them.

We'd need a new test for get_kind_and_executable_by_path.

bzr-svn and git do use record_entry_contents but don't look at the contents_summary. It's probably still better to keep the same form and just blank out the fields that are likely to be wrong?