Comment 27 for bug 607961

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 607961] Re: wadl generation timeout?

On Sun, Jun 26, 2011 at 7:58 AM, Gary Poster <email address hidden> wrote:
>> The wadl is pretty big; I'd -really- rather not check it in.
>
> Why not?  I don't feel strongly about it, but I don't understand why
> it's a bad idea yet.

IIRC benji landed a change to check it in last year some time; that
got backed out because of fallout ( I don't remember the fallout
specifics ).

I have a strong learnt allergy to checking in build products in
primary source trees; the sorts of things I have seen go wrong are:
 - datestamp based build system confusion [an update that pulls in a
build product touches it and makes it equal-to-or-newer-than a
dependency]
 - conflicts in build products [two developers make independent
changes, and now have to manually resolve conflicts in multiple xml
files]
 - (often a silent failure mode) not adding new build products won't
break anything until it gets all the way through the pipeline to the
intended user-of-the-checked-in-products, when it will break. In our
case the symptoms will be 404 replies on whatever wadl/json file we
added a need for but didn't check-in. This would in principle be
caught in qa...

I think those three modes are the minimum set: all other
things-have-gone-wrong can be reached my mixing and matching. (For
instance, you can have a dev environment that can't build but thinks
it can due to the first one masking a real problem).

I think introducing the potential for these problems into the system
is a bad idea, and the cost of compensating in some way included when
comparing the costs of (check in the file | build-and-deploy in a
different way).

>> Generating the wadl on our central box is easy.
>
> I had hoped that might be the case, but that was not what I saw.
>
> Right now, we only build the eggs on the central box.  The way we build
> the wadl right now requires much more of the build to exist on the
> central box (sourcecode and mailman, for a start).

The wadl doesn't depend on mailman does it ? Is it because we want to
import our mailman interfaces?

> Other parts of the build, such as mailman integration, have caused
> problems when copied from the central box.  In my experience, changing
> what is built and copied is difficult to qa properly, and has been a
> cause of problems in the past.  Our staging/qa infrastructure has been
> insufficient for these changes, at least the way I and others have
> handled them in the past.  I consider these sorts of changes to be a
> high risk.  I want to make this change as small as possible.

Its a bit of a side note, but if its high risk now, its never going to
be lower risk until we improve on it: if we become allergic to
changing things that are high risk, they stop evolving and we
accumulate more and more accommodations to those things in our
environment - making them higher risk than they were before.

> We could generate the wadl on our central box and then try to remove the
> risky parts that we created for the wadl generation, but that's fragile
> and not compelling.

Some other options that may be easier:
 - We could generate the wadl on the apache frontend; rsync between
the two servers to harmonise datestamps;
 - use a date insensitive etag (see bug 714621 and bug 92492 [perhaps
they should be duped once we fix this bug, at the moment they would
require discrete fixes])

> Options such as changing how our wadl is initially created, or how our
> build is done, might be valuable, but in my opinion should be well out
> of scope for fixing this critical bug.

Certainly doing other stuff 'just because its also good' would be
scope inflation and not true to our policy; OTOH the policy (AIUI)
says (things matching certain criteria) jump the queue to be fixed. It
doesn't say anything about how deep or thorough we should go - thats
left to the discretion of the engineer|team lead. If the most easily
reachable solution has other negative impacts, the policy at least
leaves plenty of discretion for folk to do less reachable solutions
that have better overall tradeoffs.

So I'm not encouraging you to do a harder fix because those other
things are intrinsically valuable, but rather because the
shortest-path fix IMO has significant negative outcomes.

> Maybe you can change my mind? :-)

I hope the failure modes I've pointed out, and the alternative
approaches to generating the wadl on the very central server, are
enough :)

-Rob