Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to avoid infinite recursion when finding the list of new revs. #322

Merged
merged 5 commits into from Feb 23, 2015

Conversation

ralphbean
Copy link
Contributor

These are changes that have already been applied to our ansible repo, but I'd appreciate review to see if they make sense/look right.

We hit a scenario today that I don't understand where a set of commits pushed by pbrobinson to the 'file' package sent the fedmsg hook into a loop that published some 30k messages before nirik was able to kill it.

There might be a better way than this to detect what commits should and shouldn't be published. Ideas welcome.

@ralphbean
Copy link
Contributor Author

For the record, it looks like pbrobinson only pushed these commits:

commit a91005c578f7418b6e4882f5503966ceaff541f0
Author: Peter Robinson <pbrobinson@gmail.com>
Date:   Sun Feb 22 16:26:22 2015 +0000

    merge in changes from F-21 that should have been build in master initially

commit 200669a64fa227b0d43faa1de3dd6608b4b8450a
Merge: 49d27e6 4591926
Author: Peter Robinson <pbrobinson@gmail.com>
Date:   Sun Feb 22 16:25:09 2015 +0000

    Merge branch 'f22'

commit 49d27e6147348c3212103541f62765b017136f0d
Author: Peter Robinson <pbrobinson@gmail.com>
Date:   Sun Feb 22 16:25:02 2015 +0000

    Revert "Rebuilt for Fedora 23 Change"

    This reverts commit b0c15a308125903e9e1b8e2b7851b2dd4019a035.

That merge in the middle is suspicious. And the branch that it merged also contained merges not in the base branch.. but trying to trace it all this afternoon turned my brain into mush.

@pypingou
Copy link
Member

I'm not seeing anything wrong with this.

Have we tried to replicate the issue (take the git repo, drop the last commits and re-add them or so)?

@ralphbean
Copy link
Contributor Author

So, I still don't understand how it could have dove into a cycle via recursion, but I do see a flaw in the original implementation and in the new implementation in 3182fe4). Consider this git graph:

* commit A <-- HEAD
|
* merge commit B
| \
| * commit C
| |
| * commit D
| |
| * merge commit E
| | \
| | * commit F
| | |
| | * commit G
| | |
| | |
* | | commit H  <-- BASE
| | |
| * | commit I
|/ /
* | commit J
| |
| * commit K
|/
* commit L
|
* commit M
|
* commit N

If the person pushing was updating master from BASE to HEAD, the
original implementation sought out all parents of the HEAD commit,
recursively, and the only stop-condition was when BASE was found for that
branch of the recursion tree. In this example, the two branches that branch
out from and back into the right side do not have BASE in their history.
For them, the original algorithm would spider all the way back to the first
commit.

I think the current implementation is not right either. It only stops once it
finds a commit that has already been seen (preventing cycles) but it won't
prevent this scenario from publishing messages about commits J, L, M, and N.

How does this new implementation sound?

  • First, calculate all the ancestors of the BASE rev, including the
    BASE rev. In the example above, that would be H, J, L, M, and N.
  • Then, traverse recursively backwards from the HEAD rev. Stop recursion
    for that branch of the call tree when a parent is found that is in the
    ancestors list that we previously computed. This would end up yielding
    commits A, B, C, D, E, F, G, I, K. (The compliment of the ancestors
    list).

Consider this git graph:

```
* commit A <-- HEAD
|
* merge commit B
| \
| * commit C
| |
| * commit D
| |
| * merge commit E
| | \
| | * commit F
| | |
| | * commit G
| | |
| | |
* | | commit H  <-- BASE
| | |
| * | commit I
|/ /
* | commit J
| |
| * commit K
|/
* commit L
|
* commit M
|
* commit N
```

If the person pushing was updating master from ``BASE`` to ``HEAD``, the
original implementation sought out all parents of the ``HEAD`` commit
recursively, and the only stop-condition was when ``BASE`` was found for that
branch of the recursion tree.  In the example above, the two branches that
branch out from and back into the right side *do not have* ``BASE`` in their
history. For them, the original algorithm would spider all the way back to the
first commit.

The current implementation is not right either.  It only stops once it finds a
commit that has already been seen (preventing cycles) but it won't prevent this
scenario from publishing  messages about commits J, L, M, and N.

The implementation in this commit works like this:

- First, calculate all the ``ancestors`` of the ``BASE`` rev, including the
  ``BASE`` rev.  In the example above, that would be H, J, L, M, and N.
- Then, traverse recursively backwards from the ``HEAD`` rev.  Stop recursion
  for that branch of the call tree when a parent is found that is in the
  ``ancestors`` list that we previously computed.  This would end up yielding
  commits A, B, C, D, E, F, G, I, K.  (The compliment of the ``ancestors``
  list).
@ralphbean
Copy link
Contributor Author

How does this new implementation sound?

Implemented in c0ef2be.

@pypingou
Copy link
Member

One idea, maybe we could check how the email hook provided in git handles this?

http://git.kernel.org/cgit/git/git.git/tree/contrib/hooks/multimail/

@bochecha
Copy link

One idea, maybe we could check how the email hook provided in git handles this?

As far as I can see, it doesn't try to iterate through the commit tree to figure out which changes to send an email for.

Instead, if it's run as post-receive hook, it reads its stdin, which gets one line per change, and just sends an email for each of these, nothing more.

If it's run as an update script, it just sends an email for the single change that was passed to it as arguments of the command-line.

This is also what the GNOME email hook we were using before was doing.

What was the reason for trying to iterate through the history tree?

@ralphbean
Copy link
Contributor Author

it reads its stdin, which gets one line per change

I didn't find where the upstream hook does this.. but it's not true.

The post-receive hook receives one line over stdin per ref (i.e., branch or tag) which has the old base commit, the new head commit, and the name of the ref. So, if you pushed three commits to master and three commits to develop at the same time, stdin would have two lines in it.. one for master and one for develop. http://git-scm.com/docs/githooks#post-receive

@ralphbean
Copy link
Contributor Author

In progit, @pypingou does the work done here by shelling out to git rev-list in order to get the list of commits between the old base and the new head. But as far as I can tell, pygit2 doesn't have that feature yet.. so we're computing it in the script ourselves. That's "why all of this stuff is necessary".

I've run tests locally on the latest changes in f860757 and it's looking good to me.

@pypingou
Copy link
Member

👍 for me

ralphbean added a commit that referenced this pull request Feb 23, 2015
Attempt to avoid infinite recursion when finding the list of new revs.
@ralphbean ralphbean merged commit 6c9df7f into develop Feb 23, 2015
@ralphbean ralphbean deleted the feature/git-hook-updates branch February 23, 2015 15:55
henrysher pushed a commit to henrysher/fedora-infra-ansible that referenced this pull request Feb 23, 2015
@bochecha
Copy link

I didn't find where the upstream hook does this..

http://git.kernel.org/cgit/git/git.git/tree/contrib/hooks/multimail/git_multimail.py#n2378

but it's not true.

The post-receive hook receives one line over stdin per ref (i.e., branch or tag) which has the old base commit, the new head commit, and the name of the ref.

And you are right.

I had misread the post-receive hook documentation, and then misread the email hook code in a way that confirmed what I thought.

Reading it more carefully, you clearly are right, and the upstream hook shells out to git log to get the list of commits between the old and new commit:

http://git.kernel.org/cgit/git/git.git/tree/contrib/hooks/multimail/git_multimail.py#n995

Apologies for the noise. 😃

@ralphbean
Copy link
Contributor Author

All good. Genuine thanks for the review. I appreciate more eyes on this one (it was driving me a little crazy).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants