Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bodhi2Client doesn't support retries #144

Closed
lbrabec opened this issue Sep 10, 2015 · 19 comments · Fixed by #163
Closed

Bodhi2Client doesn't support retries #144

lbrabec opened this issue Sep 10, 2015 · 19 comments · Fixed by #163

Comments

@lbrabec
Copy link

lbrabec commented Sep 10, 2015

Bodhi1Client support retries: it tries several times (user specified) and after last unsuccessful try it raises an exception. This unfortunately doesn't work when Bodhi2Client is used.

When you try to run code bellow, you can see that when Bodhi1Client is used it actually tries 10 times before raising. Bodhi2Client fails that respond cannot be parsed as JSON.

from fedora.client.bodhi import Bodhi1Client, Bodhi2Client
import sys
import logging


logging.basicConfig()
logging.getLogger().setLevel(level=logging.DEBUG)

if int(sys.argv[1]) == 1:
    client = Bodhi1Client(base_url='http://httpstat.us/', retries=10, debug=True)
    client.send_request('500')
else:
    client = Bodhi2Client(base_url='http://httpstat.us/', retries=10, debug=True)
    client.send_request('500')

I check the code and it seems that Bodhi2Client's parent class OpenIdBaseClient overrides method send_request from its parent class OpenIdProxyClient, in which the retry code is present.

@kparal
Copy link

kparal commented Sep 10, 2015

Fixing this would make all Taskotron folks happy, because right now every single Bodhi server failure (5xx response) makes one of our checks crash (and we receive a crash notification). Since technology is never reliable, we made use of Bodhi automatic retry functionality in the past and would like to continue doing so with Bodhi2 :)

@tflink
Copy link
Contributor

tflink commented Sep 14, 2015

This also affects fedora-easy-karma. If there are any errors, f-e-k stops and the user will need to start over because f-e-k relies on setting the retries in the bodhi client code.

@tflink
Copy link
Contributor

tflink commented Sep 22, 2015

This is having a decent sized effect in Taskotron right now - between dev/stg/prod, we saw 88 failures yesterday due to failed bodhi requests that weren't retried. The only reason that nobody's noticing is that our current suite of checks will pull in previous items which haven't already been done.

Is there any guess on a timeline for a fix?

@pypingou
Copy link
Member

Looking into the openidproxyclient code we retry in two cases:

  • query times-out
  • query returns an error code >= 400

The first situation seems like something we could fix, I am a little less incline for the second situation as I expect our newer API to return a meaningfull message upon 400 or 404 and asking the question multiple times shouldn't (in theory) return a different result.

Thoughts?

@kparal
Copy link

kparal commented Sep 23, 2015

That sounds reasonable for time-outs and 4xx codes. But our major problem is with 5xx codes, that's what Bodhi returns very often, and retrying definitely helps. Of course it would be great to fix the issue in Bodhi (or Apache or whatever else is along the way), but the safest and easiest way in this library seems to me to retry on 5xx codes, as it worked with Bodhi1 python-fedora (it reduced the number of our Bodhi-related task crashes down by >95%).

@pypingou
Copy link
Member

hm, I'll wait for @lmacken and @ralphbean to weight on this as changing the behavior of the openidbaseclient will impact more than just the bodhi client.

@ralphbean
Copy link
Contributor

Yeah, re-trying on 5xx codes but not 4xx codes makes sense to me (in addition to timeouts).

@pypingou
Copy link
Member

5xx codes might be a problem for other apps (pkgdb2 for example, maybe tagger and probably others)

@ralphbean
Copy link
Contributor

How so? The 500 codes are all server errors, in which case the request shouldn't have changed anything. It should be harmless to try again .. and if it is not harmless, then that's a problem for that server (if a 500 request leaves resources in an inconsistent state, for instance).

@pypingou
Copy link
Member

I agree it should be harmless but that also means (in the example shown above) that we're trying 10 times a request that we know for sure won't work from the first attempt, that was more what I was thinking about.

@kparal
Copy link

kparal commented Sep 23, 2015

we're trying 10 times a request that we know for sure won't work from the first attempt

Not sure I follow. Since it's a server error, another attempt might work.

If there are certain requests which we don't want to retry on 5xx error codes (let's say publishing a bodhi comment - we might rather fail on the first attempt than to risk publishing 10 comments in a row - just an example, not sure if it's a good idea or not), maybe those requests could call the underlying methods with some internal parameter e.g. retry_server_errors=False? However, if it is expected from the server to always stay consistent (I don't see into web development), maybe we don't need to deal with these worried on the client side?

trying 10 times

Btw, if you could introduce something like retry_sleep parameter, people could considerably lower the retry value itself. Sleeping a few seconds seems to give you a better opportunity for the next request to succeed than to try it immediately again (maybe it's related to server being currently overloaded, or some service getting restarted, but I'm just guessing).

@pypingou
Copy link
Member

Not sure I follow. Since it's a server error, another attempt might work.

The pkgdb2 API returns a 500 as error code for generic errors (for which I couldn't find a better code) :-(.

The retry_sleep idea could be a nice addition indeed :)

@kparal
Copy link

kparal commented Oct 9, 2015

A new patch is in #153.

@ralphbean
Copy link
Contributor

Fixed by #153.

@lbrabec
Copy link
Author

lbrabec commented Oct 13, 2015

I tried #153 (python-fedora-0.6.1-1.fc23) and I was still able to reproduce this using the reproducer from the first post - no retry is performed, no matter what number of retries is provided, it sends only one request (checked by wireshark) and ends.

I also tried to use python-requests (inspired by openidbaseclient.py):

import requests
from requests.packages.urllib3.util import Retry

s = requests.session()
s.mount('http://', requests.adapters.HTTPAdapter(Retry(total=7, status_forcelist=[500])))
s.post('http://httpstat.us/500')

And this didn't work either, only one request sent. Tried python-requests from fedora repo and pypi.

@kparal
Copy link

kparal commented Nov 30, 2015

@ralphbean @pypingou Can you please look at this? I have the same problem as @lukassparrow, only one request sent (but with Bodhi1Client the retry code works OK).

@ralphbean
Copy link
Contributor

Re-opening by request from @kparal in #fedora-admin.

@ralphbean ralphbean reopened this Nov 30, 2015
@ralphbean
Copy link
Contributor

OK, I think I see what's going on.

First, there is an HTTP method whitelist that Retry uses.

>>> from requests.packages.urllib3.util import Retry
>>> Retry.DEFAULT_METHOD_WHITELIST
frozenset(['HEAD', 'TRACE', 'GET', 'PUT', 'OPTIONS', 'DELETE'])

Notice that POST is missing.

Second, in your example, @lukassparrow, you have to specify max_retries=Retry(... with a keyword. If I modify it like this, it works:

#!/usr/bin/env python

import logging
logging.basicConfig(level=logging.DEBUG)

import requests
from requests.packages.urllib3.util import Retry

s = requests.session()
s.mount('http://', requests.adapters.HTTPAdapter(max_retries=Retry(
    total=7,
    status_forcelist=[500],
    method_whitelist=Retry.DEFAULT_METHOD_WHITELIST.union(frozenset(['POST'])),
)))
s.post('http://httpstat.us/500')

I'll modify python-fedora to send that new method_whitelist argument in and get a release out the door asap.

@kparal
Copy link

kparal commented Dec 1, 2015

Great detective work. Both the new example and the python-fedora patch work for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants