Discussion:
[SSSD] [sssd PR#5532][comment] ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe
2021-03-17 14:10:09 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
Updated with Alexey's improvements.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801112078
ikerexxe
2021-03-17 09:35:16 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: Handle ldap_install_tls() configuration and retrial

ikerexxe commented:
"""
**Wrt 1st patch**
Why did we see poll() previously?
We saw and still see poll(), but not for the authentication handshake sequence. Without any of the options set poll() is used for some part of the communication. This part of the communication is the same one that uses poll() when any of the options is set.
Probably because it requires both ASYNC && (TIMEOUT != 0).
No, I set both options and read() is still used for the authentication handshake. As said before I think we were looking to the wrong part of the logs and that's why we saw poll().
Anyway, at this point I'm totally lost wrt rationale of the 1st patch.
I'm having doubts with the usage of `LDAP_OPT_NETWORK_TIMEOUT` option, as in this case read() fails with EAGAIN instead of EINTR and ldap_install_tls() doesn't fail. Maybe openldap handles the retry in this case but I need to take a closer look.

I'll remove the usage of `LDAP_OPT_CONNECT_ASYNC` from this patch because I don't see any benefit.
**Wrt 2nd patch**
I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to check watchdog's context.
Do you mean watchdog_ctx structure?
Yes. Take a note: it's private. Please avoid making it public entirely, better introduce a helper to read required info.
Would it be enough to compare the timestamp in that structure with the actual time and see if it's below a threshold?
Not sure I understand your idea. IMO, it's better to rely on `ticks`.
I've understood your idea.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800937194
ikerexxe
2021-03-18 14:40:42 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-03-17 14:09:38 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-03-18 14:43:46 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
-- why `Marking port ... as 'not working'`? IIUC, this is exactly ip:port that is being retried (and succeeds).
I've also taken this into account in the new version of the patch.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801986134
alexey-tikhonov
2021-03-17 10:17:07 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: Handle ldap_install_tls() configuration and retrial

alexey-tikhonov commented:
"""
Post by ikerexxe
**Wrt 1st patch**
Why did we see poll() previously?
We saw and still see poll(), but not for the authentication handshake sequence. Without any of the options set poll() is used for some part of the communication. This part of the communication is the same one that uses poll() when any of the options is set.
I don't think so.
https://bugzilla.redhat.com/show_bug.cgi?id=1839972#c130 (sorry for the private link) :
```
04:45:50.443947 fcntl(23, F_GETFL) = 0x2 (flags O_RDWR) <0.000009>
04:45:50.443984 fcntl(23, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000009>
04:45:50.444071 write(23, "\26\...", 289) = 289 <0.000032>
04:45:50.444142 read(23, 0x55c5711ef360, 7) = -1 EAGAIN (Resource temporarily unavailable) <0.000011>
04:45:50.444190 poll([{fd=23, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 12000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) <1.771686>
```
If this is not a part of a handshake then please tell me what is it?
Post by ikerexxe
Probably because it requires both ASYNC && (TIMEOUT != 0).
No, I set both options and read() is still used for the authentication handshake.
Of course, `read()` is required. After all, you need to get the data from the socket eventually :)

Let me please remind a main purpose of this "settings game".
The purpose was to figure out if using `LDAP_OPT_CONNECT_ASYNC` helps to avoid removal of `O_NONBLOCK` flag from our socket.
The idea was that if we would preserve `O_NONBLOCK` then we wouldn't need `SO_RCVTIMEO/SO_SNDTIMEO` socket options that were identified as a cause of ERESTARTSYS -> EINTR change (that is an immediate cause of the ticket).

But, as strace you provided in https://bugzilla.redhat.com/show_bug.cgi?id=1839972#c129 shows, `_ASYNC` replaces blocking `read()` with timed `poll()` (and frankly this is quite expected).
And unfortunately `poll()` is one of
```
"interfaces are never restarted after being
interrupted by a signal handler, regardless of the use of
SA_RESTART; they always fail with the error EINTR when
interrupted by a signal handler"
```
(signal(7) man page)

At this point I've got an answer for my initial proposal and I personally don't see a reason to touch any settings to resolve this specific ticket.
Post by ikerexxe
I'm having doubts with the usage of LDAP_OPT_NETWORK_TIMEOUT option, as in this case read() fails with EAGAIN instead of EINTR and ldap_install_tls() doesn't fail. Maybe openldap handles the retry in this case but I need to take a closer look.
This is interesting. Could you please quote corresponding strace or at least point to a specific log that exhibits this behavior?

As I pointed out previously, SSSD already sets `LDAP_OPT_NETWORK_TIMEOUT`. Setting the same option in different places needs an explanation.

"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800964115
ikerexxe
2021-03-17 12:24:59 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: edited

Changed field: title
Original value:
"""
Handle ldap_install_tls() configuration and retrial
"""
alexey-tikhonov
2021-03-16 14:37:38 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: Handle ldap_install_tls() configuration and retrial

alexey-tikhonov commented:
"""
Setting `LDAP_OPT_CONNECT_ASYNC` uses the read()
Setting `LDAP_OPT_NETWORK_TIMEOUT` also uses the read()
Setting `LDAP_OPT_RESTART` uses the read()
Why did we see poll() previously?
Probably because it requires both ASYNC && (TIMEOUT != 0).
But this raises a question if `sdap_sys_connect_done()` is a proper place to set all the options set there...

"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800314529
ikerexxe
2021-03-17 12:26:23 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
I've updated the PR by dropping the first patch and changing the second one to detect if ldap_install_tls() failed because the watchdog interrupted it.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801041154
alexey-tikhonov
2021-03-17 18:02:50 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
Thanks for the updates.
Did you try your latest version with your reproducer?
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801295330
ikerexxe
2021-03-18 09:09:15 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
Post by alexey-tikhonov
Thanks for the updates.
Did you try your latest version with your reproducer?
Yes, when the process fails it is retried.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801756933
alexey-tikhonov
2021-03-16 14:16:20 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: Handle ldap_install_tls() configuration and retrial

alexey-tikhonov commented:
"""
**Wrt 1st patch**
I've run a battery of tests by setting only one of the communication flags per test. I've uploaded the most prominent strace logs to the bugzilla in the attachment called `strace battery of tests`.
(I didn't check the logs yet but ...)
Setting `LDAP_OPT_CONNECT_ASYNC` uses the read()
Setting `LDAP_OPT_NETWORK_TIMEOUT` also uses the read()
Setting `LDAP_OPT_RESTART` uses the read()
Why did we see poll() previously?
Anyway, at this point I'm totally lost wrt rationale of the 1st patch.
**Wrt 2nd patch**
I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to check watchdog's context.
Do you mean watchdog_ctx structure?
Yes. Take a note: it's private. Please avoid making it public entirely, better introduce a helper to read required info.
Would it be enough to compare the timestamp in that structure with the actual time and see if it's below a threshold?
Not sure I understand your idea. IMO, it's better to rely on `ticks`.

"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800293802
ikerexxe
2021-03-17 13:14:58 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-03-17 12:24:07 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: Handle ldap_install_tls() configuration and retrial
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-03-17 10:53:24 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: Handle ldap_install_tls() configuration and retrial

ikerexxe commented:
"""
Post by alexey-tikhonov
Post by ikerexxe
**Wrt 1st patch**
```
04:45:50.443947 fcntl(23, F_GETFL) = 0x2 (flags O_RDWR) <0.000009>
04:45:50.443984 fcntl(23, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000009>
04:45:50.444071 write(23, "\26\...", 289) = 289 <0.000032>
04:45:50.444142 read(23, 0x55c5711ef360, 7) = -1 EAGAIN (Resource temporarily unavailable) <0.000011>
04:45:50.444190 poll([{fd=23, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 12000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) <1.771686>
```
If this is not a part of a handshake then please tell me what is it?
It's part of the handshake, but I haven't seen this behaviour again even though I've tried it another 10 times.
Post by alexey-tikhonov
Post by ikerexxe
I'm having doubts with the usage of LDAP_OPT_NETWORK_TIMEOUT option, as in this case read() fails with EAGAIN instead of EINTR and ldap_install_tls() doesn't fail. Maybe openldap handles the retry in this case but I need to take a closer look.
This is interesting. Could you please quote corresponding strace or at least point to a specific log that exhibits this behavior?
https://bugzilla.redhat.com/attachment.cgi?id=1763588, file strace_network_timeout.out.
Post by alexey-tikhonov
As I pointed out previously, SSSD already sets `LDAP_OPT_NETWORK_TIMEOUT`. Setting the same option in different places needs an explanation.
There isn't any log but it still fails and there isn't any retry procedure. So I'll remove the first patch completely.

"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800986678
alexey-tikhonov
2021-03-18 11:34:41 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
Post by ikerexxe
Post by alexey-tikhonov
Did you try your latest version with your reproducer?
Yes, when the process fails it is retried.
Thanks for the logs. Functionally it looks good.

But I have a question:
```
(11:39:30): [pam_print_data] (0x0100): command: SSS_PAM_AUTHENTICATE
(11:39:30): [sdap_uri_callback] (0x0400): Constructed uri 'ldaps://10.0.155.220:636'
(11:39:30): [decide_tls_usage] (0x2000): [ldaps://10.0.155.220:636] is a secure channel. No need to run START_TLS
(11:39:30): [sssd_async_socket_init_send] (0x0400): Setting 12 seconds timeout for connecting
...network delay
(11:39:40): [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: [Connect error] [unknown error]
(11:39:40): [sss_ldap_init_sys_connect_done] (0x0020): Assuming TLS handshake was interrupted
(11:39:40): [sss_ldap_init_state_destructor] (0x0400): calling ldap_unbind_ext for ldap:[0xdf4950] sd:[26]
(11:39:40): [sss_ldap_init_state_destructor] (0x0400): closing socket [26]
(11:39:40): [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [1432158320]: TLS handshake was interrupted.
(11:39:40): [sdap_handle_release] (0x2000): Trace: sh[0xdf7020], connected[0], ops[(nil)], ldap[(nil)], destructor_lock[0], release_memory[0]
(11:39:40): [sdap_cli_connect_done] (0x0040): Performing retry due to TLS handshake interruption
(11:39:40): [fo_set_port_status] (0x0100): Marking port 636 of server '10.0.155.220' as 'not working'
(11:39:40): [fo_set_port_status] (0x0400): Marking port 636 of duplicate server '10.0.155.220' as 'not working'
(11:39:40): [decide_tls_usage] (0x2000): [ldaps://10.0.155.220:636] is a secure channel. No need to run START_TLS
(11:39:40): [sssd_async_socket_init_send] (0x0400): Setting 12 seconds timeout for connecting
(11:39:42): [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldaps://10.0.155.220:636/??base] with fd [26].
```
-- why `Marking port ... as 'not working'`? IIUC, this is exactly ip:port that is being retried (and succeeds).
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801848446
ikerexxe
2021-03-16 10:07:45 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: Handle ldap_install_tls() configuration and retrial

ikerexxe commented:
"""
**Wrt 1st patch**
I don't understand it's description, specifically `Configure socket options when calling ldap_install_tls() to avoid hitting EINTR during connect.` part.
IIUC as a result of this patch blocking `read()` is replaced with timed `poll()` under the hood of openlap:openssl (`LDAP_OPT_CONNECT_ASYNC`).
As for `LDAP_OPT_NETWORK_TIMEOUT` - it is set in [`sdap_sys_connect_done()`](https://github.com/SSSD/sssd/blob/master/src/providers/ldap/sdap_async_connection.c#L199). Does this happen too late? Do we need to set it in both places?
Also take a note of `LDAP_OPT_RESTART` set a little bit above and a corresponding comment. This comment looks interesting.
In general, this clearly doesn't "avoid" the issue (both read() with our socket options and poll() return EINTR being interrupted with a signal) and, taking into account our default timeout 6 seconds, I think it won't even make probability lower.
So... does it make sense to make openldap switch to timed poll() instead of blocking read()? Probably "yes" as a general improvement for a 2-x branch, but at least I wouldn't propose this change for 1-16 branch without a better reason. And anyway this requires a better explanation.
I've run a battery of tests by setting only one of the communication flags per test. I've uploaded the most prominent strace logs to the bugzilla in the attachment called `strace battery of tests`. Setting `LDAP_OPT_CONNECT_ASYNC` uses the read() system call to initiate the SSL connection and it fails with EINTR. In my opinion, it doesn't make any sense to use this flag.

Setting `LDAP_OPT_NETWORK_TIMEOUT` also uses the read() system call to initiate the SSL connection, but in this case it fails with EAGAIN. I'm not sure if openldap retries the process because I don't see sssd printing `ldap_install_tls failed` but I don't see any retrial either. I'm hesitant about setting this flag.

Setting `LDAP_OPT_RESTART` uses the read() system call to initiate the SSL connection and it fails with EINTR. As pointed out in the bugzilla, checking the man page for SSL_get_error API (https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html), it seems that the connection should be dropped when it returns `SSL_ERROR_SYSCALL`. Taking this into account I think that `LDAP_OPT_RESTART` shouldn't be used.
**Wrt 2nd patch**
I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to check watchdog's context.
Do you mean watchdog_ctx structure? Would it be enough to compare the timestamp in that structure with the actual time and see if it's below a threshold?

"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800125286
alexey-tikhonov
2021-03-30 15:05:03 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Waiting for review
alexey-tikhonov
2021-03-29 14:13:38 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
Thank you.

Besides few minor/cosmetic remarks, looks good to me.

Covscan is clean.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-809410958
ikerexxe
2021-03-30 08:00:13 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
I've changed the PR taking into account your improvements.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810005543
alexey-tikhonov
2021-03-30 15:04:56 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
Hi,

thank you for updates and logs.

There is a weird thing in the log:
```
[sss_ldap_init_sys_connect_done] (0x0020): Assuming TLS handshake was interrupted
[sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [1432158320]: TLS handshake was interrupted.
[sdap_cli_connect_done] (0x0040): TLS handshake was interruped, provider may retry
[be_resolve_server_send] (0x0040): Will not retry, maximum number of attempts (2) reached
[fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
[get_server_status] (0x1000): Status of server '10.0.155.220' is 'working'
[get_port_status] (0x1000): Port status of port 636 for server '10.0.155.220' is 'working'
[fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds
[get_server_status] (0x1000): Status of server '10.0.155.220' is 'working'
[be_resolve_server_process] (0x0040): The fail over cycled through all available servers
[be_resolve_server_done] (0x1000): Server resolution failed: [2]: No such file or directory
```
-- `status ... is 'working'` despite `Will not retry`

I guess the reason is `be_fo_set_port_status(... PORT_NOT_WORKING)` isn't executed in this case (while it should)
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810337498
ikerexxe
2021-03-30 07:59:19 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-03-31 08:18:18 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-03-31 08:18:36 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Waiting for review
ikerexxe
2021-03-31 10:10:48 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
Thanks. From functional point of view this now looks good.
I'm really not sure about implementation of FO code changes though.
Do we actually need to go to FO code? Can't we handle `retry_same_server` right inside `sdap_cli_resolve_next()`?
It's possible but I am not sure if that's better than the actual implementation. @sumit-bose what do you think? Should I move the retry logic from `be_resolve_server_send()` to `sdap_cli_resolve_next()` as Alexey suggests?
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810947952
alexey-tikhonov
2021-03-31 08:58:56 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
Thanks. From functional point of view this now looks good.

I'm really not sure about implementation of FO code changes though.
Do we actually need to go to FO code? Can't we handle `retry_same_server` right inside `sdap_cli_resolve_next()`?
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810900915
ikerexxe
2021-03-31 08:19:27 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
-- `status ... is 'working'` despite `Will not retry`
Pushed a new revision that changes this behaviour.

"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810875437
sumit-bose
2021-04-06 08:38:04 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

sumit-bose commented:
"""
Thanks. From functional point of view this now looks good.
I'm really not sure about implementation of FO code changes though.
Do we actually need to go to FO code? Can't we handle `retry_same_server` right inside `sdap_cli_resolve_next()`?
Hi,

in general yes, but I wonder what would be the benefits? Having it in `be_resolve_server_send()` makes the handling of the tevent request quite easy and it would allow other callers to use `retry_same_server` as well, although we currently do not have use-cases for this.

bye,
Sumit
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813941302
alexey-tikhonov
2021-04-06 08:45:28 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
Ok.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813946170
alexey-tikhonov
2021-04-06 09:21:25 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
LGTM, but since this is an important patch, I'd like anyone to have a second look here.

@pbrezina or @sumit-bose , could you please take a look?
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813969840
alexey-tikhonov
2021-04-06 09:22:03 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
LGTM, but since this is an important patch, I'd like somebody to have a second look here.

@pbrezina or @sumit-bose , could you please take a look?
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813969840
alexey-tikhonov
2021-04-06 09:22:25 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

alexey-tikhonov commented:
"""
LGTM, but since this is an important patch, I'd like somebody to have a second look.

@pbrezina or @sumit-bose , could you please take a look?
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813969840
ikerexxe
2021-04-09 10:47:34 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: edited

Changed field: body
Original value:
"""
Configure socket options when calling ldap_install_tls() to avoid hitting
EINTR during connect. Set the communication to asynchronous. This
configuration can't be applied for the connection part, which has to be
always blocking. On top of that set the network timeout to
ldap_opt_timeout option, to decrease the possibility of triggering a
timeout error when polling.

If the call to ldap_install_tls() fails with EINTR, retry it again.
"""
sumit-bose
2021-04-09 13:50:16 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

sumit-bose commented:
"""
Retry logic means that something failed but we have resolved the server successfully so imho the retry logic should be implemented elsewhere. Also at this time the retry logic spans across two components - `be_resolve_server` and `sdap_cli_connect` instead of being implemented at only one place - this is because `be_resolved_server` is not the component that failed. This will have to happen in other components as well if we want to reuse it so the only part that gets reused is only the attempt counter.
Hi,

that's a fair point. @ikerexxe, would you mind the change the code and follow @pbrezina's suggestion?

bye,
Sumit
My suggestion is to implement it only in `sdap_cli_connect_done` and retry only `sdap_connect_send`, but if Summit thinks that this is beneficial I won't oppose. However, triple check that storing `fo_server` pointer will not cause issues and that you don't need `for_ref_server`.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-816696617
alexey-tikhonov
2021-04-09 14:17:49 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Waiting for review
alexey-tikhonov
2021-04-09 14:17:59 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Changes requested
ikerexxe
2021-04-13 10:48:40 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-04-13 10:55:21 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

ikerexxe commented:
"""
Retry logic means that something failed but we have resolved the server successfully so imho the retry logic should be implemented elsewhere. Also at this time the retry logic spans across two components - `be_resolve_server` and `sdap_cli_connect` instead of being implemented at only one place - this is because `be_resolved_server` is not the component that failed. This will have to happen in other components as well if we want to reuse it so the only part that gets reused is only the attempt counter.
My suggestion is to implement it only in `sdap_cli_connect_done` and retry only `sdap_connect_send`, but if Summit thinks that this is beneficial I won't oppose.
I've changed the code to be in line with your proposal. Please review it for further improvement.


"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-818644180
ikerexxe
2021-04-13 10:55:32 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Changes requested
ikerexxe
2021-04-13 10:55:35 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Waiting for review
alexey-tikhonov
2021-04-16 08:28:54 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +branch: sssd-1-16
pbrezina
2021-04-16 10:22:27 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Waiting for review
pbrezina
2021-04-16 10:22:30 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Changes requested
ikerexxe
2021-04-16 14:38:46 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: synchronized

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
ikerexxe
2021-04-16 14:39:56 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Changes requested
ikerexxe
2021-04-16 14:39:54 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Waiting for review
pbrezina
2021-04-16 15:12:16 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

pbrezina commented:
"""
Ack from me.
"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-821246253
pbrezina
2021-04-19 09:18:37 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Ready to push
pbrezina
2021-04-19 09:18:38 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Waiting for review
pbrezina
2021-04-19 09:18:41 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Accepted
ikerexxe
2021-04-20 08:47:15 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -branch: sssd-1-16
pbrezina
2021-04-20 09:15:57 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

pbrezina commented:
"""
Pushed PR: https://github.com/SSSD/sssd/pull/5532

* `master`
* da55e3e69707de416b7949d08c165c950090bbb6 - ldap: retry ldap_install_tls() when watchdog interruption

"""

See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-823119763
pbrezina
2021-04-20 09:16:00 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: +Pushed
pbrezina
2021-04-20 09:16:03 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Accepted
pbrezina
2021-04-20 09:16:08 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Author: ikerexxe
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Action: closed

To pull the PR as Git branch:
git remote add ghsssd https://github.com/SSSD/sssd
git fetch ghsssd pull/5532/head:pr5532
git checkout pr5532
pbrezina
2021-04-20 09:16:05 UTC
Permalink
URL: https://github.com/SSSD/sssd/pull/5532
Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption

Label: -Ready to push

Loading...