Wednesday, July 19, 2023

LLAMA 2 - hello world

 $ torchrun --nproc_per_node 1 test2_text_completion.py \

    --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 --ckpt_dir llama-2-7b/ \

    --prompt "Hello everyone, I'm LLAMA-2"

...

Loaded in 4.32 seconds

Hello everyone, I'm LLAMA-2 000. I'm a new member here. I'm from China. I'm a girl, I'm 16. I like animals, especially lamas. I like to play games, especially the games on the internet. I like to make friends. I hope I can make

Tuesday, June 20, 2023

Azure/DALLE-2

The latest version (0.27.8) of the pypi/openai library now supports Azure/DALLE-2. To ensure compatibility, set openai.api_version = '2023-06-01-preview'.

Tuesday, March 28, 2023

BPF & Unix timestamp

Recently I've been using bpftrace to trace some events across servers and it would be really useful if bpftrace allowed to use actual unix timestamps. The nsecs variable won't work as it represents time since boot, while time() and strftime() return strings. 

In this specific case, an application running on server A was injecting timestamps into packets being sent to an app on server B where I run tracing. This allowed for tracing time it takes to send a packet over network between apps. Unfortunately bpftrace did not make it easy to do so...

This only makes sense if time across servers is synchronised with accuracy much better that the time deltas you need to measure. In our case, servers use PTP and are generally synchronised to GM <<100ns, while the time differences measured here were in many microseconds.

I'm not the only one asking for actual timestamps in bpf. For example, see here and here. The latter is a discussion about implementing a bpf function to provide epoch timestamp. Unfortunately, there was some resistance in doing so (I don't agree with the arguments), but at least there was a workaround suggested. It is unnecessarily complicated and clanky, but works.

> Not sure what problem you're trying to solve and thus what exactly you
> need... but you can probably get something very very close with 1 or 2
> of clock_gettime(CLOCK_{BOOTTIME,MONOTONIC,REALTIME}) possibly in a
> triple vdso call sandwich and iterated a few times and picking the one
> with smallest delta between 1st and 3rd calls. And then assuming the
> avg of 1st and 3rd matches the 2nd.
> ie.
>
> 5 times do:
> t1[i] = clock_gettime(REALTIME)
> t2[i] = clock_gettime(BOOTTIME)
> t3[i] = clock_gettime(REALTIME)
>
> pick i so t3[i] - t1[i] is smallest
>
> t2[i] is near equivalent to (t1[i] + t3[i]) / 2
>
> which basically gives you a REAL to BOOT offset.
Let's quickly implement it:
#include <time.h>
#include <stdio.h>
#include <stdint.h>

#define ITERATIONS 5

int main(int argc, char **argv) {
  struct timespec ts1[ITERATIONS], ts2[ITERATIONS], ts3[ITERATIONS];
  uint64_t t1, t2, t3, t4, t5, smallest_dt = 0;
  int ret, i, smallest_dt_i;

  for (i = 0; i < ITERATIONS; i++) {
    ret = clock_gettime(CLOCK_REALTIME, &ts1[i]);
    ret = clock_gettime(CLOCK_BOOTTIME, &ts2[i]);
    ret = clock_gettime(CLOCK_REALTIME, &ts3[i]);
  }

  for (i = 0; i < ITERATIONS; i++) {
    t1 = ts1[i].tv_sec * (uint64_t)1000000000 + ts1[i].tv_nsec;
    t3 = ts3[i].tv_sec * (uint64_t)1000000000 + ts3[i].tv_nsec;
    printf("i: %d dt: %lu\n", i, t3-t1);
    if (!smallest_dt || ((t3 - t1) < smallest_dt)) {
      smallest_dt = t3 - t1;
      smallest_dt_i = i;
    }
  }

  t1 = ts1[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts1[smallest_dt_i].tv_nsec;
  t2 = ts2[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts2[smallest_dt_i].tv_nsec;
  t3 = ts3[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts3[smallest_dt_i].tv_nsec;
  t4 = (uint64_t)(t1+t3)/2;
  t5 = t4 - t2;

  printf("\n");
  printf("t5=t4-t2: %lu\n", t5);
}

Now on a system where you need actual timestamp in bpftrace you do: $t_now = (uint64)NNN + nsecs;
where NNN is the t5 timestamp reported for the system.

I wish one could get it directly in bpftrace, but we are where we are...

Tuesday, February 21, 2023

Siri + ChatGPT

Although I appreciate Siri's ability to automate my home, there are times when it can be frustrating, such as when it replies with the annoying message "I found some web results, I sent them to your iPhone." Let's make Siri much more useful and fun by integrating Siri with OpenAI/ChatGPT via Shortcuts app. It even works with HomePod and Apple Watch.

To get started, we need to define a variable with our OpenAPI token, and then use the Dictate Text action to convert speech to text and store the result in another variable.
Next, let's set another variable with parameters for OpenAI query:
Now, let's issue a REST call:
And finally let's extract the reply and pass it back to Siri to read it aloud.
Now, you can say "Hey Siri, 2001" (I named the shortcut "2001") and it will ask you for a propmpt, once you provide it a moment later it will read the OpenAI/ChatGPT reply to you making Siri really useful and fun to engage with. It even works in CarPlay mode while driving.

You can download the Shortcut as well, enjoy.

btw: I asked ChatGPT to improve the blog post, here is the result:

The given blog post can be improved by making some modifications and clarifications. Below is the improved version of the post:

Siri is an amazing tool that can automate various tasks and make life much easier. However, sometimes it can be frustrating when it responds with the message "I found some web results, I sent them to your iPhone." To make Siri even more useful and fun, we can integrate Siri/Shortcuts with OpenAI/ChatGPT, which can work with HomePod too.

To get started, we need to define a variable with our OpenAI token. Then, we can use the Dictate Text action to convert speech to text and store the result in another variable. This allows Siri to understand our speech and perform the necessary actions.

Next, we need to set another variable with the parameters for the OpenAI query. These parameters include the text we want to generate, the model we want to use, and the temperature of the output. Temperature refers to the randomness of the generated text. Higher temperatures will result in more creative and unexpected output, but lower temperatures will produce more accurate and predictable text.

Once we have set the necessary variables, we can issue a REST call to the OpenAI API. This will send our query to the OpenAI server and retrieve the generated text. We can then use the Text action to speak the generated text aloud or display it on the screen.

It's important to note that using OpenAI/ChatGPT with Siri requires some programming skills and knowledge of APIs. However, with some practice and experimentation, anyone can create their own Siri shortcuts and integrate them with OpenAI. With this powerful combination, we can create a wide range of useful and entertaining applications, from generating creative writing prompts to answering trivia questions.

Thursday, January 12, 2023

bpftrace vs. strace

Many years ago I compared performance overhead of dtrace vs. truss on Solaris, in a microbenchmark. This time I run a similar test but comparing bpftrace vs. strace on Linux. To do so, I wrote a simple C program which stawns X threads and each thread will call stat("/tmp") N times in a loop. The program then prints the total time it took for all threads to execute.
While it is not necesarilly a very realistic test, it does show a potential overhead of tracing and difference between different technologies. Also sometimes you do need to trace a very tight loop which then might result in overheads like shown below (or even higher).

Let's run it three times (to see if we get consistent results) with 4 threads, each one calling stat() 100k times.
# for i in $(seq 1 3); do ./threads-loop-test-1 4 1000000; done
Total time: 0.982799
Total time: 0.977269
Total time: 0.980996
It took just below 1s to execute.
Now, let's run it under bpftrace which will count how many times the stat() was called by all threads.
# for i in $(seq 1 3); do bpftrace -qe 't:syscalls:sys_enter_newstat
                                        /pid==cpid/{@[probe]=count();}'
                                   -c "./threads-loop-test-1 4 1000000";
                       done

Total time: 1.192433

@[tracepoint:syscalls:sys_enter_newstat]: 4000019
Total time: 1.204695

@[tracepoint:syscalls:sys_enter_newstat]: 4000019
Total time: 1.143513

@[tracepoint:syscalls:sys_enter_newstat]: 4000019
There is roughly a 20% overhead - not bad.
Adding an extra condition to the predicate str(args->filename)=="/tmp" has little impact - resulting in total times <1.24s.
Again, not bad, especially given that string comparison like this is rather expensive.

Now time for strace.
# for i in $(seq 1 3); do strace -qfc -e trace=stat ./threads-loop-test-1 4 1000000; done
Total time: 49.478656
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00   87.720184          21   4000019        18 stat
------ ----------- ----------- --------- --------- ----------------
100.00   87.720184          21   4000019        18 total

Total time: 49.336942
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00   87.463778          21   4000019        18 stat
------ ----------- ----------- --------- --------- ----------------
100.00   87.463778          21   4000019        18 total

Total time: 49.250562
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00   87.923629          21   4000019        18 stat
------ ----------- ----------- --------- --------- ----------------
100.00   87.923629          21   4000019        18 total
It took about 50 times longer to execute!

While there have been many improvements to strace to reduce its impact, it is still significant in some cases.
It doesn't mean that strace is a bad tool and you should avoid it - in fact, it is often more handy and quicker to use than bpftrace or systemtap.
However be mindful of its potentially much higher overhead, especially in tight loops.


The source code for the test program.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>

void *thread_handler(void *arg) {
    int N = (long)arg;
    struct stat file_stat;
    while (N--) {
        stat("/tmp", &file_stat);
    }
    pthread_exit(NULL);
}

int main(int argc, char *argv[]) {
    struct timespec start, end;
    int num_threads;
    long N;
    if (argc != 3) {
        printf("Usage: %s  \n", argv[0]);
        exit(-1);
    }
    num_threads = atoi(argv[1]);
    N = abs(atol(argv[2]));
    pthread_t threads[num_threads];
    int rc;
    long t;
    clock_gettime(CLOCK_MONOTONIC, &start);
    for(t=0; t<num_threads; t++){
        rc = pthread_create(&threads[t], NULL, thread_handler, (void *)N);
        if (rc){
            printf("ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    }

 /* Wait for all threads to complete */
    for(t=0; t<num_threads; t++) {
        pthread_join(threads[t], NULL);
    }

    clock_gettime(CLOCK_MONOTONIC, &end);
    double time_taken = (end.tv_sec - start.tv_sec) + (double)(end.tv_nsec - start.tv_nsec) / (double)1000000000;
    printf("Total time: %lf\n", time_taken);
    
    return 0;
}

Wednesday, December 28, 2022

NFS4ERR_EXPIRED - Linux NFS client bug (fixed)

Another blog entry saved as draft for a long time and waiting to be published... better late than never, so let's do it now.
Over three years ago I found a bug in Linux nfs v4.0 client which results in applications temporarily not being able to open a file and getting an error. While this is due to a bug in the nfs v4.0 client code, depending on a specific nfs server you might or might not hit the issue. For example, both Linux and Solaris NFS servers will trigger the bug, while NetApp/ONTAP won't.

Once I debuged and understood the problem and was able to reproduce it. Then I ended up with a temporary workaround and later submitted two patches to Linux kernel which quickly have been integrated into Linux 5.6 and backported to some older versions (see the patches at the very end of this blog entry).

Let's look in more detail at the issue.

btw: some output has been abbreviated and/or modified to remove irrelevant details or to anonymize details.

An application written in java was sometimes failing to open a file with the below exception:
Caused by: java.io.FileNotFoundException: /.../latest.link (Input/output error) 
        at java.io.FileInputStream.open0(Native Method) 
        at java.io.FileInputStream.open(FileInputStream.java:195) 
        at java.io.FileInputStream.<init>(FileInputStream.java:138) 
        ...

Developers added an retry logic which helped. Usually after few retries the application was able to open the affected file fine. This was happening to different files, at different times on multiple servers. Even to files which were created long time ago (don't get fooled by the java.io.FileNotFoundException).

To understand how a file is accessed by the application (although some other applications were affected too), I wrote the below code based on the affected application.
$ cat Test.java
import java.io.*;
...

public class Test {
    public static void main(String[] args) {
        FileUtils fu = new FileUtils();
        File dir = new File("/mnt/.../indexweights");
        File ft = new File("/tmp/foo");
        try {
            ft = fu.resolveLinkFile(dir);
        } catch (IOException e) {
            System.out.println("IOException: " + e.getMessage());
            e.printStackTrace();
        } finally {
            System.out.println("Destination path: " + ft.getAbsolutePath());
        }
    }
}

Let's run it under strace and see what system calls it calls: 
$ strace -tTfv -o /tmp/a -e trace=file,close,read java Test
Destination path: /mnt/...

$ grep -A2 latest.link /tmp/a
4422  15:33:38 open("/mnt/.../indexweights/latest.link", O_RDONLY) = 5 <0.069349>
4422  15:33:38 read(5, "...", 8192) = 11 <0.000010>
4422  15:33:38 close(5)                 = 0 <0.069149>
It is just a simple open() with O_RDONLY followed by read() and close(). Nothing suspicious. The error is happening with open() though, based on the java stack in the app log.
I did run the test program multiple times on affected servers and tried to access the affected files, but couldn't trigger the issue.

We modified one of the affected applications and added a retry logic for failed file opens, we also run part of the application under strace to see what's the actual error reported by open().  Let's see an example output:

$ egrep ' open\(|open resume' .../XXX.out.20190625220935.26281
...
[pid 133595] 22:10:17 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:17 <... open resumed> ) = -1 EIO (Input/output error) <0.069362>
...
[pid 133595] 22:10:21 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:21 <... open resumed> ) = -1 EIO (Input/output error) <0.069204>
...
[pid 133595] 22:10:29 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:29 <... open resumed> ) = 192 <0.487662>
The open() failed with EIO two times and then succeeded the third time.
Given these are nfs4 mount points with "hard" mount option we shouldn't be getting EIO though.
When the issue happens no errors reported in system logs.
I also collected network traffic for the above failures.
# tshark -r tcpdump.out.201906252205 -tad nfs
...
160040 2019-06-25 22:10:17 10.2.1.142 -> 10.2.6.11  NFS 242 V4 Call ACCESS FH: 0x18703b49, [Check: RD LU MD XT DL]
160044 2019-06-25 22:10:17  10.2.6.11 -> 10.2.1.142 NFS 222 V4 Reply (Call In 160040) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
160066 2019-06-25 22:10:17 10.2.1.142 -> 10.2.6.11  NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
160074 2019-06-25 22:10:17  10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 160066) OPEN Status: NFS4ERR_EXPIRED
160793 2019-06-25 22:10:18 10.2.1.142 -> 10.2.6.11  NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
160799 2019-06-25 22:10:18  10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 160793) OPEN Status: NFS4ERR_EXPIRED
161368 2019-06-25 22:10:19 10.2.1.142 -> 10.2.6.11  NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
161394 2019-06-25 22:10:19  10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 161368) OPEN Status: NFS4ERR_EXPIRED
162036 2019-06-25 22:10:21 10.2.1.142 -> 10.2.6.11  NFS 234 V4 Call GETATTR FH: 0x7a1ffedc
162112 2019-06-25 22:10:21  10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 162036) GETATTR
162114 2019-06-25 22:10:21 10.2.1.142 -> 10.2.6.11  NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
162194 2019-06-25 22:10:21  10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 162114) OPEN Status: NFS4ERR_EXPIRED
162803 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6   NFS 226 V4 Call RENEW CID: 0x8dbc
162811 2019-06-25 22:10:22   10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162803) RENEW Status: NFS4ERR_EXPIRED
162813 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6   NFS 226 V4 Call RENEW CID: 0x8dbc
162825 2019-06-25 22:10:22   10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162813) RENEW Status: NFS4ERR_EXPIRED
162826 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6   NFS 318 V4 Call SETCLIENTID
162850 2019-06-25 22:10:23   10.2.6.6 -> 10.2.1.142 NFS 198 V4 Reply (Call In 162826) SETCLIENTID
162851 2019-06-25 22:10:23 10.2.1.142 -> 10.2.6.6   NFS 234 V4 Call SETCLIENTID_CONFIRM
162858 2019-06-25 22:10:23   10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162851) SETCLIENTID_CONFIRM
163272 2019-06-25 22:10:24 10.2.1.142 -> 10.2.6.11  NFS 226 V4 Call RENEW CID: 0x0a7f
163302 2019-06-25 22:10:25  10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163272) RENEW Status: NFS4ERR_EXPIRED
163304 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11  NFS 226 V4 Call RENEW CID: 0x0a7f
163332 2019-06-25 22:10:25  10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163304) RENEW Status: NFS4ERR_EXPIRED
163333 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11  NFS 318 V4 Call SETCLIENTID
163343 2019-06-25 22:10:25  10.2.6.11 -> 10.2.1.142 NFS 198 V4 Reply (Call In 163333) SETCLIENTID
163344 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11  NFS 234 V4 Call SETCLIENTID_CONFIRM
163382 2019-06-25 22:10:25  10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163344) SETCLIENTID_CONFIRM
164083 2019-06-25 22:10:27 10.2.1.142 -> 10.2.6.5   NFS 1606 V4 NULL Call
164109 2019-06-25 22:10:27   10.2.6.5 -> 10.2.1.142 NFS 302 V4 NULL Reply (Call In 164083)
164289 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.6   NFS 1606 V4 NULL Call
164301 2019-06-25 22:10:28   10.2.6.6 -> 10.2.1.142 NFS 302 V4 NULL Reply (Call In 164289)
164303 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.6   NFS 226 V4 Call RENEW CID: 0x9d10
164326 2019-06-25 22:10:28   10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 164303) RENEW
164417 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11  NFS 242 V4 Call ACCESS FH: 0x118824df, [Check: RD LU MD XT DL]
164427 2019-06-25 22:10:28  10.2.6.11 -> 10.2.1.142 NFS 222 V4 Reply (Call In 164417) ACCESS, [NOT Supported: MD XT DL], [Allowed: RD LU]
164429 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11  NFS 234 V4 Call GETATTR FH: 0x7a1ffedc
164440 2019-06-25 22:10:28  10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 164429) GETATTR
164441 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11  NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
164458 2019-06-25 22:10:28  10.2.6.11 -> 10.2.1.142 NFS 466 V4 Reply (Call In 164441) OPEN StateID: 0xae70
164459 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11  NFS 242 V4 Call OPEN_CONFIRM
164473 2019-06-25 22:10:29  10.2.6.11 -> 10.2.1.142 NFS 166 V4 Reply (Call In 164459) OPEN_CONFIRM
164474 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11  NFS 250 V4 Call READ StateID: 0xa087 Offset: 0 Len: 11
164498 2019-06-25 22:10:29  10.2.6.11 -> 10.2.1.142 NFS 170 V4 Reply (Call In 164474) READ
164499 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11  NFS 234 V4 Call GETATTR FH: 0xedefdae0
164519 2019-06-25 22:10:29  10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 164499) GETATTR
164520 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11  NFS 258 V4 Call CLOSE StateID: 0xa087
164545 2019-06-25 22:10:29  10.2.6.11 -> 10.2.1.142 NFS 230 V4 Reply (Call In 164520) CLOSE
...
Looks like nfs session is expiring and asynchronously renewing, failing in the meantime.

$ sudo /var/tmp/syscall-open-stat-filename-regex.stp -g --suppress-time-limits -o /var/tmp/failed_stats_and_opens.log 'XXX.*zip' &
[5] 21855
We caught some errors:
# grep '= -' /var/tmp/failed_stats_and_opens.log
Tue Nov 12 17:11:02 2019 java[16043] SyS_open("XXX") = -5 (EIO) : -5
Tue Nov 12 17:11:02 2019 java[16078] SyS_open("XXX") = -5 (EIO) : -5
...
Let's look at one java thread:
# grep 16043 /var/tmp/failed_stats_and_opens.log
Tue Nov 12 17:11:02 2019 java[16043] SyS_newlstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_open("/mnt/.../XXX.zip") = -5 (EIO) : -5
So within the same second it stat'ed the file multiple times just fine (1x lstat() + 3x stat()) only to get EIO on open().
Let's look at tcpdump capture:
# tcpdump -n -r /var/tmp/tcpdump.XXX.out
...
17:11:02.766890 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75637865:75638065, ack 2022930725, win 12288, options [nop,nop,TS val 12657161 ecr 997162631], length 200: NFS request xid 2489184212 196 getattr fh 0,0/22
17:11:02.767043 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022930725:2022931065, ack 75638065, win 32806, options [nop,nop,TS val 997162631 ecr 12657161], length 340: NFS reply xid 2489184212 reply ok 336 getattr NON 4 ids 0/15 sz 0
17:11:02.769335 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638065:75638329, ack 2022931065, win 12288, options [nop,nop,TS val 12657164 ecr 997162631], length 264: NFS request xid 2505961428 260 getattr fh 0,0/22
17:11:02.769477 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931065:2022931149, ack 75638329, win 32806, options [nop,nop,TS val 997162631 ecr 12657164], length 84: NFS reply xid 2505961428 reply ok 80 getattr ERROR: unk 10011
17:11:02.769544 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638329:75638593, ack 2022931149, win 12288, options [nop,nop,TS val 12657164 ecr 997162631], length 264: NFS request xid 2522738644 260 getattr fh 0,0/22
17:11:02.769638 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931149:2022931233, ack 75638593, win 32806, options [nop,nop,TS val 997162631 ecr 12657164], length 84: NFS reply xid 2522738644 reply ok 80 getattr ERROR: unk 10011
17:11:02.809052 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [.], ack 2022931233, win 12288, options [nop,nop,TS val 12657204 ecr 997162631], length 0
17:11:02.821772 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638593:75638789, ack 2022931233, win 12288, options [nop,nop,TS val 12657216 ecr 997162631], length 196: NFS request xid 2539515860 192 getattr fh 0,0/22
17:11:02.822228 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931233:2022931573, ack 75638789, win 32806, options [nop,nop,TS val 997162636 ecr 12657216], length 340: NFS reply xid 2539515860 reply ok 336 getattr NON 4 ids 0/15 sz 0
17:11:02.822261 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [.], ack 2022931573, win 12288, options [nop,nop,TS val 12657217 ecr 997162636], length 0
17:11:02.823327 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638789:75638957, ack 2022931573, win 12288, options [nop,nop,TS val 12657218 ecr 997162636], length 168: NFS request xid 2556293076 164 getattr fh 0,0/22
17:11:02.823436 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931573:2022931857, ack 75638957, win 32806, options [nop,nop,TS val 997162636 ecr 12657218], length 284: NFS reply xid 2556293076 reply ok 280 getattr NON 2 ids 0/9 sz 0
...
It correlates with "getattr ERROR: unk 10011"
Let's also look at tcpdump capture for YYY:
# tcpdump -n -r /var/tmp/tcpdump.YYY.out
...
17:10:08.447238 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377189, win 11471, options [nop,nop,TS val 12602842 ecr 997157199], length 0
17:11:08.645057 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377189, win 11471, options [nop,nop,TS val 12663040 ecr 997157199], length 0
17:11:08.645147 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [.], ack 314027332, win 32806, options [nop,nop,TS val 997163219 ecr 12602842], length 0
17:11:08.645173 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027332:314027492, ack 174377189, win 11471, options [nop,nop,TS val 12663040 ecr 997163219], length 160: NFS request xid 398604749 156 getattr fh 0,28/0
17:11:08.645249 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [.], ack 314027492, win 32806, options [nop,nop,TS val 997163219 ecr 12663040], length 0
17:11:08.692687 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377189:174377305, ack 314027492, win 32806, options [nop,nop,TS val 997163223 ecr 12663040], length 116: NFS reply xid 398604749 reply ok 112 getattr ERROR: unk 24
17:11:08.692796 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027492:314027652, ack 174377305, win 11471, options [nop,nop,TS val 12663087 ecr 997163223], length 160: NFS request xid 415381965 156 getattr fh 0,28/0
17:11:08.693123 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377305:174377421, ack 314027652, win 32806, options [nop,nop,TS val 997163223 ecr 12663087], length 116: NFS reply xid 415381965 reply ok 112 getattr ERROR: unk 24
17:11:08.693164 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027652:314027916, ack 174377421, win 11471, options [nop,nop,TS val 12663088 ecr 997163223], length 264: NFS request xid 432159181 260 getattr fh 0,132/0
17:11:08.693592 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377421:174377553, ack 314027916, win 32806, options [nop,nop,TS val 997163223 ecr 12663088], length 132: NFS reply xid 432159181 reply ok 128 getattr ERROR: unk 40
17:11:08.693680 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027916:314028084, ack 174377553, win 11471, options [nop,nop,TS val 12663088 ecr 997163223], length 168: NFS request xid 448936397 164 getattr fh 0,36/0
17:11:08.701988 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377553:174377669, ack 314028084, win 32806, options [nop,nop,TS val 997163224 ecr 12663088], length 116: NFS reply xid 448936397 reply ok 112 getattr ERROR: unk 24
17:11:08.702090 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [S], seq 1851114974, win 32804, options [mss 1460,sackOK,TS val 997163224 ecr 0,nop,wscale 5], length 0
17:11:08.702122 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [S.], seq 2641182587, ack 1851114975, win 28960, options [mss 1460,sackOK,TS val 12663097 ecr 997163224,nop,wscale 8], length 0
17:11:08.702206 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [.], ack 1, win 32806, options [nop,nop,TS val 997163224 ecr 12663097], length 0
17:11:08.704470 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [P.], seq 1:77, ack 1, win 32806, options [nop,nop,TS val 997163225 ecr 12663097], length 76
17:11:08.704487 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [.], ack 77, win 114, options [nop,nop,TS val 12663099 ecr 997163225], length 0
17:11:08.704534 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [P.], seq 1:29, ack 77, win 114, options [nop,nop,TS val 12663099 ecr 997163225], length 28
17:11:08.704620 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [.], ack 29, win 32806, options [nop,nop,TS val 997163225 ecr 12663099], length 0
17:11:08.741063 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377669, win 11471, options [nop,nop,TS val 12663136 ecr 997163224], length 0
17:11:13.717133 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314028084:314028244, ack 174377669, win 11471, options [nop,nop,TS val 12668112 ecr 997163224], length 160: NFS request xid 465713613 156 getattr fh 0,28/0
17:11:13.717704 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377669:174377785, ack 314028244, win 32806, options [nop,nop,TS val 997163726 ecr 12668112], length 116: NFS reply xid 465713613 reply ok 112 getattr ERROR: unk 24
17:11:13.717715 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377785, win 11471, options [nop,nop,TS val 12668112 ecr 997163726], length 0
17:12:02.981295 IP 10.210.6.5.57869 > 10.210.16.12.33663: Flags [S], seq 2961022330, win 32804, options [mss 1460,sackOK,TS val 997168652 ecr 0,nop,wscale 5], length 0
...
Few seconds later we see "getattr ERROR: unk 24", and a corresponding error in application error log.
Nothing interesting in system logs on NFS server. However, on the nfs server I was also running:
# dtrace -q -n nfsv4:::op-getattr-done,nfsv4:::op-readdir-done,nfsv4:::op-access-done,
                        nfsv4:::op-link-done,nfsv4:::op-open-done,nfsv4:::op-create-done,
                        nfsv4:::op-delegreturn-done,nfsv4:::op-verify-done,nfsv4:::op-getfh-done,
                        nfsv4:::op-readlink-done,nfsv4:::op-renew-done,nfsv4:::op-lookup-done,
                        nfsv4:::op-remove-done,nfsv4:::op-rename-done'
                        /args[2]->status != 0/
                        {
                          printf("%Y %s %s %d:%d %s %d\n", walltimestamp, probename, args[0]->ci_remote,
                                 args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid,
                                 args[1]->noi_curpath, args[2]->status);}'
                  -o /var/tmp/nfs-ops-with-errno.log
Let's see log around Nov 12 17:11:02:
...
2019 Nov 12 17:10:42 op-lookup-done 10.250.3.29 10006:5000 /.../XXX 2
2019 Nov 12 17:10:41 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022
2019 Nov 12 17:10:46 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022
2019 Nov 12 17:10:46 op-lookup-done 10.250.3.29 10006:5000 <unknown> 2
2019 Nov 12 17:10:49 op-lookup-done 10.210.10.152 10006:5000 <unknown> 2
2019 Nov 12 17:10:50 op-lookup-done 10.250.3.29 10006:5000 /.../XXX 2
...
2019 Nov 12 17:10:55 op-lookup-done 10.250.3.29 10006:5000 <unknown> 2
2019 Nov 12 17:10:56 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022
2019 Nov 12 17:10:58 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
2019 Nov 12 17:10:59 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
2019 Nov 12 17:10:59 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
2019 Nov 12 17:10:58 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
...
Notice the op-renew-done errors.
Let's see what these are:
# ggrep -E '10011|10022' /usr/include/nfs/nfs4_kprot.h
        NFS4ERR_EXPIRED = 10011,
        NFS4ERR_STALE_CLIENTID = 10022,
The op-renew-done failure with 10011 (NFS4ERR_EXPIRED) matches the time when the application failed.
The other interesting thing is, that the nfs linux client has gssd running with -t 1800 so it needs to renew creds every 30 minutes.
# ggrep 'Nov 12' /var/tmp/nfs-ops-with-errno.log | ggrep 10.210.16.12  | ggrep op-renew
2019 Nov 12 15:10:45 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 15:10:45 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 15:40:57 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 15:40:56 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:04:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:04:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:10:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:10:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
...
The interesting entries here are:
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
This seems to be 39s later than 30 minutes between them. Although I didn't catch the successful op-renew-done here.

After some further debugging let's look at the original network dump again.
It calls reaaddir, access, .... to get to final directory and then tries to open file, server responds with NFS4ERR_EXPIRED, client tries again, gets the error again and it seems to give up and move on to other files where the story repeats for each one of them. All directory lookups, readdirs, etc. work fine though. Here is just the open, it's retry and errors for the first file that failed:
102126 17:11:02.769335 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102127) PUTFH | OPEN DH: 0xcbbed605/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR
102127 17:11:02.769477 10.210.6.6 10.210.16.12 NFS 150 V4 Reply (Call In 102126) PUTFH | OPEN Status: NFS4ERR_EXPIRED
102128 17:11:02.769544 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102129) PUTFH | OPEN DH: 0xcbbed605/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR
102129 17:11:02.769638 10.210.6.6 10.210.16.12 NFS 150 V4 Reply (Call In 102128) PUTFH | OPEN Status: NFS4ERR_EXPIRED
After multiple open() calls failed it, the nfs client eventually does:
102831 17:12:02.917119 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102832) RENEW CID: 0xab20
102832 17:12:02.964745 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102831) RENEW Status: NFS4ERR_EXPIRED
102834 17:12:02.967293 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102835) RENEW CID: 0xab20
102835 17:12:02.967972 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102834) RENEW Status: NFS4ERR_EXPIRED
102836 17:12:02.968021 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102837) SETCLIENTID
102837 17:12:02.968332 10.210.6.6 10.210.16.12 NFS 198 V4 Reply (Call In 102836) SETCLIENTID
102838 17:12:02.968365 10.210.16.12 10.210.6.6 NFS 234 V4 Call (Reply In 102839) SETCLIENTID_CONFIRM
102839 17:12:02.981174 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102838) SETCLIENTID_CONFIRM
which fixes the issue and a moment later it can open files fine:
102874 17:12:04.838647 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102875) PUTFH | OPEN DH: 0x6030bc46/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR
102875 17:12:04.838844 10.210.6.6 10.210.16.12 NFS 478 V4 Reply (Call In 102874) PUTFH | OPEN StateID: 0x9aaf | GETFH | ACCESS, [Access Denied: XE], [Allowed: RD MD XT] | GETATTR
The last time the client issues RENEW operation before the first failure was:
101689 17:06:16.805156 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101691) RENEW CID: 0xab20
101691 17:06:16.838027 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101689) RENEW
101694 17:07:16.965131 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101697) RENEW CID: 0xab20
101697 17:07:16.997082 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101694) RENEW
101701 17:08:17.125137 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101703) RENEW CID: 0xab20
101703 17:08:17.157783 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101701) RENEW
101706 17:09:17.285124 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101709) RENEW CID: 0xab20
101709 17:09:17.320893 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101706) RENEW
... no more RENEW operations until:
102831 17:12:02.917119 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102832) RENEW CID: 0xab20
Notice that it's been issuing RENEW every 60s (fileserver sets the grace period to 90s), and then it hasn't done so for almost 166s which is more than the 90s session timeout requested by server. For sanity let's check when the first NFS4_EXPIRED error happened. The last RENEW before the issue was at 17:09:17 so unless the sessions was renewed (it wasn't) the server should start responding with errors from about 17:10:47. The first NFS4ERR_EXPIRED is recorded at 17:11:02 which is the first OPEN call happening during the time frame.
Looks like a bug in the nfs client.

Let's run some additional debugging.
Network packet capture:
# tcpdump -i aggr1 -w /var/tmp/tcpdump.20191119.out host 10.210.16.12  &
Let's trace all checks for nfsv4 lease expire and record all cases where it did expire, this is traced on the nfs server:
# dtrace -q -n fbt::rfs4_lease_expired:entry' {self->cp=args[0];}' 
                     -n fbt::rfs4_lease_expired:return'
                        /arg1 && self->cp/
                        {this->s=(struct sockaddr_in *)&self->cp->rc_addr;
                         printf("%Y %s client_ip: %s client_id: %x forced_expire: %d last_access: %u dt: %d rfs4_lease_time: %d ret: %d\n",
                                  walltimestamp, probefunc, inet_ntop(this->s->sin_family, (void *)&this->s->sin_addr), 
                                  self->cp->rc_clientid, self->cp->rc_forced_expire, self->cp->rc_last_access, 
                                  (walltimestamp/1000000000)-self->cp->rc_last_access, `rfs4_lease_time, arg1);
                         self->cp=0;stack()}'  
                     -n fbt::rfs4_lease_expired:return'/self->cp/{self->cp=0;}' -o /var/tmp/rfs4_lease_expired_failed.out &
Let's record all nfs RENEW calls:
# dtrace -q -n nfsv4:::op-renew-start'
                     {self->t=timestamp; 
                      printf("%Y %s client_id: 0x%x noi_xid: %u %s %d:%d %s\n", walltimestamp, probename, args[2]->clientid,
                             args[1]->noi_xid, args[0]->ci_remote, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath);}' 
                     -n nfsv4:::op-renew-done'
                        /self->t/
                        {printf("%Y %s noi_xid: %u %s %s %d:%d %s ret: %d took: %d seconds\n", walltimestamp, probename,
                                args[1]->noi_xid, args[0]->ci_remote, args[0]->ci_local, args[1]->noi_cred->cr_uid,
                                args[1]->noi_cred->cr_gid, args[1]->noi_curpath, args[2]->status,
                                (timestamp-self->t)/1000000000);
                         self->t=0;}'
                     -o /var/tmp/nfs4-renew.log &
Let's catch all nfsv4 operations and their statuses on nfs server:
# cat /var/tmp/ss
#!/bin/bash

dtrace -q -n nfsv4:::op-open-downgrade-start,nfsv4:::op-secinfo-start,nfsv4:::op-readdir-start,nfsv4:::op-setclientid-start,
             nfsv4:::op-remove-start,nfsv4:::cb-recall-start,nfsv4:::op-rename-start,nfsv4:::op-commit-start,nfsv4:::op-savefh-start,
             nfsv4:::op-delegpurge-start,nfsv4:::op-openattr-start,nfsv4:::op-delegreturn-start,nfsv4:::op-putpubfh-start,
             nfsv4:::op-nverify-start,nfsv4:::op-verify-start,nfsv4:::op-setattr-start,nfsv4:::op-getattr-start,
             nfsv4:::op-open-confirm-start,nfsv4:::null-start,nfsv4:::op-readlink-start,nfsv4:::op-lockt-start,nfsv4:::op-locku-start,
             nfsv4:::op-putrootfh-start,nfsv4:::op-close-start,nfsv4:::op-access-start,nfsv4:::op-getfh-start,nfsv4:::op-read-start,
             nfsv4:::op-lock-start,nfsv4:::op-open-start,nfsv4:::op-link-start,nfsv4:::op-renew-start,nfsv4:::op-write-start,
             nfsv4:::op-restorefh-start,nfsv4:::op-lookupp-start,nfsv4:::op-release-lockowner-start,nfsv4:::compound-start,
             nfsv4:::op-putfh-start,nfsv4:::op-setclientid-confirm-start,nfsv4:::op-lookup-start,nfsv4:::op-create-start'
             /args[0]->ci_remote=="10.210.16.12"/
             {self->t=timestamp;
              printf("%Y %s %s %s\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local);}' \
              
          -n nfsv4:::op-open-downgrade-done,nfsv4:::op-readdir-done,nfsv4:::op-access-done,nfsv4:::op-lock-done,nfsv4:::op-link-done,
             nfsv4:::op-setclientid-done,nfsv4:::op-open-confirm-done,nfsv4:::op-open-done,nfsv4:::op-create-done,
             nfsv4:::op-secinfo-done,nfsv4:::op-locku-done,nfsv4:::op-close-done,nfsv4:::op-write-done,
             nfsv4:::op-delegreturn-done,nfsv4:::op-getattr-done,nfsv4:::op-verify-done,nfsv4:::op-getfh-done,
             nfsv4:::op-putfh-done,nfsv4:::op-nverify-done,nfsv4:::op-readlink-done,nfsv4:::op-renew-done,
             nfsv4:::op-savefh-done,nfsv4:::op-read-done,nfsv4:::compound-done,nfsv4:::cb-recall-done,
             nfsv4:::op-lookup-done,nfsv4:::op-restorefh-done,nfsv4:::op-release-lockowner-done,nfsv4:::op-lookupp-done,
             nfsv4:::op-openattr-done,nfsv4:::op-putpubfh-done,nfsv4:::op-putrootfh-done,
             nfsv4:::op-setclientid-confirm-done,nfsv4:::op-remove-done,nfsv4:::op-rename-done,nfsv4:::op-commit-done,
             nfsv4:::op-delegpurge-done,nfsv4:::op-setattr-done,nfsv4:::op-lockt-done'
             /self->t/
             {printf("%Y %s %s %s took: %d ret: %d\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local,
                     (timestamp-self->t)/1000000000, args[2]->status);
              self->t=0;}' \
              
    -n nfsv4:::null-done'/self->t/
            {printf("%Y %s %s %s took: %d\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local,
                    (timestamp-self->t)/1000000000);
             self->t=0;}' \
  -o /var/tmp/nfs4_all_ops.log

# /var/tmp/ss &
The list of running jobs on the nfs server is:
# jobs
[2]   Running                 tcpdump -i aggr1 -w /var/tmp/tcpdump.20191119.out host 10.210.16.12 &
[3]   Running                 /var/tmp/ss &
[4]-  Running                 dtrace -q -n nfsv4:::op-renew-start'{self->t=timestamp; printf("%Y %s client_id: 0x%x noi_xid: %u %s %d:%d %s\n", walltimestamp, probename, args[2]->clientid, args[1]->noi_xid, args[0]->ci_remote, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath);}' -n nfsv4:::op-renew-done'/self->t/{printf("%Y %s noi_xid: %u %s %s %d:%d %s ret: %d took: %d seconds\n", walltimestamp, probename, args[1]->noi_xid, args[0]->ci_remote, args[0]->ci_local, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath, args[2]->status, (timestamp-self->t)/1000000000);self->t=0;}' -o /var/tmp/nfs4-renew.log &  (wd: /var/log/smblog)
[5]+  Running                 dtrace -q -n fbt::rfs4_lease_expired:entry'{self->cp=args[0];}' -n fbt::rfs4_lease_expired:return'/arg1 && self->cp/{this->s=(struct sockaddr_in *)&self->cp->rc_addr; printf("%Y %s client_ip: %s client_id: %x forced_expire: %d last_access: %u dt: %d rfs4_lease_time: %d ret: %d\n", walltimestamp, probefunc, inet_ntop(this->s->sin_family, (void *)&this->s->sin_addr), self->cp->rc_clientid, self->cp->rc_forced_expire, self->cp->rc_last_access, (walltimestamp/1000000000)-self->cp->rc_last_access, `rfs4_lease_time, arg1);self->cp=0;stack()}' -n fbt::rfs4_lease_expired:return'/self->cp/{self->cp=0;}' -o /var/tmp/rfs4_lease_expired_failed.out &
These have been running without production traffic, so there is little going on there.
Still client needs to renew its nfsv4 session as long as the filesystem is mounted, etc.
First let's see how often the client issues RENEW operations and when it fails:
# grep "op-renew-done.*10.210.16.12" /var/tmp/nfs4-renew.log | less
...
2019 Nov 19 17:30:01 op-renew-done noi_xid: 807596244 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:31:01 op-renew-done noi_xid: 824373460 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:31:01 op-renew-done noi_xid: 2357017293 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:32:02 op-renew-done noi_xid: 841150676 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:32:02 op-renew-done noi_xid: 2373794509 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:33:02 op-renew-done noi_xid: 857927892 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:33:02 op-renew-done noi_xid: 2390571725 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:34:02 op-renew-done noi_xid: 2407348941 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:34:02 op-renew-done noi_xid: 874705108 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:35:02 op-renew-done noi_xid: 2424126157 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:35:02 op-renew-done noi_xid: 891482324 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:36:02 op-renew-done noi_xid: 2440903373 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:36:02 op-renew-done noi_xid: 908259540 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:37:02 op-renew-done noi_xid: 925036756 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:37:02 op-renew-done noi_xid: 2457680589 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:38:02 op-renew-done noi_xid: 2474457805 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:38:02 op-renew-done noi_xid: 941813972 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:39:03 op-renew-done noi_xid: 958591188 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:39:03 op-renew-done noi_xid: 2491235021 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 796801741 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 813578957 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3491950804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3508728020 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:41:02 op-renew-done noi_xid: 863910605 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:41:02 op-renew-done noi_xid: 3559059668 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
...
It's been issuing the RENEW ops at every 60s but then there is a gap of 114s (17:39:03 - 17:40:57) which is more than 90s requested by the server, so the server replies with 10011 (NFS4ERR_EXPIRED). If now an open() happened it would have failed as this client doesn't seem to handle NFS4ERR_RENEW returned by open. Notice that it hits the issue for both server IP addresses at the same time (both served by the same nfs server), that's why there are two RENEW calls. It then starts working again and it starts sending new RENEW calls regularly. It must have issued setclientid to fix the state though, let's check it in the other log with all nfs operations:
# less /var/tmp/nfs4_all_ops.log
...
2019 Nov 19 17:38:02 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:38:02 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:39:03 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:39:03 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:39:03 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:39:03 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:39:03 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:39:03 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0

... many getattr, access, etc. operations

2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-confirm-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-confirm-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:40:57 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:41:02 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:41:02 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:41:02 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:41:02 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:41:02 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:41:02 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:42:03 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:42:03 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:42:03 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
...
Yes, it did issue setclientid (for both 10.210.6.56 and .6).
Let's see how often it issues the setclientid:
# ggrep op-setclientid-done /var/tmp/nfs4_all_ops.log
2019 Nov 19 13:11:00 op-setclientid-done 10.210.16.12 took: 0 ret: 0
2019 Nov 19 13:40:55 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 14:40:56 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 15:11:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 15:40:54 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 15:40:54 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 16:40:48 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 16:40:48 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 18:40:55 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 18:40:55 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 19:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 19:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 20:11:03 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 20:11:03 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 20:40:51 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 20:40:51 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 21:10:49 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 21:10:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 22:40:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 22:40:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 23:11:08 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 23:11:08 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 23:40:58 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 23:40:58 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 00:41:06 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 00:41:06 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 01:11:02 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 01:11:02 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 02:10:54 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 02:10:54 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 03:10:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 03:10:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 03:41:06 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 03:41:06 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 04:11:15 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 04:11:15 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 04:41:00 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 04:41:00 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 05:11:03 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 05:11:03 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 05:40:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 05:40:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 06:11:09 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 06:11:09 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 06:41:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 06:41:05 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 07:11:05 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 07:11:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 07:40:58 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 07:40:58 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 08:10:49 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 08:10:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 10:41:18 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 10:41:18 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 11:11:23 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 11:11:23 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
It issues the setclientid calls either every 1hr or at about every 30 minutes intervals.
Let's see if it corresponds to RENEW called which got 10011 (NFS4ERR_EXPIRED):
# grep "op-renew-done.*10.210.16.12.*10011" /var/tmp/nfs4-renew.log
...
2019 Nov 19 13:11:00 op-renew-done noi_xid: 3380034253 10.210.16.12 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 13:11:00 op-renew-done noi_xid: 3363257037 10.210.16.12 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 13:40:55 op-renew-done noi_xid: 2205694669 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 13:40:55 op-renew-done noi_xid: 2222471885 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 14:40:56 op-renew-done noi_xid: 3866704589 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 14:40:56 op-renew-done noi_xid: 3883481805 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:11:05 op-renew-done noi_xid: 2709142221 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:11:05 op-renew-done noi_xid: 2725919437 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 1232812749 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 1249589965 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 4196397268 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 4213174484 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 1713500372 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 1696723156 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 3162258125 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 3179035341 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 796801741 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 813578957 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3491950804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3508728020 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 2407480013 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 2424257229 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 690286804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 673509588 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 4018158285 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 2149970132 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 2166747348 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 4034935501 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
...
It does. So every ~30 minutes or ~1hr the client doesn't issuew RENEW ontime and has to call SETCLIENTID. It seems that if during the window there is an OPEN issued it will get NFS4ERR_EXPIRED and fail as Linux doesn't seem to issue RENEW and then retry the OPEN. I think the file can't be already opened and attributes can't be cached to hit the issue.
Now the question is why Linux is late here to issue RENEW sometimes.

I re-enabled application and it failed, while stap on the nfs client recorded:
# less /var/tmp/failed_stats_and_opens.log
...
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newlstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_open("/mnt/.../.zip") = -5 (EIO) : -5
...
From the debug log on the nfs client (for the 16:41:51 case):
Nov 20 16:41:51 xxx kernel: [702405.312031] NFS: open file(.../XXX.zip)
Nov 20 16:41:51 xxx kernel: [702405.312033] --> nfs_put_client({23})
Nov 20 16:41:51 xxx kernel: [702405.312051] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=1024
Nov 20 16:41:51 xxx kernel: [702405.312053] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Nov 20 16:41:51 xxx kernel: [702405.312212] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Nov 20 16:41:51 xxx kernel: [702405.312227] nfs4_map_errors could not handle NFSv4 error 10011
Nov 20 16:41:51 xxx kernel: [702405.312233] NFS: atomic_open(0:109/65275), XXX.zip
Nov 20 16:41:51 xxx kernel: [702405.312268] --> nfs_put_client({23})
Nov 20 16:41:51 xxx kernel: [702405.312297] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=1024
Nov 20 16:41:51 xxx kernel: [702405.312299] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Nov 20 16:41:51 xxx kernel: [702405.312468] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Nov 20 16:41:51 xxx kernel: [702405.312478] nfs4_map_errors could not handle NFSv4 error 10011
On the nfs server:
# grep '20 16:[34].*-done.*10.210.6.6' /var/tmp/nfs4_all_ops.log | less
...
2019 Nov 20 16:41:51 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
...
2019 Nov 20 16:41:51 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
...
Let's see renew operations:
# ggrep "op-renew-done.*10.210.16.12 10.210.6.6" /var/tmp/nfs4-renew.log | ggrep '20 16:[234]'
2019 Nov 20 16:20:46 op-renew-done noi_xid: 2640048340 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:21:46 op-renew-done noi_xid: 2656825556 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:22:46 op-renew-done noi_xid: 2673602772 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:23:46 op-renew-done noi_xid: 2690379988 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:26:18 op-renew-done noi_xid: 3479630036 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:27:19 op-renew-done noi_xid: 3496407252 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:28:19 op-renew-done noi_xid: 3513184468 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:29:19 op-renew-done noi_xid: 3529961684 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:30:19 op-renew-done noi_xid: 3546738900 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:31:19 op-renew-done noi_xid: 3563516116 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:32:19 op-renew-done noi_xid: 3580293332 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:33:19 op-renew-done noi_xid: 3597070548 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:34:20 op-renew-done noi_xid: 3613847764 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:35:20 op-renew-done noi_xid: 3630624980 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:36:20 op-renew-done noi_xid: 3647402196 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:37:20 op-renew-done noi_xid: 3664179412 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:38:20 op-renew-done noi_xid: 3680956628 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:42:16 op-renew-done noi_xid: 3932811476 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 20 16:42:16 op-renew-done noi_xid: 3949588692 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 20 16:44:46 op-renew-done noi_xid: 1438445780 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:45:46 op-renew-done noi_xid: 1455222996 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:46:46 op-renew-done noi_xid: 1472000212 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:47:46 op-renew-done noi_xid: 1488777428 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:48:46 op-renew-done noi_xid: 1505554644 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:49:46 op-renew-done noi_xid: 1522331860 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds

# grep '20 1[56]:.*setclie.*10.210.6.6' /var/tmp/nfs4_all_ops.log
2019 Nov 20 15:11:09 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:11:09 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 15:11:09 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:11:09 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 15:40:49 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:40:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 15:40:49 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:40:49 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 16:42:16 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 20 16:42:16 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 16:42:16 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 20 16:42:16 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
And the client recovery from its debug log:
Nov 20 16:42:16 xxx kernel: [702430.126654] nfs4_renew_state: start
Nov 20 16:42:16 xxx kernel: [702430.126677] nfs4_renew_state: done
Nov 20 16:42:16 xxx kernel: [702430.245186] nfs4_schedule_lease_recovery: scheduling lease recovery for server xxx
Nov 20 16:42:16 xxx kernel: [702430.247174] nfs4_schedule_state_renewal: requeueing work. Lease period = 30
Nov 20 16:42:16 xxx kernel: [702430.247177] --> nfs_put_client({24})
Nov 20 16:42:16 xxx kernel: [702430.247817] nfs4_recovery_handle_error: handled error -10011 for server xxx
Nov 20 16:42:16 xxx kernel: [702430.247821] NFS call  setclientid auth=RPCSEC_GSS, 'Linux NFSv4.0 xxx/10.210.6.6'
Nov 20 16:42:16 xxx kernel: [702430.248233] NFS reply setclientid: 0
Nov 20 16:42:16 xxx kernel: [702430.248235] NFS call  setclientid_confirm auth=RPCSEC_GSS, (client ID ad23bf2032c90f)
Nov 20 16:42:16 xxx kernel: [702430.257473] NFS reply setclientid_confirm: 0
This is a gap of 236s (well over the 90s) between 16:38:20 and 16:42:16 and all open errors happened only during this exact time period:
# grep '20 1[56]:.*open-done.*10.210.6.6' /var/tmp/nfs4_all_ops.log | grep 10011
2019 Nov 20 16:40:36 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:40:36 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:15 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:15 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:30 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:30 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:32 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:31 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:32 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
nfs4renewd.c /nfs4_renew_state() prints “failed to call renewd. Reason: lease not expired” and later dprintk("%s: requeueing work. Lease period = %ld\n", func, (timeout + HZ - 1) / HZ);
Prints 57s, this suggests 3s before fsinfo() was run on the server, if solaris wouldn’t renew lease on fsinfo() then RENEW wouldn’t be send by nfs4_renew_state() and the client would end up without a lease and would have to renew.
I used stap to confirm it.
nfs4proc.c /nfs4_do_fsinfo() nfs4_do_open() -> _nfs4_do_open() -> nfs4_recover_expired_lease() -> nfs4_client_recover_expired_lease() -> test_bit(NFS4CLNT_LEASE_EXPIRED) 10 times in a loop, if not set return EIO
The only thing which will set the bit is nfs4renewd.c/nfs4_renew_state() but if something else is modifying clp->cl_last_renewal and casing nfs4_renew_state() not to set the bit, then open() won’t recover. We do see the dprintk("%s: requeueing work. Lease period = %ld\n"… printed by nfs4_schedule_state_renewal() with value of 57 (and others) which means 3s before something updated cl_last_renewal.
Now nfs4_do_fsinfo() calls nfs4_set_lease_period().
static struct nfs4_state *nfs4_do_open(struct inode *dir,
                                        struct nfs_open_context *ctx,
                                        int flags,
                                        struct iattr *sattr,
                                        struct nfs4_label *label,
                                        int *opened)
{
        struct nfs_server *server = NFS_SERVER(dir);
        struct nfs4_exception exception = { };
        struct nfs4_state *res;
        int status;

        do {
                status = _nfs4_do_open(dir, ctx, flags, sattr, label, opened);
                res = ctx->state;
                trace_nfs4_open_file(ctx, flags, status);
                if (status == 0)
                        break;
…
        return res;
}
static int _nfs4_do_open(struct inode *dir,
                        struct nfs_open_context *ctx,
                        int flags,
                        struct iattr *sattr,
                        struct nfs4_label *label,
                        int *opened)
{
…
        status = nfs4_recover_expired_lease(server);
        if (status != 0)
                goto err_put_state_owner;
…
err_put_state_owner:
        nfs4_put_state_owner(sp);
out_err:
        return status;
}
static int nfs4_recover_expired_lease(struct nfs_server *server)
{
        return nfs4_client_recover_expired_lease(server->nfs_client);
}

nfs4state.c:
int nfs4_client_recover_expired_lease(struct nfs_client *clp)
{
        unsigned int loop;
        int ret;

        for (loop = NFS4_MAX_LOOP_ON_RECOVER; loop != 0; loop--) {
                ret = nfs4_wait_clnt_recover(clp);
                if (ret != 0)
                        break;
                if (!test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state) &&
                    !test_bit(NFS4CLNT_CHECK_LEASE,&clp->cl_state))
                        break;
                nfs4_schedule_state_manager(clp);
                ret = -EIO;
        }
        return ret;
}
$ grep -r NFS4_MAX_LOOP_ON_RECOVER . | grep define
./nfs4_fs.h:#define NFS4_MAX_LOOP_ON_RECOVER (10)
nfs4renewd.c:
nfs4_renew_state(struct work_struct *work)
{
        const struct nfs4_state_maintenance_ops *ops;
        struct nfs_client *clp =
                container_of(work, struct nfs_client, cl_renewd.work);
        struct rpc_cred *cred;
        long lease;
        unsigned long last, now;
        unsigned renew_flags = 0;

        ops = clp->cl_mvops->state_renewal_ops;
        dprintk("%s: start\n", __func__);

        if (test_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state))
                goto out;

        spin_lock(&clp->cl_lock);
        lease = clp->cl_lease_time;
        last = clp->cl_last_renewal;
        now = jiffies;
        /* Are we close to a lease timeout? */
        if (time_after(now, last + lease/3))
                renew_flags |= NFS4_RENEW_TIMEOUT;
        if (nfs_delegations_present(clp))
                renew_flags |= NFS4_RENEW_DELEGATION_CB;

        if (renew_flags != 0) {
                cred = ops->get_state_renewal_cred_locked(clp);
                spin_unlock(&clp->cl_lock);
                if (cred == NULL) {
                        if (!(renew_flags & NFS4_RENEW_DELEGATION_CB)) {
                                set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
                                goto out;
                        }
                        nfs_expire_all_delegations(clp);
                } else {
                        int ret;

                        /* Queue an asynchronous RENEW. */
                        ret = ops->sched_state_renewal(clp, cred, renew_flags);
                        put_rpccred(cred);
                        switch (ret) {
                        default:
                                goto out_exp;
                        case -EAGAIN:
                        case -ENOMEM:
                                break;
                        }
                }
        } else {
                dprintk("%s: failed to call renewd. Reason: lease not expired \n",
                                __func__);
                spin_unlock(&clp->cl_lock);
        }
        nfs4_schedule_state_renewal(clp);
out_exp:
        nfs_expire_unreferenced_delegations(clp);
out:
        dprintk("%s: done\n", __func__);
}


void
nfs4_schedule_state_renewal(struct nfs_client *clp)
{
        long timeout;

        spin_lock(&clp->cl_lock);
        timeout = (2 * clp->cl_lease_time) / 3 + (long)clp->cl_last_renewal
                - (long)jiffies;
        if (timeout < 5 * HZ)
                timeout = 5 * HZ;
        dprintk("%s: requeueing work. Lease period = %ld\n",
                        __func__, (timeout + HZ - 1) / HZ);
        mod_delayed_work(system_wq, &clp->cl_renewd, timeout);
        set_bit(NFS_CS_RENEWD, &clp->cl_res_state);
        spin_unlock(&clp->cl_lock);
}


/**
 * nfs4_set_lease_period - Sets the lease period on a nfs_client
 *
 * @clp: pointer to nfs_client
 * @lease: new value for lease period
 * @lastrenewed: time at which lease was last renewed
 */
void nfs4_set_lease_period(struct nfs_client *clp,
                unsigned long lease,
                unsigned long lastrenewed)
{
        spin_lock(&clp->cl_lock);
        clp->cl_lease_time = lease;
        clp->cl_last_renewal = lastrenewed;
        spin_unlock(&clp->cl_lock);

        /* Cap maximum reconnect timeout at 1/2 lease period */
        rpc_cap_max_reconnect_timeout(clp->cl_rpcclient, lease >> 1);
}
static int nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fsinfo *fsinfo)
{
        struct nfs4_exception exception = { };
        unsigned long now = jiffies;
        int err;

        do {
                err = _nfs4_do_fsinfo(server, fhandle, fsinfo);
                trace_nfs4_fsinfo(server, fhandle, fsinfo->fattr, err);
                if (err == 0) {
                        nfs4_set_lease_period(server->nfs_client,
                                        fsinfo->lease_time * HZ,
                                        now);
                        break;
                }
                err = nfs4_handle_exception(server, err, &exception);
        } while (exception.retry);
        return err;
}
static int _nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle,
                struct nfs_fsinfo *fsinfo)
{
        struct nfs4_fsinfo_arg args = {
                .fh = fhandle,
                .bitmask = server->attr_bitmask,
        };
        struct nfs4_fsinfo_res res = {
                .fsinfo = fsinfo,
        };
        struct rpc_message msg = {
                .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_FSINFO],
                .rpc_argp = &args,
                .rpc_resp = &res,
        };

        return nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
}
Many nfs4 ops reset lease as well, so they assume on successful op server would do the same.
For example:
static void do_renew_lease(struct nfs_client *clp, unsigned long timestamp)
{
        spin_lock(&clp->cl_lock);
        if (time_before(clp->cl_last_renewal,timestamp))
                clp->cl_last_renewal = timestamp;
        spin_unlock(&clp->cl_lock);
}


static void renew_lease(const struct nfs_server *server, unsigned long timestamp)
{
        struct nfs_client *clp = server->nfs_client;

        if (!nfs4_has_session(clp))
                do_renew_lease(clp, timestamp);
}
}

static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
{
        struct nfs4_opendata *data = calldata;

        nfs40_sequence_done(task, &data->c_res.seq_res);

        data->rpc_status = task->tk_status;
        if (data->rpc_status == 0) {
                nfs4_stateid_copy(&data->o_res.stateid, &data->c_res.stateid);
                nfs_confirm_seqid(&data->owner->so_seqid, 0);
                renew_lease(data->o_res.server, data->timestamp);
                data->rpc_done = 1;
        }
}

static void nfs4_open_done(struct rpc_task *task, void *calldata)
{
        struct nfs4_opendata *data = calldata;

        data->rpc_status = task->tk_status;

        if (!nfs4_sequence_process(task, &data->o_res.seq_res))
                return;

        if (task->tk_status == 0) {
                if (data->o_res.f_attr->valid & NFS_ATTR_FATTR_TYPE) {
                        switch (data->o_res.f_attr->mode & S_IFMT) {
                        case S_IFREG:
                                break;
                        case S_IFLNK:
                                data->rpc_status = -ELOOP;
                                break;
                        case S_IFDIR:
                                data->rpc_status = -EISDIR;
                                break;
                        default:
                                data->rpc_status = -ENOTDIR;
                        }
                }
                renew_lease(data->o_res.server, data->timestamp);
                if (!(data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM))
                        nfs_confirm_seqid(&data->owner->so_seqid, 0);
        }
        data->rpc_done = 1;
}
Let’s see what is calling nfs4_do_fsinfo():
# cat ./nfs4_do_fsinfo.stp
#!/usr/bin/env stap


/*
 module("nfsv4").function("nfs4_do_fsinfo@fs/nfs/nfs4proc.c:4504") $server:struct nfs_server* $fhandle:struct nfs_fh* $fsinfo:struct nfs_fsinfo* $exception:struct nfs4_exception
*/


probe  module("nfsv4").function("nfs4_do_fsinfo") {

        printf("%s %s proc: %s[%d] uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(), pid(), uid())
        print_backtrace()
        exit()
}
# ./nfs4_do_fsinfo.stp

WARNING: Missing unwind data for a module, rerun with 'stap -d nfs'
Thu Nov 21 16:39:53 2019 nfs4_do_fsinfo proc: ls[18701] uid: 10006
 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
 0xffffffffc05924e0 [nfs]
 0xffffffffc0592b43 [nfs]
 0xffffffffc059f098 [nfs]
 0xffffffff9104d1fe
 0xffffffff9106b377
 0xffffffff9106b454
 0xffffffffc05ab014 [nfs]
 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
 0xffffffffc05ab21a [nfs]
 0xffffffff91053a69
 0xffffffff910549ad
 0xffffffff91058483
 0xffffffff9105878b
 0xffffffff91058cea
 0xffffffff9105954b
 0xffffffff9105b497
 0xffffffff9105b501
 0xffffffff9104e1e3


# ./nfs4_do_fsinfo.stp
WARNING: Missing unwind data for a module, rerun with 'stap -d nfs'
Thu Nov 21 17:10:05 2019 nfs4_do_fsinfo proc: ls[29463] uid: 10006
 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
 0xffffffffc05924e0 [nfs]
 0xffffffffc0592b43 [nfs]
 0xffffffffc059f098 [nfs]
 0xffffffff9104d1fe
 0xffffffff9106b377
 0xffffffff9106b454
 0xffffffffc05ab014 [nfs]
 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
 0xffffffffc05ab21a [nfs]
 0xffffffff91053a69
 0xffffffff910549ad
 0xffffffff91058483
 0xffffffff9105878b
 0xffffffff91058cea
 0xffffffff9105954b
 0xffffffff9105b497
 0xffffffff9105b501
 0xffffffff9104e1e3
Let’s get some more info:
# cat ./nfs4_do_fsinfo.stp
#!/usr/bin/env stap


/*
 module("nfsv4").function("nfs4_do_fsinfo@fs/nfs/nfs4proc.c:4504") $server:struct nfs_server* $fhandle:struct nfs_fh* $fsinfo:struct nfs_fsinfo* $exception:struct nfs4_exception
*/



probe  module("nfsv4").function("nfs4_do_fsinfo") {

        printf("%s %s proc: %s[%d] ppid: %d args: %s cwd: %s uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(), pid(),
               ppid(), cmdline_str(), fullpath_struct_path(task_cwd_path(task_current())), uid())
        print_backtrace()
        exit()
}
# ./nfs4_do_fsinfo.stp
WARNING: Missing unwind data for a module, rerun with 'stap -d nfs'
Thu Nov 21 17:40:05 2019 nfs4_do_fsinfo proc: ls[9275] ppid: 9273 args: ls /mnt/.../ cwd: /... uid: 10006
 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
 0xffffffffc05924e0 [nfs]
 0xffffffffc0592b43 [nfs]
 0xffffffffc059f098 [nfs]
 0xffffffff9104d1fe
 0xffffffff9106b377
 0xffffffff9106b454
 0xffffffffc05ab014 [nfs]
 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
 0xffffffffc05ab21a [nfs]
 0xffffffff91053a69
 0xffffffff910549ad
 0xffffffff91058483
 0xffffffff9105878b
 0xffffffff91058cea
 0xffffffff9105954b
 0xffffffff9105b497
 0xffffffff9105b501
 0xffffffff9104e1e3
Notice that fsifno is actually a set of GETATTR operations in nfsv4:
/*
 * FSINFO request
 */
static void nfs4_xdr_enc_fsinfo(struct rpc_rqst *req, struct xdr_stream *xdr,
                                struct nfs4_fsinfo_arg *args)
{
        struct compound_hdr hdr = {
                .minorversion = nfs4_xdr_minorversion(&args->seq_args),
        };

        encode_compound_hdr(xdr, req, &hdr);
        encode_sequence(xdr, &args->seq_args, &hdr);
        encode_putfh(xdr, args->fh, &hdr);
        encode_fsinfo(xdr, args->bitmask, &hdr);
        encode_nops(&hdr);
}
static void encode_fsinfo(struct xdr_stream *xdr, const u32* bitmask, struct compound_hdr *hdr)
{
        encode_getattr_three(xdr,
                             bitmask[0] & nfs4_fsinfo_bitmap[0],
                             bitmask[1] & nfs4_fsinfo_bitmap[1],
                             bitmask[2] & nfs4_fsinfo_bitmap[2],
                             hdr);
}
Let's double check that the nfs4_do_fsinfo() is caused by vfs/fstat() call or similar by ls:
# umount /mnt/...
$ ./sh/checkXXXMounts.sh
Checking /mnt/...
...
# stap -g -d/bin/ls --ldd --all-modules ./nfs4_do_fsinfo.stp
WARNING: missing unwind/symbol data for module 'stap_6d2b184a53eae72ccbc95adec92f2d3c_25253'
Fri Nov 22 10:32:32 2019 nfs4_do_fsinfo proc: ls[27494] ppid: 27492 args: ls /mnt/... cwd: /... uid: 10006
 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
 0xffffffffc05924e0 : nfs_probe_fsinfo+0xc0/0x660 [nfs]
 0xffffffffc0592b43 : nfs_clone_server+0xc3/0x1f0 [nfs]
 0xffffffffc059f098 : nfs_xdev_mount+0x98/0x130 [nfs]
 0xffffffff9104d1fe : mount_fs+0x3e/0x1b0 [kernel]
 0xffffffff9106b377 : vfs_kern_mount+0x67/0x110 [kernel]
 0xffffffff9106b454 : vfs_submount+0x34/0x40 [kernel]
 0xffffffffc05ab014 : nfs_do_submount+0xb4/0x160 [nfs]
 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
 0xffffffffc05ab21a : nfs_d_automount+0xca/0x1c0 [nfs]
 0xffffffff91053a69 : follow_managed+0x149/0x310 [kernel]
 0xffffffff910549ad : lookup_fast+0x12d/0x230 [kernel]
 0xffffffff91058483 : link_path_walk+0x1b3/0x8b0 [kernel]
 0xffffffff9105878b : link_path_walk+0x4bb/0x8b0 [kernel]
 0xffffffff91058cea : path_lookupat+0x7a/0x8b0 [kernel]
 0xffffffff9105954b : filename_lookup+0x2b/0xc0 [kernel]
 0xffffffff9105b497 : user_path_at_empty+0x67/0xc0 [kernel]
 0xffffffff9105b501 : user_path_at+0x11/0x20 [kernel]
 0xffffffff9104e1e3 : vfs_fstatat+0x63/0xc0 [kernel]
-------
 0x7fd54c693275 : __xstat64+0x15/0x50 [/usr/lib64/libc-2.17.so]
 0x408841 : gobble_file.constprop.50+0x8b1/0xeb0 [/usr/bin/ls]
 0x402e41 : main+0x531/0x1a14 [/usr/bin/ls]
 0x7fd54c5c6505 : __libc_start_main+0xf5/0x1c0 [/usr/lib64/libc-2.17.so]
 0x40434d : _start+0x29/0x2c [/usr/bin/ls]
-------
Yes, it is. So the
btw: just running ls is enough to trigger it after data is no longer cached or better unmount it first:
# umount /mnt/...
# ls -l /mnt/... >/dev/null

# stap -g -d/bin/ls --ldd --all-modules ./nfs4_do_fsinfo.stp
Fri Nov 22 10:37:17 2019 nfs4_do_fsinfo proc: ls[29995] ppid: 13019 args: ls --color=auto -l /mnt/... cwd: /... uid: 0
 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
 0xffffffffc05924e0 : nfs_probe_fsinfo+0xc0/0x660 [nfs]
 0xffffffffc0592b43 : nfs_clone_server+0xc3/0x1f0 [nfs]
 0xffffffffc059f098 : nfs_xdev_mount+0x98/0x130 [nfs]
 0xffffffff9104d1fe : mount_fs+0x3e/0x1b0 [kernel]
 0xffffffff9106b377 : vfs_kern_mount+0x67/0x110 [kernel]
 0xffffffff9106b454 : vfs_submount+0x34/0x40 [kernel]
 0xffffffffc05ab014 : nfs_do_submount+0xb4/0x160 [nfs]
 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
 0xffffffffc05ab21a : nfs_d_automount+0xca/0x1c0 [nfs]
 0xffffffff91053a69 : follow_managed+0x149/0x310 [kernel]
 0xffffffff910549ad : lookup_fast+0x12d/0x230 [kernel]
 0xffffffff91058ddd : path_lookupat+0x16d/0x8b0 [kernel]
 0xffffffff9105954b : filename_lookup+0x2b/0xc0 [kernel]
 0xffffffff9105b497 : user_path_at_empty+0x67/0xc0 [kernel]
 0xffffffff9105b501 : user_path_at+0x11/0x20 [kernel]
 0xffffffff9104e1e3 : vfs_fstatat+0x63/0xc0 [kernel]
 0xffffffff9104e601 : SYSC_newlstat+0x31/0x60 [kernel]

So what happens here is that Linux will unmount cross-mountpoints after /proc/sys/fs/nfs/nfs_mountpoint_timeout which on this server has the default value of 500. Next time anything triggers mounting a cross-mouint linux will send fsinfo() call (nfs4_do_fsinfo()).
For now, assuming the nfs4_do_fsinfo() only gets called on mounting cross-mount (not necesarilly true), to test a possible workaround let's modify linux so it wouldn't unmount sub-mounts for 7 days - on linux client:
# bc
60*60*24*7
604800

# echo 604800 > /proc/sys/fs/nfs/nfs_mountpoint_timeout

# date
Fri 22 Nov 12:23:27 GMT 2019
Also running:
# stap -g --ldd --all-modules --suppress-handler-errors -o /var/tmp/nfs4_do_fsinfo.log ./nfs4_do_fsinfo.stp  &
[6] 10295

I've managed to reproduce the issue.
# cat ./nfs4_renew_state.stp
#!/bin/stap


/*
 * module("nfsv4").function("nfs4_renew_state@fs/nfs/nfs4renewd.c:55") $work:struct work_struct* $clp:struct nfs_client* $renew_flags:unsigned int $__func__:char[] const
*/

probe  module("nfsv4").function("nfs4_renew_state") {

        printf("%s %s least_time: %d secs_to_renew: %d server: %s\n", ctime(gettimeofday_s()), probefunc(), 
               $clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(), kernel_string($clp->cl_hostname))
}
# ./nfs4_renew_state.stp | grep xxx
Fri Nov 22 15:40:41 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
Fri Nov 22 15:41:42 2019 nfs4_renew_state least_time: 90 secs_to_renew: 5 server: xxx
Fri Nov 22 15:42:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
Few seconds before 15:40:41+60s I did:
# umount /mnt/...; ls -l /mnt/.../
ls: cannot open directory /mnt/.../: Permission denied
So now Linux is not going to send RENEW and assume its lease got extended.
At 15:40:41+90s solaris server will assume the client's lease expired, so if we call open between 15:42:11 - 15:42:36 then open should fail:
$ date;strace -e trace=open cat /mnt/.../xxx >/dev/null
Fri 22 Nov 15:42:13 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../xxx", O_RDONLY) = -1 EIO (Input/output error)
...

$ date;strace -e trace=open cat /mnt/.../xxx >/dev/null
Fri 22 Nov 15:42:38 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../xxx", O_RDONLY) = 3
+++ exited with 0 +++
and it did fail, while when re-run again after linux sent renew then it worked.
Let's confirm solaris did respond with 10011:
# ggrep "Nov 22 .*-done.*10.210.16.12" /var/tmp/nfs4-renew.log | grep "ret: 10011" | tail
...
2019 Nov 22 15:42:36 op-renew-done noi_xid: 2258385869 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 22 15:42:36 op-renew-done noi_xid: 2275163085 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds

# grep 15:42:13 /var/tmp/nfs4_all_ops.log | grep 10011
2019 Nov 22 15:42:13 op-open-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011
It did.

I've also managed to reproduce the same issue against Linux NFS server.
# cat ./nfs4_renew_state.stp
#!/bin/stap


/*
 * module("nfsv4").function("nfs4_renew_state@fs/nfs/nfs4renewd.c:55") $work:struct work_struct* $clp:struct nfs_client* $renew_flags:unsigned int $__func__:char[] const
*/

probe  module("nfsv4").function("nfs4_renew_state") {

        printf("%s %s least_time: %d secs_to_renew: %d server: %s\n", ctime(gettimeofday_s()), probefunc(), 
               $clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(), kernel_string($clp->cl_hostname))
}
# ./nfs4_renew_state.stp | grep xxx
Mon Nov 25 13:49:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 40 server: xxx
Mon Nov 25 13:50:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
Mon Nov 25 13:51:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 3 server: xxx
Mon Nov 25 13:52:33 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
Mon Nov 25 13:52:38 2019 nfs4_renew_state least_time: 90 secs_to_renew: 65 server: xxx
Mon Nov 25 13:53:38 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
^C
At 13:51:32 (4 seconds before client was to send RENEW operation) I run:
# date; umount /mnt/...; ls -l /mnt/.../
Mon 25 Nov 13:51:32 GMT 2019
total 7
-rw-r--r-- 1 root root 5 Nov 25 13:48 test_f1
Then at 13:52:08, 2 seconds after the client's lease expired from server point of view (13:50:36 + 90s = 13:52:06), I run:
$ date;strace -e trace=open cat /mnt/.../test_f1 >/dev/null
Mon 25 Nov 13:52:08 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../test_f1", O_RDONLY) = -1 EIO (Input/output error)
...
Then, at 13:52:33 (13:51:36 + 57s), the client send RENEW operation, and had to recover via SETCLIENTID, etc. and now open() works fine:
$ date;strace -e trace=open cat /mnt/.../test_f1 >/dev/null
Mon 25 Nov 13:53:39 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../test_f1", O_RDONLY) = 3
+++ exited with 0 +++
The Linux nfs server behaves the same as Solaris nfs server.
The linux is CentOS 7.4 running 3.10.0-693.11.6.el7.x86_64.
Also reproduced against Solaris 11.4.15.5.0 NFS server.

A workaround on Linux nfs client hosts is to increase sysctl/fs.nfs.nfs_mountpoint_timeout from 500 (default) to max allowed value: 2147483.

I couldn't reproduce the issue against NetApp filers though, some clarification from NetApp support:
Hi Robert,
 
I have gone through this case and your request for explanation on why ONTAP is not throwing “NFS4ERR_EXPIRED” when the lease is expired while other servers does that.
 
Why the Client did not send the RENEW call is client’s behavior and ONTAP cannot control that.
Now on why ONTAP is not throwing “NFS4ERR_EXPIRED” even after lease is expired (30 secs), I believe the RFC has the answer for that.
 
This section explains the behavior of other servers you have tested.
 
https://tools.ietf.org/html/rfc7530#section-9.6.3
9.6.3.  Network Partitions and Recovery
If this occurs, the server may cancel
the lease and free all locks held for the client.
 
The following section explains the ONTAP’s behavior.
 
https://tools.ietf.org/html/rfc7530#section-9.6.3.1
9.6.3.1.  Courtesy Locks
As a courtesy to the client or as an optimization, the server may
continue to hold locks, including delegations, on behalf of a client
for which recent communication has extended beyond the lease period,
delaying the cancellation of the lease.
 
This section clarifies that both the behaviors can be implemented.
 
https://tools.ietf.org/html/rfc7530#section-9.6.3.2
9.6.3.2.  Lease Cancellation
 
As a result of lease expiration, leases may be canceled, either
immediately upon expiration or subsequently, depending on the
occurrence of a conflicting lock or extension of the period of
partition beyond what the server will tolerate.
 
I believe ONTAP is behaving according to the RFC and in more optimized way.
In that case, please let me know if you need any further explanation from our side on the same.
...

Hey Robert,
 
We tested in the lab thoroughly to check for how long we keep the courtesy locks.
 
The courtesy locks will be kept for 2*lease time, if there is no conflicting request from any other client for the same file.
Please note that it might take some more time (40-50 secs) to clear those states from memory as the thread runs periodically to clean that.
 
Yes, they are tunable. As the time depends on lease-time, you can tweak that for a particular NFS server to control the courtesy lock time.
...
Knowing now more how ONTAP behaves I re-created the issue as well, although the behavior is different.
It is easier to cause do_fsinfo() every 10s as it will permanently stop linux client from sending RENEW operation.
# while [ 1 ] ; do date; umount /mnt/...; ls -l /mnt/.../; sleep 10; done
Few minutes later I run:
$ date; strace -e trace=open cat /mnt/.../f1
Fri  6 Dec 10:52:53 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../f1", O_RDONLY) = 3
test
+++ exited with 0 +++
Notice that the open did not fail. Network dump:
 34 3.656314120 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
 35 3.861180170 10.210.196.134 -> 10.210.16.12 NFS 150 V4 Reply (Call In 34) OPEN Status: NFS4ERR_STALE_CLIENTID
 36 3.861352838 10.210.16.12 -> 10.210.196.134 NFS 230 V4 Call RENEW CID: 0x5b6f
 37 4.066210651 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 36) RENEW Status: NFS4ERR_STALE_CLIENTID
 38 4.066357237 10.210.16.12 -> 10.210.196.134 NFS 338 V4 Call SETCLIENTID
 39 4.271134903 10.210.196.134 -> 10.210.16.12 NFS 198 V4 Reply (Call In 38) SETCLIENTID
 40 4.271256568 10.210.16.12 -> 10.210.196.134 NFS 238 V4 Call SETCLIENTID_CONFIRM
 41 4.476085500 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 40) SETCLIENTID_CONFIRM
 44 4.476296996 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
 49 4.681334155 10.210.196.134 -> 10.210.16.12 NFS 470 V4 Reply (Call In 44) OPEN StateID: 0x900a
 50 4.683485544 10.210.16.12 -> 10.210.196.134 NFS 258 V4 Call READ StateID: 0x900a Offset: 0 Len: 5
 53 4.888349784 10.210.196.134 -> 10.210.16.12 NFS 166 V4 Reply (Call In 50) READ
 54 4.888516437 10.210.16.12 -> 10.210.196.134 NFS 266 V4 Call CLOSE StateID: 0x900a
 56 5.093391513 10.210.196.134 -> 10.210.16.12 NFS 230 V4 Reply (Call In 54) CLOSE
So ONTAP responds with NFS4ERR_STALE_CLIENTID instead of NFS4ERR_EXPIRED.
I think this is handled by fs/nfs/nfs4proc.c: nfs4_do_handle_exception()
Let's confirm it:
# cat ./nfs4_do_handle_exception.stp
#!/bin/stap


// /bin/stap -g -d /bin/ls --ldd --all-modules


/*
 * module("nfsv4").function("nfs4_do_handle_exception@fs/nfs/nfs4proc.c:383") $server:struct nfs_server* $errorcode:int $exception:struct nfs4_exception* $__func__:char[] const
*/


probe  module("nfsv4").function("nfs4_do_handle_exception") {

        printf("%s %s proc: %s[%d] ppid: %d args: %s cwd: %s uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(), 
               pid(), ppid(), cmdline_str(), fullpath_struct_path(task_cwd_path(task_current())), uid())
        print_backtrace()
        printf("-------\n")
        //print_ubacktrace()
        //printf("-------\n")
}

# ./nfs4_do_handle_exception.stp --all-modules -o /var/tmp/nfs4_do_handle_exception.out &
$ date; strace -e trace=open cat /mnt/.../f1
Fri  6 Dec 11:28:32 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../f1", O_RDONLY) = 3
test
+++ exited with 0 +++
 24 11:28:32.419449800 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x49b41796, [Check: RD LU MD XT DL]
 25 11:28:32.624732095 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 24) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
 27 11:28:32.625066507 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x951a2495, [Check: RD LU MD XT DL]
 28 11:28:32.830312285 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 27) ACCESS, [Allowed: RD LU MD XT DL]
 29 11:28:32.830765027 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x6d8f1235, [Check: RD LU MD XT DL]
 30 11:28:33.046220915 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 29) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
 31 11:28:33.046554768 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
 32 11:28:33.251846761 10.210.196.134 -> 10.210.16.12 NFS 150 V4 Reply (Call In 31) OPEN Status: NFS4ERR_STALE_CLIENTID
 33 11:28:33.252467673 10.210.16.12 -> 10.210.196.134 NFS 230 V4 Call RENEW CID: 0x3e59
 34 11:28:33.457711654 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 33) RENEW Status: NFS4ERR_STALE_CLIENTID
 35 11:28:33.457873262 10.210.16.12 -> 10.210.196.134 NFS 338 V4 Call SETCLIENTID
 36 11:28:33.663124308 10.210.196.134 -> 10.210.16.12 NFS 198 V4 Reply (Call In 35) SETCLIENTID
 37 11:28:33.663262733 10.210.16.12 -> 10.210.196.134 NFS 238 V4 Call SETCLIENTID_CONFIRM
 38 11:28:33.868584648 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 37) SETCLIENTID_CONFIRM
 41 11:28:33.868821676 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
 46 11:28:34.074306487 10.210.196.134 -> 10.210.16.12 NFS 470 V4 Reply (Call In 41) OPEN StateID: 0x985c
 47 11:28:34.074508377 10.210.16.12 -> 10.210.196.134 NFS 258 V4 Call READ StateID: 0x985c Offset: 0 Len: 5
 50 11:28:34.279740081 10.210.196.134 -> 10.210.16.12 NFS 166 V4 Reply (Call In 47) READ
 51 11:28:34.279909666 10.210.16.12 -> 10.210.196.134 NFS 266 V4 Call CLOSE StateID: 0x985c
 53 11:28:34.485207915 10.210.196.134 -> 10.210.16.12 NFS 230 V4 Reply (Call In 51) CLOSE
# less /var/tmp/nfs4_do_handle_exception.out
...
-------
Fri Dec  6 11:28:32 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /...w uid: 14100
 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
 0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4]
 0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs]
 0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs]
 0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel]
 0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel]
 0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel]
 0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel]
 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
 0x7f6b0d228710
-------
Fri Dec  6 11:28:32 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100
 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
 0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4]
 0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs]
 0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs]
 0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel]
 0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel]
 0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel]
 0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel]
 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
 0x7f6b0d228710
-------
Fri Dec  6 11:28:33 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100
 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
 0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4]
 0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs]
 0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs]
 0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel]
 0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel]
 0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel]
 0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel]
 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
 0x7f6b0d228710
-------
Fri Dec  6 11:28:33 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100
 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
 0xffffffffc05e0296 : nfs4_do_open+0x2c6/0x630 [nfsv4]
 0xffffffffc05e06f7 : nfs4_atomic_open+0xf7/0x110 [nfsv4]
 0xffffffffc0597317 : nfs_atomic_open+0x197/0x3a0 [nfs]
 0xffffffff91057623 : do_last+0xa53/0x1290 [kernel]
 0xffffffff91059c5d : path_openat+0xcd/0x5a0 [kernel]
 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
 0x7f6b0d228710
-------
The nfs4_do_open() function does end up calling nfs4_do_handle_exception(), the relevant code is:
static int nfs4_do_handle_exception(struct nfs_server *server,
                int errorcode, struct nfs4_exception *exception)
{
...
                case -NFS4ERR_EXPIRED:
                case -NFS4ERR_BAD_STATEID:
                        if (inode != NULL && stateid != NULL) {
                                nfs_inode_find_state_and_recover(inode,
                                                stateid);
                                goto wait_on_recovery;
                        }
...
                case -NFS4ERR_STALE_STATEID:
                case -NFS4ERR_STALE_CLIENTID:
                        nfs4_schedule_lease_recovery(clp);
                        goto wait_on_recovery;
...
In another function they handle both of them the same though:
@@ -5960,6 +5962,8 @@ static void nfs4_release_lockowner_done(struct rpc_task *task, void *calldata)
                break;
        case -NFS4ERR_STALE_CLIENTID:
        case -NFS4ERR_EXPIRED:
+               nfs4_schedule_lease_recovery(server->nfs_client);
+               break;
Not sure why it is handled differently in nfs4_do_handle_exception().
This was changed by:
commit 272289a3df7297fd798a8ddf652c6d29f675de28
Author: Trond Myklebust
Date:   Thu Sep 22 13:39:15 2016 -0400

    NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid

    If we're not yet sure that all state has expired or been revoked, we
    should try to do a minimal recovery on just the one stateid.

    Signed-off-by: Trond Myklebust
    Tested-by: Oleg Drokin
    Signed-off-by: Anna Schumaker

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 389089e..409b9fa 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -397,13 +397,23 @@ static int nfs4_do_handle_exception(struct nfs_server *server,
        exception->delay = 0;
        exception->recovering = 0;
        exception->retry = 0;
+
+       if (stateid == NULL && state != NULL)
+               stateid = &state->stateid;

The new nfs v4 client behavior in Linux was introduced by https://github.com/torvalds/linux/commit/83ca7f5ab31f57506bb35482e4b0426be653196a. This is the first time the linux client assumes implicit lease renewal when it calls fsinfio() (getattrs for mounts). The code was later refactored a little bit but latest upstream version still does it (at the time of debugging the issue).


The below two patches I contributed to Linux kernel are now also present in CentOS 8.3 with kernel 4.18.0-240.1.1.el8_3 and newer. https://github.com/torvalds/linux/commit/924491f2e476f7234d722b24171a4daff61bbe13
NFSv4: try lease recovery on NFS4ERR_EXPIRED

Currently, if an nfs server returns NFS4ERR_EXPIRED to open(),
we return EIO to applications without even trying to recover.

Fixes: 272289a ("NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid")
Signed-off-by: Robert Milkowski
Reviewed-by: Trond Myklebust
Signed-off-by: Anna Schumaker
https://github.com/torvalds/linux/commit/7dc2993a9e51dd2eee955944efec65bef90265b7
NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals

Currently, each time nfs4_do_fsinfo() is called it will do an implicit
NFS4 lease renewal, which is not compliant with the NFS4 specification.
This can result in a lease being expired by an NFS server.

Commit 83ca7f5 ("NFS: Avoid PUTROOTFH when managing leases")
introduced implicit client lease renewal in nfs4_do_fsinfo(),
which can result in the NFSv4.0 lease to expire on a server side,
and servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.

This can easily be reproduced by frequently unmounting a sub-mount,
then stat'ing it to get it mounted again, which will delay or even
completely prevent client from sending RENEW operations if no other
NFS operations are issued. Eventually nfs server will expire client's
lease and return an error on file access or next RENEW.

This can also happen when a sub-mount is automatically unmounted
due to inactivity (after nfs_mountpoint_expiry_timeout), then it is
mounted again via stat(). This can result in a short window during
which client's lease will expire on a server but not on a client.
This specific case was observed on production systems.

This patch removes the implicit lease renewal from nfs4_do_fsinfo().

Fixes: 83ca7f5 ("NFS: Avoid PUTROOTFH when managing leases")
Signed-off-by: Robert Milkowski
Signed-off-by: Anna Schumaker