Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] strange nfs crashes



Hello Michal,

In the last TLUG I was suggesting a similar testing to somebody else :-)

On Thu, Apr 15, 2010 at 04:25, Michal Hajek <hajek1@example.com> wrote:
> I have several linux machines which mount nfs dir from yet another linux
> machine (aka server). All worked fine in the office room. Later the
> machines were moved to server-room, and were connected to different
> switch. After that change, nfs starts crashing/freezing.
> I am short of imagination of what could go wrong.
>
> The only apparent change was the switch and thus the speed of net - in
> office we used 100Mb/s and in the server-room there is 1Gb/s switch.
>
> Other services, like ssh or vnc, are not affected. Both work very well.
> The only problem seems to be nfs.
>
> I have checked logs on server and clients too. Nothing suspicious.
>
> One thing I may mention is that I have set MTU on the clients to be
> 1500. But I do not know if that may cause any troubles with new switch,
> since I do not have any experience with switches whatsoever.
>
> Unfortunately, the switch is not mine and I cannot access it. Thus I
> would like to find out some more convincing argument that the problem
> actually is the switch (or not).
>
> I do not exclude other possible sources of nfs misbehaviour, if you have
> suggestions how to troubleshoot that, all are welcome.  Maybe I shall
> mention that nfs freezes typically after several hours (2,3 or so) of
> usage. Freezing means that nfs directory on clients is not accessible.
> (i.e. no response to ls)
>
> Could you please suggest some ideas how to investigate this peculiar
> problem?

First, comparing NFS to ssh is more like comparing bikes and trucks on
a highway...
So if ssh works it only means that your TCP/IP layer is OK for
(usually) small bursts of data over TCP.

NFS usually uses UDP. YOu can by the way try first to switch it to use
TCP. You might loose some performance, but you'll gain reliability.
`man nfs`, but the short answer is mount with `-o proto=tcp`

This will probably solve your problem, but will not find the culprit.
If you want to blame the switch you need packet log of what is going
there and what is getting on the other side. So, sync the time (with
ntp) on both ends, then run tcpdump/tshark on both machines and log
all traffic until the freeze occurs.

tcpdump -i <interface> -s 65535 -w <some-file>

Then analyze if you are getting everything you sent in both
directions. Check for retransmitted packets as well. I think there was
a good utility for comparing packet captures of the send and receive
end, but I cannot remember its name :-| Searching for it I found
pcapdiff which seems like it will do:
http://www.eff.org/testyourisp/pcapdiff

Opening the captures in wireshark will allow you to go through them manually.

Try and let us know (if) you had (any) success ;-)

Cheers,
Kalin.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links