Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] network losing connection



Matt Gross wrote:
On Tue, 2007-02-27 at 10:54 +0100, Sigurd Urdahl wrote:
Matt Gross wrote:
Hi Everyone,

I am having a lot of problems with my Fedora Core 6 server connection
[..]
If it's an ethernet problem you should be able to see it in the output from ifconfig:

sigurdur@??:~$ /sbin/ifconfig
here is mine:

[root@?? ~]# ifconfig
eth1 Link encap:Ethernet HWaddr 00:16:17:B7:D9:03
OK, then we can rule that one out.
Is the problem noticeable between computers on the same LAN, or only through a broad band router or something like that?
Could you describe your network layout? What components are installed? Is the traffic passing through simple hubs, switches or a router?

If possible, it might be an idea to install something like smokeping on both the server and another box on the same net and then test connectivity both between these two boxes, between each of them and your default gateway, and from each to some common point outside your net. Make sure you use IPs when testing, not hostnames, to make sure name resolution is not a factor. (this actually applies to other tests you do too, e.g arp -n, tcpdump -n)
If the former, try setting static arp entries on both servers to see if it's an arp related problem.

I don't think it is a problem with the arp cache. After the problem happened, the arp cache is still showing a correct entry:

[root@?? ~]# arp -a
? (192.168.0.254) at 00:90:CC:42:71:58 [ether] on eth1

Does the question mark above signify anything important?

It indicates that arp was not able to resolve the hostname for 192.168.0.254.Could be significant, but not necessarily. (arp -n gives the same output it seems).
I did notice that when I am having the problem, the following takes
several seconds to complete:

[root@?? ~]# arp -va
? (192.168.0.254) at <incomplete> on eth1
Entries: 1 Skipped: 0 Found: 1
This indicates that you don't have an arp entry for 192.168.0.254, and thus you are unable to send traffic there. That would be what you'd expect if the network connection is down, if the network is down on 192.168.0.254, if 192.168.0.254 is down or completely overworked so it won't answer your ARP REQUESTs.
I noticed the <incomplete> and tried to add the arp entry manually, but
that did not fix my problem.
I would assume that the problem then is that you actually lose the connection between the server and 192.169.0.254. The next question then is whether it's the server or 192.168.0.254 that loses the connection. f you can test from another box too it would probably be useful.

Another thing that strikes me, you say the problem shows up when you have high traffic volumes, and somewhere in this thread you indicate that you have Gbit interfaces. _If_ 192.168.0.254 is a (cheap) broadband router (it has a very typical default gateway IP:-) it might very well start sweating if your server feeds it data a Gbit speed while it can only get rid of the data in a normal broadband connection speed. Could you try to force the network interface on the server down to 100 mbit/FD?

kind regards,
-sig

--
Sigurd Urdahl
Linux, goofing, cooking, making fire, computer security, having a
beer. Give me good music.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links