Ahoj vsem
Po zakoupeni serveru Dell PowerEdge R610 dochazelo k neustalym vypadkum, kdy server 1 - 2 minuty vubec nereagoval a nedochazelo ani k zadnemu zapisu do jakychkoliv logovacich souboru. Po servisni oprave a vymene zakladni desky se problem zdanlive vyresil a pri testovani se nic neprojevilo. Ovsem pote, co zacal byt server vytezovan se vypadky objevily znovu. Tentokrat "pouze" 2 - 3 za den. Technik od Dellu nenasel zadnou chybu v hardware (to ovsem nenasel ani napoprve a vymena zakladni desky problem temer vyresila). Domniva se, ze chyba je v software ovsem tuto verzi Ubuntu ve stejne konfiguraci provozuji i na dalsich 5ti serverech Dell a bez problemu. Prosim o nasmerovani jakym zpusobem chybu najit a vyresit pripadne jak technikovi dokazat, ze se jedna o chybu hardware. Monitoring zabbix v dobe vypadku neprijima zadna data ze serveru. V podstate je v dobe vypadku serveru uplne mrtvy, nedostupny z internetu ani pres ssh ani pres KVM a nezapisuje do zadnych souboru. Prikladam vypis ze syslogu a sar z doby vypadku:
syslog:
Nov 27 16:54:44 s6 postfix/cleanup[23475]: 706727820CD: message-id=<000701ca6f7a$16894ab0$4a37b6d2@nbnet.nb.ca>
Nov 27 16:54:44 s6 postfix/smtpd[14249]: 5DA4678213C: client=189-11-20-179.mganm702.dsl.brasiltelecom.net.br[189.11.20.179]
Nov 27 16:54:44 s6 postfix/qmgr[12859]: 706727820CD: from=<rclundoo@nbnet.nb.ca>, size=861, nrcpt=1 (queue active)
Nov 27 16:54:45 s6 postfix/smtpd[2892]: disconnect from unknown[189.238.15.3]
Nov 27 16:54:49 s6 postfix/smtpd[14382]: connect from unknown[190.66.169.227]
Nov 27 16:54:50 s6 postfix/smtpd[14382]: setting up TLS connection from unknown[190.66.169.227]
Nov 27 16:56:02 s6 postfix/cleanup[23570]: 5DA4678213C: message-id=<000701ca6f78$9b5ad3a0$ae78a432@acapacific.com.sg>
Nov 27 16:56:02 s6 postfix/smtpd[14391]: warning: 78.128.29.46: hostname ip-30-46.powernet.bg verification failed: Name or service not known
Nov 27 16:56:02 s6 postfix/smtpd[14391]: connect from unknown[78.128.29.46]
Nov 27 16:56:02 s6 imapd: Connection, ip=[::ffff:127.0.0.1]
sar (syslog):
# sar -s 16:50:00 -e 17:00:00
Linux 2.6.24-25-server (www.priklad.cz) 11/27/09
16:50:01 CPU %user %nice %system %iowait %steal %idle
16:51:01 all 4.50 0.03 2.91 0.05 0.00 92.52
16:52:01 all 19.37 0.16 30.32 0.04 0.00 50.11
16:53:01 all 22.21 0.08 32.17 0.08 0.00 45.46
16:54:01 all 24.56 0.15 37.35 0.04 0.00 37.90
16:56:02 all 19.00 0.22 28.01 0.06 0.00 52.71
16:57:01 all 8.41 0.03 7.17 0.07 0.00 84.32
16:58:01 all 5.20 0.02 2.67 0.10 0.00 92.01
16:59:01 all 6.79 0.03 3.50 0.04 0.00 89.63
Average: all 9.11 0.05 9.40 0.06 0.00 81.38
Diky za jakykoliv napad nebo nasmerovani.
SupuS