存档

文章标签 ‘linux’

ATA bus error SError: { PHYRdyChg DevExch }

2011年6月20日 评论已被关闭
ATA bus error in /var/log/messages:
SCSI device sdb: 490350672 512-byte hdwr sectors (251060 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
ata1.00: irq_stat 0x00400040, connection status changed
ata1: SError: { PHYRdyChg DevExch }
ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         res 40/00:c4:d1:67:e4/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: link is slow to respond, please be patient (ready=0)
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 490350672 512-byte hdwr sectors (251060 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 104320 blocks.
md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
md: md0: sync done.
md: syncing RAID array md1
RAID1 conf printout:
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 8385856 blocks.
 — wd:2 rd:2
 disk 0, wo:0, o:1, dev:sda1
 disk 1, wo:0, o:1, dev:sdb1
md: md1: sync done.
RAID1 conf printout:
 — wd:2 rd:2
 disk 0, wo:0, o:1, dev:sda2
 disk 1, wo:0, o:1, dev:sdb2
 cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]      
md1 : active raid1 sdb2[1] sda2[0]
      8385856 blocks [2/2] [UU]      
md2 : active raid1 sdb3[1] sda3[0]
      236677504 blocks [2/2] [UU]
smartctl -a /dev/sdb
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2502ABYS-01B7A0
Serial Number:    WD-WCAT1C148773
Firmware Version: 02.03B02
User Capacity:    251,059,544,064 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Jun 20 03:23:22 2011 UTC
SMART support is: Available – device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (4800) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time: (   2) minutes.
Extended self-test routine
recommended polling time: (  59) minutes.
Conveyance self-test routine
recommended polling time: (   5) minutes.
SCT capabilities:       (0x303f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       –       0
  3 Spin_Up_Time            0x0027   200   195   021    Pre-fail  Always       –       1000
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       –       36
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       –       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always     &nbsp
; –       0
  9 Power_On_Hours          0x0032   078   078   000    Old_age   Always       –       16351
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       –       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       –       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       33
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       –       30
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       –       36
194 Temperature_Celsius     0x0022   112   099   000    Old_age   Always       –       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       –       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      –       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       –       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      –       1
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        51         –
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Poor quality SATA cables possibly,may need change SATA cables.
ref link:
https://ata.wiki.kernel.org/index.php/Libata_error_messages
 
分类: linux 标签:

linux 环境变量中配置参数 use function but not use alias

2011年6月20日 评论已被关闭
例子:使用两个参数匹配域名,其他部分自动补齐,减少键盘输入,然后ssh登陆到目的主机。
cat .bashrc 
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# update the PATH
export PATH=${PATH}:/usr/sbin:/sbin
sss() {
  ssh $1.$2.2hei.net
}
use:
$sss test blog 
The authenticity of host ‘test.blog.2hei.net (192.168.1.12)’ can’t be established.
RSA key fingerprint is 00:45:c8:28:29:cd:a6:50:26:a6:5d:23:a4:fb:10:9a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘test.blog.2hei.net,192.168.1.12’ (RSA) to the list of known hosts.
2hei@test.blog.2hei.net’s password: 
Last login: Thu Jun  9 06:12:21 2011 from 192.168.1.11
Kickstart-installed Red Hat Linux Wed Sep 15 22:25:51 UTC 2010
$
分类: linux 标签:

bonding in linux

2011年2月18日 评论已被关闭

####### bond0 #######
dond0 use dhcp
$cat ifcfg-bond0
DEVICE=bond0
BOOTPROTO=dhcp
ONBOOT=yes

$cat ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes

DEVICE=eth1
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes

####### bond1 #######
$cat ifcfg-bond1
DEVICE=bond1
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.56.101
NETMASK=255.255.255.0
BONDING_OPTS=”mode=1 miimon=80 arp_interval=500 arp_ip_target=192.168.56.1″          

$cat ifcfg-eth2
DEVICE=eth2
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond1
SLAVE=yes

$cat ifcfg-eth3
DEVICE=eth3
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond1
SLAVE=yes

#######################3
cat /etc/modprobe.conf
add:
alias bond1 bonding
options bond1 miimon=80 mode=1

$modprobe bonding
$sudo /sbin/service network restart
$cat /proc/net/bonding/bond0
$cat /proc/net/bonding/bond1

[root@2hei.net ~]# ifconfig
bond0     Link encap:Ethernet  HWaddr 08:00:27:88:8C:E3  
          inet addr:10.100.10.110  Bcast:10.100.10.255  Mask:255.255.254.0
          inet6 addr: fe80::a00:27ff:fe88:8ce3/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:50626 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2674 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4491313 (4.2 MiB)  TX bytes:425663 (415.6 KiB)

bond1     Link encap:Ethernet  HWaddr 08:00:27:6F:EE:83  
          inet addr:192.168.56.101  Bcast:192.168.56.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe6f:ee83/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:1980 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2064 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:124007 (121.1 KiB)  TX bytes:111090 (108.4 KiB)

eth0      Link encap:Ethernet  HWaddr 08:00:27:88:8C:E3  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:26130 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2405 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2316248 (2.2 MiB)  TX bytes:386534 (377.4 KiB)

eth1      Link encap:Ethernet  HWaddr 08:00:27:88:8C:E3  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:24505 errors:0 dropped:0 overruns:0 frame:0
          TX packets:289 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2175605 (2.0 MiB)  TX bytes:42289 (41.2 KiB)

eth2      Link encap:Ethernet  HWaddr 08:00:27:6F:EE:83  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:1980 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2032 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:124007 (121.1 KiB)  TX bytes:105295 (102.8 KiB)

eth3      Link encap:Ethernet  HWaddr 08:00:27:6F:EE:83  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:5795 (5.6 KiB)

[root@2hei.net ~]# mii-tool
eth0: no autonegotiation, 100baseTx-FD, link ok
eth1: no autonegotiation, 100baseTx-FD, link ok
eth2: no autonegotiation, 100baseTx-FD, link ok
eth3: no autonegotiation, 100baseTx-FD, link ok

[root@2hei.net ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 08:00:27:88:8c:e3

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 08:00:27:2d:3d:56

[root@2hei.net ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 500
ARP IP target/s (n.n.n.n form): 192.168.56.1

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 08:00:27:6f:ee:83

Slave Interface: eth3
MII Status: down
Link Failure Count: 1
Permanent HW addr: 08:00:27:9c:5a:b8          

分类: linux 标签:

finding netcard driver type in linux

2010年6月29日 1 条评论

we have two ways:

1. finding netcard driver by syslog
grep -i ‘driver’ /var/log/messages
or
dmesg | grep -i driver

2.lsmod  
#try to find netcard type.

[root@2hei.net]# modinfo e1000
filename:       /lib/modules/2.6.9-34.ELsmp/kernel/drivers/net/e1000/e1000.ko
parm:           debug:Debug level (0=none,…,16=all)
version:        6.1.16-k3-NAPI 4BCC06D27AAC4C711223CC9
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <linux.nics@intel.com>

[root@2hei.net]# modinfo igb
filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/net/igb/igb.ko
version:        1.3.16-k2
license:        GPL
description:    Intel(R) Gigabit Ethernet Network Driver
author:         Intel Corporation, <e1000-devel@lists.sourceforge.net>
srcversion:     78555F0A019E05BADBD95AA

[root@2hei.net]# modinfo bonding
filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis@lbl.gov and many others
description:    Ethernet Channel Bonding Driver, v3.4.0
version:        3.4.0
license:        GPL
srcversion:     7989A7EEF2EE7B5D78C0E79
depends:        ipv6
vermagic:       2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1

分类: linux 标签:

replacing_hard_disks_in_a_raid1_array

2010年6月9日 评论已被关闭
cat /proc/mdstat
Personalities : 
[raid1] read_ahead 1024 sectors 
md1 : active raid1 sda3[0] sdb3[1] 522048 blocks [2/2] [U_] 
md0 : active raid1 sda2[0] sdb2[1] 4192896 blocks [2/2] [U_] 
md2 : active raid1 sda1[0] sdb1[1] 128384 blocks [2/2] [U_] 
unused devices: <none> 
this shows disk hdb failed!  we will replace it.
work follow it:
Replacing A Failed Hard Drive In A Software RAID1 Array
Version 1.0 
Author: Falko Timme <ft [at] falkotimme [dot] com> 
Last edited 01/21/2007
This guide shows how to remove a failed hard drive from a Linux RAID1 array (software RAID), and how to add a new hard disk to the RAID1 array without losing data.
I do not issue any guarantee that this will work for you!
 
1 Preliminary Note
In this example I have two hard drives, /dev/sda and /dev/sdb, with the partitions /dev/sda1 and /dev/sda2 as well as /dev/sdb1 and /dev/sdb2.
/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.
/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.
/dev/sda1 + /dev/sdb1 = /dev/md0
/dev/sda2 + /dev/sdb2 = /dev/md1
/dev/sdb has failed, and we want to replace it.
 
2 How Do I Tell If A Hard Disk Has Failed?
If a disk has failed, you will probably find a lot of error messages in the log files, e.g. /var/log/messages or /var/log/syslog.
You can also run
cat /proc/mdstat
and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.
 
3 Removing The Failed Disk
To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).
First we mark /dev/sdb1 as failed:
mdadm –manage /dev/md0 –fail /dev/sdb1
The output of
cat /proc/mdstat
should look like this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
      24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]
unused devices: <none>
Then we remove /dev/sdb1 from /dev/md0:
mdadm –manage /dev/md0 –remove /dev/sdb1
The output should be like this:
server1:~# mdadm –manage /dev/md0 –remove /dev/sdb1
mdadm: hot removed /dev/sdb1
And
cat /proc/mdstat
should show this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
      24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]
unused devices: <none>
Now we do the same steps again for /dev/sdb2 (which is part of /dev/md1):
mdadm –manage /dev/md1 –fail /dev/sdb2
cat /proc/mdstat
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
      24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[2](F)
      24418688 blocks [2/1] [U_]
unused devices: <none>
mdadm –manage /dev/md1 –remove /dev/sdb2
server1:~# mdadm –manage /dev/md1 –remove /dev/sdb2
mdadm: hot removed /dev/sdb2
cat /proc/mdstat
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
      24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0]
      24418688 blocks [2/1] [U_]
unused devices: <none>
Then power down the system:
shutdown -h now
and replace the old /dev/sdb hard drive with a new one (it must have at least the same size as the old one – if it’s only a few MB smaller than the old one then rebuilding the arrays will fail).
 
4 Adding The New Hard Disk
After you have changed the hard disk /dev/sdb, boot the system.
The first thing we must do now is to create the exact same partitioning as on /dev/sda. We can do this with one simple command:
sfdisk -d /dev/sda | sfdisk /dev/sdb
You can run
fdisk -l
to check if both hard drives have the same partitioning now.
Next we add /dev/sdb1 to /dev/md0 and /dev/sdb2 to /dev/md1:
mdadm –manage /dev/md0 –add /dev/sdb1
server1:~# mdadm –manage /dev/md0 –add /dev/sdb1
mdadm: re-added /dev/sdb1
mdadm –manage /dev/md1 –add /dev/sdb2
server1:~# mdadm –manage /dev/md1 –add /dev/sdb2
mdadm: re-added /dev/sdb2
Now both arays (/dev/md0 and /dev/md1) will be synchronized. Run
cat /proc/mdstat
to see when it’s finished.
During the synchronization the output will look like this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
      24418688 blocks [2/1] [U_]
      [=>……………….]  recovery =  9.9% (2423168/24418688) finish=2.8min speed=127535K/sec
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/1] [U_]
      [=>……………….]  recovery =  6.4% (1572096/24418688) finish=1.9min speed=196512K/sec
unused devices: <none>
When the synchronization is finished, the output will look like this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
      24418688 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]
unused devices: <none>
That’s it, you have successfully replaced /dev/sdb!
分类: linux 标签: ,

ssh登陆到终端的时候去掉Your default context is root提示

2009年7月15日 评论已被关闭

使用ssh或者telnet登陆到终端的时候,发现有的机器会有如下的提示:
[local@2hei.net ~]$ su –
Password:
Your default context is root:system_r:unconfined_t.

Do you want to choose a different one? [n]

[root@2hei.net ~]# getenforce
Permissive

解决办法一:
[root@2hei.net ~]#  vi /etc/pam.d/su  

session           required     /lib/security/$ISA/pam_selinux.so open multiple
改成
session           required     /lib/security/$ISA/pam_selinux.so open

解决办法二:
vi /etc/selinux/config
# SELINUX= can take one of these three values:
#       enforcing – SELinux security policy is enforced.
#       permissive – SELinux prints warnings instead of enforcing.
#       disabled – SELinux is fully disabled.

SELINUX=permissive
改成
SELINUX=disabled

重启系统或者使用以下命令使SElinux生效
setenforce 0

使用getenforce查看结果是否生效

分类: OpenSource 标签:

linux限制用户密码尝试次数

2009年7月8日 评论已被关闭

设置方式:
vi /etc/pam.d/system-auth

auth        required      /lib/security/$ISA/pam_tally.so onerr=fail no_magic_root
account     required      /lib/security/$ISA/pam_tally.so deny=5 no_magic_root reset

#说明
deny=5 : Deny access if tally for this user exceeds 3 times.
lock_time=180 : Always deny for 180 seconds after failed attempt. There is also unlock_time=n option. It allow access after n seconds after failed attempt. If this option is used the user will be locked out for the specified amount of time after he exceeded his maximum allowed attempts. Otherwise the account is locked until the lock is removed by a manual intervention of the system administrator.
magic_root : If the module is invoked by a user with uid=0 the counter is not incremented. The sys-admin should use this for user launched services, like su, otherwise this argument should be omitted.
no_magic_root : Avoid root account locking, if the module is invoked by a user with uid=0

会有日志记录被锁定的用户
/var/log/faillog

清除被锁定的用户
crontab -l
*/30 * * * * /sbin/pam_tally –reset
或者
*/30 * * * * faillog -r

分类: others 标签: ,

kernel: mptscsih: ioc0: task abort: SUCCESS

2008年12月9日 评论已被关闭

收到服务器报警,查看/var/log/messages 中很多类似下面的日志:
Dec  9 00:03:22 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007ec8bb00)
Dec  9 00:03:22 kernel: sd 0:0:0:0:
Dec  9 00:03:22 kernel:         command: Read(10): 28 00 05 4c 03 6a 00 01 00 00
Dec  9 00:03:23 kernel: mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Dec  9 00:03:23 kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ec8bb00)
Dec  9 00:03:33 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007ec8bb00)
Dec  9 00:03:33 kernel: sd 0:0:0:0:
Dec  9 00:03:33 kernel:         command: Test Unit Ready: 00 00 00 00 00 00
Dec  9 00:03:33 kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Dec  9 00:03:33 kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ec8bb00)
Dec  9 00:03:33 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810077e51380)
Dec  9 00:03:33 kernel: sd 0:0:0:0:

系统状态:
uname -a
Linux 2.6.18-53.1.13.el5 #1 SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/redhat-release
CentOS release 5 (Final)

#smartctl -a /dev/sda
smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: HP       DF072A9844       Version: HPD0
Serial number: DQA2P6B00GMC0648
Device type: disk
Transport protocol: SAS
Local Time is: Tue Dec  9 09:33:35 2008 CST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        65 C
Manufactured in week 48 of year 2006
Current start stop count:      6 times
Recommended maximum start stop count:  10000 times
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0          0.000           0
write:         0        0         0         0          0          0.000           0

Non-medium error count:     2293
No self-tests have been logged
Long (extended) Self Test duration: 1815 seconds [30.2 minutes]

找了半天并没有看出有什么异常来,网上有些资料说是操作系统内核对SAS硬盘的支持不好,也有把责任归咎于dell服务器,但是我的机器可是HP滴!
HP官方站上找到了一点信息  是关于磁带备份操作中的注意事项。

再观察几天,也继续搜索一下解决方案,如果还没有结果的话,我准备重做系统了,降低内核版本,装回32bit再试试看。

分类: OpenSource 标签: