Fix: unable to kmem_alloc enough memory for scatter/gather list in ZFS Solaris 10.5
The ZFS Pool on my server was showing degraded state. After checking the SMART status of the constituent drives and finding no problem, I discovered that there’s a bug in Solaris 10.5 where the system reports a growing number of errors and eventually fails the pool. dmesg
shows an error unable to kmem_alloc enough memory for scatter/gather list
, however, there is actually nothing wrong with the pool. Running zpool status shows degraded state:
[email protected]:~# zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM CAP Product /Disks IOstat mess SN/LUN rpool ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 32.2 GB VMware Virtual S S:5 H:25 T:0 000000000000000 errors: No known data errors pool: tank state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 0 in 12h15m with 0 errors on Fri Dec 21 00:08:43 2020 config: NAME STATE READ WRITE CKSUM CAP Product /Disks IOstat mess SN/LUN tank DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c0t50014EE20BF0750Dd0 ONLINE 0 0 0 4 TB WDC WD40EFRX-68W S:0 H:0 T:0 WDWCC4E6NAXVAS c0t50014EE263348A3Ed0 ONLINE 0 0 0 4 TB WDC WD40EFRX-68W S:0 H:0 T:0 WDWCC4E0FRRRRP c0t50014EE2B69D2D68d0 DEGRADED 0 0 20 too many errors 4 TB WDC WD40EFRX-68W S:0 H:0 T:0 WDWCC4E3AN2Y99 errors: No known data errors
Running zpool clear
recovers the pool:
[email protected]:~# zpool clear [email protected]:~# zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t50014EE20BF0750Dd0 ONLINE 0 0 2 c0t50014EE263348A3Ed0 ONLINE 0 0 0 c0t50014EE2B69D2D68d0 ONLINE 0 0 0