Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] [Suspected Spam] RAID not seen by system



CL writes:

 > I ran into problems with my RAID

Which level (I have some guesses, but...)?  Hardware, software
(guessing the latter since you mention mdadm)?

You don't use LVM?  (That may be obsolete, I use it because I've
always used it.  DFWAB.)

 > a while back in which one member disk was acting wonky.  I FAILed
 > the wonky disk, installed a brand new one (same maker and model
 > number) and performed a full resynch, which took about 30 days.

30 days doesn't sound reasonable for devices I'd consider plausible.
"33000kbps" for the reshaping implies a little over 8 hours/TB, and in
fact for RAID 5 I would expect only the new drive to be written (all
have to be read, though), so a fraction of the time for full
reshaping.

 > 1.  The RAID is "there", tests healthy

What tests?  May as well give output for the tests, please.

 > 4.  At this time, the RAID starts from boot as inactive.
 > Performing a stop and assemble starts the reshape process, which is
 > stuck around 1%.  Reshape speeds start out circa 33,000kbps and
 > fall to 1kbps in around 4~6 hours.

Does this performance degradation happen gradually or suddenly?  If
suddenly, does it take 4-6 hours and then suddenly fall, or does it
happen in less than an hour?  What other work is the machine doing?

To be honest, this does not sound like a healthy array to me.  The job
to be done is not terribly complicated in itself (although doing it
while the array is active sounds like an exercise in concurrent
futility), and by adding a disk you provide plenty of buffer space for
the critical region.  I see no algorithmic reason why the reshape
process should stall; in fact as the process goes on it should get
faster as it becomes unnecessary to buffer the transfers.

Are you seeing disk errors on the array component devices in the logs?

 > After about 18~24 hours, the reshape stops completely

How do you know it stops?  Does it say "stopped" or does it say "0kbps"?

 > 5.  sudo mdadm -D /dev/md0 produces completely normal output.

Please provide this output.  Have you tried "-D --verbose"?

 > I want to be able to mount and view the RAID without losing the
 > data.

Have you tried mounting it read-only?  That will prevent the reshape
from restarting, and according to my understanding of the mdadm
manpage, the array should be in a consistent state, and can be read.

It is quite possible that you don't have a working RAID right now.
(I'm assuming that you have a striped array (probably RAID 5 or 6
since you were able to reconstruct the array with a single disk
failure) rather mirroring (RAID 1).)  The array is undoubtedly marked
as having reshape in progress, which at least part of the time
probably means it's in an inconsistent state.  Reshaping involves
actually moving data (that's why speed is measured in kbps), so the
normal algorithms used for reading and writing the array as a block
device will be incorrect; the reshape will be using the block devices
representing the individual hardware components.

Somebody with recovery experience may be able to help with getting the
data out of an array with a reshape in progress, but if mount ro
doesn't work, I think your only option for mount and view is to
complete the reshape and reassembly.

Steve

-- 
Associate Professor              Division of Policy and Planning Science
http://turnbull.sk.tsukuba.ac.jp/     Faculty of Systems and Information
Email: turnbull@example.com                   University of Tsukuba
Tel: 029-853-5175                 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links