Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- Date: Tue, 05 Oct 2021 21:58:23 +0900
- From: Jim Blackson <blackson@example.com>
- Subject: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- References: <20210928184725.4C0F.A5A534A3@a1d.co.jp> <CAAhy3dud=yzGQyxdSnjDrbVCyzfbNbQeUFTVrB_HUp=P9Hyg=Q@mail.gmail.com>
Hi Ray, On Tue, 5 Oct 2021 13:52:01 +0800 Raymond Wan <rwan.kyoto@example.com> wrote: > Sorry for the late reply and thanks a lot for your advice! After > Christian's first reply, I had already started shifting away from > OCFS2 and over to NFS/ext4. And that meant not doing fsck any more > and copying files to external hard drives so that I can do a reformat. > So, in terms of the danger of data loss, I am fine now! Thank you! You're welcome. I am relieved to hear you are out of danger of data loss. > So, the problem was at one of the upper layers. The hard disks appear > to be healthy. While one server was writing to the OCFS2 file system, > it was restarted because it froze. However, whether something within > the SAN caused it to freeze, I don't know. Just being a little paranoid here... You have a large file system full of important data running on top of a complicated storage system (the SAN). There was an outage. Before the cause of the outage was confirmed, it looked like the top level file system was being modified (by fsck). This is a common recipe for data loss. I was worried fsck could be modifying or deleting critical inodes/pointer to your files, which means losing the location of the data. Taking a dd copy of the logical volume containing the whole file system, then performing a recovery or file copy is one safe option. True, 70TB is pretty big ... :-) I was also worried about the integrity of the SAN. It is very bad if the SAN becomes corrupted. Taking a dd copy of each physical HDD is my preferred backup option if the SAN is suspect. Recovery here means virtually rebuilding the SAN and file system from the physical HDDs - not for the faint of heart. $$$ helps too. :-) > The SAN is healthy. All of the lights are fine and the hard drives appear fine. How about the SAN's internal system logs: any errors show up? > For now, I've moved everything off to external hard drives, Nice. Could be valuable hanging on to those drives for a (long) while. > So, back to one of my earlier ideas. If I knew some part of the > directory structure was corrupted, is it possible to "edit" the data > structure of the directories (i.e., using debugfs.ocfs2, for example, > which I believe has an ext4 equivalent) to "fix" the problem? If this > were to happen again, then I would like to consider this as one > solution. To me, it looks the same as using fsck. First I would confirm the integrity of the SAN, then make backups and/or copy off what I could. Once important data was safe, then one can try to "fix" the file system. Best regards, jimb.
- References:
- Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- From: Raymond Wan
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- Next by Date: [tlug] Job: SRE position at Robotics Startup
- Previous by thread: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- Next by thread: [tlug] Job: SRE position at Robotics Startup
- Index(es):