Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- Date: Tue, 28 Sep 2021 18:47:30 +0900
- From: Jim Blackson <blackson@example.com>
- Subject: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- References: <YUbpc+W897zMrOgz@fluxcoil.net> <611fd353-cde3-0ae7-0dbf-e1b54dc60174@gmail.com>
Hi Ray, Raymond Wan <rwan.kyoto@example.com> wrote: > ... I'm more concerned about data loss and how I can get this file system back into read/write mode. Raymond Wan <rwan.kyoto@example.com> wrote: > The data itself isn't "raw" data, but processed data that represents > about 9 months of processing time...I can't lose it or else I'm doomed... > Wait -- is this mailing list public? Oops... :-P If data loss is a primary concern, please do not fsck around with (write to) the SAN. Worst case is losing the SAN, so how about shutting down the SAN, carefully label and record the position of each HDD/SDD in the SAN (so you known which disk goes in which slot in which array), then "dd/ddrescue" duplicate each and every HDD to another set of matching HDDs. This new set is your backup. After that, you can reboot the SAN and immediately copy off as many files as you can. Once your critical data is copied and verified, then you can debug the SAN and file system with a little less fear. :-) As for SANs, sorry I'm not familiar with ocfs2 and don't know your configuration. However, many commercial SANs I have seen come in for recovery are composed of 5 or 6 layers. One challenge is identifying source of errors while not making things worse. The lowest layer is the individual SSD/HDD drives themselves. These are formed into hardware or software low-level RAIDs, then formed again into a few large RAIDs. These second-level RAIDs are gathered into "pools" or "tiers" of storage for use by the SAN system. The pools are often divided into system, snapshot, and logical volumes by the SAN software. The logical volumes are for user data; system for mapping user blocks to pool storage, and snapshot area for internal system backups if configured. The worst case is a failed rebuild of a live low-level RAID. A rebuild overwrites all the data on that RAID; a failure will corrupt the second-level RAID, which corrupts the pool, which corrupts the system mapping and logical user data. When that happens you don't know what you have, and you don't know where it is. Is your SAN healthy? One possiblity is a failing HDD; trying to read bad sectors causes an access delay, bad data causing a bad inode number? Hope this helps, jimb.
- References:
- Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- From: Christian Horn
- Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- From: Raymond Wan
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- Previous by thread: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- Next by thread: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
- Index(es):