TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...

Date: Mon, 20 Sep 2021 01:21:17 +0800

From: Raymond Wan <rwan.kyoto@example.com>

Subject: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...

References: <CAAhy3dsGd-05-PZsO1_rOzLmmkFJJwwnKZt5Bc96+kNrG4hhxA@mail.gmail.com> <YUbpc+W897zMrOgz@fluxcoil.net>

User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0
Hi Christian,
Thanks a lot for this! I've been struggling with thisproblem for a few days but OCFS2 in general for a lot longerand have had very little luck hearing what more experiencedpeople have to say...
I really appreciate your time in your response as I havevery little experience with SANs...learning as I go...
On 19/9/2021 3:40 pm, Christian Horn wrote:
On Sat, Sep 18, 2021 at 10:24:50PM +0800, Raymond Wan wrote:
The file system is a Oracle Cluster Filesystem (ocfs2) on a SAN.
[recalling]
- with ocfs2, there is basically one or more blockdevices, which
   multple systems are accessing
- so the data is in one place, and the accessing systems are talking
   via network, so that they can lock files for writing
Based on my limited understanding, that is correct.
When I mount it, it now mounts as read-write but once I enter the
direction in question, the file system switches to read-only.  There
isn't a whole lot of information for ocfs2.  I tried to register
myself on the ocfs2 mailing list and it seems they're not approving my
application.  I also tried ServerFault with no luck.
I think oracle provides support for ocfs2, so if these servers are
under a support contract, I would contact them.
Unfortunately, we don't have a support contract with OCFS2.We're running Ubuntu and I'm using it via an officialpackage that's distributed with Ubuntu 20.04 .
I'm not sure if a support contract is still possible. Butif my employers were willing to purchase that, they shouldhave bought double the disk space so that I could do aproper backup... :-P
Anyway, any suggestions would be appreciated!  I am considering
copying the files out and reformatting as a last resort, but at 70 TB,
I would rather not...  :-(
- At least GFS2 has also a mount mode which basically disables the network
   locking for a moment and says "believe me, you are the only node with
   access", that might work better
- if the block device was smaller, it would be best practise to make a
   copy to a file or other block device, and then just then try repairs.
   If that 70TB buffer does not exist, taking snapshots (dmsetup snapshot,
   or on the SAN storage itself) could also provide a safety net for
   repair attempts
- But I suspect you will end up to copy off from there what you can
   still read, and restore the rest from a backup
I see... I was hoping that there was some way to "repair"the inode information. Actually, the area on the disk thatis causing problems for me can be erased...if I could justdo that, that would be great!
All those times I ran (or auto-ran) fsck on ext4 partitionswhich saved my data due to the journal, etc., and I guessit's about time that my luck ran out. I guess I'm just abit disappointed that there's no manual way to fix thisproblem, even when fsck.ocfs2 tells me exactly where theproblems are.
I've been spending the last couple of days scrounging up 70TB of space to copy the files that are fine. Guess I willend up formatting this drive and restoring from this backup.
Thank you for your comments! I thought what I was doing was"dumb"...I see now that it might be my only option.
The data itself isn't "raw" data, but processed data thatrepresents about 9 months of processing time...I can't loseit or else I'm doomed... Wait -- is this mailing listpublic? Oops... :-P
On an unrelated note, does anyone have an opinion about GFS2 and if
it's better than ocfs2?
GFS2 has the same operation mode, so it's also "coordinating multiple
systems accessing a single block device".  We offer support for GFS2,
but honestly, in most cases I would like to hear the exact usecase and
based on that decide whether gfs2 is really needed.  In many cases,
keeping data on an XFS or ext4, and sharing via NFS, would provide
better availability and be easier to administer.
Ideally, of one or multiple nodes accessing an ocfs2 or gfs2 volume
go down, the others stay operating.  If a single/unclustered NFS
server goes down, that whole service is down.  But the easier
administration might effectively make up for that.
My limited understanding was that since this is a SAN, myonly option was to use OCFS2 or GFS2.
Actually, the vendor of the SAN performed the initialinstallation (I won't say who the vendor was, but let's saytheir name rhymes with "Dell" :-P ). And they used ext4.Since they're the experts, I didn't question it. Withinminutes of using it on our cluster, files startedmysteriously disappearing. It was quite frustrating.
I asked on ServerFault and a couple of people clarified tome that ext4 wouldn't work. I still don't understand it...Ithought a SAN could look after the disk the same way aserver looks after an ext4 disk that is NFS exported...
I guess it's because it's a block device? The decision touse OCFS2 over GFS2 was a 50/50 decision. Now that I've hadto remove all of the data to format the drive, I'm thinkingmaybe I should give GFS2 a try...
Neither seem to have a lot of help pages on the Internet. Ipresume it's because the number of users is few...
Perhaps I will give GFS2 a try. I just hope it's better andnot worse... We have many file systems on the servers thatare ext4 and NFS mounted. So far...none of them have givenme any problems the last few years. But this SAN is makingme want to scream every couple of months... *sigh*
Thanks again for your reply!

Ray
Follow-Ups:

Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
From: Christian Horn

Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
From: Jim Blackson

References:

[tlug] Help with fsck and ocfs2 (or even ext4?)...
From: Raymond Wan

Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...
From: Christian Horn

Prev by Date: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...

Next by Date: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...

Previous by thread: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...

Next by thread: Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...

Index(es):

Date

Thread

Home | Main Index | Thread Index