Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Help with fsck and ocfs2 (or even ext4?)...



Hi Christian,

Sorry for the late reply, but I had a busy week last week.  I was also
busy thinking of my options to help resolve the mess I'm in...


On 20/9/2021 6:12 am, Christian Horn wrote:
> On Mon, Sep 20, 2021 at 01:21:17AM +0800, Raymond Wan wrote:
>> [..]
>> Actually, the vendor of the SAN performed the initial installation (I won't
>> say who the vendor was, but let's say their name rhymes with "Dell" :-P ).
>> And they used ext4. Since they're the experts, I didn't question it.  Within
>> minutes of using it on our cluster, files started mysteriously disappearing.
>> It was quite frustrating.
> ext4 is fine - as long as you ensure that at any time just one of the
> systems who can "see" that blockdevice is actually mounting the device.
> Mounting and writing to the blockdevice from multiple systems is asking
> for havoc, each system "thinks" it has exclusive access.
...
> SAN means here that just blockdevices are handed out, if multiple systems
> need access, they need to coordinate, that is done with ocfs2 or gfs2.
> With NFS, again just one system is accessing the blockdevice, and is then
> doing locks/coordinating as part of NFS.


I see.  I think I grossly underestimated the difficulty in setting up the SAN.

So, would this work, then?  What if I picked a server on my cluster of
5 servers.  Let's say node1.  I format the SAN as an ext4 and the
node1 mounts this SAN.  But then, it exports it to the other servers
in its /etc/exports file.  And the other nodes mounts it as if it
belonged to node1 using NFS.  Does this effectively mean that node1
becomes this SAN's "manager"?

I think I would like to avoid either OCFS2 or GFS2, if I can.  I'm
sure it would be the best solution (somehow...).  Perhaps it would
distribute the workload of locking equally across all servers instead
of giving the burden all to node1.  But it seems difficult [to me] and
more trouble than it's worth [so far].


> If you want to "replicate" that havoc of ext4 from multiple simultaneous
> systems, or gfs2/ocfs, I recommend this:
> - a linux system acting as hypervisor
> - multiple KVM guests
> - the hypervisor sharing one or multiple iSCSI devices
> - the guests accessing these - they are the shared block devices.
>
> NFS would be easier to operate, but when the one NFS server is not clus-
> tered and goes down, the whole storage is unavailable.


I see.  I was looking through the GFS2 documentation recently and it
seems more detailed than the OCFS2 documentation.  At the very least,
it seems if there are n nodes connecting to the block device, the disk
keeps n journals.  If a node goes down, another node can apply its
journal to bring the system to a consistent state again.  I can see
how this feature would be advantageous.


> a) NFS server
>    + you access the 70TB with one system, run ext4 or XFS
>    + you have no cluster infrastructure, easy to maintain
>    - if that NFS service is down, your clients can not access
>      the data.  But in that case, you could just manually
>      make one of the other systems the NFS server (if they
>      still all have access to the block device)
>
> b) NFS server, clustered: 2 systems, cluster like pacemaker,
>    virtual IP, ext4/XFS on the 70TB, if one server goes down
>    the other mounts the 70TB and offers the service
>    + you still just need the 70TB storage
>    - but a second system, and need to administer a cluster
>
> c) Ceph: all of the systems have a bit local storage, i.e.
>    20TB.  The systems then "work" together with ceph, and
>    together present a 70TB volume.  Your systems/storage
>    should be enough so you have all data twice, so one
>    system can go down without the 70TB data becoming un-
>    available.
>    - needs kind of "coordinating" infra again, ceph here
>    - and each system accessing a bit of storage, instead
>      of one big chunk


Thank you for this list!  I really appreciate how you laid it out for
me in such a clear way.  I actually saw both e-mails before I started
writing my reply.  Guess I wanted to make sure I understood what was
being said.

In essence, what I said at the beginning of this e-mail is option (a)
exactly, right?  Perhaps I should try that.

If looking after this SAN was my main job, perhaps I'd want to do it
"right".  But it's a small part of my job, unfortunately.  And I
probably want to get it done and over with.  When I have more time
(and if they can give me 70 TB of space to do backups...), perhaps I
can try to do it the "right way".

Thank you for all your help!  Really lost here...would have liked to
be looking after this SAN with a team so that we can discuss options,
but it's just me.

Oh, on an unrelated note, in my Serverfault post, someone replied with
this link to a PDF which gave a good GFS2 vs OCFS2 comparison (IMHO):
http://www.linux-kongress.org/2010/slides/gfs2ocfs2-seidel.pdf .  Not
sure if this is of interest to anyone.  Anyway, thank you to the rest
of TLUG for tolerating this thread!  It's been such an uphill battle
with this SAN...

Ray


Home | Main Index | Thread Index