Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] distributed file systems



On 2010-02-15 15:26 +0900 (Mon), Sach Jobb wrote:

> Indeed technically mogilefs is not a file system in the sense that we
> are used to. However, the general term for this sort of thing, so far
> as I can tell, seems to be "distributed file systems." Even mogilefs
> refers to itself as a "distributed file system." In fact I don't
> actually care much what it's called. I am just trying to use the same
> name that everyone else does.

No problem. Just keep in mind that there are two distinct uses of "file
system;" the general one you use above, and one that refers specifically
to things you access via names in the directory tree of your local
system. Most naïve users will use it in the second sense, and so when
talking to people who don't know you, you need to make it clear right
off that you're using it in the first sense.

So, if I understand you correctly, you have a central server with a
database and a growing set of files, some of which change from time to
time. The database is always updated through a web interface running
on that server. Is it the same case for the files? That is, having
read-only replicas of the files will work for you? Or do people need to
be able to edit the files elsewhere and send them to you somehow? How do
people add and edit files?

And why do you need replicas at all? Why is it not possible for
everybody just to make their changes on that one server?

> The updates have to be replicated to the other servers fairly quickly.
> I.E. I don't want to use cron or something triggering a script.

So delays of, say, five minutes are too long. How many seconds delay can
you handle? And is it practical to ask users to run an update command to
get the latest changes before accessing the replica of the files?

What is the latency and bandwidth between the replicas and the master
server? What is the typical size of an update (in bytes), and how
frequently do they happen?

> That is interesting. The code itself is managed that way as part of a
> deployment process (we'll just add the remote servers into the same
> process).

Sorry, I thought you had only one server, but this implies you'll have
multiple servers running your application. Can you clarify this?

> It seems a bit strange to me to use version control for
> something that doesn't have any versioning.

You yourself said that you change text files. So you've got versioning
right there. You also have versioning even if you just add files; you
have version of the collection before a certain file was added, and a
different version after that file was added.

> ...but wouldn't it be sexier if I just new that every write was being
> written to the other servers, and the was something in place tracking
> this against the nodes?

Again, I'm confused about this multiple servers thing. Can you describe
again where data enters the system and how?

At any rate, there are lots of sexy things you can do, but if you're
just trying to solve a business problem, rather than be cool, you
probably don't want to do them. After all, if your 5 km. commute to work
on your bicycle takes too long, you could buy a fast sports car, but if
you can change to a different, 1 km. route instead, that's going to be a
lot more cost-effective.

The general rule in distributed systems is, if you want it to work well,
adapt your application to the system rather than your system to the
application.

cjs
-- 
Curt Sampson         <cjs@example.com>         +81 90 7737 2974
             http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links