Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] MobileFS: Good or Bad?

[From the tlug-admin list.]

On 2008-09-09 01:37 +0900 (Tue), zev wrote:

> On Sep 8, 2008, at 11:54 PM, Curt Sampson wrote:
>> Hm. I may have some comments relating to this, given both our recent
>> experience with MogileFS on a web site with a couple of terrabytes of
>> data to serve,
>> ...
>> Assuming I swing by, of course. I've got a friend coming into town this
>> weekend, and also a whole boatload of work to try to get done before I
>> head out to the ICFP conference next week.
> So can we get a quick executive summary?
> MogileFS good or bad ;-)

For those who don't know about it, Zev is asking about this:

Good news: the storage management scheme itself seems very well designed.
Bad news:  They didn't bother with security; better set up a VPN.
Good news: Nodes can serve files using any HTTP server. (We use lighttpd.)
Bad news:  The engine that copies stuff between nodes uses only WebDAV,
	   with their special WebDAV server.
Good news: It's all open source, and written in a scripting language.
Bad news:  That language is perl, and the code is not entirely modular.
Good news: It automatically distributes and replicates the files across nodes.
Bad news:  It stores the information about all this in a database in one
           of the usual DBMSes (MySQL, PostgreSQL, ...), and you're
           responsible for replicating that. You lose access to that DB,
           you lose access to the cluster. You lose the data in the DB,
           you lose all the data in the cluster.

This last point, unfortuately, is MogileFS's great failing. Not only do
they depend on the DBMS to store data about where things are, but they
depend on it for ensuring uniqueness of keys, as well. This makes it
very hard to distribute, hard enough that if I really needed to make
this bulletproof, I'd probably write a replacement for MogileFS rather
than try to deal with the inevitable kludges when trying to distribute
an DBMS that was not designed to be distributed in the first place.

As for performance, we've not rolled the thing out into full production
yet, so we'll have to see. But I can't see any reason it would be
different from any other large lighttpd installation, including the one
that it's replacing.

Incidently, I found it interesting that, now that most 1U servers come
with a pair of gigbit Ethernet interfaces, the main bottleneck when
you're serving a lot of different static content (enough that your
buffer cache is not terribly useful) turns out to be disk I/O. A single
modern disk has no hope of saturating a gigE interface, and even a pair
of them may not, if you've got a heavy seek load.

Curt Sampson       <>        +81 90 7737 2974   
Mobile sites and software consulting:

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links