Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] distributed file systems



On 2010-02-15 17:48 +0900 (Mon), Sach Jobb wrote:

> We have considered a solution of splitting the input/update/delete
> processes from the read process on the software level. So that the
> local client would read everything from a local "read only" server,
> but anything that required an upload or delete would directly access
> our data center in Tokyo. In fact, that's sort of the fall back plan.
> 
> The real issue here is performance. It's just too slow from the UK
> (NYC seems to be a bit better, but still very slow). If it's too slow
> we are justifiably afraid that people will grow frustrated and won't
> use it. Before anyone asks: yes the site is heavy and that's because
> it's packed with media and there is a little bit we can do to improve
> it but fundamentally the business requirements demand that the page
> remain fairly large.

Ah, I see! If you're looking for a pleasing user experience, you
definitely want to stay away from anything too fancy. Sophisticated
synchronization can be a real performance killer, and once it hits you
(if it does), you may have a hell of a time trying to dig yourself out
of it.

Trying to get large files from Tokyo to London in under ten seconds on
a consistent basis is probably not something you want to start chasing.
I think what you really want to do is to try to get *most* things
happening quickly; most users can live fairly happily with something
being a bit slow now and then if that's rare.

For the large files that change rarely or never, Edmund had the right
idea: serve them all from one central location, but use caching proxes
at all of the remote locations. Make sure you set your expiry time
appropriately. Preheat the cache by feeding information about new files
to a program at each remote site that will request them through the
cache. Unless you've got a huge number of updates and everybody's only
looking at the very newest stuff, your speed problems (at least for
those files) will probably be more to do with local networks and the web
browsers' rendering speed than anything else.

As far as putting data into the system, centralize it in the sense that
everything going out to the remote locations passes through the central
location. That doesn't mean, however, that you can push some of the
user interface out to the remote sites. If the submission process is
a multi-step one, perhaps it can do most of the work on a remote web
server, and then that remote server can submit a final consolidated and
checked request to the central location. Possibly it can cache some
of the database data that has a master copy at the central location.
(That's a trade off you have to decide: accuracy versus speed of
response. You can't have both.) Doing some research on consistency
models, particularly eventual consistency (which is what Amazon uses for
its shopping carts), will be useful here.

> This is literally less than 50 files. We needed some way for some of
> the content to be changed that did not involve a database for a couple
> of customers in some rare circumstances and this horrible hack is the
> result of that.

Yeah, then split these off from the other files and handle them in a
different way.

cjs
-- 
Curt Sampson         <cjs@example.com>         +81 90 7737 2974
             http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links