Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Making better use of SSDs?



Hi Satoshi,


On Tue, May 29, 2012 at 2:37 PM, Satoshi Nagayasu
<satoshi.nagayasu@example.com> wrote:
> 2012/05/29 0:28, Raymond Wan wrote:
> I built a distributed BLAST system, one of the most popular
> software in bioinformatics for similarity search in genome
> database, for my master thesis in grad school.
> It was really fun! :)
>
> BLAST is a kind of software that has large amount of I/O
> operations, particularly sequential read.
>
> If you're using similar software, I recommend you to use SSD
> with multi-core CPU to execute multiple queries at once.
>
> You will be able to take advantage of SSD in terms of throughput
> (not a single query response), because executing multiple
> sequential reads would (theoretically) act as random read
> within a single spindle (hard drive).


Well, we're not using BLAST, but BLAST-like.  So, your past experience
with BLAST is relevant.  The problem, as Nava pointed out, is the
general sequential nature of the algorithm.  You're going to be doing
something like:

foreach chromosome
  foreach sequence
    BLAST search sequence in the chromosome

And you end up wondering which to put on the SSD.  The sequences would
be my first choice, but the data is nowadays in the terabyte range.
Even if there was an SSD that was this big, you're just going to make
23 passes over it in sequential order, it may not be worth putting it
on the SSD.

While there will be some gains and when 'time is money', even a small
amount is better than nothing, Nava seems to have convinced me :-)
that such algorithms may not justify the use of an SSD.  Especially
given the data sizes nowadays.  I do agree that a multi-core CPU
framework is useful and many open-source software have enabled that.

Ray


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links