In coarse level parallelisation, how to handle output?

Discussion:

(too old to reply)

Mike

2007-01-08 18:45:52 UTC

Hi all,

If my executable non-parallel program needs to write to a datafile as an
output, what happens if I run 100 independent copy of these files, using MPI
as dispatcher? The simplest way is to change my executable program to write
to different output datafile, say data001 all the way down to data100.

Then I have to manually post-process all the datafiles myself.

Is there a way they can write into one file, by appending? Maybe there will
be conflicts, I guess...

Logan Shaw

2007-01-09 05:48:52 UTC

Permalink

Post by Mike
Hi all,
If my executable non-parallel program needs to write to a datafile as an
output, what happens if I run 100 independent copy of these files, using MPI
as dispatcher? The simplest way is to change my executable program to write
to different output datafile, say data001 all the way down to data100.
Then I have to manually post-process all the datafiles myself.
Is there a way they can write into one file, by appending?

Yes, but that's a lot more difficult than just writing into separate
files. You'll have to make sure that your updates to the file don't
stomp on each other. To put it in concrete terms, if two processes
each have, say, 5000 bytes to write to one file, and if they both do
it at once, there's a very good chance that one of them will write
only part of the 5000 bytes before the other writes some. You will
end up with one process's output arbitrarily interleaved with others'.
It's just not a good way to go.

How to combine separate outputs together is a question that depends
on the type of outputs, though. For instance, if all 100 processes
are trying to find members of a set, just output the members of a
set that were found into each process's output file, then when all
are done, concatenate them together and (if duplicates are possible)
eliminate duplicates. That's pretty trivial. For example, if you
had 100 processes trying to find all the prime numbers between
1 and 1,000,000, you might have one process handling the range
1 to 10,000, another process handling the range 10,001 to 20,000,
and so on. So that would be really easy to merge together in the
end.

Of course, with other types of output, things won't be as easy,
or maybe they will. It all depends on the type of output you are
going to have.

For what it's worth, in Unix it is trivial to concatenate 100 files
together. If you have file00 through file99, you just do this:

cat file* > combined-output

If you want to sort them and eliminate duplicates on a line-by-line
basis, that's also trivially easy in Unix:

sort -u file* > combined-output-without-duplicates

By the way, starting 4 separate threads for this same basic subject
is bordering on excessive...

- Logan

Martin Blume

2007-01-09 17:49:50 UTC

Permalink

"Logan Shaw" schrieb

Post by Logan Shaw
By the way, starting 4 separate threads for this same basic
subject is bordering on excessive...

Hey, it's about parallelisation, no? :-)

SCNR
Martin

Josef Moellers

2007-01-09 08:20:50 UTC

Permalink

This post might be inappropriate. Click to display it.

Mateusz Pabiś

2007-01-21 15:01:35 UTC

Permalink

There is something called MPI-IO or parallel IO in MPI2.0 specs (thanks
to IBM). Do some google about:
- MPI_File_open ( MPI_COMM_WORLD...
- MPI_File_set_view ( ...
- MPI_File_write ( ...

--
Best regards
Mateusz Pabis