I will provide test files in binary form (here). Your output should be a binary file.
The fan-out k should be given as a command-line argument. In class we discussed that k should be chosen on the order of M/B. Experiment with different values of k and try to optimize k. To decide the size of a run, you will need to know the size of main memory. The simplest way is to let the user provide this information as a command line argument.
iosort -i filename -o filename -k value -m value
Try to optimize efficiency as much as you can both with respect to IO and CPU.
If you need to use temporary files, place them in /tmp/scratch. This is a hard disk that's local to the machine, and thus not on NFS. The location of the scratch space may differ slightly from one machine to the other, so make it a paramater on the command line.
iosort -i filename -o filename -k value -m value -s scratchlocation
Run your experiments with 256MB of memory and datasets of various sizes. Focus on data sizes that show the IO-efficiency of your sort (that is, don't run lots of experiments for small datasets that fit in memory).
For none of the experiments it is clear exactly how long the experiments will take. There is no need to let an algorithm run for more than a day to verify that it takes a long time. Write scripts. Let experiments run overnight while you work on other things. Keep me updated on you progress, so that we can adapt the schedule if necessary.
Add a timer to your code to measure the total running time. Include the time to read the input file and to write the output file.
Here is the code for a heap, which you may need to use (if you do, you'll need to adapt it to your problem): pqueue