This memory manager scales perfectly on multi core and multi cpu systems. I made this MM because FastMM does not scale at all!
ScaleMM scales perfectly on a quad core sytem (horizontal line from 1 till 4 threads, x-axis, slowly increases till 16 threads) but execution time (y-axis) of FastMM/D2010 doubles when 2 threads are used (and quadruples if 4 threads are used! etc).
I made 2 versions: the first version worked on top of FastMM (or other MM) and only handled small memory (<>1Mb). It take some time however to make this second version, because the medium memory handling was more difficult to make (split one block in dynamic pieces of different sizes).
The speed of ScaleMM versus FastMM is difficult to say: my own benchmark with mixed sizes and operations shows ScaleMM2 is 3x faster than FastMM/D2010, but in other benchmarks (allocs of same size, first alloc all then free all) ScaleMM2 turns out to be 50% slower than FastMM/D2010.
I hope I can find some time to do some more investigation (why it is much faster in one and slower in another test). Also an ASM optimized version would help of course :-).
The quality of ScaleMM2 seems good: I made some simple unit tests, and also ran the FastCode MM Benchmark (good and extensive stress test!). I made a lot of internal "CheckMem" procedures which checks for corruption etc. But I haven't done a real life production test yet...
I have many ideas to improve and extend ScaleMM: detailed logging and statistics/usage (overhead, amount cached and over allocated, etc), "GC" thread (delayed release of freed memory, in case same memory is needed in e.g. the next for loop, background handling of interthread memory, etc), "Object Garbage Collection", etc.
However: my spare time is low! So if anyone can help me (or any company to hire/pay me :-) )...