Rethinking multicore

July 10, 2010

In the recent months always comes to my mind how we can get over this multicore crisis that we are currently falling on. So I started with some consideration that I think it’s important for any future multicore processors design.

  1. Inherently sequential code can be improved by only small factor. Thus no need to put too much effort to make it 100% parallel, any small improvement in running sequential code is a good achievement.
  2. If a good parallel programming model is provided, software developers will eventually start writing more parallel code. This again minimize the need to make the sequential code run in parallel.
  3. Any future designs should address both fine-grain and coarse-grain parallelism. Addressing both types of parallelism will give the flexibility for the programmer to exploit the maximum parallelism on the application. But it would be better not to let the programmer worry about what type of parallelism, compilers, operating systems, and hardware should take care of it.
  4. Von Neumann architecture  use only one memory port for both data and code, while in Harvard Architecture it uses a port for data and another port for code. While current modern architectures, even those who started as pure Von Neumman,  are using mixed of both architectures. For example Intel L1 cache is divided to both data and code cache, very similar to Harvard Architecture,  while there is only one main memory for both data and code, like Von Neumann. But having 1000s of cores on the same chip will require more memory ports. By more memory ports I don’t using Harvard architecture or building systems similar to NUMAs. We need to figure out a smarter memory schemes that fulfill the needs for 1000s taking in mind that the advancement rate on the memory speed is much less than CPUes.
  5. With all this cores per chip and the amount of the available parallelism, concurrency is going to be a major concern and efficient standardized methods is needed to handle this issue. Handling concurrency should be embedded deep on the stack from the hardware to the operating system to the programming language so the end programmer shouldn’t worry that much about it. Of course it’s not going to be totally free for the programmer to use concurrency, but at least should be very easy, pain free, and efficient.
  6. Another thing that I already talked about in my previous post. We need to rethink about the contract between the different layers of the stack that we built so far. Thinking again is required on each layer as well as on the  contract between the layers.

Leave a Reply