CPU vs GPU
How are they different?
In this blog post, although the discussion will revolve around the differences between CPU and GPU, however we should know that CPU can never be fully replaced by a GPU.
A GPU complements CPU architecture by allowing repetitive calculations within an application to be run in parallel while the main program continues to run on the CPU.
CPU architecture:
For the sake of explanation, let’s take one CPU core into consideration.
What we notice is that, we have layer-1 cache for data and instruction which is dedicated per core.
We also have a dedicated layer-2 cache . Depending on the make & model, it can be up to 2 mb in size which is actually a good size for the cache.
Last layer cache is the layer-3 cache which is shared among all the core within the CPU package.
The process of fetching the data starts from layer-1 cache. If the data is not present in layer-1, it will start looking into layer-2 cache. If the data is not present in layer-2, it will then start looking into layer-3 cache. In case the data is not present in any cache layer, it will then talk to the memory controller which is connected to the actual global memory - DRAM.
So, CPU architecture is about quick memory access and performing calculation is a sequential way.
However, when we are talking about ML /AI where computation need to be performed on much larger datasets, we need throughput and parallelism. Taking this into account, CPU alone is not the perfect choice to achieve that.
Another important thing to be noted here is that the process is scheduled per core in case of CPU.
GPU architecture:
When we take a look at the GPU architecture we find that the GPU architecture is based on many processing clusters. Within these processing clusters, we have streaming multiprocessors and that is where the cores reside.
The important thing to notice here is that a GPU consists of multiple processing clusters. A common GPU consists of 30-40 processing clusters. And then, a single processing cluster can have multiple streaming processing clusters
When the GPU performs computation on a large dataset, it is scheduled on the Processing Cluster and not the Core. It will then go ahead and schedule it on multiple Streaming Multiprocessors. Since we have multiple cores in a single Streaming Multiprocessor, now we have lot of cores doing computation which make good throughput possible.
This architecture makes it possible to achieve parallelism and thereby, provides amazingly good throughput.
No comments:
Post a Comment