Monday, April 25, 2016

Spreading the... love?

I originally started this blog as a place to explore and discuss topics that interest me and projects I'm working on but aside from the scarce nature of posts, I haven't upheld that mission very well. In the beginning I was reticent to discuss my dissertation project for fear of someone else beating me to the punch but at this point I think I've narrowed it enough that I'll be unique even if someone were to find the general outlines.

Speaking of spreading something

So what am I really going to waste your time with? While at least half a dozen smart-ass remarks instantly spring to mind, I'll leave those remarks as an exercise for the readers and mention that much of my research time the past three years has been spent teaching myself the Python language and distributed computing, with particular emphasis on Big Data analytics.

But why distributed?

A few months ago on a rare occasion when I happened to come up for air I noticed that processor speed (X.XX GHz) wasn't really mentioned much in advertising any more but every ad talked about how many cores the computer has. This seemed curious to me so armed with a two year degree in electronics, I set about researching this shift.

In reasonably short order I found a research paper that explained the shift. As with so many things in tech, it all boils down to physics - specifically, the physics of transistors.

Review of computers and electronics

First, recall that three of the most important computer subsystems -- CPU, RAM, and ALU (Arithmetic Logic Unit) -- are composed in large part of transistors and transistors have been getting smaller on a fairly regular basis (Moore's Law).

Now if we look at a transistor (right) we see four areas of interest:
  • G - Gate
  • S - Source
  • D - Drain
  • B - Body
Transistors are made through a process similar to photography: areas of the silicon are masked and unmasked areas are treated in some way, then the mask is washed off and a different layer is masked.

On our transistor to the right, imagine that the S and D areas have been bombarded with negative ions making a negatively charged region. Then the area under the gate is bombarded with positive ions and an insulator is placed on top. This creates an NPN field effect transistor (FET). Metal wires are connected to the S, D, and G areas. When a positive voltage is applied to the gate, positive charge in the area under the gate is repelled creating a negative channel and current flows from S to D.

The problem: as transistors get smaller all of those regions get smaller. Remember that the gate is the actual "switch" part of the switch and it has an insulator under it to help control the voltage applied to it. But if that insulator gets too thin then some voltage will leak through even when it isn't wanted. Thus the transistor can not be definitively "on" or "off." That's where we are right now. We do not have the ability to make transistors any smaller and still control the gate leakage.

For this reason, we've stopped trying to stuff more transistors into a smaller space and started to focus on ways to do more things at the same time. Thus multi-core CPUs.

Divide and Conquer

Computer science has acknowledged the power of dividing a problem into smaller pieces and then solving the small pieces simultaneously. This is the driving force behind two bedrock algorithms: Binary Search and Quicksort.

Now, let's imagine you have a task going to the CPU that can be broken up into pieces - and a surprisingly large number of tasks can be -  it might look something like below:
Perhaps each grey rectangle is a part of a list of numbers and the task is to see if a specific number exists in the whole list. It is reasonably easy to see that searching 1/4 of the list takes less time than the whole list, and doing all four of those quarters simultaneously is again an optimization. Thus the idea of multiprocessing with multi-core CPUs.

Next time we'll talk about Hadoop and MapReduce.

No comments:

Post a Comment