Monday, December 22, 2008

Intel's latest Processor i7 and we are part of it



Do you know that there were Bangladeshi engineers and scientists worked on the fastest processor on the planet developed by Intel Corporation called i7? There were at least 10-15 engineers of Bangladeshi origin worked in various development and production phases of i7. To set the perspective on how big of a task this was, I must say that it took almost 4 years from 2004 to today. Moreover, about 300 to 1200 designers, architect, musk designer, debugger, design validators, test engineers, fault engineers, scientists, path finder and more worked on this project in various stages of product development. Over its life span, there may be five of the thousands of Intel experts that helped design and test the new i7.

 

Paradigm shift:

After completing my Potomac design task I joined Nehalem in late 2004. I recall the time when we were brainstorming on how to bring the new architecture, new features in most power effective, and high performance mix. During that time Intel introduced a Philosophy of "Right hand turn". We just don't go for speed or performance; but as a total experience and we think outside the box of traditional design philosophy.

 

It is not easy for me to write about my work and inside stories of how we designed the i7; because I should be very careful about intellectual property of Intel. Intellectual property is the most important asset of Intel. I have to follow Intel's information security guideline and I will be limiting my scope on the press releases and public documents.

 

I7 Highlights:

As I already said, i7 is the fastest processor in the planet. The Core i7 processor is the first product of Nehalem processor design. This is the most sophisticated processor ever build. It has the technology to boost performance on demand and maximize data throughput. The Core i7 processor speeds video editing, immersive games and other internet and computer activities by up to 40 percent without increasing power consumption.

 

The key features at a glance:

·         Four cores, or processing units, all on one piece of silicon.

·         New Intel® Turbo Boost Technology that matches the computer's energy use to its workload, automatically adjusting the number of cores and the clock speed depending on the application.

·         Double the memory bandwidth of previous processors, thanks to an innovation called Intel® QuickPath Technology.

·         Hyper-Threading technology that allows up to eight computing threads to run simultaneously.

·         Processing speeds up to 3.2 GHz.

·         Energy efficiency: Despite the extra computing muscle, the chips don't use any more power than Intel's previous top-of-the-line processors.

 

 

The Core i7 processor is the first member of the Intel Nehalem microarchitecture family; server and mobile product versions will be in production later. Each Core i7 processor features an 8 MB level 3 cache and three channels of DDR3 1066 memory to deliver the best memory performance of any desktop platform. Intel's top performance processor, the Intel® Core™ i7 Extreme Edition, also removes overspeed protection, allowing Intel's knowledgeable customers or hobbyists to further increase the chip's speed.

 

Shift in traditional design philosophy:

In mid 2005, Intel introduced its Intel® Xeon® processor MP processor-based platform with up to 8 MB of a third-level "L3" cache memory reservoir, codenamed " Potomac ." Potomac was the top-of-the-line 32-bit Intel Xeon processor with Hyper-Threading technology performance. I joined the Potomac design team in 2003 after completing my assignments on VT (Virtualization Technology or Vanderbilt Technology). We introduced a lot of *T (what we called Star-T) in various products of Prescott . There was a lot of innovation going on because of the tough competition and ever hungry market of innovative high performance products.

 

Now going back to my early days of Nehalem, which is the codename of i7. I started my work in instruction queue (IQ). My job is to design circuitry so that the stream of bits can be converted into useful "instruction". These instructions can be processed by the processor's execution unit to deliver what the instruction is asking. Now if you are familiar with IA (Intel Architecture) Instruction set, then you know what an instruction means.

 

Let me back up a little bit and give a vary high level view of the activity here. To learn the most accurate science, please consult any good text book in library near you.

 

Think about the very basic electrical engineering that we all see everyday in our life. A flip in any switch in the wall turns on or turns off a light bulb in our home. Hence a switch works as a "gate" (or door if you will) of electricity flow. When the circuit is complete, than electrical voltage to the opposite nodes use material property of the filament of the bulb and the enclosed atmosphere to convert electrical energy to light and heat energy.

 

The similar thing happens in a fan. However the material is different and use different property. We call it magnetic property or you may call it electric field. The trick of opposing electric field is used to fool the drum of the motor and the result is the rotating motion of the fan. Hence we use electric energy to electro-magnetic energy to rotate the shaft and wings deliver wind. There is a lot of talk now a day to do the reverse; which is to use wind energy from wind tunnels to produce electric energy and store in battery or deliver to the grid.

 

Remember I called light bulb switch as "gate"? "Gate" is the most important basic building block of a microprocessor. Think of it like a "brick", which is being widely used to build multistoried building now a day.

You may think each "Transistor" like a gate. There are 731 Million Transistors in a single core of this i7. You may recall i7 has 4 cores. Look at the attached picture of i7. Can you identify the 4 cores? There is also a huge uncore section in the picture which helps cores and the outside world to communicate.


Now we know that a transistor is like a switch. It can have two positions, on or off.  Let us do coding. My code name "1" is for "on" and my code name "0" is for off. See how easy, we are doing software programming.

 

My design:

The processing speed of i7 is 3.2 Giga Hertz per second. What exactly does it means? The processor can produce 3.2 billion "0" and "1" in a second. Of course this is a mare simplification. That's a lot of 0 and 1.

You know the microprocessor wants everything simple. We proudly call it a dumb machine as it will do the exact same thing over and over. It only understands various combinations of "0" and "1" in a string. Think about a mile long line of 0 and 1 with various combinations. From that mile long string, look at the following section:

000001110101010011100000000101010101010100000101111000001010101010000010110101010101010

 

This is a typical stream of binary data coming to instruction queue. The bold highlighted bits translate into: "Add 32 bit of immediate data to the EAX register". The 00000101 is what we call 1 byte opcode (05) for Addition operation. Hence these highlighted streams of 32 bit data and 8 bit opcode represent a complete instruction.

 

While I was designing, my job is to find those instructions in this type of stream of 0's and 1's and place them in a specific area. Hola! You have a queue of instruction ready for the execution unit to execute. My design is such that, I can place six instructions in one clock cycle, enough to keep the execution pipe busy to complete 4 instructions in one clock cycle. This is parallel processing in action. Even one thread of a core can simultaneously compute 4 instructions with in one clock cycle span of time.

 

What is one clock cycle in 3.2 GHz i7? It is 3.2 Billionth of a second. Hence, one core at any second may complete more than 10 Billion of IA-32 instructions. With 4 cores, possibilities are endless. Moreover with hyper threading in action, you may have as much as 8 core equivalent processing power. Yesterday's mainframe computer become today's desktop with i7. Did you realize that we are talking about circuit and logic design of a microprocessor?  See how easy it is.

 

Hardware and Software Synergy:

Let me take a stab on how an application software work's with operating system and get the job done from hardware. Although I do not claim myself to be an expert on software or operation system, here is what I think the synergy among them. Let us go from the lowest level of operation to the user level operation. We just learn that how a lowest level instruction worked in a core. With the help of combinatory logic and sequential control by the clock the execution unit can do all the operation and instruction of IA-32.

 

Let us try to put things into perspective. This 8 bit opcode 00000101 that we learned today is asking the machine to perform an addition of adjacent 32 bit of data with the data of the register file called EAX. You may think of a register file EAX, small memory storage very close to the execution unit. So we are adding two data. The output of this operation will be stored in the same register file EAX erasing old data after the operation is done. In everyday use, this result can be a fraction of digitized sound bits, or a pixel of an image or an intermediate stage of a mathematical computation. Then there should be another opcode to take the result to the proper destination or IO. Hold you though right there.

 

If you do not know how to read and understand English, than everything I wrote so far is gibberish to you. Similarly if the application software does not understand all the 0's and 1's that the code calculated and delivered, than there is no use. So they need translation in an understandable language.

 

What happen when you hit a key on your keyboard:

You hit a key in your keyboard of your i7 desktop. The driver program of the keyboard translates that key stroke to a form of instruction codes. Keyboard is considered to be an IO (Input Output device). There is a sequence of tasks follow through. The driver program may be written in C++ programming language. In the high level C++ code there may be lot of really easy understandable code. These codes are library modules. Those library modules may be in fact written in IA-32 opcode. Now the compiler takes those high level code of the keyboard driver and translate into machine level code of 0's and 1's. With the help of the operating system recourses and the motherboard, these code streams goes through the bus and enter the processor. The processor generates the output and returns it back to another IO, which is your computer monitor display. Now you see the letter you typed.

 

How music or video play in Computer:

The same thing happens when you play music in your desktop. The software application reads the code of the song with the help of OS. Send those stream of opcodes, compiler compile them, processor continuously load next stream of digitized bits to process and send it to another IO, in this case the speakers. Now think about the digital photo or movie. Processor sends the processed output to display. The demand for processing capacity dramatically increasing from key stroke, to song and movie. The next toughest job for processor is interactive extreme gaming. Processor has to do zillions of IO operation with multiple of driver, sound, video, console and mathematical calculation. Taking inputs from multiple IO and process it instantly to deliver to multiple IO requires a lot of processing power. The synergy of execution should be smooth enough to deliver ultimate user experiences. Do you want a processor that delivers just that?

 

i7 offers up 40% faster performance for immersive 3D games, video processing, music and other demanding applications. Through a sophisticated on-die power control unit and using new "power gate" transistors based on Intel's advanced 45 nanometer, high-k metal gate manufacturing process, Turbo Boost automatically adjusts the clock speed of one or more of the four individual processing cores for single- and multi-threaded applications to boost performance, without increasing power consumption.

 

Now you might wonder what Intel will do next? Is there really anything more to it?  Wait for the next product.

 

Disclaimer: Opinion expressed is solely the author's. Author does not speak for or represent Intel Corporation.

 

About the Author:

 

Sohel Abdullah is A BUET graduate and PURDUE University alamni believes technology has the potential to change life standard of every human being if it implemented right and conducted based on a transparent policy.

 

More information and essays:

http://abuabdullah-sohel.blogspot.com/

 

0 comments:

Post a Comment