Processor performance depends on. type and frequency of the processor bus. Arithmetic logic unit and control unit

Microprocessor CPU(Central Processing Unit) performs all basic calculations and data processing. All PC-compatible computers use processors compatible with Intel architecture x86, but they are produced and designed both by Intel itself and by third-party companies AMD, Cyrix, IDT, Rise Technologies.

standard processor specifications:

architecture

For example, you can use a special copper paint sold in small bottles at any auto parts store to repair the window anti-theft mesh. The real problem is that the pins are very small and if you connect to adjacent rather than opposite pins you can render the chip non-functional. If you're not careful, you can easily damage a processor worth several hundred dollars. With this detection, the system detects and sets the correct voltage by reading certain pins on the processor.

bit depth

type and number of cores

cache memory

processor bus type and frequency

processor speed

architecture

Most modern processors personal computers are generally based on some version of the cyclic sequential processing of information invented by John von Neumann.

Some experimenters have found that by slightly increasing or decreasing the voltage from the standard, higher overclocking speeds can be achieved with a stable system. My recommendation is to be careful when playing with voltage because you can damage the chip this way. Even without voltage change, adjustable acceleration motherboard tires are very simple and quite useful. See Chapter 19 Power Sources for more information. additional information about upgrading power supplies and chassis.

As the CPU core speed increased, the memory speed could not keep up. How could you run the processor faster than the memory you're loading it from without a major performance hit? Simply put, cache memory is a high-speed memory buffer that temporarily stores data needed by the processor, allowing the processor to retrieve that data faster than if it were coming from main memory. But there is one more feature of a cache over a simple buffer, and that is intelligence.

From the point of view of programmers, the architecture of the processor means its ability to execute a certain set of machine codes. Most modern desktop CPUs belong to the x86 family, or Intel-compatible processors of the IA32 architecture (32-bit Intel processor architecture). Its foundation was laid by Intel in the i80386 processor, however, in subsequent generations of processors, it was supplemented and expanded both by Intel itself (new MMX, SSE, SSE2 and SSE3 instruction sets were introduced) and by third-party manufacturers (EMMX, 3DNow! and Extended instruction sets). 3DNow!, developed by AMD). From the point of view of computer hardware developers, the concept of "processor architecture" has a slightly different meaning. From their point of view, the processor architecture reflects the basic principles internal organization specific families of processors capable of executing certain instruction sets: SSE, SSE2, SSE3, 3DNow, Enhanced 3DNow, etc.). Main types of common architectures: CISC, RISC, NetBurst, K7, K8, MultiRISC and their subsequent modifications.

The cache is a buffer with a brain. The buffer contains random data, usually based on first, first, or first, last. The cache, on the other hand, contains data that the processor is likely to need before it is actually needed. This allows the processor to continue running at or near full speed without waiting for data to be received from the slower main memory.

These caches and their functions are described in the following sections. To understand the importance of cache, you need to know the relative speeds of processors and memory. This may seem like a rather dated example, but you'll see in a moment that the numbers here make it easy for me to explain how cache memory works.

bit depth

The number of bits of information that the processor can process in one cycle is characterized by the capacity of internal registers (memory cells inside the processor). The bit depth of registers is understood as the number of flip-flops connected in parallel, of which the registers are composed. The larger this number, the higher the bit depth of each individual register. Modern processors have a bit depth of 32 and 64 bits, less often 128 (mostly server options).

This cache is basically an area of very fast memory built into the processor and is used to store some current working set of code and data. Cache memory can be accessed without wait states because it runs at the same speed as the processor core. The use of cache memory prevents the processor from waiting for code and data from the slower main memory, which improves performance.

The cache is even more important in modern processors because it is often the only memory in the entire system that can actually keep up with the chip. Most modern processors are clock multiplier, which means they run at a speed that is actually a multiple of the motherboard they are plugged into.

type and number of cores

Within the same architecture, different processors can be quite different from each other. And these differences are embodied in a variety of processor cores that have a certain set of strictly determined characteristics. Most often, these differences are embodied in different frequencies of the system bus (FSB), cache sizes in the second level, support for certain new instruction systems, or technological processes, according to which processors are manufactured. Often, changing the core in the same family of processors entails replacing the processor socket, which raises questions about further compatibility of motherboards. However, in the process of improving the kernel, manufacturers have to make minor changes to it that cannot claim to be a "proper name". These changes are called core revisions and are most commonly denoted by alphanumeric combinations. However, in new revisions of the same kernel, quite noticeable innovations can occur. Thus, Intel introduced support for the 64-bit EM64T architecture into individual processors of the Pentium 4 family precisely in the process of changing the revision.

This story involves a person who ate food to act as a processor, querying and working on data from memory. The kitchen, where food is prepared, is the main system memory. Say that you start eating at a certain restaurant every day at the same time. You come, sit down and order a hot dog. To keep this history proportionately accurate, let's say that you typically eat at the rate of one bite every four seconds. The kitchen also takes 60 seconds to produce any item you order.

cache memory

All modern processors have a cache (in English - cache) - an array of ultra-fast random access memory, which is a buffer between the controller of a relatively slow system memory and processor. This buffer stores the blocks of data that the CPU is currently working with, which significantly reduces the number of processor accesses to extremely slow (compared to the speed of the processor) system memory. This significantly increases the overall performance of the processor.

So, when you first arrive, you sit down, order a hot dog, and have to wait 60 seconds for the food to be produced before you can start eating. After the waiter brings the food, you start eating at the usual rate. You finish your hot dog pretty quickly, so you call the waiter and order a hamburger. Again you wait 60 seconds while the hamburger is being produced. When it arrives, you start eating at full speed again. After you finish your hamburger, you order a plate of french fries.

Again you wait and after it is delivered in 60 seconds you will eat it at full speed. Finally, you decide to finish your meal and order a cheesecake for dessert. After another 60 second wait, you can eat the cheesecake at full speed. Your overall eating experience consists mainly of many expectations and then short bursts of actual eating at full speed.

At the same time, in modern processors, the cache is no longer a single memory array, as before, but is divided into several levels. The fastest, but relatively small first-level cache (denoted as L1), with which the processor core works, is most often divided into two halves - the instruction cache and the data cache. The L1 cache interacts with the second-level cache - L2, which, as a rule, is much larger in volume and is mixed, without division into an instruction cache and a data cache. Some desktop processors, for example server processors, also sometimes acquire an L3 cache. The L3 cache is usually even larger, although somewhat slower than L2 (due to the fact that the bus between L2 and L3 is narrower than the bus between L1 and L2), but its speed is, in any case, disproportionately faster than the speed system memory.

So you walked into a restaurant and ordered a hot dog and the waiter immediately put it on your plate without waiting! You then proceed to finish the hot dog and to the right as you are about to request a hamburger the waiter puts one on the plate. The rest of the meal continues in the same way and you eat all the food in bite seconds each time and never have to wait for the kitchen to prepare the food. Your overall experience in eating this time consists of eating without waiting for food to be prepared, primarily due to the intelligence and care of the waiter.

Without a waiter, the space on the table is just a food buffer. When you stock up, you can eat until the buffer is empty, but no one seems to mask it intelligently. The waiter is a cache controller that takes action and adds intelligence to decide which dishes should be placed on the table before you need them. Like a real cache controller, it uses its skills to literally guess what food you'll need next, and if and when it guesses, you won't have to wait.

There are two types of cache: exclusive and non-exclusive cache. In the first case, the information in the caches of all levels is clearly demarcated - each of them contains exclusively original information, while in the case of a non-exclusive cache, information can be duplicated at all levels of caching. Today it is difficult to say which of these two schemes is more correct - both in one and in the other there are both minuses and pluses. The exclusive caching scheme is used in AMD processors, while the non-exclusive caching scheme is used in Intel processors.

Now let's say that on the fourth night you will arrive exactly on time and start with a regular hot dog. The waiter, feeling confident by now, has already prepared a hot dog when you arrive, so no waiting. The waiter guessed, and the corollary is that this time you have to wait the full 60 seconds while the kitchen is cooking your brat.

This is called a cache leak, in which the cache controller incorrectly populated the cache with the data the processor actually needed. However, 10% of the time the cache controller guesses wrong and the data has to be retrieved from the much slower main memory, which means the processor has to wait. By analogy, the processor was 14 times faster than the main memory. The cache is what makes the difference.

processor bus type and frequency

The processor (otherwise - the system) bus, which is most often called the FSB (Front Side Bus), is a collection of signal lines combined according to their purpose (data, addresses, control), which have certain electrical characteristics and information transfer protocols. Thus, the FSB acts as a backbone between the processor (or processors) and all other devices in the computer: memory, video card, hard drive, and so on. Only the CPU is connected directly to the system bus, other devices are connected to it through special controllers, concentrated mainly in the north bridge of the set system logic(chipset) motherboard. Although there may be exceptions - for example, in AMD K8 processors, the memory controller is integrated directly into the processor, thereby providing a much more efficient memory-CPU interface than Intel solutions that remain faithful to the classic canons of the organization external interface processor. The main FSB parameters of some processors are given in Table 1:

If the requested item is there, it will return with it in just 15 seconds. For an analogy to describe these new chips, a waiter would simply place a basket right next to the table you sat at in a restaurant. If the required food item was not on the table or on the first food cart, the waiter could then reach for the second food basket to get the needed item. It causes others interesting moments. Given that main memory is directly used only about 1% of the time, if you doubled the performance there, you would only double the speed of your system 1% of the time!

Table 1

CPU	FSB frequency, MHz	FSB type	theoretical throughput FSB, Mb/s
Intel Pentium III	100/133	AGTL+	800/1066
Intel Pentium 4	100/133/200	QPB	3200/4266/6400
Intel Pentium D	133/200	QPB	4266/6400
Intel Pentium 4EE	200/266	QPB	6400/8533
Intel Core	133/166	QPB	4266/5333
Intel Core 2	200/266	QPB	6400/8533
AMD Athlon	100/133	EV6	1600/2133
AMD Athlon XP	133/166/200	EV6	2133/2666/3200
AMD Sempron	800	HyperTransport	6400
AMD Athlon 64	800/1000	HyperTransport	6400/8000

Intel processors use the QPB (Quad Pumped Bus) system bus, which transfers data four times per clock, while the EV6 system bus of AMD Athlon and Athlon XP processors transfers data twice per clock (Double Data Rate). In the AMD64 architecture used by AMD in the Athlon 64/FX/Opteron processors, a new approach to the organization of the CPU interface is applied - here, instead of the FSB processor bus and for communication with other processors, the following are used: the high-speed HyperTransport serial (packet) bus, built according to the Peer scheme -to-Peer (point-to-peer), providing high speed data exchange.

One issue was the speed of available third-party cache chips. Now you will bite every half second. The real jump in speed comes when you need something that isn't already on the table, in which case the waiter comes up to the cart and nine out of ten times they can find the food you want in just over a quarter of a second. If the restaurant's productivity increased at the same speed as CPU performance! You know that the cache stores copies of data from different addresses in main memory.

Since the cache cannot store copies of data from all addresses in main memory at the same time, there must be a way to know which addresses are currently being copied to the cache, so that if we need data from those addresses, it can be read from the cache and not from the main memory. Each cache line has a corresponding address tag that stores the main memory address of the data currently being copied to that particular cache line. If data is needed from a particular main memory address, the cache controller can quickly look up address tags to see if the this moment the requested address is in the cache or not.

processor speed

Processor performance is characterized by its clock frequency, usually measured in megahertz (MHz). It is determined by the parameters of a quartz resonator, which is a quartz crystal enclosed in a small tin container. Under the influence of an electrical voltage in a quartz crystal, oscillations occur electric current, with a frequency determined by the shape and size of the crystal. The frequency of this alternating current and is called clock frequency. Microcircuits conventional computer operate at a frequency of several million hertz. (hertz is one cycle per second). Speed is measured in megahertz, i.е. in millions of cycles per second. The smallest unit of time (quantum) for a processor is the clock period, or simply a tick. Each operation takes at least one cycle. For example, data exchange with memory Pentium processor II completes in three cycles plus a few wait cycles.

If there is data, it can be read from a faster cache; if not specified, it must be read from the much slower main memory. Various ways tag organizations or mappings affect how the cache works. The cache can be mapped as fully associative, direct, or customized associative.

If the requested main memory address is in a tag, the corresponding cache location is returned. If the requested address is not found in the address tag entries, an error occurs and the data must be obtained from the main memory address instead of the cache.

It should also be noted that all modern processors They support the very important superscalar technology, which allows parallel execution of instruction flows independent of each other, and also, if possible, change the order of their execution to improve performance.

Historically, the desktop processor market has been dominated by two companies

In direct mapped cache certain addresses main memory is pre-assigned to specific locations in the cache where they will be stored. This also leads to more fast work, since only one tag address needs to be checked for a given memory address.

The associative cache set is a modified direct-mapped cache. A direct-mapped cache has only one set of memory associations, which means that a given memory address can only be mapped to a specific, given cache line location. A bidirectional cache associative set has two sets, so that a given memory location can be in one of two locations. A four position associative cache can store a given memory address at four different points in a cache line.

Both companies began mass production of dual-core processors starting in 2005. By this time, classic single-core CPUs had almost completely exhausted the reserves of performance growth due to an increase in the operating frequency. The stumbling block was not only too high heat dissipation of processors operating at high frequencies, but also problems with their stability. So an extensive way of developing processors for the coming years was ordered, and their manufacturers, willy-nilly, had to master a new, intensive way to increase product performance. There are currently two leading desktop processor architectures: Intel Core, AMD64(K8).

By increasing the associativity set, the probability of finding a value increases; however, this takes a little longer because more tag addresses need to be checked when looking up a specific location in the cache. As the number of subframes or sets increases, the cache becomes fully associative, a situation in which any memory address can be stored anywhere in a cache line. In general, a direct-mapped cache is the fastest at finding and retrieving data from the cache because it only has to look at one specific tag address for a given memory address.

Before arguing which processor is better, you first need to consider in detail the architecture of the processor. Those. how it works and how it affects its performance.

This will come in handy when buying a new processor, as it will help you decide on the answers to such questions: “Why do you need a PC? Should I play games that, the more demanding on resources, the cooler for the PC owner? Typing, sometimes creating presentations and diagrams in spreadsheets, or working with graphics? Work with video and sound (this is not listening to music and watching movies)? If I already remembered watching movies, then I'll ask you if you want to watch movies on Blu-ray discs? Or maybe you want to use your PC for other purposes? Well, at least “throw show off” what a cool PC, like: “I have an LG PC, because it says on the monitor!”.
By the way, if you want, you can slightly increase the speed of your computer without an upgrade by disabling the controllers that are included with the motherboard, but which you do not use.
This may reduce download times. For example, additional SATA controllers use their own BIOS, but users with a small number of drives are unlikely to need an additional controller at all.
Disabling it will save time spent on initializing the controller BIOS and checking connected drives. Yes, and you will get rid of the messages “drive not found”.
So, in this article I will tell you what main elements the processor consists of.

Processor architecture

A simple example: Why were Athlons once the top sellers? It's just that AMD introduced Hyper-Transport technology (the speed advantage - 12.8 GB / s - was provided by AMD with a shorter pipeline (rejection of the FSB), as well as an integrated memory controller). In response, Intel followed with Hyper-Threading technology and an increase in frequencies.
Hyper Threading (HT) is a technology whose idea is simple. One physical processor appears to the operating system as two logical processors, and operating system sees no difference between one NT processor or two conventional processors.
In both cases, the operating system directs threads as if it were a two-processor system. Further, all issues are resolved at the hardware level.
I will explain in more detail. Single-core processor models include the following cooperating devices:

Control device coordinates the work of all other devices, performs device management functions, and manages computer calculations.
Arithmetic logic unit (ALU). This is the name of the device for integer operations. Arithmetic operations such as addition, multiplication, and division, as well as logical operations(OR, AND, ASL, ROL, etc.) are processed using the ALU. A processor can have multiple ALUs. Each is capable of performing arithmetic or logical operations independently of the others, allowing multiple operations to be performed simultaneously.
AGU (Address Generation Unit)– address generation device. This device is no less important than the ALU, because it is responsible for the correct addressing when loading or saving data.
Math coprocessor (FPU). A processor may contain several math coprocessors. Each of them is capable of performing at least one floating point operation regardless of what the other ALUs are doing.

Data pipeline processing allows a single math coprocessor to perform multiple operations at the same time. The coprocessor supports high-precision calculations, both integer and floating point, and, in addition, contains a set of useful constants that speed up the calculation.
The coprocessor works in parallel with the central processor, thus providing high performance. By the way, at the dawn of the development of personal computers, the FPU was a separate chip on the board.
Instruction (command) decoder parses instructions to extract operands and addresses where results are placed. It then sends a message to another independent device about what needs to be done to execute the instruction. The decoder allows the execution of several instructions at the same time to load all the executing devices.
CACHE. Unlike RAM, this memory is located inside the CPU and has high speed access, is designed to speed up access to data contained in less fast RAM memory.
If it takes no more than 1.25 ns to access ordinary computer memory (for top-end DDR3 memory, the access time is 0.41 ns), then the cache operates at the processor frequency. Next time I'll talk about how the amount of cache affects the performance of the processor. The processor also has ultra-fast random access memory (SRAM), which is called processor registers . SRAM is intended primarily for storing intermediate results of calculations or data necessary for the operation of the processor. Processor register access time is about 0.3 ns.
Tires (groups of conductors). With other devices, primarily with operational memory, the processor is connected by buses.
So the difference between a single-core processor and a processor with HT technology is that in such a processor architecture, the ALU and FPU are determined not by one, but by two groups, i.e. in the structure of the processor, there are not 7 groups, but already 9.
Additional groups of ALUs and FPUs work in parallel with the “twins” ALUs and FPUs, and not sequentially, and in one cycle they process two operations simultaneously, and do not queue them, i.e. virtualization of the second kernel turns out.
In this case, when the operations are similar, the processor speed decreases, and when the programs are diverse or adapted to this technology, the speed increases.
But still, you will not get a full-fledged core, and the performance will increase by approximately 25% or decrease by 10%. And this is because this technology imitates multi-core processor. When processing 3D-graphics and this technology is useful.
In the new Nehalem architecture (Intel Core i3 - i7), Intel engineers tried to eliminate all the weaknesses of Hyper-Threading, and the end result was called Simultaneous MultiThreading (or SMT). One of the features of this technology is the division of cores dynamically into real and virtual ones, which allows them to be used more efficiently.
It should also be taken into account that Intel is aggressively marketing new processors not only against AMD, but also against its previous models. So, I once leafed through a booklet with new Intel products. It had many chipset capability diagrams and south bridges, information about new features and processor architecture…
On the very first pages and covers of the booklet, there are bright histograms like this one with headings like “Difference in performance of processors of the old and new architecture*”. Bottom on last page booklet in small print says "* new architecture gives an increase in performance as much as 10%.
I start to look closely at the graphs - for sure, it turns out that the columns that display the advantages of the new processors are elongated! I am already silent about why such colors were used to display graphs of “new” and “old” processor models.
Okay, back to the brochure. Suppose you decide to buy new computer or make an upgrade, for which we went to a computer store. In addition, you are not a sophisticated buyer and just asked the seller a question.
After that, he will start talking about new processors, showing a brochure and figuring out how much money he can get out of you. Well, tell me, please, who will explain the features of the graphs to you, and when you see them, you will immediately want to buy a processor from the advertised ones, or rather, what you won’t even read in the pictures.
Why read if everything is explained to you, pushing you to more expensive components, and showing something similar to that described in the booklet? So it turns out, as in a joke: he came for hooks to a fishing rod, and left on an all-terrain vehicle with a trailer, on which there was a boat with a bunch of fishing tackle.
Intel does this because they have moved new processor models to a new socket, so when you buy a processor and a mother for it, you will have to buy DDR3 memory as well. And I also want to have support for new interfaces in the future, besides, Intel has already abandoned the PCI line.
But back to the technology war between AMD and Intel. Later, in response, Intel got rid of the FSB bus in the processor, evolving it into a Quad-Pumped Bus (QPB), which is capable of transmitting four data blocks and two addresses per clock! Those. for each clock cycle of the bus, a command or four pieces of data can be transmitted through it.
AMD was the first to integrate a memory controller into processors back in 2003, but it works clearly better at Intel. But in the Core i5 nominal value, it only works with memory no higher than 1333 MHz, against 1600 MHz for
AMD platforms. Intel followed this idea in 2008. IN Socket platform 1156 Intel went one step further by porting the main graphics controller (PCI Express 2.0) to the processor and then releasing dual-core socket processors 1156 with a built-in graphics engine, which used to be found in “budget” northbridges, and Intel was already the first in this.

CPU temperature

Based on my experience, I can say that I have never seen a “colder” processor than a VIA. The arguments are simple: on an office PC, apart from Windows XP, Office 2003 and antivirus, it costs nothing, and in the heat there have never been problems with it, not like with two PCs with Intel processors, With
the same frequency. Given the load of “paper” work, this is a strong argument. I forgot to remember that a PC with a VIA central processor stands with its sidewalls not removed, while others do not have sidewalls and they are still “hot”.
But back to the temperatures on AMD and Intel processors. What AMD processors they heat up more than Intel, this has long been known, but cooling them is not such a big problem.
It is enough to buy at least two additional fans in the case and connect their power to the motherboard or purchase power adapters from IDE or Multi Fan Power Port for six fans. Some fan manufacturers provide additional power adapters.
You will have a headache if the temperature of the central processor exceeds 55-56 C at the moment when you “load” the PC with “heavy” programs, and even if the case is not sufficiently ventilated.
I have already seen such a miracle of technology several times - a compact system unit, in which there was a power supply unit above the processor fan, obscuring half of it, and on the other side above the “mother” DVD-RW and hard drive.
At the same time, there was a pipe on the sidewall for direct access of cold air to the processor. Then I had to attach the fan to the sidewall, and then attach the “proprietary” pipe-tunnel to it, which, in theory, was supposed to provide cool air directly to the processor. Before that, there was almost no sense from this pipe.
And one friend did it differently, he just worked with a file.
Therefore, before buying a PC for yourself, think about the size, layout and quality of the system unit, otherwise you will not change it soon.
At the same time, also take into account the fact that chipsets with additional controllers, together with a video card and hard drives, raise the temperature in the system unit quite well.
I confess that at one time, when buying a PC for myself, I spent almost twice as much on the system unit than for motherboard along with a 2.8 GHz Celeron processor.

Video viewing

The presence of the abbreviation Vivo (Video Input Video Output) from nVidia or Avivo from ATI in the video card specification means that when playing a movie in MPEG-2, MPEG-4, H.264, VC-1 and WMV9 or DVD formats GPU takes over part of the work on decoding the stream.
This allows you to generally reduce the load on the central processor and, accordingly, increase the overall performance of the system. It should be noted that decoding is carried out not only by the video chip. Part of the work falls on the shoulders of the central processor.
The problem was relevant for owners of computers based on processors with a clock frequency of up to 500 MHz, their power was not enough for full video playback, and films often “slowed down”. Modern processors with a clock frequency of 1.7 GHz or more are not threatened, even in the absence of Vivo.
The video card affects the quality of the video being played very indirectly. The amount of video card memory does not affect its performance in the least (it is affected by the frequencies of the video card processor and memory modules, as well as shaders), so playing with a 19-inch display there is no difference from 512 or 1024 MB of video memory.
Most modern video cards are equipped with a video output, and some with a video input and, accordingly, an analog-to-digital and digital-to-analog converter.
If you want to watch Blu-ray and HD DVD videos, then minimum requirements to the system are:

ArcSoft TotalMedia Theater or Media Player Classic - Home Cinema;
operating room Microsoft system Windows XP SP2;
a processor of at least 3.2 GHz (2.8 GHz with a 1 MB cache already “slows down”) and then not everything pulls, so some simple dual-core is better;
120 MB of free hard disk space;
512 MB of RAM, and preferably 1 GB;
Blu-ray or HD DVD drive (well, if such a movie has not been downloaded from the Internet);
256 MB video memory (128 MB minimum) or higher.

To capture high-quality video with a tuner, you need a productive system. Processor at least 1 GHz, RAM 256 MB and a capacious hard drive.

Alexander Romanov. According to the magazine "Computer"