The dynamic range is compressed or standard. Mastering in reverse: is it possible to increase the dynamic range of compressed recordings? Ratio or aspect ratio

💖 Like it? Share the link with your friends

The sound level is the same throughout the composition, there are several pauses.

Narrowing the dynamic range

Narrowing the dynamic range, or more simply compression, is necessary for different purposes, the most common of them:

1) Achieving a single volume level throughout the entire composition (or part of the instrument).

2) Achieving a single volume level of compositions throughout the album / radio broadcast.

2) Increasing intelligibility, mainly when compressing a certain part (vocal, bass drum).

How does the narrowing of the dynamic range happen?

The compressor analyzes the input audio level by comparing it to a user-defined Threshold value.

If the signal level is below the value Threshold– then the compressor continues to analyze the sound without changing it. If the sound level exceeds the Threshold value, then the compressor starts its action. Since the role of the compressor is to narrow the dynamic range, it is logical to assume that it limits the largest and smallest amplitude values ​​(signal level). At the first stage, the largest values ​​are limited, which decrease with a certain force, which is called Ratio(Attitude). Let's look at an example:

The green curves show the sound level, the greater the amplitude of their oscillations from the X axis, the greater the signal level.

The yellow line is the threshold (Threshold) for the compressor to operate. By making the Threshold value higher, the user moves it away from the X axis. By making the Threshold value lower, the user brings it closer to the Y axis. It is clear that the lower the threshold value, the more often the compressor will operate and vice versa, the higher, the less often. If the Ratio value is very high, then after reaching the Threshold signal level, the entire subsequent signal will be suppressed by the compressor to silence. If the Ratio value is very small, then nothing will happen. The choice of Threshold and Ratio values ​​will be discussed later. Now we should ask ourselves the following question: What is the point of suppressing all subsequent sound? Indeed, this makes no sense, we only need to get rid of the amplitude values ​​(peaks) that exceed the Threshold value (marked in red on the graph). It is to solve this problem that there is a parameter Release(Fade out), which sets the duration of the compression.

The example shows that the first and second Threshold exceedances last less than the third Threshold exceedance. So, if the Release parameter is set to the first two peaks, then when processing the third peak, an unprocessed part may remain (since the threshold exceeding the Threshold lasts longer). If the Release parameter is set to the third peak, then when processing the first and second peaks, an undesirable decrease in the signal level is formed behind them.

The same goes for the Ratio parameter. If the Ratio parameter is set to the first two peaks, then the third one will not be sufficiently suppressed. If the Ratio parameter is set to process the third peak, then the processing of the first two peaks will be too high.

These problems can be solved in two ways:

1) By setting the attack parameter (Attack) - a partial solution.

2) Dynamic compression is a complete solution.

Parameter Astill (Attack) is designed to set the time after which the compressor will start its work after exceeding the Threshold threshold. If the parameter is close to zero (it is equal to zero in case of parallel compression, see the corresponding article) - then the compressor will start to suppress the signal immediately, and the amount of time specified by the Release parameter will work. If the attack speed is high, then the compressor will start its action after a certain period of time (this is necessary to give clarity). In our case, you can set the threshold (Threshold), attenuation (Release) and compression level (Ratio) parameters to process the first two peaks, and set the Attack value (Attack) close to zero. Then the compressor will suppress the first two peaks, and when processing the third one, it will suppress it until the threshold is exceeded (Threshold). However, this does not guarantee high-quality sound processing and is close to limiting (a rough cut of all amplitude values, in this case the compressor is called a limiter).

Let's look at the result of sound processing by the compressor:

The peaks disappeared, I note that the processing settings were quite gentle and we suppressed only the most protruding amplitude values. In practice, the dynamic range narrows much more and this trend is only progressing. In the minds of many composers, they make music louder, but in practice, they completely deprive it of dynamics for those listeners who will probably listen to it at home and not on the radio.

It remains for us to consider the last compression parameter, this Gain(Gain). Amplification is intended to increase the amplitude of the entire composition and, in fact, is equivalent to another tool of sound editors - normalize. Let's look at the end result:

In our case, the compression was justified and improved the sound quality, since the prominent peak is more an accident than an intentional result. In addition, you can see that the music is rhythmic, therefore it has a narrow dynamic range. In cases where high amplitude values ​​were made on purpose, compression can become a mistake.

Dynamic compression

The difference between dynamic compression and non-dynamic compression is that the first level of signal suppression (Ratio) depends on the level of the incoming signal. Dynamic compressors are in all modern programs, the Ratio and Threshold parameters are controlled using a window (each parameter has its own axis):

There is no single standard for displaying the graph, somewhere along the Y axis the level of the incoming signal is displayed, somewhere on the contrary, the level of the signal after compression. Somewhere the point (0,0) is in the upper right corner, somewhere in the lower left. In any case, moving the mouse cursor over this field changes the values ​​of the numbers that correspond to the Ratio and Threshold parameters. Those. You set the compression level for each Threshold value, so you can set the compression very flexibly.

Side Chain

The side chain compressor analyzes the signal of one channel, and when the sound level exceeds the threshold (threshold), it applies compression to the other channel. The side chain has its advantages of working with instruments that are located in the same frequency region (bass-bass drum is actively used), but sometimes instruments located in different frequency regions are used, which leads to an interesting side-chain effect.

Part Two - Compression Steps

There are three stages of compression:

1) The first stage is the compression of individual sounds (singleshoots).

The timbre of any instrument has the following characteristics: Attack, Hold, Decay, Delay, Sustain, Release.

The stage of compression of individual sounds is divided into two parts:

1.1) Compression of individual sounds of rhythmic instruments

Often the components of a beat require separate compression to give them clarity. Many people process the bass drum separately from other rhythmic instruments, both at the stage of compressing individual sounds, and at the stage of compressing individual parts. This is due to the fact that it is located in the low-frequency region, where, in addition to it, only bass is usually present. The clarity of the bass drum is understood as the presence of a characteristic click (the bass drum has a very short attack and hold time). If there is no click, then you need to process it with a compressor, setting the threshold to zero and the attack time from 10 to 50 ms. The Compressor's Realese must end before the kick kick kicks in again. The last problem can be solved using the formula: 60,000 / BPM , where BPM is the tempo of the composition. So, for example) 60,000/137=437.96 (time in milliseconds until a new downbeat of a 4-meter composition).

All of the above applies to other rhythmic instruments with a short attack time - they should have an accentuated click that should not be suppressed by the compressor at any of the stages of compression levels.

1.2) Compressionindividual soundsharmonic instruments

Unlike rhythmic instruments, parts of harmonic instruments are rarely composed of individual sounds. However, this does not mean that they should not be processed at the sound compression level. If you use a sample with a recorded part, then this is the second level of compression. This level of compression applies only to synthesized harmonic instruments. These can be samplers, synthesizers using various sound synthesis methods (physical modeling, FM, additive, subtractive, etc.). As you probably already guessed, we are talking about programming synthesizer settings. Yes! It's compression too! Almost all synthesizers have a programmable envelope parameter (ADSR), which means envelope. With the help of the envelope, the time of the Attack (Attack), Decay (Decay), Holding Level (Sustain), Decay (Release) is set. And if you tell me that this is not the compression of each individual sound - you are my enemy for life!

2) The second stage - Compression of individual parts.

By compression of individual parts, I mean the narrowing of the dynamic range of a number of combined individual sounds. This stage also includes recordings of parties, including vocals, which require compression processing to give it clarity and intelligibility. When processing batches by compression, it is necessary to take into account the fact that when adding individual sounds, unwanted peaks may appear, which you need to get rid of at this stage, because if this is not done now, then the picture may worsen at the stage of mixing the entire composition. At the stage of compression of individual parts, the compression of the processing stage of individual sounds must be taken into account. If you have achieved the clarity of the bass drum, then incorrect re-processing at the second stage can ruin everything. It is not necessary to have all parts processed by the compressor, nor is it necessary to process all individual sounds. I advise you to put an amplitude analyzer just in case to determine the presence of unwanted side effects of combining individual sounds. In addition to compression at this stage, care must be taken to ensure that the parties are, if possible, in different frequency bands to perform quantization. It is also useful to remember that sound has such a characteristic as masking (psychoacoustics):

1) A quieter sound is masked by a louder sound in front of it.

2) Quieter sound at low frequency is masked by louder sound at high frequency.

So, for example, if you have a synth part, often the notes start playing before the previous notes finish playing. Sometimes this is necessary (creating harmony, playing style, polyphony), but sometimes not at all - you can cut their end (Delay - Release) in case it is heard in solo mode, but not heard in all-part play mode. The same applies to effects, such as reverb - it should not last until the sound source starts again. By cutting and removing the unwanted signal, you make the sound cleaner, and this can also be considered as compression - because you remove unwanted waves.

3) The third stage - Compression of the composition.

When compressing the entire composition, you need to take into account the fact that all parts are a combination of many individual sounds. Therefore, when combining them and then compressing them, care must be taken that the final compression does not spoil what we achieved in the first two stages. You also need to separate compositions in which a wide or narrow range is important. when compressing compositions with a wide dynamic range, it is enough to put a compressor that will crush short-term peaks that were formed as a result of adding parts together. When compressing a composition in which a narrow dynamic range is important, everything is much more complicated. Here compressors Lately are called maximizers. Maximizer is a plugin that combines a compressor, limiter, graphic equalizer, enhancer and other sound transformation tools. At the same time, he must necessarily have sound analysis tools. Maximizing, the final processing by the compressor, is largely needed to combat the mistakes made in the previous stages. Mistakes - not so much compression (however, if you do at the last stage what you could have done at the first stage, this is already a mistake), but in the initial choice of good samples and instruments that would not interfere with each other (we are talking about frequency ranges) . This is what the frequency response is corrected for. It often happens that with strong compression on the master, you need to change the compression and mixing parameters at earlier stages, since with a strong narrowing of the dynamic range, quiet sounds that were previously masked come out, the sound of individual components of the composition changes.

In these parts, I deliberately did not talk about specific compression parameters. I considered it necessary to write about the fact that during compression it is necessary to pay attention to all sounds and all parts at all stages of creating a composition. Only in this way, in the end, you will get a harmonious result, not only from the point of view of music theory, but also from the point of view of sound engineering.

Further in the table are given practical advice processing individual batches. However, in compression, numbers and presets can only suggest the desired area in which to search. The ideal compression settings depend on each individual case. The Gain and Threshold parameters assume a normal sound level (logical use of the entire range).

Part Three - Compression Options

Quick reference:

Threshold - determines the sound level of the incoming signal, upon reaching which the compressor starts to work.

Attack (Attack) - determines the time after which the compressor will start to work.

Level (ratio) - determines the degree of reduction of amplitude values ​​(in relation to the original amplitude value).

Release (release) - determines the time after which the compressor will stop working.

Gain - Determines how much the input signal will be boosted after it has been processed by the compressor.

Compression table:

Tool Threshold Attack Ratio Release Gain Description
vocals 0 dB 1-2ms

2-5ms

10 ms

0.1 ms

0.1 ms

less than 4:1

2,5: 1

4:1 – 12:1

2:1 -8:1

150ms

50-100ms

150 ms

150ms

0.5s

Compression during recording should be minimal, it requires mandatory processing at the mixing stage to make it clear and intelligible.
wind instruments 1-5ms 6:1 – 15:1 0.3s
Barrel 10 to 50 ms

10-100ms

4:1 and above

10:1

50-100ms

1ms

The lower the Thrshold and the larger the Ratio and the longer the Attack , the more pronounced the click at the beginning of the kick.
Synthesizers Depends on wave type (ADSR envelopes).
Working drum: 10-40ms

1-5ms

5:1

5:1 – 10:1

50ms

0.2s

Hi-hat 20ms 10:1 1ms
Overhead microphones 2-5ms 5:1 1-50ms
Drums 5ms 5:1 – 8:1 10ms
Bas-guitar 100-200ms

4ms to 10ms

5:1 1ms

10ms

Strings 0-40ms 3:1 500ms
Synth. bass 4ms-10ms 4:1 10ms Depends on envelopes.
Percussion 0-20ms 10:1 50ms
Acoustic guitar, Piano 10-30ms

5 - 10ms

4:1

5:1 -10:1

50-100ms

0.5s

Electro-nitara 2-5ms 8:1 0.5s
Final compression 0.1 ms

0.1 ms

2:1

2:1 to 3:1

50ms

0.1 ms

0 dB output The attack time depends on the goal - whether to remove peaks or make the track smoother.
Limiter after final compression 0 mS 10:1 10-50ms 0 dB output If you need a narrow dynamic range and a rough "cut" of the waves.

The information was taken from various sources, which are referred to by popular resources on the Internet. The difference in compression parameters is explained by the difference in sound preferences and working with different material.

Let's think about the question - why do we need to raise the volume? In order to hear quiet sounds that are not audible in our conditions (for example, if you cannot listen loudly, if there is extraneous noise in the room, etc.). Is it possible to amplify quiet sounds, but not loud ones? It turns out you can. This technique is called Dynamic Range Compression (DRC). To do this, you need to change the current volume constantly - quiet sounds are amplified, loud ones are not. The simplest law of volume change is linear, i.e. the volume changes according to the law output_loudness = k * input_loudness, where k is the dynamic range compression ratio:

Figure 18. Dynamic range compression.

For k = 1, no change is made (the output volume is equal to the input volume). For k< 1 громкость будет увеличиваться, а динамический диапазон - сужаться. Посмотрим на график (k=1/2) - тихий звук, имевший громкость -50дБ станет громче на 25дБ, что значительно громче, но при этом громкость диалогов (-27дБ) повысится всего лишь на 13.5дБ, а громкость самых громких звуков (0дБ) вообще не изменится. При k >1 - the volume will decrease and the dynamic range will increase.

Let's look at the loudness graphs (k = 1/2: DD compression by two times):

Figure 19. Loudness graphs.

As you can see in the original, there were both very quiet sounds, 30dB below the level of dialogues, and very loud sounds - 30dB above the level of dialogues. That. the dynamic range was 60dB. After compression, loud sounds are only 15dB higher and soft sounds are 15dB lower than dialogue (dynamic range is now 30dB). Thus, loud sounds become much quieter, and quiet sounds become much louder. In this case, no overflow occurs!

Now let's turn to the histograms:

Figure 20. An example of compression.

As you can clearly see, at +30dB gain, the histogram shape is well preserved, which means that loud sounds remain well defined (do not go to the maximum and are not cut off, as it happens with simple gain). This produces quiet sounds. The histogram does not show this well, but the difference is very noticeable by ear. The disadvantage of the method is the same volume jumps. However, the mechanism of their occurrence differs from the volume jumps that occur during clipping, and their nature is different - they appear mainly when quiet sounds are very strongly amplified (and not when loud sounds are cut off, as with normal amplification). An excessive level of compression leads to a flattening of the sound picture - all sounds tend to the same volume and inexpressiveness.

Highly amplifying quiet sounds may cause recording noise to become audible. Therefore, a slightly modified algorithm is applied in the filter so that the noise level rises less:

Figure 21. Increasing the volume, without increasing the noise.

Those. at a volume level of -50dB, the transfer function inflection occurs, and the noise will be amplified less (yellow line). In the absence of such an inflection, the noise will be much louder (grey line). Such a simple modification significantly reduces the amount of noise even at very high compression levels (compression 1:5 in the figure). The “DRC” level in the filter sets the level of gain for quieter sounds (at -50dB), so The compression level 1/5 shown in the figure corresponds to the +40dB level in the filter settings.

© 2014 website

Or photographic latitude photographic material is the ratio between the maximum and minimum exposure values ​​\u200b\u200bthat can be correctly captured in the picture. Applied to digital photography, the dynamic range is actually equivalent to the ratio of the maximum and minimum possible values ​​of the useful electrical signal generated by the photosensor during exposure.

Dynamic range is measured in exposure steps (). Each step corresponds to doubling the amount of light. So, for example, if a certain camera has a dynamic range of 8 EV, then this means that the maximum possible value of the useful signal of its matrix is ​​related to the minimum as 2 8: 1, which means that the camera is able to capture objects that differ in brightness within one frame no more than 256 times. More precisely, it can capture objects with any brightness, however, objects whose brightness will exceed the maximum allowable value will come out dazzling white in the picture, and objects whose brightness will be below the minimum value will be jet black. Details and texture will be distinguishable only on those objects, the brightness of which fits into the dynamic range of the camera.

To describe the relationship between the brightness of the lightest and darkest of the subjects being photographed, the not quite correct term "dynamic range of the scene" is often used. It would be more correct to talk about the range of brightness or the level of contrast, since the dynamic range is usually a characteristic of the measuring device (in this case, the matrix of a digital camera).

Unfortunately, the brightness range of many beautiful scenes that we encounter in real life can significantly exceed the dynamic range of a digital camera. In such cases, the photographer is forced to decide which objects should be worked out in great detail, and which can be left outside the dynamic range without compromising the creative intent. In order to make the most of your camera's dynamic range, sometimes you may need not so much a thorough understanding of the principle of operation of the photosensor as a developed artistic flair.

Factors limiting dynamic range

The lower limit of the dynamic range is set by the intrinsic noise level of the photosensor. Even an unlit matrix generates a background electrical signal called dark noise. Also, interference occurs when a charge is transferred to an analog-to-digital converter, and the ADC itself introduces a certain error into the digitized signal - the so-called. sampling noise.

If you take a picture in complete darkness or with a lens cap on, the camera will only record this meaningless noise. If a minimum amount of light is allowed to hit the sensor, the photodiodes will begin to accumulate electric charge. The magnitude of the charge, and hence the intensity of the useful signal, will be proportional to the number of captured photons. In order for any meaningful details to appear in the picture, it is necessary that the level of the useful signal exceed the level of background noise.

Thus, the lower limit of the dynamic range or, in other words, the sensor sensitivity threshold can be formally defined as the output signal level at which the signal-to-noise ratio is greater than one.

The upper limit of the dynamic range is determined by the capacitance of a single photodiode. If during the exposure any photodiode accumulates an electric charge of the maximum value for itself, then the image pixel corresponding to the overloaded photodiode will turn out to be absolutely white, and further irradiation will not affect its brightness in any way. This phenomenon is called clipping. The higher the overload capacity of the photodiode, the more signal it is able to give at the output before it reaches saturation.

For greater clarity, let's turn to the characteristic curve, which is a graph of the dependence of the output signal on the exposure. On horizontal axis plotted is the binary logarithm of the irradiation received by the sensor, and the vertical is the binary logarithm of the electrical signal generated by the sensor in response to this irradiation. My drawing is largely arbitrary and is for illustrative purposes only. The characteristic curve of a real photosensor has a slightly more complex shape, and the noise level is rarely so high.

Two critical turning points are clearly visible on the graph: in the first of them, the useful signal level crosses the noise threshold, and in the second, the photodiodes reach saturation. The exposure values ​​between these two points constitute the dynamic range. In this abstract example, it is equal, as you can easily see, to 5 EV, i.e. the camera is able to digest five doublings of exposure, which is equivalent to a 32-fold (2 5 = 32) difference in brightness.

The exposure zones that make up the dynamic range are not equivalent. The upper zones have a higher signal-to-noise ratio, and therefore look cleaner and more detailed than the lower ones. As a result, the upper limit of the dynamic range is very real and noticeable - clipping cuts off the light at the slightest overexposure, while the lower limit is inconspicuously drowned in noise, and the transition to black is not as sharp as to white.

The linear dependence of the signal on exposure, as well as a sharp plateau, are unique features of the digital photographic process. For comparison, take a look at the conditional characteristic curve of traditional photographic film.

The shape of the curve and especially the angle of inclination strongly depend on the type of film and on the procedure for its development, but the main, conspicuous difference between the film graph and the digital graph remains unchanged - the non-linear nature of the dependence of the optical density of the film on the exposure value.

The lower limit of the photographic latitude of the negative film is determined by the density of the veil, and the upper limit is determined by the maximum achievable optical density of the photolayer; for reversible films, the opposite is true. Both in the shadows and in the highlights, smooth curves of the characteristic curve are observed, indicating a drop in contrast when approaching the boundaries of the dynamic range, because the slope of the curve is proportional to the contrast of the image. Thus, exposure areas lying in the middle of the graph have maximum contrast, while contrast is reduced in highlights and shadows. In practice, the difference between film and digital matrix is ​​especially noticeable in the highlights: where in the digital image the lights are burned out by clipping, on the film the details are still distinguishable, albeit with low contrast, and the transition to pure white color looks smooth and natural.

In sensitometry, even two independent terms are used: actually photographic latitude, limited by a relatively linear section of the characteristic curve, and useful photographic latitude, which, in addition to the linear section, also includes the base and shoulder of the chart.

It is noteworthy that when processing digital photographs, as a rule, a more or less pronounced S-curve is applied to them, increasing the contrast in midtones at the cost of reducing it in shadows and highlights, which gives the digital image a more natural and pleasing look to the eye.

Bit depth

Unlike the matrix of a digital camera, human vision is characterized by, let's say, a logarithmic view of the world. Successive doublings of the amount of light are perceived by us as equal changes in brightness. Light numbers can even be compared with musical octaves, because two-fold changes in sound frequency are perceived by ear as a single musical interval. Other sense organs work on the same principle. The non-linearity of perception greatly expands the range of human sensitivity to stimuli of varying intensity.

When converting a RAW file (it doesn't matter - using the camera or in a RAW converter) containing linear data, the so-called. gamma curve, which is designed to non-linearly increase the brightness of a digital image, bringing it into line with the characteristics of human vision.

With linear conversion, the image is too dark.

After gamma correction, the brightness returns to normal.

The gamma curve, as it were, stretches the dark tones and compresses the light tones, making the distribution of gradations more uniform. The result is a natural-looking image, but the noise and sampling artifacts in the shadows inevitably become more noticeable, which is only exacerbated by the small number of brightness levels in the lower zones.

Linear distribution of gradations of brightness.
Uniform distribution after applying the gamma curve.

ISO and dynamic range

Despite the fact that digital photography uses the same concept of the photosensitivity of the photographic material as in film photography, it should be understood that this happens solely due to tradition, since the approaches to changing the photosensitivity in digital and film photography differ fundamentally.

Increasing the ISO speed in traditional photography means changing from one film to another with coarser grain, i.e. there is an objective change in the properties of the photographic material itself. In a digital camera, the light sensitivity of the sensor is rigidly set by its physical characteristics and cannot be literally changed. When increasing the ISO, the camera does not change the actual sensitivity of the sensor, but only amplifies the electrical signal generated by the sensor in response to irradiation and adjusts the algorithm for digitizing this signal accordingly.

An important consequence of this is the decrease in effective dynamic range in proportion to the increase in ISO, because along with the useful signal, noise also increases. If at ISO 100 the entire range of signal values ​​is digitized - from zero to the saturation point, then at ISO 200 only half of the capacity of photodiodes is taken as a maximum. With each doubling of ISO sensitivity, the top stop of the dynamic range seems to be cut off, and the remaining steps are pulled up in its place. That is why the use of ultra-high ISO values ​​\u200b\u200bis devoid of practical meaning. With the same success, you can brighten the photo in the RAW converter and get a comparable noise level. The difference between increasing the ISO and artificially brightening the image is that when the ISO is increased, the signal is amplified before it enters the ADC, which means that the quantization noise is not amplified, unlike the sensor’s own noise, while in the RAW converter they are subject to amplification including ADC errors. In addition, reducing the sampling range means more accurate sampling of the remaining values ​​of the input signal.

By the way, lowering the ISO below the base value (for example, to ISO 50), which is available on some devices, does not expand the dynamic range at all, but simply attenuates the signal by half, which is equivalent to darkening the image in the RAW converter. This function can even be considered as harmful, since using a sub-minimum ISO value provokes the camera to increase the exposure, which, with the sensor saturation threshold remaining unchanged, increases the risk of clipping in the highlights.

True value of dynamic range

There are a number of programs like (DxO Analyzer, Imatest, RawDigger, etc.) that allow you to measure the dynamic range of a digital camera at home. In principle, this is not very necessary, since data for most cameras can be freely found on the Internet, for example, at DxOMark.com.

Should we believe the results of such tests? Quite. With the only caveat that all these tests determine the effective or, so to speak, the technical dynamic range, i.e. the relationship between saturation level and matrix noise level. For the photographer, the useful dynamic range is of primary importance, i.e. the number of exposure zones that really allow you to capture some useful information.

As you remember, the dynamic range threshold is set by the noise level of the photosensor. The problem is that, in practice, the lower zones, which are technically already included in the dynamic range, still contain too much noise to be usefully used. Here, much depends on individual disgust - everyone determines the acceptable noise level for himself.

My subjective opinion is that the details in the shadows begin to look more or less decent at a signal-to-noise ratio of at least eight. On that basis, I define useful dynamic range for myself as technical dynamic range minus about three stops.

For example, if a reflex camera has a dynamic range of 13 EV, which is very good by today's standards, according to reliable tests, then its useful dynamic range will be about 10 EV, which, in general, is also quite good. Of course, we are talking about shooting in RAW, with a minimum ISO and maximum bit depth. When shooting in JPEG, the dynamic range is highly dependent on the contrast settings, but on average, another two to three stops should be discarded.

For comparison: color reversible films have a useful photographic latitude of 5-6 steps; black-and-white negative films give 9-10 stops with standard development and printing procedures, and with certain manipulations - up to 16-18 stops.

Summing up the above, let's try to formulate a few simple rules, the observance of which will help you squeeze the maximum performance out of your camera's sensor:

  • The dynamic range of a digital camera is fully available only when shooting in RAW.
  • Dynamic range decreases as ISO increases, so avoid high ISO unless absolutely necessary.
  • Using higher bit depths for RAW files does not increase true dynamic range, but improves tonal separation in the shadows at the expense of more brightness levels.
  • Exposure to the right. The upper exposure zones always contain the maximum useful information with a minimum of noise and should be used most efficiently. At the same time, do not forget about the danger of clipping - pixels that have reached saturation are absolutely useless.

And most importantly, don't worry too much about your camera's dynamic range. It's all right with dynamic range. Your ability to see the light and properly manage the exposure is much more important. A good photographer will not complain about the lack of photographic latitude, but will try to wait for more comfortable lighting, or change the angle, or use the flash, in a word, will act in accordance with the circumstances. I'll tell you more: some scenes only benefit from the fact that they do not fit into the dynamic range of the camera. Often, unnecessary abundance of details just needs to be hidden in a semi-abstract black silhouette, which makes the photo both concise and richer.

High contrast is not always bad - you just need to be able to work with it. Learn to exploit the equipment's weaknesses as well as its strengths, and you'll be surprised at how much your creativity expands.

Thank you for your attention!

Vasily A.

post scriptum

If the article turned out to be useful and informative for you, you can kindly support the project by contributing to its development. If you did not like the article, but you have thoughts on how to make it better, your criticism will be accepted with no less gratitude.

Do not forget that this article is subject to copyright. Reprinting and quoting are permissible provided there is a valid link to the original source, and the text used must not be distorted or modified in any way.

This group of methods is based on the fact that the transmitted signals are subjected to nonlinear amplitude transformations, and in the transmitting and receiving parts of the nonlinearities are mutually inverse. For example, if the transmitter uses a non-linear function Öu , the receiver uses u 2 . The successive application of reciprocal functions will lead to the fact that the overall transformation remains linear.

The idea of ​​non-linear data compression methods is that the transmitter can, with the same amplitude of the output signals, transmit a larger range of changes passed parameter(that is, more dynamic range). Dynamic Range is the ratio of the largest allowable signal amplitude to the smallest, expressed in relative units or decibels:

; (2.17)
. (2.18)

The natural desire to increase the dynamic range by reducing U min is limited by the sensitivity of the equipment and the increase in the influence of interference and intrinsic noise.

Most often, dynamic range compression is performed using a pair of reciprocal logarithm and potentiate functions. The first operation of changing the amplitude is called compression(compression), the second - expansion(stretch). The choice of these functions is connected with their greatest possibility of compression.

At the same time, these methods also have disadvantages. The first of them is that the logarithm of a small number is negative and in the limit:

that is, the sensitivity is highly non-linear.

To reduce these shortcomings, both functions are modified by bias and approximation. For example, for telephone channels, the approximated function has the form (type A,):

where A=87.6. The gain from compression in this case is 24dB.

Data compression by non-linear procedures is implemented by analog means with large errors. The use of digital tools can significantly improve the accuracy or speed of the conversion. At the same time, the direct use of funds computer science(i.e., direct calculation of logarithms and exponents) will not give the best result due to low performance and accumulating calculation errors.

Data compression by compression due to accuracy limitations is used in non-critical cases, for example, for voice transmission over telephone and radio channels.

Efficient coding

Efficient codes were proposed by K. Shannon, Fano and Huffman. The essence of the codes lies in the fact that they are uneven, that is, with an unequal number of digits, and the length of the code is inversely proportional to the probability of its occurrence. Another great feature of efficient codes is that they don't require delimiters, i.e. special characters separating neighboring code combinations. This is achieved by observing a simple rule: shorter codes are not the beginning of longer ones. In this case, the continuous bit stream is unambiguously decoded because the decoder detects shorter patterns first. Efficient codes have long been purely academic, but have recently been successfully used in the formation of databases, as well as in the compression of information in modern modems and software archivers.

Due to the unevenness, the average code length is introduced. Average length - mathematical expectation of code length:

moreover, l cf tends to H(x) from above (that is, l cf > H(x)).

The fulfillment of condition (2.23) becomes stronger as N increases.

There are two types of efficient codes: Shannon-Fano and Huffman. Let's take an example to get them. Suppose the probabilities of the characters in the sequence have the values ​​given in Table 2.1.

Table 2.1.

Symbol probabilities

N
pi 0.1 0.2 0.1 0.3 0.05 0.15 0.03 0.02 0.05

Symbols are ranked, that is, they are presented in a series in descending order of probabilities. After that, according to the Shannon-Fano method, the following procedure is periodically repeated: the entire group of events is divided into two subgroups with the same (or approximately the same) total probabilities. The procedure continues until one element remains in the next subgroup, after which this element is eliminated, and the specified actions continue with the remaining ones. This continues until there is only one element left in the last two subgroups. Let's continue the consideration of our example, which is summarized in Table 2.2.

Table 2.2.

Shannon-Fano coding

N Pi
4 0.3 I
0.2 I II
6 0.15 I I
0.1 II
1 0.1 I I
9 0.05 II II
5 0.05 II I
7 0.03 II II I
8 0.02 II

As can be seen from Table 2.2, the first symbol with probability p 4 = 0.3 participated in two procedures for splitting into groups and both times fell into the group with number I . Accordingly, it is encoded with a two-digit code II. The second element at the first stage of partitioning belonged to group I, at the second - to group II. Therefore, its code is 10. The codes of the remaining characters do not need additional comments.

Usually non-uniform codes are depicted as code trees. A code tree is a graph indicating the allowed code combinations. The directions of the edges of this graph are preliminarily set, as shown in Fig. 2.11 (the choice of directions is arbitrary).

According to the graph, they are guided as follows: make up a route for the selected symbol; the number of bits for it is equal to the number of edges in the route, and the value of each bit is equal to the direction of the corresponding edge. The route is drawn from the starting point (in the drawing it is marked with the letter A). For example, the route to vertex 5 consists of five edges, of which all but the last have direction 0; we get the code 00001.

For this example, we calculate the entropy and the average length of a word.

H(x) = -(0.3 log 0.3 + 0.2 log 0.2 + 2 0.1 log 0.1+ 2 0.05 log 0.05+

0.03 log 0.03 + 0.02 log 0.02) = 2.23 bits

lav = 0.3 2 + 0.2 2 + 0.15 3 + 0.1 3 + 0.1 4 + 0.05 5 +0.05 4+

0.03 6 + 0.02 6 = 2.9 .

As you can see, the average word length is close to the entropy.

Huffman codes are built according to a different algorithm. The encoding procedure consists of two steps. At the first stage, single compressions of the alphabet are sequentially carried out. One-time compression - replacing the last two characters (with the lowest probabilities) with one, with a total probability. Compression is carried out until two characters remain. At the same time, the coding table is filled in, in which the resulting probabilities are put down, and the routes along which the new symbols pass at the next stage are also depicted.

At the second stage, the actual encoding takes place, which begins from the last stage: the first of the two characters is assigned a code of 1, the second - 0. After that, they go to the previous stage. The codes from the next stage are assigned to the characters that did not participate in compression at this stage, and the code of the character obtained after gluing is assigned to the last two characters twice and added to the code top character 1, lower - 0. If the symbol is not further involved in gluing, its code remains unchanged. The procedure continues until the end (that is, until the first stage).

Table 2.3 shows Huffman encoding. As can be seen from the table, encoding was carried out in 7 stages. On the left are the probabilities of symbols, on the right - intermediate codes. The arrows show the movements of the newly formed symbols. At each stage, the last two characters differ only in the least significant bit, which corresponds to the coding technique. Calculate the average word length:

lav = 0.3 2 + 0.2 2 + 0.15 3 ++ 2 0.1 3 + +0.05 4 + 0.05 5 + 0.03 6 + 0.02 6 = 2.7

This is even closer to entropy: the code is even more efficient. On fig. 2.12 shows the Huffman code tree.

Table 2.3.

Huffman encoding

N pi code I II III IV V VI VII
0.3 0.3 11 0.3 11 0.3 11 0.3 11 0.3 11 0.4 0 0.6 1
0.2 0.2 01 0.2 01 0.2 01 0.2 01 0.3 10 0.3 11 0.4 0
0.15 0.15 101 0.15 101 0.15 101 0.2 00 0.2 01 0.3 10
0.1 0.1 001 0.1 001 0.15 100 0.15 101 0.2 00
0.1 0.1 000 0.1 000 0.1 001 0.15 100
0.05 0.05 1000 0.1 1001 0.1 000
0.05 0.05 10011 0.05 1000
0.03 0.05 10010
0.02

Both codes satisfy the requirement of unambiguous decoding: as can be seen from the tables, shorter combinations are not the beginning of longer codes.

With an increase in the number of characters, the efficiency of codes increases, therefore, in some cases, larger blocks are encoded (for example, when it comes to texts, you can encode some of the most common syllables, words, and even phrases).

The effect of introducing such codes is determined by comparing them with a uniform code:

(2.24)

where n is the number of digits of the uniform code, which is replaced by an effective one.

Modifications of Huffman codes

The classical Huffman algorithm refers to two-pass, i.e. requires first a set of statistics on symbols and messages, and then the procedures described above. This is inconvenient in practice, since it increases the time for message processing and dictionary accumulation. One-pass methods are more commonly used, in which the accumulation and encoding procedures are combined. Such methods are also called Huffman adaptive compression [46].

The essence of adaptive compression according to Huffman is reduced to the construction of the initial code tree and its subsequent modification after the arrival of each next character. As before, the trees here are binary, i.e. from each vertex of the graph-tree comes a maximum of two arcs. It is customary to call the initial vertex the parent, and the next two vertices associated with it - the children. Let's introduce the concept of the weight of a vertex - this is the number of characters (words) corresponding to a given vertex, obtained when submitting the original sequence. Obviously, the sum of the weights of the children is equal to the weight of the parent.

After the introduction of the next symbol of the input sequence, the code tree is revised: the weights of the vertices are recalculated and, if necessary, the vertices are rearranged. The vertex permutation rule is as follows: the weights of the lower vertices are the smallest, and the vertices on the left of the graph have the smallest weights.

At the same time, the vertices are numbered. The numbering starts from the lower (hanging, i.e. without children) vertices from left to right, then transferred to top level etc. up to the numbering of the last, initial vertex. In this case, the following result is achieved: the smaller the weight of the vertex, the smaller its number.

The permutation is carried out mainly for hanging vertices. When rearranging, the rule formulated above should be taken into account: vertices with a large weight also have a larger number.

After passing through the sequence (it is also called control or test), code combinations are assigned to all hanging vertices. The code assignment rule is similar to the one above: the number of code bits is equal to the number of vertices through which the route passes from the source to the given hanging vertex, and the value of a particular bit corresponds to the direction from the parent to the "child" (say, moving to the left from the parent corresponds to the value 1, to the right - 0 ).

The resulting code combinations are entered into the memory of the compression device along with their counterparts and form a dictionary. The use of the algorithm is as follows. The compressed sequence of characters is divided into fragments according to the available dictionary, after which each of the fragments is replaced by its code from the dictionary. Fragments not found in the dictionary form new hanging vertices, gain weight and are also entered into the dictionary. Thus, an adaptive dictionary replenishment algorithm is formed.

To increase the efficiency of the method, it is desirable to increase the size of the dictionary; in this case, the compression ratio is increased. In practice, the size of a dictionary is 4 - 16 KB of memory.


Let's illustrate the above algorithm with an example. On fig. 2.13 shows the original diagram (also called the Huffman tree). Each vertex of the tree is shown by a rectangle in which two digits are entered through a fraction: the first indicates the number of the vertex, the second - its weight. As you can see, the correspondence between the weights of the vertices and their numbers is satisfied.

Let us now assume that the symbol corresponding to vertex 1 occurs a second time in the test sequence. The weight of the vertex has changed, as shown in Fig. 2.14, as a result of which the vertex numbering rule is violated. At the next stage, we change the location of hanging vertices, for which we swap vertices 1 and 4 and renumber all the vertices of the tree. The resulting graph is shown in Fig. 2.15. The procedure then continues in the same way.

It should be remembered that each hanging node in the Huffman tree corresponds to a certain character or group of them. The parent differs from the children in that the group of characters corresponding to it is one character shorter than that of its children, and these children differ in the last character. For example, the parent matches the characters "kar"; then the children may have the sequences "kara" and "karp".

The above algorithm is not academic and is actively used in archiving programs, including when compressing graphic data (they will be discussed below).

Lempel–Ziva algorithms

These are the most commonly used compression algorithms today. They are used in most programs - archivers (for example, PKZIP, ARJ, LHA). The essence of the algorithms lies in the fact that a certain set of characters is replaced during archiving by its number in a specially formed dictionary. For example, the phrase "Outgoing number for your letter ...", which is often found in business correspondence, can occupy position 121 in the dictionary; then instead of transmitting or storing said phrase (30 bytes), you can store the phrase number (1.5 bytes in BCD or 1 byte in binary).

The algorithms are named after the authors who first proposed them in 1977. Of these, the first is LZ77. For archiving, a so-called message-sliding window is created, which consists of two parts. The first part, of a larger format, serves to form a dictionary and has a size of the order of several kilobytes. The second, smaller part (usually up to 100 bytes) receives the current characters of the text being viewed. The algorithm tries to find a set of characters in the dictionary that matches those received in the viewport. If this succeeds, a code consisting of three parts is formed: the offset in the dictionary relative to its initial substring, the length of this substring, and the character following this substring. For example, the selected substring consists of the characters "app" (6 characters in total), the character following it is "e". Then, if the substring has the address (place in the dictionary) 45, then the entry in the dictionary looks like "45, 6. e". After that, the contents of the window are shifted by a position, and the search continues. Thus, a dictionary is formed.

The advantage of the algorithm is an easily formalized dictionary compilation algorithm. In addition, unzipping is possible without the initial dictionary (it is desirable to have a test sequence at the same time) - the dictionary is formed during unzipping.

The disadvantages of the algorithm appear when the size of the dictionary increases - the time for searching increases. In addition, if a string of characters appears in the current window that is not in the dictionary, each character is written with a three-element code, i.e. It's not compression, but expansion.

The LZSS algorithm, proposed in 1978, has the best performance. It has differences in the maintenance of the sliding window and the output codes of the compressor. In addition to the window, the algorithm forms a binary tree similar to the Huffman tree to speed up the search for matches: each substring that leaves the current window is added to the tree as one of the children. This algorithm allows you to additionally increase the size of the current window (it is desirable that its value be equal to the power of two: 128, 256, etc. bytes). Sequence codes are also formed differently: an additional 1-bit prefix is ​​introduced to distinguish unencoded characters from "offset, length" pairs.

An even greater degree of compression is obtained when using algorithms such as LZW. The algorithms described earlier have a fixed window size, which makes it impossible to enter phrases longer than the window size into the dictionary. In the LZW algorithms (and their predecessor LZ78), the viewport has an unlimited size, and the dictionary accumulates phrases (and not a collection of characters, as before). The dictionary has an unlimited length, and the encoder (decoder) works in the phrase waiting mode. When a phrase matching the dictionary is formed, the match code (i.e. the code for that phrase in the dictionary) and the code of the character following it are returned. If, as the characters accumulate, a new phrase is formed, it is also entered into the dictionary, as well as a shorter one. The result is a recursive procedure that provides fast encoding and decoding.

Additional feature compression provides a compressed encoding of repeated characters. If in the sequence some characters follow in a row (for example, in the text these can be "space" characters, in a numerical sequence - consecutive zeros, etc.), then it makes sense to replace them with a pair of "character; length" or "sign, length ". In the first case, the code indicates the sign that the sequence will be encoded (usually 1 bit), then the code of the repeated character and the length of the sequence. In the second case (provided for the most frequently occurring repeating characters), the prefix simply indicates the sign of repetitions.

People fascinated by home sound exhibit an interesting paradox. They are ready to shovel the listening room, build speakers with exotic radiators, but embarrassedly step back in front of the musical can, like a wolf in front of a red flag. But in fact, why can’t you stand up for the flag, and try to cook something more edible from canned food?

From time to time there are plaintive questions on the forum: "Recommend well-recorded albums." It is understandable. Special audiophile editions, although they will please the ear for the first minute, but no one listens to them to the end, the repertoire is painfully dull. As for the rest of the music library, the problem seems to be obvious. You can save, or you can not save and swell a lot of money into components. Still, few people like to listen to their favorite music at high volume and the capabilities of the amplifier have nothing to do with it.

Today, even in Hi-Res albums, the peaks of the phonogram are cut off and the volume is driven into clipping. It is believed that the majority listens to music on any kind of junk, and therefore it is necessary to “turn on the gas”, make a kind of thin compensation.


Of course, this is not done on purpose to upset audiophiles. Few people remember them at all. They only guessed to give them the master files from which the main circulation is copied - CDs, MP3s, and so on. Of course, the master has long been flattened by the compressor, no one will deliberately prepare special versions for HD Tracks. Unless a certain procedure is followed for the vinyl carrier, which for this reason sounds more humane. And for the digital path, everything ends the same way - with a big fat compressor.

So, at present, all 100% of the released phonograms, with the exception of classical music, are subjected to compression during mastering. Someone performs this procedure more or less skillfully, while someone is completely stupid. As a result, we have pilgrims on the forums with the DR plugin line in their bosoms, painful comparisons of publications, flight to vinyl, where you also need to mine first presses.

The most frostbitten at the sight of all these outrages have literally turned into audio Satanists. No kidding, they're reading the sound engineer's holy scripture backwards! Modern sound editing programs have some tool to restore the clipped sound wave.

Initially, this functionality was intended for studios. When mixing, there are situations when clipping got on the record, and for a number of reasons it is no longer possible to redo the session, and here the arsenal of an audio editor comes to the rescue - declipper, decompressor, etc.

And now ordinary listeners, who are bleeding from their ears after another novelty, are more and more boldly pulling their hands to such software. Someone prefers iZotope, someone prefers Adobe Audition, someone shares operations between several programs. The point of restoring the previous dynamics is to programmatically correct the clipped signal peaks, which, resting at 0 dB, resemble a gear.

Yes, there is no question of a 100% revival of the source code, since there are interpolation processes using rather speculative algorithms. But still, some results of processing seemed interesting and worthy of study to me.

For example, Lana Del Rey's album "Lust For Life", steadily filthy swearing, ugh, mastering! The original song "When the World Was at War We Kept Dancing" was like this.


And after a series of declippers and decompressors, it became like this. The DR coefficient has changed from 5 to 9. You can download and listen to the sample before and after processing.


I can’t say that the method is universal and suitable for all ruined albums, but in this case I preferred to keep in the collection this particular version, processed by the rutracker activist, instead of the official 24-bit edition.

Even if artificially extracting the peaks from the stuffing does not bring back the true dynamics of the musical performance, your DAC will still thank you. After all, it was so hard for him to work without errors at the limiting levels, where the likelihood of the so-called inter-sample peaks (ISP) is high. And now only rare flashes of the signal will jump to 0 dB. In addition, a muted soundtrack when compressed to FLAC or another lossless codec will now be smaller in size. More "air" in the signal saves hard drive space.

Try to revive your most hated albums killed in the "volume war". For headroom, you first need to lower the track level by -6 dB, and then start the declipper. Those who do not believe in computers can simply stick a studio expander between the CD player and the amplifier. This device essentially does the same - restores and stretches the peaks of a compressed audio signal as much as possible. Such devices from the 80-90s are not very expensive, and as an experiment it will be very interesting to try them.


The DBX 3BX dynamic range controller processes the signal separately in three bands - bass, midrange and treble

Once upon a time, equalizers were a matter of course in the audio system, and no one was afraid of them. Today it is not required to equalize the blockage of the high frequencies of the magnetic tape, but with the ugly dynamics something needs to be solved, brothers.



tell friends