The content of the article
Have you ever wondered how sound is reproduced in digital devices? How is a sound signal formed from a combination of ones and zeros? Surely I was thinking, since I already started reading! But often even professionals have only a general idea of the modern sound path. From this article, you will learn how different formats appeared, what a digital-to-analog converter is, what types of DAC are there and what determines the quality of sound reproduction.
As you know, in digital audio, almost any format, with rare exceptions, is recorded by a pulse-code stream, or PCM stream – pulse code modulation…
FLAC, MP3, WAV, Audio CD, DVD-Audio and other formats are just ways of packaging, "conservation" of the PCM stream.
How it all began
The theoretical foundations of digital sound transmission were developed at the dawn of the twentieth century, when scientists tried to transmit an audio signal over a long distance, but not by telephone, but in a rather strange way for that time.
By dividing the sound wave into small parts, it could be sent to the recipient in some kind of mathematical representation. The recipient, in turn, could restore the original wave and listen to the recording. Also, scientists were faced with the task of increasing the bandwidth of the "ether".
In 1933 saw the light V.A.Kotelnikov's theorem… In Western sources it is called the Nyquist – Shannon theorem. Yes, Harry Nyquist was the first to raise this topic: in 1927 he calculated the minimum sampling rate for transmitting a waveform, later named after him the "Nyquist frequency" – but Kotelnikov's theorem was published 16 years earlier.
The essence of the theorem is simple: a continuous signal can be represented in the form of an interpolation series consisting of discrete reports, from which the signal can be reconstructed. To be able to restore approximately the original state of the signal, the sampling frequency must be at least twice the upper cutoff frequency of this signal.
For many years, the theorem was not in demand – until the arrival of the digital age. It was then that she found a use. In particular, the theorem came in handy in the development of the CDDA (Compact Disc Digital Audio) format, in the common people it is called Audio CD or Red Book. The format was released by engineers at Philips and Sony in 1980 and became the standard for audio CDs.
- sampling rate – 44.1 kHz;
- quantization capacity – 16 bits.
- Sampling rate – the number of samples of the signal, "taken" during its sampling. Measured in hertz.
- Quantization bit – the number of binary bits expressing the signal amplitude. Measured in bits.
The sampling rate of 44.1 kHz was calculated from the Kotelnikov theorem. It is believed that the average person's hearing is not able to pick up sound beyond 19-22 kHz. Probably, the frequency was 22 kHz and was chosen as the upper boundary.
22,000 × 2 = 44,000 + 100 = 44,100 Hertz
Where did 100 Hertz come from? There is a version that this is a small margin in case of errors or oversampling. In fact, such a frequency Sony chose for compatibility with the PAL broadcast standard.
The bit depth of the CDDA format is 16 bits, or 65,536 samples, which equates to a dynamic range of approximately 96 dB. Such a large number of samples was not chosen by chance. Firstly, due to the strong influence of quantization noise, and secondly, in order to provide a formal dynamic range higher than that of the then main competitors – cassette records and vinyl records. I'll cover this in more detail in the section on digital-to-analog converters.
The further development of PCM continued on the principle of multiplying by two.
Other sampling rates appeared: first, the sampling rate of 48 kHz was added, and later the frequencies based on it were 96, 192 and 384 kHz. The 44.1 kHz frequency also doubled to 88.2, 176.4 and 352.8 kHz. The bit width increased from 16 to 24, and later to 32 bits.
The next after CDDA in 1987 appeared the DAT format – Digital Audio Tape. The sampling rate in it was 48 kHz, the quantization bit did not change. And although the format failed, the 48 kHz sampling rate caught on in recording studios, as they say, due to the convenience of digital processing.
In 1999, the DVD-Audio format was released, which made it possible to record on one disc six stereo tracks with a sampling frequency of 96 kHz and a bit depth of 24 bits, or two stereo tracks with a frequency of 192 kHz, 24 bits.
In the same year, the SACD – Super Audio CD format was introduced, but discs for it began to be produced only three years later. I'll tell you more about this format in the section on DSD.
These are the main formats that are considered the standard for digital audio recordings on media. Now let's look at how data is transmitted in a digital audio path.
The structure of the digital audio path
When playing music, something like the following happens: the player, using a codec made in the form of a device or program, unpacks the file in a specified format (FLAC, MP3 and others) or reads data from a CD, DVD-Audio or SACD disc, receiving a standard PCM data stream … This stream is then sent via USB, LAN, S / PDIF, PCI, and so on to the I2S converter. In turn, the converter converts the received data into so-called I2S data interface frames (not to be confused with I2C!).
I2S is a digital audio streaming serial bus. Now I2S is a standard for connecting a signal source (computer, turntable) to a digital-to-analog converter. It is through it that the vast majority of the DAC is connected directly or indirectly. There are other digital audio streaming standards, but they are much less common.
The I2S bus can consist of three, four or even five pins:
- continuous serial clock (SCK) – bit sync clock (may be called BCK or BCLK);
- word select (WS) – frame sync clock (may be called LRCK or FSYNC);
- serial data (SD) – the signal of the transmitted data (can be called DATA, SDOUT or SDATA). As a rule, data is transmitted from a transmitter to a receiver, but there are devices that can act as both a receiver and a transmitter at the same time. In this case, another contact may be present;
- serial data in (SDIN) – on this contact, data moves in the direction of reception, not transmission.
SD or SDOUT is used to connect a D / A converter, and SDIN is used to connect an A / D converter to the I2S bus.
Sometimes there is one more contact Master Clock (MCLK or MCK), it is used to synchronize the receiver and transmitter from the same clock to reduce the transmission error rate. For external synchronization of MCLK, two clock generators are used: with a frequency of 22 579 kHz and 24 576 kHz. The first, 22,579 kHz, is for frequencies that are multiples of 44.1 kHz (88.2, 176.4, 352.8 kHz), and the second, 24,576 kHz, is for frequencies that are multiples of 48 kHz (96, 192, 384 kHz). There may also be generators at 45 158.4 kHz and 49 152 kHz – you have probably already noticed how in the world of digital sound they like to multiply everything by two.
In I2S, three contacts are necessarily used: SCK, WS, SD – the rest of the contacts are optional.
On the SCK channel, sync pulses are transmitted, under which the frames are synchronized.
The length of the "word" is transmitted over the WS channel, and logical states are also used. If the WS pin is a logical unit, then the data of the right channel is transmitted, if zero – the data of the left channel.
Data bits are transmitted via SD – the values of the amplitude of the audio signal during quantization, the same 16, 24 or 32 bits. No checksums and service channels are provided on the I2S bus. If data is lost in transit, there is no way to recover it.
Expensive DACs often have external connectors for connecting to I2S. The use of such connectors and cables can have a bad effect on the sound, up to the appearance of "artifacts" and stuttering, everything will depend on the quality and length of the wire. Still, I2S is an in-circuit connector, and the length of the wires from the transmitter to the receiver should tend to zero.
Let's take a look at how the PCM data stream is transmitted over the I2S bus. For example, when transmitting 44.1 kHz PCM at 16 bits, the word length on the SD channel will be these sixteen bits, and the frame length will be 32 bits (right + left). But most often, transmitters use 24-bit word length.
When playing PCM 44.1×16, the most significant bits are either simply ignored, since they are filled with zeros, or, in the case of old multi-bit DACs, they can go to the next frame. The length of the "word" (WS) can also depend on the player through which the music is played, as well as the driver of the playback device.
An alternative to PCM and I2S would be to record the audio signal to DSD. This format developed in parallel with PCM, although here Kotelnikov's theorem had some influence. To improve the sound quality compared to CDDA, the emphasis was not on increasing the quantization bit, as in the DVD Audio format, but on increasing the sampling frequency.
DSD stands for Direct Stream Digital. It has its origins in the laboratories of Sony and Philips – however, like other formats discussed in this article.
DSD first saw the light of day on Super Audio CDs back in 2002.
At that time, SACD seemed like a masterpiece of engineering, it used a completely new way of recording and playback, very close to analog devices. The implementation was both simple and elegant at the same time.
The media was even equipped with copy protection, although without this, no pirates were afraid. Under the Sony and Philips brands, they began to produce "closed" devices exclusively for playback, without any possibility of copying discs. Producers sold recording equipment to studios, but retained control over the production of SACDs.
Who knows, perhaps the CACD format could have gained popularity comparable to Audio CD, if not for the cost of playback devices. By unreasonably winding up the prices of the players, the leaders of Sony and Philips themselves hampered the popularity of their format. And the next mistake put an end to the sale of specialized devices. To promote the Sony PlayStation, Sony engineers have added the ability to listen to SACD on it. Hackers immediately hacked the set-top box and began to copy SACD discs into ISO images, which can be burned to a regular DVD disc and played on any competing player; others simply ripped out tracks for playback on a computer.
The record companies are good too: contrary to what music lovers expected, they didn't take full advantage of the new high-definition format. The studios did not record music from the master tape in DSD, but took a digital recording in PCM, remixed and processed everything in a row: limiters, compressors, dithering with noise shaping and various digital filters. The end result was such a sterile and dry sound that even CD Audio could have sounded much better. Thus, listeners' confidence in SACD was undermined, and at the same time in new formats in general.
Alas, with vinyl records this vicious practice continues to this day: studios print vinyl from a digital recording, even if they have the recording on the master tape. So on modern vinyl it can easily be 44.1×16.
What is DSD? This is a one-bit stream with a very high sampling rate compared to PCM. Also, DSD uses a different type of modulation, PDM (Pulse Density Modulation) – pulse density modulation. Sound recording in this format is performed by a one-bit analog-to-digital converter, now such ADCs based on sigma-delta modulation are used everywhere. The recording process looks like this: while the amplitude of the wave increases, the output of the ADC is a logical unit, when the amplitude drops, the output is a logical zero, there can be no average value. It is compared with the previous value of the wave amplitude.
DSD provides important advantages over PCM:
- more precisely, drawing a wave;
- higher noise immunity;
- an easier way to switch and transmit a digital stream;
- it is theoretically possible to reduce the cost by simplifying the DAC circuit, but due to backward compatibility with older formats, manufacturers are unlikely to go for it.
Originally SACDs used the DSD x64 format with a sampling rate of 2822.4 kHz. The sampling rate of Audio CD 44.1 kHz was taken as a basis, increased by 64 times, hence the name x64. The following DSDs are actually used today:
- x64 = 2822.4 kHz;
- x128 = 5644.8 kHz;
- x256 = 11 289.6 kHz;
- x512 = 22,579.2 kHz;
- declared DSD x1024.
There is a certain intermediate format between PCM and DSD called DXD – Digital eXtreme Definition. It is, in fact, high-definition PCM – 352.8 kHz or 384 kHz with 24 or 32-bit quantization. It is used in studios for processing and subsequent mixing of materials.
But this approach is flawed: firstly, it does not allow using all the advantages of DSD, and secondly, the file size is larger than in DSD. Currently, the flagship DACs at the I2S input accept a PCM data stream with a sampling rate of up to 768 kHz and a bit depth of up to 32 bits. It’s scary to even consider how much space on the hard disk one album will take in this resolution.
DSD has practically separated from SACD. Now the DSD format can often be found packed in files with the DSF and DFF extensions. A lot of turntables with the ability to record in DSF and DFF have been released, lovers of good sound are increasingly digitizing vinyl records in the DSD format. But in recording studios, no one wants to invest in unpopular formats, so they continue to rivet sound at minimum salaries: 44.1 × 16.
DSD switching and data transmission
To transfer a digital stream to DSD, a three-pin connection scheme is used:
- DSD Clock Pin (DCLK) – synchronization;
- DSD Lch Data Input Pin (DSDL) – left channel data;
- DSD Rch Data Input Pin (DSDR) – Right channel data.
Unlike I2S, DSD data transmission is extremely simplified. DCLK sets the clock frequency of the bit synchronization, and the left and right channel data are sequentially transmitted over the DSDL and DSDR pins, respectively. There are no tweaks here, recording and playback in DSD is done bit by bit. This approach gives the maximum approximation to the analog signal, and due to the high frequency, quantization noise is reduced and the reproduction accuracy is increased by an order of magnitude.
Continuation is available only to participants
Materials from the latest issues become available separately only two months after publication. To continue reading, you must become a member of the "Xakep.ru" community.
Join the Xakep.ru community!
Membership in the community during the specified period will open you access to ALL Hacker materials, increase your personal cumulative discount and allow you to accumulate a professional Xakep Score!
I am already a member of "Xakep.ru"