ASCII encoding. ASCII encoding table
A computer is the process of converting it into a form that allows for more convenient transfer, storage or automatic processing of this data. Different tables are used for this purpose. ASCII encoding is the first system developed in the United States to work with English-language text, which was subsequently distributed throughout the world. Its description, features, properties and further use of the article presented below.
Display and storage of information in the computer
Symbols on a computer monitor or a mobile digital gadget are formed on the basis of sets of vector shapes of various signs and code, allowing to find among them the character that needs to be inserted in the right place. It is a sequence of bits. Thus, each character must uniquely correspond to a set of zeros and ones that stand in a certain, unique order.
How it all began
Historically, the first computers were English-speaking.To encode the symbolic information in them, it was enough to use only 7 bits of memory, whereas for this purpose 1 byte consisting of 8 bits was allocated. The number of characters understood by the computer in this case was 128. These characters included the English alphabet with its punctuation marks, numbers, and some special characters. The English seven-bit encoding with the corresponding table (code page), developed in 1963, was named the American Standard Code for Information Interchange. Usually for its designation the abbreviation “ASCII Coding” has been used and is still being used.
Transition to multilingual
Over time, computers became widely used in non-English-speaking countries. In this regard, there is a need for encodings that allow the use of national languages. It was decided not to reinvent the wheel, and take ASCII as a basis. The coding table in the new edition has expanded significantly. Using the 8th bit allowed to translate 256 characters into a computer language.
ASCII has a table that is divided into 2 parts. The generally accepted international standard is considered to be only its first half. It includes:
- Characters with sequence numbers from 0 to 31, encoded by sequences from 00000000 to 00011111. They are reserved for control characters that control the process of displaying text on a screen or printer, beep, etc.
- Characters with NN in the table from 32 to 127, encoded by sequences from 00100000 to 01111111 constitute the standard part of the table. These include a space (N 32), letters of the Latin alphabet (upper and lower case), ten-digit numbers from 0 to 9, punctuation marks, brackets of different type, and other characters.
- Characters with serial numbers from 128 to 255, encoded by sequences from 10,000,000 to 1,111,111. These include letters of national alphabets other than Latin. This alternative part of the table is the ASCII character set used to convert Russian characters into a computer form.
The special features of the ASCII encoding are the difference between the letters “A” - “Z” of the lower and upper registers with only one bit. This circumstance greatly simplifies the conversion of the register, as well as its verification of belonging to a given range of values. In addition, all letters in the ASCII encoding system are represented by their own sequence numbers in the alphabet, which are written in 5 digits in the binary number system, before which for lowercase letters is 0112, and the top - 0102.
Among the features of the encoding ASCII can be counted and the presentation of 10 digits - "0" - "9". In the second number system, they begin with 00112, and end with 2 numbers. So 01012is equivalent to a decimal number of five, so the character "5" is written as 0011 01012. Based on the above, you can easily convert binary-decimal numbers to an ASCII string by adding the bit sequence 00112 to each nibble to the left.
As you know, thousands of characters are required to display texts in the languages of the group of Southeast Asia. Such a number of them is not described in any way in one byte of information, therefore even extended versions of ASCII could no longer meet the increased needs of users from different countries.
Thus, it became necessary to create a universal text encoding, the development of which, in collaboration with many leaders of the global IT industry, was taken up by the Unicode consortium. Its specialists created the UTF 32 system. In it, 32 bits were allocated to encode 1 character, which constitute 4 bytes of information. The main disadvantage was a sharp increase in the amount of required memory in as much as 4 times, which entailed many problems.
At the same time, for most countries with official languages belonging to the Indo-European group, the number of characters is 232is more than redundant.
As a result of the further work of specialists from the Unicode consortium, the UTF-16 encoding appeared. It became the option of converting symbolic information, which suited everyone both in terms of the amount of required memory and in terms of the number of encoded characters. That is why UTF-16 was adopted by default and requires 2 bytes to be reserved for one character.
Even this fairly advanced and successful version of Unicode had some drawbacks, and after switching from an extended version of ASCII to UTF-16, it doubled the weight of the document.
In this regard, it was decided to use the variable length encoding UTF-8. In this case, each character in the source text is encoded in a sequence from 1 to 6 bytes in length.
Contact American standard code for information interchange
All Latin characters in UTF-8 of variable length are encoded in 1 byte, as in the ASCII encoding system.
A special feature of UTF-8 is that in the case of text in Latin without using other characters, even programs that do not understand Unicode will still allow it to be read.In other words, the basic part of the ASCII text encoding is simply transferred to the new UTF variable length. Cyrillic characters in UTF-8 occupy 2 bytes, and, for example, Georgian - 3 bytes. By creating UTF-16 and 8, the main problem of creating a single code space in fonts was solved. Since then, producers of fonts need only fill the table with vector forms of text symbols based on their needs.
In different operating systems, preference is given to different encodings. To be able to read and edit texts typed in a different encoding, the Russian text conversion programs are used. Some text editors contain embedded transcoders and allow you to read text regardless of encoding.
Now you know how many characters are in ASCII and how and why it was developed. Of course, today the Unicode standard has become the most common in the world. However, we must not forget that it was created on the basis of ASCII, so you should appreciate the contribution of its developers in the field of IT.