- Регистрация
- 9 Май 2015
- Сообщения
- 1,480
- Баллы
- 155
The use of compressed files is mainstream and many format-algorithm combinations exist to reduce the amount of data stored on computer or exchanged over the Internet.
One of the most used, and historical, compression format is ZIP. The standard has originally been developed by Phil Katz (PK) and te latest specifications, 6.3.10, by PKware and can be found at [1].
Initially, ZIP files could not be encrypted but encryption features have been added over time. It is possible to encrypt compressed files in several ways and this post describes how TMS Cryptography Pack new TZCClass provides encryption and decryption services for compressed files, using the ZIP format.
ZIP format description
ZIP archives contain various tags starting with PK (Phil Katz) followed by two bytes to indicate what type of section (compressed file, central directory header, end of central directory header, etc.) is introduced by the tag.
The tag to look for in the first place is the end_of_central_directory_header as it represents the entry point to the central_directory_header. The latter is a table of content for compressed files stored in the archive. It serves the same purpose as the PDF "xref" (cross-reference) section with offsets to compressed files, starting from the beginning of the archive.
The relevant tags are:
$504b0506 = end of central directory header
$504b0102 = central directory header
$504b0304 = file header
We won't use other tags to add encryption services.
ZIP encryption header format description
There are several types of encryption supported by the standard but we will focus on "modern" encryption using the AES.
Whenever encryption services are added using the AES, a specific extension shall be added to each file header and to each entry of the central directory.
The encryption header format has the following fixed structure [3][4]:
Offset Size(bytes) Content
0 2 Extra field header ID (0x9901)
2 2 Data size (currently 7, but subject to possible increase in the future)
4 2 The ""
6 2 2-character vendor ID
8 1 Integer mode value indicating AES encryption strength
9 2 The actual compression method used to compress the file
The "Integer version number specific to the zip vendor" is actually set to 1 or 2. When set to 1, a CRC is added to the file and central directory headers. When set to 2, this CRC is set to 0.
This extra field is followed by a random sequence of 8, 12 or 16 bytes, use as a salt, for 128, 192 or 256-bit AES keys. Our implementation only uses 256-bit keys for encryption and then only uses 16-byte salts.
This salt is followed by a verification code of 2 bytes, computed with PBKDF2-SHA1 using the user password for the archive and the previous salt. The result of this computation is a byte sequence of size bytes with size = 2 x AES key length + verification code length. In our implementation, size = 2 * 32 + 2 = 66 bytes.
Finally, an authentication code is added to the encrypted compressed data. The computation is the following:
authentication code = |HMACSHA1(salt2, encrypted data)|10, where |x|n is represents the first n bytes of x.
In practise, this means that the authentication code contains the first 10 bytes of the SHA1 hash MAC computed over the encrypted (compressed) data.
Beyond those addition to headers, when used, encryption is marked as another type of compression in the ZIP archive with tag 99 ($63). Some bits in the headers are also set to 1, to indicate that the content is encrypted (overall, there is a lot of redundancy).
We can see the relevant bytes and byte sequence on the hexadecimal snapshot (from the excellent FrHed [8]). In black rectangles, ZIP tags; in red rectangles, encryption headers and relevant byte values; in blue, the random seed (or salt - 16 bytes), the verification code (2 bytes) and the authentication code (10 bytes).
With this detailed information and more tips ([4], [5], [6]), we can now encrypt ZIPped files.
ZIP encryption design choices
We first observe that RAD Studio comes with System.Zip and System.ZLib libraries. Quick tests show that it is rather simple to create/read ZIP archives with the TZipFile class. Similarly it is simple to create/read compressed files with ZLIB.
We also note that TZipFile contains a specific interface for crypto:
IZipCryptor = interface(IInterface)
['{CB1D5970-6CCC-402D-BAAD-02ED425ED07B}']
procedure Init(const APassword: string; AEncrypt: Boolean);
procedure Decrypt(var Buffer: TBytes);
procedure Encrypt(var Buffer: TBytes);
end;
And finally, we also note in the ZIP library header that "Support for Compression modes 0(store) and 8(deflate) are implemented in this unit".
Compression mode "0" means no compression at all. This mode is used when files are very small, typically less than a few dozen bytes.
With that in mind, we have several options to develop a class to encrypt and decrypt ZIP archives:
- code every ZIP and encryption function from scratch
- add an IZipCryptor to the TZipFile class
- develop an encryption class over the zlib [2]
- "hack" an existing compression service (e.g., generate a ZIP archive with a tool and rewrite the result)
We started with the coding of an IZipCryptor service to use the TZipFile class but this option was plagued with issues as 1) the proposed IZipCryptor interface is not fit for AE-x encryption and 2) there were too many modifications in the TZipFile class to be done.
So, our backup plan was to develop an encryption class over the zlib, which is described in the next section.
Implementation of encryption services for ZIP archives
The principle we retained for the new class is to use data structures from System.Zip, such as TZipEndOfCentralHeader and TZipHeader. Note that TZipHeader also contains elements belonging to a central directory header.
To begin with, we create an encryption sequence "constant":
TEncryptedFileHeader: array [0..10] of byte = (
$01, $99, 7, 0, 2, 0, $41, $45, 3, 8, 0);
where $0199 is the little endian tag for the extra encryption field, $0700 the fixed size of the remainder of this sequence, $0200 for AE-2 (our default value), $4145 stand for 'AE', 3 is the 256-bit key representation and $0800 is the deflate compression (8) as supported by System.Zlib.
If and when AE-1 encryption is required, we just assign 1 to byte 4 of the sequence (once copied to the relevant header).
We create a new type:
TEncryptionType = (etnone, etae1, etae2); // no encryption, AE-1 encryption, AE-2 encryption
to be able to deal with all scenarios.
Encryption keys are derived from passwords using PBKDF2-SHA1, a cryptographic algorithm that was not provided by TMS Cryptography Pack because of the use of the deprecated SHA1 for hashing. We added this PBKDF2 variant to comply with ZIP encryption requirements. We defined the number of itérations for PBKDF2 in a constant:
NumberOfIterations = 1000;
To support all crypto services we need:
FPassword: string; // the original user password provided by the user
FKey: TBytes; // the key derived with PBKDF2 and stored in bytes 1 to 32 of the PBKDF2 issued sequence
FSalt: TBytes; // the random salt used by PBKDF2, together with the password
FAuthSalt: TBytes; // the second salt, computed with PBKDF2 and stored in bytes 33 to 64 of the PBKDF2 issued sequence
FVerifCode: TBytes; // the verification code, stored in bytes 65 and 66 of the PBKDF2 issued sequence
Because, we need to encrypt and decrypt ZIP archives, the Init procedure of the TZCClass has 3 parameters:
procedure Init(const APassword: string; var Data: TBytes; AEncrypt: Boolean);
APassword: always required
Data: collects bytes generated when decrypting (needs to be allocated then), not used to encrypt
AEncrypt: set to true when encrypting, to false otherwise
The other crypto-specific methods are:
procedure GetSeedAndCode(var Seed: TBytes; var Code: TBytes); // used to get parameters when encrypting, Seed is salt2 to generate the Authentication Code
function ComputeAuthenticationCode(CryptoGram: TBytes): TBytes; // generates the Authentication Code for both encryption and decryption (used to verify integrity in decryption mode)
Then, standard encryption/decryption methods are provided by the class.
Typical use is the following:
// encryption example
procedure TMainForm.EncryptBtnClick(Sender: TObject);
var
ZC: TZCClass;
begin
if OpenDialog.Execute = false then // get a file name from the opendialog box
Exit;
ZC := TZCClass.Create('.\encryptedTest.zip', PwdEdit.Text); // PwdEdit.Text contains the user password
ZC.EncryptionType := etae2;
ZC.FileList.Add(OpenDialog.FileName); // in this example, one file is added to the class file list, before being compressed and encrypted
try
// Create a new ZIP file
ZC.EncryptZipArchive;
finally
ZC.Free;
end;
end;
// decryption example
procedure TMainForm.DecryptBtnClick(Sender: TObject);
var
ZC: TZCClass;
begin
if OpenDialog.Execute = false then
Exit;
try
ZC := TZCClass.Create(OpenDialog.FileName, PwdEdit.Text);
ZC.DecryptZipArchive; // done! files decrypted, extracted, created
// let's play a bit
case ZC.EncryptionType of
etnone: MainMemo.Lines.Add('ZIP file is in the clear');
etae1: MainMemo.Lines.Add('ZIP file is encrypted in AE-1 mode');
etae2: MainMemo.Lines.Add('ZIP file is encrypted in AE-2 mode');
end;
finally
ZC.Free;
end;
end;
Conclusion
There are many compressed file formats in use, such as ZIP, GZIP, RAR, etc. We presented a new class that can encrypt and decrypt ZIP archives using the AE-1 and AE-2 formats.
The class has been successfully tested with 7zip for interoperability [7]. It is available for registered users with the latest version 5.x of the TMS Cryptography Pack.
References
[1]
[2] RFC 1950 : ZLIB Compressed Data Format Specification version 3.3 ()
[3]
[4] (in French)
[5]
[6]
[7]
[8]
One of the most used, and historical, compression format is ZIP. The standard has originally been developed by Phil Katz (PK) and te latest specifications, 6.3.10, by PKware and can be found at [1].
Initially, ZIP files could not be encrypted but encryption features have been added over time. It is possible to encrypt compressed files in several ways and this post describes how TMS Cryptography Pack new TZCClass provides encryption and decryption services for compressed files, using the ZIP format.
ZIP format description
ZIP archives contain various tags starting with PK (Phil Katz) followed by two bytes to indicate what type of section (compressed file, central directory header, end of central directory header, etc.) is introduced by the tag.
The tag to look for in the first place is the end_of_central_directory_header as it represents the entry point to the central_directory_header. The latter is a table of content for compressed files stored in the archive. It serves the same purpose as the PDF "xref" (cross-reference) section with offsets to compressed files, starting from the beginning of the archive.
The relevant tags are:
$504b0506 = end of central directory header
$504b0102 = central directory header
$504b0304 = file header
We won't use other tags to add encryption services.

ZIP encryption header format description
There are several types of encryption supported by the standard but we will focus on "modern" encryption using the AES.
Whenever encryption services are added using the AES, a specific extension shall be added to each file header and to each entry of the central directory.
The encryption header format has the following fixed structure [3][4]:
Offset Size(bytes) Content
0 2 Extra field header ID (0x9901)
2 2 Data size (currently 7, but subject to possible increase in the future)
4 2 The ""
6 2 2-character vendor ID
8 1 Integer mode value indicating AES encryption strength
9 2 The actual compression method used to compress the file
The "Integer version number specific to the zip vendor" is actually set to 1 or 2. When set to 1, a CRC is added to the file and central directory headers. When set to 2, this CRC is set to 0.
This extra field is followed by a random sequence of 8, 12 or 16 bytes, use as a salt, for 128, 192 or 256-bit AES keys. Our implementation only uses 256-bit keys for encryption and then only uses 16-byte salts.
This salt is followed by a verification code of 2 bytes, computed with PBKDF2-SHA1 using the user password for the archive and the previous salt. The result of this computation is a byte sequence of size bytes with size = 2 x AES key length + verification code length. In our implementation, size = 2 * 32 + 2 = 66 bytes.
Finally, an authentication code is added to the encrypted compressed data. The computation is the following:
authentication code = |HMACSHA1(salt2, encrypted data)|10, where |x|n is represents the first n bytes of x.
In practise, this means that the authentication code contains the first 10 bytes of the SHA1 hash MAC computed over the encrypted (compressed) data.
Beyond those addition to headers, when used, encryption is marked as another type of compression in the ZIP archive with tag 99 ($63). Some bits in the headers are also set to 1, to indicate that the content is encrypted (overall, there is a lot of redundancy).
We can see the relevant bytes and byte sequence on the hexadecimal snapshot (from the excellent FrHed [8]). In black rectangles, ZIP tags; in red rectangles, encryption headers and relevant byte values; in blue, the random seed (or salt - 16 bytes), the verification code (2 bytes) and the authentication code (10 bytes).

With this detailed information and more tips ([4], [5], [6]), we can now encrypt ZIPped files.
ZIP encryption design choices
We first observe that RAD Studio comes with System.Zip and System.ZLib libraries. Quick tests show that it is rather simple to create/read ZIP archives with the TZipFile class. Similarly it is simple to create/read compressed files with ZLIB.
We also note that TZipFile contains a specific interface for crypto:
IZipCryptor = interface(IInterface)
['{CB1D5970-6CCC-402D-BAAD-02ED425ED07B}']
procedure Init(const APassword: string; AEncrypt: Boolean);
procedure Decrypt(var Buffer: TBytes);
procedure Encrypt(var Buffer: TBytes);
end;
And finally, we also note in the ZIP library header that "Support for Compression modes 0(store) and 8(deflate) are implemented in this unit".
Compression mode "0" means no compression at all. This mode is used when files are very small, typically less than a few dozen bytes.
With that in mind, we have several options to develop a class to encrypt and decrypt ZIP archives:
- code every ZIP and encryption function from scratch
- add an IZipCryptor to the TZipFile class
- develop an encryption class over the zlib [2]
- "hack" an existing compression service (e.g., generate a ZIP archive with a tool and rewrite the result)
We started with the coding of an IZipCryptor service to use the TZipFile class but this option was plagued with issues as 1) the proposed IZipCryptor interface is not fit for AE-x encryption and 2) there were too many modifications in the TZipFile class to be done.
So, our backup plan was to develop an encryption class over the zlib, which is described in the next section.
Implementation of encryption services for ZIP archives
The principle we retained for the new class is to use data structures from System.Zip, such as TZipEndOfCentralHeader and TZipHeader. Note that TZipHeader also contains elements belonging to a central directory header.
To begin with, we create an encryption sequence "constant":
TEncryptedFileHeader: array [0..10] of byte = (
$01, $99, 7, 0, 2, 0, $41, $45, 3, 8, 0);
where $0199 is the little endian tag for the extra encryption field, $0700 the fixed size of the remainder of this sequence, $0200 for AE-2 (our default value), $4145 stand for 'AE', 3 is the 256-bit key representation and $0800 is the deflate compression (8) as supported by System.Zlib.
If and when AE-1 encryption is required, we just assign 1 to byte 4 of the sequence (once copied to the relevant header).
We create a new type:
TEncryptionType = (etnone, etae1, etae2); // no encryption, AE-1 encryption, AE-2 encryption
to be able to deal with all scenarios.
Encryption keys are derived from passwords using PBKDF2-SHA1, a cryptographic algorithm that was not provided by TMS Cryptography Pack because of the use of the deprecated SHA1 for hashing. We added this PBKDF2 variant to comply with ZIP encryption requirements. We defined the number of itérations for PBKDF2 in a constant:
NumberOfIterations = 1000;
To support all crypto services we need:
FPassword: string; // the original user password provided by the user
FKey: TBytes; // the key derived with PBKDF2 and stored in bytes 1 to 32 of the PBKDF2 issued sequence
FSalt: TBytes; // the random salt used by PBKDF2, together with the password
FAuthSalt: TBytes; // the second salt, computed with PBKDF2 and stored in bytes 33 to 64 of the PBKDF2 issued sequence
FVerifCode: TBytes; // the verification code, stored in bytes 65 and 66 of the PBKDF2 issued sequence
Because, we need to encrypt and decrypt ZIP archives, the Init procedure of the TZCClass has 3 parameters:
procedure Init(const APassword: string; var Data: TBytes; AEncrypt: Boolean);
APassword: always required
Data: collects bytes generated when decrypting (needs to be allocated then), not used to encrypt
AEncrypt: set to true when encrypting, to false otherwise
The other crypto-specific methods are:
procedure GetSeedAndCode(var Seed: TBytes; var Code: TBytes); // used to get parameters when encrypting, Seed is salt2 to generate the Authentication Code
function ComputeAuthenticationCode(CryptoGram: TBytes): TBytes; // generates the Authentication Code for both encryption and decryption (used to verify integrity in decryption mode)
Then, standard encryption/decryption methods are provided by the class.
Typical use is the following:
// encryption example
procedure TMainForm.EncryptBtnClick(Sender: TObject);
var
ZC: TZCClass;
begin
if OpenDialog.Execute = false then // get a file name from the opendialog box
Exit;
ZC := TZCClass.Create('.\encryptedTest.zip', PwdEdit.Text); // PwdEdit.Text contains the user password
ZC.EncryptionType := etae2;
ZC.FileList.Add(OpenDialog.FileName); // in this example, one file is added to the class file list, before being compressed and encrypted
try
// Create a new ZIP file
ZC.EncryptZipArchive;
finally
ZC.Free;
end;
end;
// decryption example
procedure TMainForm.DecryptBtnClick(Sender: TObject);
var
ZC: TZCClass;
begin
if OpenDialog.Execute = false then
Exit;
try
ZC := TZCClass.Create(OpenDialog.FileName, PwdEdit.Text);
ZC.DecryptZipArchive; // done! files decrypted, extracted, created
// let's play a bit
case ZC.EncryptionType of
etnone: MainMemo.Lines.Add('ZIP file is in the clear');
etae1: MainMemo.Lines.Add('ZIP file is encrypted in AE-1 mode');
etae2: MainMemo.Lines.Add('ZIP file is encrypted in AE-2 mode');
end;
finally
ZC.Free;
end;
end;
Conclusion
There are many compressed file formats in use, such as ZIP, GZIP, RAR, etc. We presented a new class that can encrypt and decrypt ZIP archives using the AE-1 and AE-2 formats.
The class has been successfully tested with 7zip for interoperability [7]. It is available for registered users with the latest version 5.x of the TMS Cryptography Pack.
References
[1]
[2] RFC 1950 : ZLIB Compressed Data Format Specification version 3.3 ()
[3]
[4] (in French)
[5]
[6]
[7]
[8]
Источник: