Skip to content

Linux "leftovers" and "cold knowledge" - file archiving and compression

Original link: https://www.itylq.com/linux-archive-and-compress.html Release date: 2026-02-21 Migration time: 2026-03-21

P1.File archiving and compression

In Linux systems, file/directory archiving and compression are two different concepts that are often used together. Their working principles and functions are completely different. This article will help you establish the definition and purpose of file archiving and compression, and clarify common but confusing Linux knowledge.

1 What is file archiving

Archiving is to package multiple files or directories into a whole, usually called a tarball, which occupies the same total space on the hard disk. It usually retains Meta metadata such as the file's structure, permissions, owner, and timestamp.

The purpose of archiving is to facilitate storage, backup and transmission. It does not care about the file size. The size of the file before archiving and after archiving is almost unchanged.

The typical archiving tool in Linux systems is tar (short for Tape Archiver).

2 What is file compression

Compression uses different algorithms to reduce the hard disk space occupied by files. Compressed files will automatically form different file suffixes due to different algorithms, such as .tar.gz, .tar.bz2, .tar.xz, .tar.zst, .zip, .rar, 7z, etc.

The purpose of compression is to save storage space and speed up file transfer, saving transfer time. It does not care about and retain the metadata of the file.

It is worth noting that compression tools commonly used in Linux systems can generally only compress a single file stream, which is the underlying reason why when multiple files or directories need to be compressed, they are generally used in conjunction with tar archive tools.

Typical compression tools in Linux systems are gzip, bzip2, and xz. Other common compression tools include 7zip, rar, Zstandard (zstd), etc. (manual installation is required).

3 Detailed explanation of tar command

Most usage scenarios involve batch operations on multiple files/directories, which requires file archiving first and then file compression.

A typical file archive compression processing logic is:

c (create) → j (bzip2 compression) → v (show details) → f (specify file name)

tar command implementation:

tar -czvf file.tar.gz /path/to/files

The core options of the tar command are:

    -c, –create creates, creates a new archive file;

    -x, --extract Extract, decompress/unpack archive files;

    -t, –list list, view the content list in the archive file;

    -r, --append append, add files to the end of the existing archive file;

    -u, --update update, only append files newer than those in the archive; # Similar effect to replacement

    -d, --diff compare, compare the differences in the archive and file system.

Compression algorithm options:

    -z, –gzip Automatically generate .tar.gz or .tgz suffix files through gzip compression;

    -j, –bzip2 Automatically generate .tar.bz2 or .tbz2 suffix files through bzip2 compression;

    -J, –xz automatically generates .tar.xz suffix file through xz compression;

    –lzma automatically generates .tar.lzma suffix files through LZMA compression;

    –zstd automatically generates .tar.zst suffix files through Zstandard compression.

Other common accessibility options:

    -v, –verbose verbose output, showing a list of files being processed;

    -f, –file file, specify the archive file name (the option must be followed by the file name);

    -C, –directory directory, switch to the specified directory for operation; #Specify the decompression directory, etc.

    -p, –permissions retain permissions and retain the original file permission attributes;

    -P, --absolute-names absolute path, retain the absolute path before the file name; #Use with caution

    –exclude exclude, exclude specified files or directories.

4 Combined use of archive compression

  1. Archive and compress all files in the specified directory:
    tar -czvf file.tar.gz /path/to/files #czvf=create+gzip+verbose+file
  1. Unzip and unpack the compressed file:
    tar -xzvf file.tar.gz [-C /target/to/unzips] #xzvf=extract+gzip+verbose+file -C specifies the path to decompress to
  1. View the file list information in the archive file:
    tar -tf file.tar.gz #tf=list+file, only view the content without decompression

5 Extensions

An interesting phenomenon, I wonder if you have noticed it:

According to the logical sequence, when decompressing and unpacking a compressed package, theoretically it should be decompressed first with -z and then with -x. But why is the conventional writing method tar -xzvf instead of tar -zxvf?

This involves several Unix philosophical ideas: First, the single responsibility principle, the tar tool focuses on archiving (packing/unpacking/viewing package contents), and gzip focuses on compression/decompression; second, the interface consistency principle, Unix advocates "taking the tool as the subject and the core function as the first parameter". The core function of the tar command is placed first, so that people can know at a glance "what is the purpose of executing this command."

Therefore, based on the above principles, if the gzip command also supports file archiving (packing/unpacking), then when using the gzip command, the recommended order of using the gzip command options should be:

Use tools as the subject and core functions as the first parameter

gzip -unzip -tar unpack -show details -file file.tar.gz

The recommended order of options for tar is:

tar - unpack - gzip compression - show details - file file.tar.gz


This article was moved from WordPress to MkDocs