Basic storage abstractions on Linux

Some of the first concepts you come in contact with when installing and using a GNU/Linux distribution for the first time are those related to storage. I would have saved a lot of time in my projects with linux if I had studied the fundamentals of storage sooner.

The following is a condensed and simplified summary of five concepts that you will need to understand Linux (and most other OS’s) better. Hence, this will be useful for working with your personal computer, bare metal servers, virtual machines, docker containers, IoT devices, etc.

Here is a list, from the most concrete to the most abstract:

  1. Drive: Physical storage medium or device
  2. Disk: OS representation of a drive
  3. Partition: Contiguous part of a disk
  4. Volume: Formatted partition
  5. Image: Snapshot of disk, partition or volume

Let’s elaborate on each one.

Drive

A physical device that has the capability of storing raw (digital) data. It is the physical substrate that contains the immaterial data. The more correct name for it would be storage medium or device.

Sometimes it is used informally to refer to any of the other categories 🤦‍♂️️

There are two main types, with respect to the storage technology:

  • HDD: Hard Disk Drives. It uses magnetic storage on a spinning metal disk.
  • SSD: Solid State Drives. It uses flash memory, that is, chips (silicon transistors aka MOSFETs) to store bits.

Disk

It is the representation of the Drive within an operating system, like Linux.

The OS assigns a name to each disk.

On Linux, the most common names are:

  • sdx for SSD drives, where x is an integer beginning from 1
  • hdx for HDD drives, where x is an integer beginning from 1

Example: A computer with two SSDs and one HDD will have disks named as follows: sda, sdb and hda

On Windows, all regular hardware disks have the ‘disk’ prefix and an integer beginning from 0

Example: A computer with two SSDs and one HDD will have disks named as follows: disk0, disk1, disk2

A disk has a special sector or place for the partition table: a data structure that defines how the disk is divided into one or more partitions.

The partition configuration of a disk can be implemented with any of several existing partition schemes. The most popular are:

  • MBR: Master Boot Record. MBR can only handle four primary partitions and 2TB of HDD space
  • GPT: GUID Partition Table. GPT has no partition limit.

Partition

It is a chunk of a disk, with a specific starting point (block), end point (block) and size, which are set at the time of the disk configuration.

A partition doesn’t know anything about its name or file system. A partition can be resized, but it requires rewriting the disk’s partition table and possibly erasing data from contiguous blocks in the disk.

On Linux, the partitions have names according to their parent disk. For example, if the sda disk has two partitions, the OS will identify them as sda1 and sda2. These identifiers are not unique for the hardware device, they may change depending on the order in which the OS reads the connected devices.

Volume

It is the abstract and logical object that the computer user interacts with. A volume has a name, a file system (NTFS, FAT32, ExFAT, BTFS, ZFS, etc) and stores system and user files.

A volume can reside in a single partition, but it can also reside across multiple partitions, like it occurs when using a RAID array. In any case, the user is oblivious to this difference and only sees a single volume from which files can be created, modified or erased.

For the OS to be able to use the volume, it must first mount it to a point/place within the local directory tree. Therefore a volume can be mounted or not in the local system.

💡 On linux, disks and partitions/volumes are represented within the directory tree as device files under the /dev directory. There, the naming conventions are used to tell apart disks from partitions/volumes

On linux, the following commands are useful to get information about disks, partitions/volumes and mounting points

  • df -h
  • lsblk -f

For other useful commands, like fdisk, cfdisk, parted or sfdisk or lshw, go to https://linuxhandbook.com/linux-list-disks/ and https://devconnected.com/how-to-list-disks-on-linux/

Image

It is a data snapshot of the exact (bit by bit) configuration of a disk, a partition or a volume, and it is saved as a file like any other, within a volume. They are often used to perform backups and recoveries. An image can be mounted to a OS directory tree, just like a volume. It can also be copied bit by bit to a partition or an entire disk, as it is the case when performing a recovery.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *