NVMe physical block size

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

SSDs store data in storage cells that are summarized in pages. Multiple pages form a block. However, SSDs do not report physical_block_size. The Linux kernel calculates the physical_block_size due to different parameters since kernel version 5.3

Calculation of physical_block_size in the Linux kernel

Since Linux kernel version 5.3, the physical_block_size of NVMe SSDs is calculated as follows in the Linux kernel: [1]

  1. To start with, the following applies:
    phys_bs = logical_block_size
    512 byte or 4.096 byte - depends on the block size of the default-namespace of a SSD or with which block size the namespace has been formated.
  2. If an NPGW (Namespace Preferred Write Granularity) is defined for the affected NVMe namespace, phys_bs is increased accordingly to a multiple:
    phys_bs = (1 + NPWG) * logical_block_size
  3. In addition, the atomic_bs (atomar block size, which can still be written atomically during a power failure) is determined.
  4. Linux file systems assume, that the writing of a single physical block is an atomar process. Therefore, the smaller of the two values (phys_bs, atomic_bs) is used to secure physical_block_size:
    physical_block_size = min(phys_bs, atomic_bs)

Calculation in detail

The following information explains the calculation in detail.

atomic_bs

Definitions:[2]

  • Namespace Features (NSFEAT):
    • Bit 1 (NSABP), if set to '1', states that the fields NAWUN, NAWUPF and NACWU are defined for this namespace and should be used by the host instead of AWUN, AWUPF and ACWU in the data structure "Identify Controller". If this value is reset to '0' , the controller does not support the fields NAWUN, NAWUPF and NACWU for this namespace. In this case, the host should use the fields AWUN, AWUPF and ACWU.
  • Namespace Atomic Boundary Offset (NABO)
    This field states the LBA in this namespace, where the first atomar border starts. Typically, NABO refers to the LBAaddress 0.
  • Namespace Atomic Write Unit Power Fail (NAWUPF) (≥ AWUPF)
    This field states the namespace specific size of the writing process that is guaranteed during the power outage or an error condition in the NVM. If the NSABP-Bit is set to '0', this field is reserved.
    A value from 0h means that the size for this namespace is the same as that specified in the AWUPF field of the Identify Controller data structure. All other values state the size in form of logical blocks using the same encoding as in the AWUPF field.
  • Atomic Write Unit Power Fail (AWUPF)
    This field states the size of the writing process, which is guaranteed to be written atomically to the NVM across all namespaces with any supported namespace format during a power failure or fault condition.
    If a specific namespace guarantees a larger size than specified in this field, this namespace-specific size is specified in the NAWUPF field in the Identify namespace data structure.
    This field is stated in logical blocks and is a 0-based value. The AWUPF-value must be less than or equal to the AWUN value.
    When a write command with a size less than or equal to the AWUPF value is transmitted, the host is guaranteed that the write operation in the NVM is atomic with respect to other read or write commands. When a writing command is transferred, that is larger than this size, there is no waranty for the atomicity of the command. If the write size is less than or equal to the AWUPF value and the write command fails, subsequent read commands for the corresponding logical blocks return the data from the previous successful write command. If a write command with a size greater than the AWUPF value is transmitted, there is no guarantee that data will be returned for subsequent read commands for the associated logical blocks.

Calculation:

  1. atomic_bs = bs by default
  2. If NABO has the value 0 and
  3. if the NSABP bit is set and NAWUPF > 1, the following applies atomic_bs = (1 + NAWUP) * bs,
    1. Otherwise, the following applies: atomic_bs = (1 + AWUPF) * bs

phys_bs

Definitions:[2]

  • Namespace Features (NSFEAT):
    Bit 4 (OPTPERF), if set to '1', displays that the areas NPWG, NPWA, NPDG, NPDA and NOWS are defined for this namespace and that should be used by the host for the E/A-optimization. If the value is '0', the NVMe SSD does not support the NPWG, NPWA, NPDG, NPDA, and NOWS fields for this namespace.
  • Namespace Preferred Write Granularity (NPWG)
    This field specifies the smallest recommended write granularity in logical blocks for this namespace. It is a 0-based value. If the OPTPERF bit is set to "0", this field is reserved. The specified size should be less than or equal to the maximum data transfer size (MDTS), which is specified in units of the minimum memory page size. The value of this field may change when the namespace is reformatted. The size should be a multiple of Namespace Preferred Write Alignment (NPWA).

Calculation:

  1. phys_bs = bs by default
  2. If OPTPERF has the value 1, the following applies: phys_bs = (1 + NPWG) * bs

Example of a NVMe SSD

The following example shows a Micron 7450 Pro M.2 SSD with a logical block size of 512 byte and E2MU200 firmware (older version reported larger values for NPWG):[3]

root@ubuntu2204:~# nvme list
Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          22503FB3D7CE         Micron_7450_MTFDKBA960TFR                1           0,00   B / 960,20  GB    512   B +  0 B   E2MU200
root@ubuntu2204:~# cat /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/nvme/nvme0/nvme0n1/queue/physical_block_size
4096
root@ubuntu2204:~# cat /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/nvme/nvme0/nvme0n1/queue/logical_block_size
512
root@ubuntu2204:~# nvme id-ns -H /dev/nvme0n1 |grep -i NAWUPF
  [1:1] : 0x1   Namespace uses NAWUN, NAWUPF, and NACWU
nawupf  : 511
root@ubuntu2204:~# nvme id-ctrl -H /dev/nvme0 |grep -i AWUPF
awupf     : 63
root@ubuntu2204:~# nvme id-ns -H /dev/nvme0n1 |grep -i NPWG
  [4:4] : 0x1   NPWG, NPWA, NPDG, NPDA, and NOWS are Supported
npwg    : 7

The following example shows a Micron 7450 Max U.2 SSD with a logical block size of 4,096 byte and E2MU200 firmware:

root@ubuntu2204:~# nvme list
Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
[...]
/dev/nvme1n1          22323B6AC5C5         Micron_7450_MTFDKCC3T2TFS                1         977,32  MB /   3,20  TB      4 KiB +  0 B   E2MU200
root@ubuntu2204:~# cat /sys/devices/pci0000:5d/0000:5d:01.0/0000:5e:00.0/nvme/nvme1/nvme1n1/queue/physical_block_size
4096
root@ubuntu2204:~# cat /sys/devices/pci0000:5d/0000:5d:01.0/0000:5e:00.0/nvme/nvme1/nvme1n1/queue/logical_block_size
4096
root@ubuntu2204:~# nvme id-ns -H /dev/nvme1n1 |grep -i NAWUPF
  [1:1] : 0x1   Namespace uses NAWUN, NAWUPF, and NACWU
nawupf  : 63
root@ubuntu2204:~# nvme id-ctrl -H /dev/nvme1 |grep -i AWUPF
awupf     : 63
root@ubuntu2204:~# nvme id-ns -H /dev/nvme1n1 |grep -i NPWG
  [4:4] : 0x1   NPWG, NPWA, NPDG, NPDA, and NOWS are Supported
npwg    : 0

More information

References

  1. nvme: set physical block size and optimal I/O size (git.kernel.org, Bart Van Assche, 28.06.2019)
  2. 2.0 2.1 NVM Command Set Specification (nvmexpress.org, 03.10.2022) Revision 1.0c
  3. ashift=18 needed for NVMe with physical block size 256k (github.com/openzfs/zfs/issues)


Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

AMD EPYC Server with Ubuntu - Enable SATA Hot-Swap
Collect and report Linux System Activity Information with sar
Searching for files in Linux with find