Breaking News

ExT3 File System (LINUX)




The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper and later in a February 1999 kernel mailing list posting, and the filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4....


Advantages

Although its performance (speed) is less attractive than competing Linux filesystems such as ext4, JFS, ReiserFS and XFS, it has a significant advantage in that it allows in-place upgrades from the ext2 file system without having to back up and restore data. Benchmarks suggest that ext3 also uses less CPU power than ReiserFS and XFS. It is also considered safer than the other Linux file systems due to its relative simplicity and wider testing base.
The ext3 file system adds, over its predecessor:
  • A Journaling file system.
  • Online file system growth.
  • Htree indexing for larger directories. An HTree is a specialized version of a B-tree (not to be confused with the H tree fractal).
Without these, any ext3 file system is also a valid ext2 file system. This has allowed well-tested and mature file system maintenance utilities for maintaining and repairing ext2 file systems to also be used with ext3 without major changes. The ext2 and ext3 file systems share the same standard set of utilities, e2fsprogs, which includes an fsck tool. The close relationship also makes conversion between the two file systems (both forward to ext3 and backward to ext2) straightforward.
While in some contexts the lack of "modern" filesystem features such as dynamic inode allocation and extents could be considered a disadvantage, in terms of recoverability this gives ext3 a significant advantage over file systems with those features. The file system metadata is all in fixed, well-known locations, and there is some redundancy inherent in the data structures that may allow ext2 and ext3 to be recoverable in the face of significant data corruption, where tree-based file systems may not be recoverable.


SiZE LIMITS  :
ext3 has a maximum size for both individual files and the entire filesystem. For the filesystem as a whole that limit is  blocks. Both limits are dependent on the block size of the filesystem; the following chart summarizes the limits


Block sizeMaximum
file size
Maximum
file system size
1 KiB16 GiB2 TiB
2 KiB256 GiB8 TiB
4 KiB2 TiB16 TiB
8 KiB[limits 1]2 TiB32 TiB    



Disadvantages

Functionality

Since ext3 aims to be backwards compatible with the earlier ext2, many of the on-disk structures are similar to those of ext2. Because of that, ext3 lacks a number of features of more recent designs, such as extents, dynamic allocation of inodes, and block suballocation There is a limit of 31998 sub-directories per one directory, stemming from its limit of 32000 links per inode.
ext3, like most current Linux filesystems, cannot be fsck-ed while the filesystem is mounted for writing. Attempting to check a file system that is already mounted may detect bogus errors where changed data has not reached the disk yet, and corrupt the file system in an attempt to "fix" these errors.

Defragmentation

There is no online ext3 defragmentation tool that works on the filesystem level. An offline ext2 defragmenter, e2defrag, exists but requires that the ext3 filesystem be converted back to ext2 first. But depending on the feature bits turned on in the filesystem, e2defrag may destroy data; it does not know how to treat many of the newer ext3 features.
There are userspace defragmentation tools like Shake and defrag Shake works by allocating space for the whole file as one operation, which will generally cause the allocator to find contiguous disk space. It also tries to write files used at the same time next to each other. Defrag works by copying each file over itself. However they only work if the filesystem is reasonably empty. A true defragmentation tool does not exist for ext3.
That being said, as the Linux System Administrator Guide states, "Modern Linux filesystem(s) keep fragmentation at a minimum by keeping all blocks in a file close together, even if they can't be stored in consecutive sectors. Some filesystems, like ext3, effectively allocate the free block that is nearest to other blocks in a file. Therefore it is not necessary to worry about fragmentation in a Linux system."
While ext3 is more resistant to file fragmentation than the FAT filesystem, nonetheless ext3 filesystems can get fragmented over time or on specific usage patterns, like slowly-writing large files. Consequently the successor to the ext3 filesystem, ext4, is planned to eventually include an online filesystem defragmentation utility, and currently supports extents (contiguous file regions).

Recovery

There is no support of deleted file recovery in the file system design. The ext3 driver actively deletes files by wiping file inodes for crash safety reasons. This is why an accidental 'rm -rf ...' command may cause permanent data loss.
There are still several technique and some free and commercialsoftware for recovery of deleted or lost files using file system journal analysis; however, they do not guarantee any specific file recovery.

Compression

Support for transparent compression is available as an unofficial patch for ext3. This patch is a direct port of e2compr and still needs further development, it compiles and boots well with upstream kernels but journaling is not implemented yet. The current patch is named e3compr.

Lack of snapshots support

Unlike a number of modern file systems, Ext3 does not have native support for snapshots - the ability to quickly capture the state of the filesystem at arbitrary times, instead relying on less space-efficient volume level snapshots provided by the Linux LVM. The Next3 file system is a modified version of Ext3 which offers snapshots support, yet retains compatibility to the EXT3 on-disk format.

No checksumming in journal

Ext3 does not do checksumming when writing to the journal. If barrier=1 is not enabled as a mount option (in /etc/fstab), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash.
Consider the following scenario: If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to amortize write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the "winners" (transactions with a commit block, including the invalid transaction above which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. There is a test program available to trigger the problematic behavior. If checksums had been used, where the blocks of the "fake winner" transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to ext4.
Filesystems going through the device mapper interface (including software RAID and LVM implementations) may not support barriers, and will issue a warning if that mount option is used. There are also some disks that do not properly implement the write cache flushing extension necessary for barriers to work, which causes a similar warning. In these situations, where barriers are not supported or practical, reliable write ordering is possible by turning off the disk's write cache and using the data=journal mount option. Turning off the disk's write cache may be required even when barriers are available. Applications like databases expect a call to fsync() will flush pending writes to disk, and the barrier implementation doesn't always clear the drive's write cache in response to that call. There is also a potential issue with the barrier implementation related to error handling during events such as a drive failure It is also known that sometimes some virtualization technologies do not properly forward flush command to the underlaying devices (files, volumes, disk) from guest operating system. Similarly some hard disks or controllers implements cache flushing incorrectly or not at all, but still advertise that it is supported, and do not return any error when it is used. For this reasons it is safer to assume that cache flushing do not work, or test it extensively with more reliable and tested components (like SCSI disks).

No comments