Skip to content
Snippets Groups Projects
Commit 9b6f9d17 authored by Andreas Dilger's avatar Andreas Dilger
Browse files

Branch HEAD

Description: data loss for recently-modified files
Details    : In some cases it is possible that recently written or created
	     files may not be written to disk in a timely manner (this should
	     normally be within 30s unless client IO load is very high).
	     The problem appears as zero-length files or files that are a
	     multiple of 1MB in size after a client crash or client eviction
	     that are missing data at the end of the file.

	     This problem is more likely to be hit on clients where files are
	     repeatedly created and unlinked in the same directory, clients
	     have a large amount of RAM, have many CPUs, the filesystem has
	     many OSTs, the clients are rebooted frequently, and/or the files
	     are not accessed by other nodes after being written.

	     The presence of the problem can be detected by looking at
	     /proc/sys/fs/inode-state.  If the first number (nr_inodes) is
	     smaller than the second (nr_unused) then dirty files will not
	     be flushed automatically to disk.  "sync; sleep 10" should be
	     run several times on the node before unmounting it to update
	     Lustre (this is also safe to run on nodes without this problem).

	     There is also a related kernel bug in the RHEL4 4 2.6.9 kernel
	     that can cause this same problem, so customers using that kernel
	     also need to update the kernel in addition to Lustre.  In order
	     to properly fix this bug, the RHEL3 2.4.21 kernel is also updated.

	     It is normal that files written just before a client crash (less
	     than 30s) may not yet have been flushed to disk, even for local
	     filesystems.
i=green(original patch), i=shadow
b=12181, b=12203
parent 6460bf88
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment