Wednesday, July 16, 2008

The way Linux pages data

I've been spending some more time looking at why the bad sectors on the NIST tests (see last post) were in the middle of a read run. In their conclusions they state that:-

"Up to seven accessible sectors adjacent to a faulty sector may be missed when imaged with dd based tools in the Linux environment directly from the ATA interface."

This doesn't seem to make any sense if we are saying that some sectors are skipped when a bad sector is encountered. Surely it would always be the first sector with later sectors skipped? This explanation seems to go some of the way in finding a solution.

When dd requests a block, the mapping layer figures the position of the data on the disk via its logical block number. The kernel issues the read operation and the generic block layer kicks off the I/O operation to copy the data. Each transfer of data involves not just the block in question but also blocks that are adjacent to the required block. Hence a 4096byte 'page' transfered from the device to the block buffering layer in the kernel (often a page segment in RAM) will contain the bad block and adjacent 'good' blocks.

If you have a 4096byte page with a single 512byte bad block you will have, wait for it, 7 good 512byte blocks in that page. This fits with the observations of NIST that 7 sectors may be missed, obviously something bad is happening to the entire 4096byte page.

They then go on to conclude that:-

"For imaging with dd over the firewire interface, the length of runs of missed sectors associated with a single, isolated faulty sector was a multiple of eight sectors."

This makes perfect sense, as the kernel pages the data in 4096byte blocks including 7 good and 1 bad sectors, any 'loss' of data by the block buffering layer would be in 'whole pages' or 8 sector multiples. Am I making any sense?

Hence, I'm reasoning that when dd hits a bad block, something is happening to the block buffering layer to either overwrite, clear or otherwise remove some or all of the buffered pages. The speed of the differences in moving blocks to and from different media such as ATA rather than firewire may help to explain the different numbers of lost pages. e.g. there is physically more or less data in the buffer when it gets deleted/wiped/overwritten etc.

I now need to look at why the buffer is possibly being affected. Any comments are welcomed!