How to write random data to the hard drive with dd

Note: the script posted here is a bit buggy. There is an improved version on a newer post.

Since I bought my netbook (an Acer Aspire One 532h-2514), I’m considering encrypting all my data. After some searching, I found that a non-encrypted /boot in a small partition, plus a dm-crypt/luks encrypted partition is enough to me. Configuring this was a lot tricky ^^

I will document the steps here, but the first the most time-consuming: writing random data to the device.

All documentation I found states that it’s very useful to write random data on the whole disk before doing anything. This is because the disk usually comes filled with zeroes (you can see it with hexdump /dev/devicename), so it’s straightforward to check how much data the disk actually holds. (Some of them: DM-Crypt with LUKS (gentoo wiki), System Encryption.. (arch wiki), a howtoa stackoverflow thread, some other howto..)

They will suggest either badblocks with the option for “writing random data” or dd from /dev/urandom. Badblocks is a tool for searching for bad blocks on the HD; it can make a destructive write search that will overwrite the disc with a pattern, and if this pattern comes from a pseudorandom source it will write almost random data to the disc. Except that its pseudorandom number generator isn’t reseeded, and because of this the data written will be periodic.

dd from /dev/urandom is safer, but is also very slow. Here with my Atom N450 1.6GHz it writes at most at 1.6MB/s. On my 250GB hard drive, it’s more than 40 hours! The man page of urandom says:

The  kernel  random-number  generator  is  designed  to produce a small
amount of high-quality seed material to seed  a  cryptographic  pseudo-
random  number  generator  (CPRNG).   It  is designed for security, not
speed, and is poorly suited to generating large amounts of random data.

On this stack overflow thread, someone suggests that writing an encrypted stream using openssl could work. But openssl has a pseudo-random number generator.. so I think it might be enough to generate some chunks of data (say, 4MB) with openssl, and write them with dd. Between each block, the PRNG would be reseeded. Sounds ok. I’m sure how secure it is, though.. but it improved the performance to about 5MB/s. (That is, 15h..)

Here is a script to just do this:

#!/bin/bash

blocksize="$((4*1024*1024))" # 4 mb

if [[ -z "$1" ]]; then
  echo "uso: $0 "
else
  target="$1"
fi

if [[ -f step ]]; then
  initial="$(< step)"
else
  initial=0
fi

for ((i = "$initial"; 1; i++)); do
  openssl rand \
          -rand /dev/urandom \
          "$blocksize" | \

  dd of="$target" \
     bs="$blocksize" \
     seek="$i"
  echo "$i" > step.new
  mv step.new step
done

In order to use it, I create a directory and put the script there. I will then execute the script; a file named step is created in the same directory to record the progress. If you stop it before it finishes and run again from the same directory, it will resume a little before the last write (unless there is a bug..)

Unfortunately this script is poorly written, and dd will just fail to write when the disk was completely randomized. I suppose the right thing would be to stop the loop after dd fails the first time. Also, I’m checking the progress with hexdump. hexdump -s 20000m /dev/sdb prints

4e2000000 0000 0000 0000 0000 0000 0000 0000 0000
*

(The * means the next blocks are identical)

but hexdump -s 19000m /dev/sdb prints some random data, and this means dd is between 19000MB and 2000MB (where exactly you can see by doing something like echo $(($(< step) * 4)), in megabytes).

I will maybe document the exact steps on how to create a luks partition, then a lvm volume inside it, then a debian installed on a logical volume (That’s what I did). But if you are looking for it, you can follow the links I gave above (most of them are tutorials that describe just that), or return to google 🙂

But I will give additional links: there is an incredible straightforward migration to this setup that is identical than installing from bare metal, except that instead to doing rsync one would do an installation (but I didn’t used yaird to install the initrd, I used update-initramfs and tweaked it a bit – debian has some options), this helped too and this ubuntu tutorial is also nice. I installed debian via debootstrap from another debian system. It’s just debootstrap /directory, then mount –bind some things (like /dev, etc), then chrooting to it..

The trick here was that update-initramfs has a bug (strangely marked as fixed at debian bug tracker) that prevented the correct interpretation of /etc/crypttab, but I followed the workaround of the poster there and it’s working like a charm 🙂

There is a continuation to this post, explaining how to partition the disk the way I did.

Advertisements

About Elias

Some random geek
This entry was posted in Linux and tagged , , , , , , , . Bookmark the permalink.

5 Responses to How to write random data to the hard drive with dd

  1. Mark Essel says:

    Heyo!

    Very cool that you’re blogging, I’ll start reading through (you posted a bunch in 2 days so it may take me a while to catch up).

    Glad you’re blogging. When disqus isn’t closing comments it’s a nice comment tool but with a hosted wordpress blog you can’t do too much fun stuff like add nostat.us in your footer 🙂

  2. Mark Essel says:

    Question:
    Couldn’t I just write some c code to allocate massive blocks and write them to disk faster than the pseudo random stuff you described? Those rates seemed low

    • Elias says:

      Omg. I was writing a long, long, very long reply, and simply lost it,, and I don’t know why……

      Trying to be less verbose: I don’t think so. Do blocks allocated by malloc come randomized? If yes, from where would come this randomness? From reused freed memory, or from the OS. Reused memory: if you /have/ random data, it’s better to write to the disk, it’s waiting for it. From the OS: malloc uses brk or mmap (man malloc, see NOTES; man brk, man malloc). I don’t know the internals, but I think if the data comes random, it will be garbage from another process. How much processes write a large amount of random data to memory? Regular program data surely isn’t random. Also, I have only 2GB of memory. Even if it were full of random numbers, it makes about 1/150 of this drive.

      Maybe you are confusing data you can’t completely predict (i.e. data you don’t have) with random data. By random data here I mean statistically random, and there are tests for identifying most non-random data (wikipedia has something about it). The problem of using data from an unknown source (such as garbage you took from malloc) as random is that if the entropy of the source is low, you will be fooling yourself. But you may still try to extract the bits of entropy of this data; that’s basically what /dev/urandom does..

      (Re-reading it, I was still verbose, damn..)

    • Elias says:

      I’ve done right now a rsync to an usb hard drive. Dunno about its speed but I got this:

      sent 8.66G bytes received 32.74K bytes 6.67M bytes/sec

      Maybe it was slow because it involved a lot of seeking. In any case, this means that 5MB/s isn’t completely unacceptable for a HD..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s