Wednesday, September 24, 2008

Slow speeds transferring data between my machines

Alright, I'm not sure what I'm missing, or where I'm not testing, so I'm appealing to the people who know more than me.

I've got a couple of machines. We'll call them DB1 and DB2.

If I test the network connection between the two of them, it looks fine:

DB1 -> DB2
1024.75 Mbit/sec

DB2 -> DB1
895.13 Mbit/sec

When you convert those to Gb/s, I'm getting right around the theoretical max for the network.
(this was tested by using ttcp, by the way). So at least my cables aren't broken.

Now, the next thing I thought of was that my disks were slow.

DB1 has an internal array. It's fast enough:


[root@DB1 ~]# dd if=/dev/zero of=/db/testout bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 58.2554 seconds, 176 MB/s

DB2 is connected to the SAN, and is no slouch either:

[root@DB2 ~]# dd if=/dev/zero of=/db/testout bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 76.7791 seconds, 133 MB/s

When I read from the big array on DB1 to the mirrored disks, I get very fast speeds. Because my free space is small enough (< 4GB free) on my mirrored disks, I can't get a big enough file to make the transfer count. It reports 1,489.63Mb/s, which is baloney, but it lets me know that it's fast.

Reading from the SAN to DB2's local disks is, if not fast, passable
10240000000 bytes (10 GB) copied, 169.405 seconds, 60.4 MB/s
That works out 483.2Mb/s

Now, when I try to rsync from DB2 to DB1, I have issues. Big issues.

I tried to rsync across a 10GB file. Here were the results:

sent 10241250100 bytes received 42 bytes 10935664.86 bytes/sec
(10.93 MB/s or 87.49Mb/s)

Less than 100Mb/s.

I was alerted to this problem earlier, when it took all damned day to transfer my 330GB database image. Here's the output from that ordeal:

sent 68605073418 bytes received 3998 bytes 2616367.39 bytes/se
(2.62 MB/s or 20.93Mb/s)

It only says 68GB because I used the -z flag on the rsync.

To prove that it isn't some sort of bizarre combination of the SAN causing some problem when being read from rsync, here's a mirrored-disk transfer from DB2 to the root partition on DB1:

sent 1024125084 bytes received 42 bytes 11192624.33 bytes/sec
(11.12MB/s or 89.54Mb/s)

I'm willing to say that maybe the network was congested earlier, or maybe the SAN was under stress, but on an otherwise unused network, I should be getting a lot damned more than 89Mb/s between two servers on the same Gb LAN.

Any ideas?


[UPDATE]
Figured it out. Stupid compression flag on rsync