[gPXE-devel] gPXE performance under virtualization
Stefan Hajnoczi
stefanha at gmail.com
Tue Jul 6 15:58:26 EDT 2010
Here are the results from an investigation into gPXE performance under
virtualization. It suggests optimizing the main loop to do as few
real-mode switches as possible.
I originally sent this as a series of emails to Andrei Faur but we
both haven't had time to dig deeper yet and I thought others would be
interested, too.
CCed Michael Brown who may be interested in KVM performance too.
I ran 5 HTTP downloads of a 100 MB file from tmpfs and compared gPXE
with console built in against gPXE without console. It makes a 15%
difference in gPXE's response latency under KVM. I wonder if the
difference without console is even larger in VirtualBox.
KVM with tap networking using gPXE andreif/pcnet32_tmp
Webserver is Python's SimpleHTTPServer
$ ./tcp_stats.py console.csv
10.0.6.1:irdmi-10.0.6.2:blackjack duration=19.578165
bytes_transferred=99959422 [10.0.6.2: avg=0.000270 min=0.000207
max=0.527400] [10.0.6.1: avg=0.000014 min=0.000008 max=0.004641]
10.0.6.1:irdmi-10.0.6.2:1028 duration=19.541665
bytes_transferred=104783153 [10.0.6.2: avg=0.000256 min=0.000148
max=0.013202] [10.0.6.1: avg=0.000014 min=0.000008 max=0.006002]
10.0.6.1:irdmi-10.0.6.2:1024 duration=19.458486
bytes_transferred=104049017 [10.0.6.2: avg=0.000258 min=0.000163
max=0.144666] [10.0.6.1: avg=0.000013 min=0.000008 max=0.003486]
10.0.6.1:irdmi-10.0.6.2:1027 duration=19.521474
bytes_transferred=104704961 [10.0.6.2: avg=0.000256 min=0.000137
max=0.016374] [10.0.6.1: avg=0.000014 min=0.000008 max=0.006024]
10.0.6.1:irdmi-10.0.6.2:cap duration=19.448346
bytes_transferred=104762678 [10.0.6.2: avg=0.000255 min=0.000193
max=0.016795] [10.0.6.1: avg=0.000014 min=0.000008 max=0.006083]
$ ./tcp_stats.py noconsole.csv
10.0.6.1:irdmi-10.0.6.2:blackjack duration=16.669561
bytes_transferred=103219313 [10.0.6.2: avg=0.000221 min=0.000122
max=0.213510] [10.0.6.1: avg=0.000013 min=0.000008 max=0.006060]
10.0.6.1:irdmi-10.0.6.2:1028 duration=16.685741
bytes_transferred=104792126 [10.0.6.2: avg=0.000218 min=0.000131
max=0.010243] [10.0.6.1: avg=0.000013 min=0.000008 max=0.005084]
10.0.6.1:irdmi-10.0.6.2:1024 duration=17.296785
bytes_transferred=104620977 [10.0.6.2: avg=0.000226 min=0.000184
max=0.016814] [10.0.6.1: avg=0.000014 min=0.000008 max=0.005454]
10.0.6.1:irdmi-10.0.6.2:1027 duration=16.681651
bytes_transferred=104777134 [10.0.6.2: avg=0.000217 min=0.000079
max=0.012261] [10.0.6.1: avg=0.000013 min=0.000008 max=0.006327]
10.0.6.1:irdmi-10.0.6.2:cap duration=16.699250
bytes_transferred=104775998 [10.0.6.2: avg=0.000218 min=0.000114
max=0.012540] [10.0.6.1: avg=0.000013 min=0.000008 max=0.005322]
Observations:
The server always responds quickly ~13-14 us.
With console enabled the client responds ~256 us. With console
disabled it responds ~218 us. Looks like the console is_key() does
make a difference in response latency.
The 15% latency reduction directly translates to a 15% download time
improvement.
Thomas suggested running with direct VGA and PC keyboard, here are the results:
$ ./tcp_stats.py vgaconsole.csv
10.0.6.1:irdmi-10.0.6.2:blackjack duration=16.847296
bytes_transferred=104857001 [10.0.6.2: avg=0.000220 min=0.000166
max=0.006442] [10.0.6.1: avg=0.000013 min=0.000008 max=0.000555]
10.0.6.1:irdmi-10.0.6.2:1028 duration=16.869025
bytes_transferred=104775913 [10.0.6.2: avg=0.000220 min=0.000101
max=0.011546] [10.0.6.1: avg=0.000013 min=0.000008 max=0.000574]
10.0.6.1:irdmi-10.0.6.2:1024 duration=17.265904
bytes_transferred=101391937 [10.0.6.2: avg=0.000233 min=0.000050
max=0.346175] [10.0.6.1: avg=0.000014 min=0.000008 max=0.029232]
10.0.6.1:irdmi-10.0.6.2:1027 duration=17.162903
bytes_transferred=103062929 [10.0.6.2: avg=0.000228 min=0.000099
max=0.306272] [10.0.6.1: avg=0.000014 min=0.000008 max=0.004575]
10.0.6.1:irdmi-10.0.6.2:cap duration=16.911902
bytes_transferred=104797633 [10.0.6.2: avg=0.000221 min=0.000142
max=0.009305] [10.0.6.1: avg=0.000013 min=0.000008 max=0.004723]
They are close to running without a console. So the BIOS console is
slowing us down compared to direct access.
No console and rdtsc timer instead of BIOS timer:
$ ./tcp_stats.py noconsole_rdtsc.csv
10.0.6.1:irdmi-10.0.6.2:blackjack duration=11.130568
bytes_transferred=104710654 [10.0.6.2: avg=0.000143 min=0.000079
max=0.012379] [10.0.6.1: avg=0.000011 min=0.000008 max=0.007807]
10.0.6.1:irdmi-10.0.6.2:1028 duration=10.752914
bytes_transferred=104214089 [10.0.6.2: avg=0.000139 min=0.000047
max=0.059598] [10.0.6.1: avg=0.000011 min=0.000008 max=0.003159]
10.0.6.1:irdmi-10.0.6.2:1024 duration=11.115069
bytes_transferred=104570086 [10.0.6.2: avg=0.000143 min=0.000031
max=0.016413] [10.0.6.1: avg=0.000012 min=0.000008 max=0.028772]
10.0.6.1:irdmi-10.0.6.2:1027 duration=11.153725
bytes_transferred=104439977 [10.0.6.2: avg=0.000144 min=0.000107
max=0.025765] [10.0.6.1: avg=0.000011 min=0.000008 max=0.000550]
10.0.6.1:irdmi-10.0.6.2:cap duration=10.874129
bytes_transferred=104744057 [10.0.6.2: avg=0.000140 min=0.000118
max=0.010973] [10.0.6.1: avg=0.000011 min=0.000008 max=0.000522]
We've gone from 19.5 s down to 11.1 s, more than 40% improvement!
To gather this data:
1. Capture packets on VM network interface for the entire benchmark duration.
2. Export the capture to CSV using Wireshark.
3. Use tcp_stats.py on the CSV file to produce statistics.
The script is available at http://etherboot.org/share/stefanha/tcp_stats.py.
Config options in config/defaults/pcbios.h:
#undef CONSOLE_PCBIOS /* disable BIOS console */
#undef TIMER_PCBIOS /* switch from BIOS timer to rdtsc instruction */
#define TIMER_RDTSC
Testers under VMware and on physical machines appreciated!
Stefan
More information about the gPXE-devel
mailing list