<pre>Hey fellows,<br><br>When booting hundreds of similar systems at the same time, we need to add some sleeping entropy prior the dhcp & pxe stuff start.<br>That avoid a massive incast problem. To see that point, consider that I'm facing up to 720 similar hardware that boot at the exactly same time.<br>
Adding some sleeps prior the dhcp start is a good thing for me.<br><br>I've been working on a prototype and faced one big issue. The current random() implementation uses currentick() as seed.<br>But as you guess, the time I need to reach the dhcp is mostly stable over my systems so I have mostly the same results everywhere.<br>
So I did use the rtc clock to grab the time and use the last digit of the mac address to increase the entropy.<br><br>I know the patch isn't perfect, and the cmos code might be moved to the random() thing ... but I preferred submitting a first prototype to rise issues & comments about this strategy.<br>
<br>I just have to say this trick worked great on my hosts.<br><br>Please find bellow my git commit in my personal gpxe repo.<br><br>Cheers,<br>Erwan<br><br>------------------<br><br>From: Erwan Velu <<a href="mailto:erwan.velu@zodiacaerospace.com">erwan.velu@zodiacaerospace.com</a>><br>
Date: Fri, 20 Aug 2010 15:14:44 +0000 (+0200)<br>Subject: MAX_RANDOM_SLEEP_TIME to avoid incast troubles<br>X-Git-Url: <a href="http://gitweb.ife-sit.info/?p=gpxe.git;a=commitdiff_plain;h=3b9111e487e45201226b9c3426965ffd843d0687;hp=02a0646fec8011c73f31a83a967873e5fe896575">http://gitweb.ife-sit.info/?p=gpxe.git;a=commitdiff_plain;h=3b9111e487e45201226b9c3426965ffd843d0687;hp=02a0646fec8011c73f31a83a967873e5fe896575</a><br>
<br>MAX_RANDOM_SLEEP_TIME to avoid incast troubles<br><br>When booting hundreds of similar systems at the same time, we need to<br>add some sleeping entropy prior the dhcp & pxe stuff start.<br><br>By default, gpxe enabled systems will wait up to 30seconds prior<br>
booting.<br>---<br><br>diff --git a/src/usr/dhcpmgmt.c b/src/usr/dhcpmgmt.c<br>index f82a3bb..97be87f 100644<br>--- a/src/usr/dhcpmgmt.c<br>+++ b/src/usr/dhcpmgmt.c<br>@@ -20,7 +20,10 @@ FILE_LICENCE ( GPL2_OR_LATER );<br>
<br> #include <string.h><br> #include <stdio.h><br>+#include <stdlib.h><br> #include <errno.h><br>+#include <unistd.h><br>+#include <gpxe/io.h><br> #include <gpxe/netdevice.h><br>
#include <gpxe/dhcp.h><br> #include <gpxe/monojob.h><br>@@ -29,6 +32,7 @@ FILE_LICENCE ( GPL2_OR_LATER );<br> #include <usr/dhcpmgmt.h><br> <br> #define LINK_WAIT_MS        15000<br>+#define MAX_RANDOM_SLEEP_TIME 30<br>
<br> /** @file<br> *<br>@@ -56,6 +60,35 @@ int dhcp ( struct net_device *netdev ) {<br>         while ( hlen-- )<br>                 printf ( "%02x%c", *(chaddr++), ( hlen ? ':' : ')' ) );<br> <br>+        /* In some particular setups like large clusters, many systems can bootup at the same time.<br>
+         * This could generate a huge load to the main servers, this is know as the incast effect.<br>+         * We can avoid this phenomena by introducing a variable sleep time comprised<br>+         * between 0 and MAX_RANDOM_SLEEP_TIME.<br>
+         * To generate random numbers, we grab the time from the cmos powered by the last digit of<br>+         * the network card. That's clearly not secured but that's enought for getting entropy at<br>+         * boot time.<br>+         */<br>
+        if (MAX_RANDOM_SLEEP_TIME > 0 ) {<br>+                uint8_t random_sleep_time;<br>+<br>+                /* Grabbing time from the CMOS */<br>+                uint8_t clock_ctl_addr = 0x70;<br>+                uint8_t clock_data_addr = 0x71;<br>+                uint8_t cmos_time;<br>+                outb (0x80, clock_ctl_addr);<br>
+                cmos_time=inb (clock_data_addr);<br>+<br>+                /* Let's power the cmos time with the last digit of the mac address */<br>+                cmos_time ^= *(--chaddr);<br>+<br>+                /* Initialize the random number generator to compute the sleeping time*/<br>
+                srandom(cmos_time);<br>+                random_sleep_time=random() % MAX_RANDOM_SLEEP_TIME;<br>+<br>+                printf ( " \nWaiting %i seconds to avoid incast problems",random_sleep_time);<br>+                sleep(random_sleep_time);<br>+        }<br>
+<br>         if ( ( rc = start_dhcp ( &monojob, netdev ) ) == 0 ) {<br>                 rc = monojob_wait ( "" );<br>         } else if ( rc > 0 ) {<br></pre>