<div dir="ltr"><p>Hi Matthew,</p>
<p>Stefan is right, you should reduce the DEBUG messages depth to find the fail cause.</p>
<p>I have tried SRP boot only with Hermon driver (ConnectX) and it worked for me.</p>
<p>Regards,</p>
<div>Itay<br><br></div>
<div>On Wed, Jun 23, 2010 at 11:27 AM, Stefan Hajnoczi <span dir="ltr"><<a href="mailto:stefanha@gmail.com">stefanha@gmail.com</a>></span> wrote:<br></div>
<div class="gmail_quote">
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">
<div class="im">On Wed, Jun 23, 2010 at 6:44 AM, M Lowe <<a href="mailto:mlowe@shaw.ca">mlowe@shaw.ca</a>> wrote:<br>> My motherboard doesn't have a serial port, so that's not an<br>> option. Unless gPXE supports USB-Serial converters?<br>
<br></div>gPXE doesn't support USB. Can your BIOS can redirect to (USB-)serial?<br>
<div class="im"><br>> Arbel 0x1f7c4 command failed with status 22:<br>> 000404c8: 00 00 00 00 00 00 00 00-00 00 00 00 00 (rest of line cut off)<br>> 000404d8: cf ec 00 00 00 00 0 (rest of line cut off)<br><br></div>
I think this is the error code (from the Linux driver):<br><br>/* HCA local attached memory not present: */<br>MTHCA_CMD_STAT_LAM_NOT_PRE = 0x22,<br><br>The gPXE source says this error can be ignored.<br>
<div class="im"><br>> Arbel 0x1f7c4 command failed with status 0a:<br>> 0004019c: 00 00 00 00 cf eb f0 00-00 00 00 02 00 00 00 00 :<br>> ...............<br>> 000401ac: cf ec 00 00 00 00 00 00-0a 00 30 24 : .........0$<br>
> Arbel 0x1f7c4 could not issue MAD IFC: Input/output error (0x1d714039)<br><br></div>Error code from Linux again:<br><br>/* Index out of range: */<br>MTHCA_CMD_STAT_BAD_INDEX = 0x0a,<br><br>I think this happens here:<br>
/* Update MAD parameters */<br>for ( i = 0 ; i < ARBEL_NUM_PORTS ; i++ )<br> ib_smc_update ( arbel->ibdev[i], arbel_mad );<br><br>The driver defines ARBEL_NUM_PORTS to 2, so perhaps it is probing a port that<br>
doesn't exist. This should be fine, too.<br>
<div class="im"><br>> It seems that running gdbstub halts whatever thread is handling the network<br>> IO, making it impossible to connect to gdbstub over udp. After exiting<br>> gdbstub, gPXE starts responding to pings and arp requests again.<br>
<br></div>The gdbstub performs low-level network I/O - it directly polls the network<br>device for packets. The network stack will not respond while the gdbstub is<br>active. However, the gdbstub implements ARP response directly.<br>
<br>Are you running gdbudp on the NIC you are trying to debug? In order to be able<br>to debug the arbel driver, gdbudp needs to use another NIC (e.g. an e1000<br>card). This is because setting breakpoints in the arbel code won't work if<br>
gdbudp is using the arbel card.<br><br>> Any ideas?<br><br>I think you are on the right track looking at DBG() messages. You've<br>established that transmit is working and the target receives the login<br>request.<br>
<br>You might need to reduce the number of DBG() messages in gPXE's<br>receive code path when debugging without a serial port. Run without<br>the ":3" on the DEBUG= options for less verbose output. You can also<br>
try commenting out or moving DBG() messages that are too frequent and<br>not useful.<br><br>The aim would be to find out if the response is being received at each<br>layer of the stack (arbel driver, infiniband, srp) and then understand<br>
the reason for dropping the response.<br><br>Michael Brown and Itay Gazit may have better Infiniband and SRP<br>debugging ideas. I have CCed them and added the gPXE mailing list<br>(the Etherboot-discuss list has been replaced by <a href="mailto:gpxe@etherboot.org">gpxe@etherboot.org</a>).<br>
<font color="#888888"><br>Stefan<br></font></blockquote></div><br></div>