<html xmlns="http://www.w3.org/TR/REC-html40" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml">
<head>
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body vlink="purple" link="blue" lang="EN-US">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D">Even ideas on how I can debug this issue further would help. I
don’t mind putting in the leg work at all, but a lot of this code is over
my head. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Itay Gazit
[mailto:itaygazit@gmail.com] <br>
<b>Sent:</b> Tuesday, July 13, 2010 8:41 AM<br>
<b>To:</b> M Lowe; Michael Brown<br>
<b>Cc:</b> Stefan Hajnoczi; etherboot-discuss@lists.sourceforge.net; gpxe<br>
<b>Subject:</b> Re: [Etherboot-discuss] SRP timeout<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Michael,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Do you have an idea? What can be the problem with the arbel
driver?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p style="margin-bottom:12.0pt" class="MsoNormal">Itay<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">On Mon, Jul 12, 2010 at 3:18 AM, M Lowe <<a href="mailto:mlowe@shaw.ca">mlowe@shaw.ca</a>> wrote:<o:p></o:p></p>
<p class="MsoNormal">I have been able to log the debug messages now however I see
no errors<br>
that would indicate where the problem is.<br>
<br>
Just to recap quickly, the problem is that san-booting over InfiniBand<br>
using SRP doesn't work and just times out. The timeout occurs while<br>
waiting for a response to the SRP login request. I'm fairly certain the<br>
problem lies within gPXE because I can access the SRP target just fine<br>
through a local installation of Windows. In addition, on the SRP target<br>
side I have traced through the ib_srpt module and found that a login<br>
response is generated and sent (or at least posted to the mthca module<br>
work queue).<br>
<br>
On the gPXE side I've found that I'm not receiving the SRP_LOGIN_RSP<br>
packet even at the InfiniBand protocol level (net/infiniband.c). So far<br>
I have been able to determine the packet is lost at some point in the<br>
Arbel driver (drivers/infiniband/arbel.c) before arbel_complete().This<br>
would indicate the problem exists within the Arbel driver and explains<br>
why SRP sanboot worked with the Hermon driver. Despite compiling with<br>
DEBUG=arbel:3 I get no errors indicating there are any problems or<br>
dropped packets.<br>
<br>
Here is the output from autoboot with<br>
DEBUG=srp,ipoib,arp,infiniband,ib_cm,ib_cmrc,ib_mcast,ib_mi,ib_packet,ib<br>
_pathrec,ib_sma,ib_smc,ib_srp<br>
<br>
Note: I have added some debug messages to help illustrate the flow of<br>
packets. At the beginning of ipoib_complete_recv, ib_complete_recv, and<br>
ib_mi_complete_recv I have added "RX" debug messages.<br>
<br>
Booting from root path<br>
"ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c9020<br>
022e5e4:0002c9020022e5e4"<br>
SRP 0xbb134 using<br>
ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c90200<br>
22e5e4:0002c9020022e5e4<br>
SRP attached successfully<br>
IBDEV 0xb9a84 creating completion queue<br>
IBDEV 0xb9a84 created 8-entry completion queue 0xbb4c4 (0xbb214) with<br>
CQN 0x83<br>
IBDEV 0xb9a84 creating queue pair<br>
IBDEV 0xb9a84 created queue pair 0xbb4f4 (0xbb5c4) with QPN 0x550403<br>
IBDEV 0xb9a84 QPN 0x550403 has 4 send entries at [0xbb5a0,0xbb5b0)<br>
IBDEV 0xb9a84 QPN 0x550403 has 2 receive entries at [0xbb5b0,0xbb5b8)<br>
CMRC 0xbb1b4 using QPN 550403<br>
SRP 0xbb134 TX login request tag 0000000000000001<br>
CM 0xbbb64 created for IBDEV 0xb9a84 QPN 550403<br>
CM 0xbbb64 connecting to fe800000:00000000:0002c902:0022e5e5<br>
0002c902:0022e5e4<br>
MI 0xba564 TX TID 6750584500000003 (03,02,01,0035) status 0000<br>
infiniband RX<br>
MI 0xba564 RX<br>
MI 0xba564 RX TID 6750584500000003 (03,02,81,0035) status 0000<br>
IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0<br>
rate 6<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
infiniband RX<br>
IPoIB 0xb9ccc RX<br>
ARP cache add: IP 10.20.76.1 => IPoIB<br>
80000404:fe800000:00000000:0002c902:0022e5e5<br>
ARP reply: IP 10.20.76.45 => IPoIB<br>
00550402:fe800000:00000000:0002c902:00243035<br>
IPoIB peer 4 has MAC 80000404:fe800000:00000000:0002c902:0022e5e5<br>
MI 0xba564 TX TID 6750584500000005 (03,02,01,0035) status 0000<br>
infiniband RX<br>
MI 0xba564 RX<br>
MI 0xba564 RX TID 6750584500000005 (03,02,81,0035) status 0000<br>
MI 0xba564 RX TID 6750584500000005 handling via transaction handler<br>
IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0<br>
rate 6<br>
infiniband RX<br>
IPoIB 0xb9ccc RX<br>
ARP cache update: IP 10.20.76.1 => IPoIB<br>
80000404:fe800000:00000000:0002c902:0022e5e5<br>
ARP reply: IP 10.20.76.45 => IPoIB<br>
00550402:fe800000:00000000:0002c902:00243035<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 abandoning TID 6750584500000004<br>
CM 0xbbb64 connection request failed: Connection timed out (0x4c206035)<br>
CMRC 0xbb1b4 disconnected: Connection timed out (0x4c206035)<br>
SRP 0xbb134 socket closed: Connection timed out (0x4c206035)<br>
<br>
<br>
<br>
From: Itay Gazit [mailto:<a href="mailto:itaygazit@gmail.com">itaygazit@gmail.com</a>]<br>
Sent: Friday, June 25, 2010 11:47 AM<br>
To: Stefan Hajnoczi; M Lowe<br>
Cc: <a href="mailto:etherboot-discuss@lists.sourceforge.net">etherboot-discuss@lists.sourceforge.net</a>;
gpxe; Michael Brown<br>
Subject: Re: [Etherboot-discuss] SRP timeout<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><br>
Hi Matthew,<br>
Stefan is right, you should reduce the DEBUG messages depth to find the<br>
fail cause.<br>
I have tried SRP boot only with Hermon driver (ConnectX) and it worked<br>
for me.<br>
Regards,<br>
Itay<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>