I just looked into arbel code. It does not implement RC qp type at all (It implements UD only). This means SRP will not work with arbel card since SRP uses RC/RDMA for data transfer.<br><br>Your best bet at this moment is to us Hermon card (unless you want to add RC QP capability into arbel driver :-) )<br>
<br>Itay,<br><br>Can you reconfirm this from the arbel code?<br><br>Thx,<br>Viswa<br><br><br><div class="gmail_quote">On Wed, Jul 21, 2010 at 9:19 PM, M Lowe <span dir="ltr"><<a href="mailto:mlowe@shaw.ca">mlowe@shaw.ca</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Even ideas on how I can debug this issue further would help. I
don’t mind putting in the leg work at all, but a lot of this code is over
my head. </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<div style="border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; padding: 3pt 0in 0in;">
<p class="MsoNormal"><b><span style="font-size: 10pt;">From:</span></b><span style="font-size: 10pt;"> Itay Gazit
[mailto:<a href="mailto:itaygazit@gmail.com" target="_blank">itaygazit@gmail.com</a>] <br>
<b>Sent:</b> Tuesday, July 13, 2010 8:41 AM<br>
<b>To:</b> M Lowe; Michael Brown<br>
<b>Cc:</b> Stefan Hajnoczi; <a href="mailto:etherboot-discuss@lists.sourceforge.net" target="_blank">etherboot-discuss@lists.sourceforge.net</a>; gpxe<div><div></div><div class="h5"><br>
<b>Subject:</b> Re: [Etherboot-discuss] SRP timeout</div></div></span></p>
</div><div><div></div><div class="h5">
<p class="MsoNormal"> </p>
<div>
<div>
<p class="MsoNormal">Michael,</p>
</div>
<div>
<p class="MsoNormal">Do you have an idea? What can be the problem with the arbel
driver?</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom: 12pt;">Itay</p>
</div>
<div>
<p class="MsoNormal">On Mon, Jul 12, 2010 at 3:18 AM, M Lowe <<a href="mailto:mlowe@shaw.ca" target="_blank">mlowe@shaw.ca</a>> wrote:</p>
<p class="MsoNormal">I have been able to log the debug messages now however I see
no errors<br>
that would indicate where the problem is.<br>
<br>
Just to recap quickly, the problem is that san-booting over InfiniBand<br>
using SRP doesn't work and just times out. The timeout occurs while<br>
waiting for a response to the SRP login request. I'm fairly certain the<br>
problem lies within gPXE because I can access the SRP target just fine<br>
through a local installation of Windows. In addition, on the SRP target<br>
side I have traced through the ib_srpt module and found that a login<br>
response is generated and sent (or at least posted to the mthca module<br>
work queue).<br>
<br>
On the gPXE side I've found that I'm not receiving the SRP_LOGIN_RSP<br>
packet even at the InfiniBand protocol level (net/infiniband.c). So far<br>
I have been able to determine the packet is lost at some point in the<br>
Arbel driver (drivers/infiniband/arbel.c) before arbel_complete().This<br>
would indicate the problem exists within the Arbel driver and explains<br>
why SRP sanboot worked with the Hermon driver. Despite compiling with<br>
DEBUG=arbel:3 I get no errors indicating there are any problems or<br>
dropped packets.<br>
<br>
Here is the output from autoboot with<br>
DEBUG=srp,ipoib,arp,infiniband,ib_cm,ib_cmrc,ib_mcast,ib_mi,ib_packet,ib<br>
_pathrec,ib_sma,ib_smc,ib_srp<br>
<br>
Note: I have added some debug messages to help illustrate the flow of<br>
packets. At the beginning of ipoib_complete_recv, ib_complete_recv, and<br>
ib_mi_complete_recv I have added "RX" debug messages.<br>
<br>
Booting from root path<br>
"ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c9020<br>
022e5e4:0002c9020022e5e4"<br>
SRP 0xbb134 using<br>
ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c90200<br>
22e5e4:0002c9020022e5e4<br>
SRP attached successfully<br>
IBDEV 0xb9a84 creating completion queue<br>
IBDEV 0xb9a84 created 8-entry completion queue 0xbb4c4 (0xbb214) with<br>
CQN 0x83<br>
IBDEV 0xb9a84 creating queue pair<br>
IBDEV 0xb9a84 created queue pair 0xbb4f4 (0xbb5c4) with QPN 0x550403<br>
IBDEV 0xb9a84 QPN 0x550403 has 4 send entries at [0xbb5a0,0xbb5b0)<br>
IBDEV 0xb9a84 QPN 0x550403 has 2 receive entries at [0xbb5b0,0xbb5b8)<br>
CMRC 0xbb1b4 using QPN 550403<br>
SRP 0xbb134 TX login request tag 0000000000000001<br>
CM 0xbbb64 created for IBDEV 0xb9a84 QPN 550403<br>
CM 0xbbb64 connecting to fe800000:00000000:0002c902:0022e5e5<br>
0002c902:0022e5e4<br>
MI 0xba564 TX TID 6750584500000003 (03,02,01,0035) status 0000<br>
infiniband RX<br>
MI 0xba564 RX<br>
MI 0xba564 RX TID 6750584500000003 (03,02,81,0035) status 0000<br>
IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0<br>
rate 6<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
infiniband RX<br>
IPoIB 0xb9ccc RX<br>
ARP cache add: IP 10.20.76.1 => IPoIB<br>
80000404:fe800000:00000000:0002c902:0022e5e5<br>
ARP reply: IP 10.20.76.45 => IPoIB<br>
00550402:fe800000:00000000:0002c902:00243035<br>
IPoIB peer 4 has MAC 80000404:fe800000:00000000:0002c902:0022e5e5<br>
MI 0xba564 TX TID 6750584500000005 (03,02,01,0035) status 0000<br>
infiniband RX<br>
MI 0xba564 RX<br>
MI 0xba564 RX TID 6750584500000005 (03,02,81,0035) status 0000<br>
MI 0xba564 RX TID 6750584500000005 handling via transaction handler<br>
IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0<br>
rate 6<br>
infiniband RX<br>
IPoIB 0xb9ccc RX<br>
ARP cache update: IP 10.20.76.1 => IPoIB<br>
80000404:fe800000:00000000:0002c902:0022e5e5<br>
ARP reply: IP 10.20.76.45 => IPoIB<br>
00550402:fe800000:00000000:0002c902:00243035<br>
MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000<br>
MI 0xba564 abandoning TID 6750584500000004<br>
CM 0xbbb64 connection request failed: Connection timed out (0x4c206035)<br>
CMRC 0xbb1b4 disconnected: Connection timed out (0x4c206035)<br>
SRP 0xbb134 socket closed: Connection timed out (0x4c206035)<br>
<br>
<br>
<br>
From: Itay Gazit [mailto:<a href="mailto:itaygazit@gmail.com" target="_blank">itaygazit@gmail.com</a>]<br>
Sent: Friday, June 25, 2010 11:47 AM<br>
To: Stefan Hajnoczi; M Lowe<br>
Cc: <a href="mailto:etherboot-discuss@lists.sourceforge.net" target="_blank">etherboot-discuss@lists.sourceforge.net</a>;
gpxe; Michael Brown<br>
Subject: Re: [Etherboot-discuss] SRP timeout</p>
<div>
<div>
<p class="MsoNormal"><br>
Hi Matthew,<br>
Stefan is right, you should reduce the DEBUG messages depth to find the<br>
fail cause.<br>
I have tried SRP boot only with Hermon driver (ConnectX) and it worked<br>
for me.<br>
Regards,<br>
Itay</p>
</div>
</div>
</div>
<p class="MsoNormal"> </p>
</div>
</div></div></div>
</div>
<br>------------------------------------------------------------------------------<br>
This SF.net email is sponsored by Sprint<br>
What will you do first with EVO, the first 4G phone?<br>
Visit <a href="http://sprint.com/first" target="_blank">sprint.com/first</a> -- <a href="http://p.sf.net/sfu/sprint-com-first" target="_blank">http://p.sf.net/sfu/sprint-com-first</a><br>_______________________________________________<br>
Etherboot-discuss mailing list<br>
<a href="mailto:Etherboot-discuss@lists.sourceforge.net">Etherboot-discuss@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/etherboot-discuss" target="_blank">https://lists.sourceforge.net/lists/listinfo/etherboot-discuss</a><br>
<br></blockquote></div><br>