[gPXE] [Etherboot-discuss] SRP timeout

Itay Gazit itaygazit at gmail.com
Sat Jul 24 07:01:31 EDT 2010


Viswa,
Michael is trying to add the RC support for the Arbel.

Michael,
I suspect something earlier went wrong before posting the WQE. Can you
verify the all MAP_FA process suceed.

Regards,

Itay
On Fri, Jul 23, 2010 at 8:03 AM, Viswanath Krishnamurthy <
viswa.krish at gmail.com> wrote:

> I just looked into arbel code. It does not implement RC qp type at all (It
> implements UD only). This means SRP will not work with arbel card since SRP
> uses RC/RDMA for data transfer.
>
> Your best bet at this moment is to us Hermon card (unless you want to add
> RC QP capability into arbel driver :-) )
>
> Itay,
>
> Can you reconfirm this from the arbel code?
>
> Thx,
> Viswa
>
>
>   On Wed, Jul 21, 2010 at 9:19 PM, M Lowe <mlowe at shaw.ca> wrote:
>
>>    Even ideas on how I can debug this issue further would help. I don’t
>> mind putting in the leg work at all, but a lot of this code is over my head.
>>
>>
>>
>>
>>
>>
>> *From:* Itay Gazit [mailto:itaygazit at gmail.com]
>> *Sent:* Tuesday, July 13, 2010 8:41 AM
>> *To:* M Lowe; Michael Brown
>> *Cc:* Stefan Hajnoczi; etherboot-discuss at lists.sourceforge.net; gpxe
>>
>> *Subject:* Re: [Etherboot-discuss] SRP timeout
>>
>>
>>
>> Michael,
>>
>> Do you have an idea? What can be the problem with the arbel driver?
>>
>>
>>
>> Itay
>>
>> On Mon, Jul 12, 2010 at 3:18 AM, M Lowe <mlowe at shaw.ca> wrote:
>>
>> I have been able to log the debug messages now however I see no errors
>> that would indicate where the problem is.
>>
>> Just to recap quickly, the problem is that san-booting over InfiniBand
>> using SRP doesn't work and just times out. The timeout occurs while
>> waiting for a response to the SRP login request. I'm fairly certain the
>> problem lies within gPXE because I can access the SRP target just fine
>> through a local installation of Windows. In addition, on the SRP target
>> side I have traced through the ib_srpt module and found that a login
>> response is generated and sent (or at least posted to the mthca module
>> work queue).
>>
>> On the gPXE side I've found that I'm not receiving the SRP_LOGIN_RSP
>> packet even at the InfiniBand protocol level (net/infiniband.c). So far
>> I have been able to determine the packet is lost at some point in the
>> Arbel driver (drivers/infiniband/arbel.c) before arbel_complete().This
>> would indicate the problem exists within the Arbel driver and explains
>> why SRP sanboot worked with the Hermon driver. Despite compiling with
>> DEBUG=arbel:3 I get no errors indicating there are any problems or
>> dropped packets.
>>
>> Here is the output from autoboot with
>> DEBUG=srp,ipoib,arp,infiniband,ib_cm,ib_cmrc,ib_mcast,ib_mi,ib_packet,ib
>> _pathrec,ib_sma,ib_smc,ib_srp
>>
>> Note: I have added some debug messages to help illustrate the flow of
>> packets. At the beginning of ipoib_complete_recv, ib_complete_recv, and
>> ib_mi_complete_recv I have added "RX" debug messages.
>>
>> Booting from root path
>> "ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c9020
>> 022e5e4:0002c9020022e5e4"
>> SRP 0xbb134 using
>> ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c90200
>> 22e5e4:0002c9020022e5e4
>> SRP attached successfully
>> IBDEV 0xb9a84 creating completion queue
>> IBDEV 0xb9a84 created 8-entry completion queue 0xbb4c4 (0xbb214) with
>> CQN 0x83
>> IBDEV 0xb9a84 creating queue pair
>> IBDEV 0xb9a84 created queue pair 0xbb4f4 (0xbb5c4) with QPN 0x550403
>> IBDEV 0xb9a84 QPN 0x550403 has 4 send entries at [0xbb5a0,0xbb5b0)
>> IBDEV 0xb9a84 QPN 0x550403 has 2 receive entries at [0xbb5b0,0xbb5b8)
>> CMRC 0xbb1b4 using QPN 550403
>> SRP 0xbb134 TX login request tag 0000000000000001
>> CM 0xbbb64 created for IBDEV 0xb9a84 QPN 550403
>> CM 0xbbb64 connecting to fe800000:00000000:0002c902:0022e5e5
>> 0002c902:0022e5e4
>> MI 0xba564 TX TID 6750584500000003 (03,02,01,0035) status 0000
>> infiniband RX
>> MI 0xba564 RX
>> MI 0xba564 RX TID 6750584500000003 (03,02,81,0035) status 0000
>> IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0
>> rate 6
>> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
>> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
>> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
>> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
>> infiniband RX
>> IPoIB 0xb9ccc RX
>> ARP cache add: IP 10.20.76.1 => IPoIB
>> 80000404:fe800000:00000000:0002c902:0022e5e5
>> ARP reply: IP 10.20.76.45 => IPoIB
>> 00550402:fe800000:00000000:0002c902:00243035
>> IPoIB peer 4 has MAC 80000404:fe800000:00000000:0002c902:0022e5e5
>> MI 0xba564 TX TID 6750584500000005 (03,02,01,0035) status 0000
>> infiniband RX
>> MI 0xba564 RX
>> MI 0xba564 RX TID 6750584500000005 (03,02,81,0035) status 0000
>> MI 0xba564 RX TID 6750584500000005 handling via transaction handler
>> IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0
>> rate 6
>> infiniband RX
>> IPoIB 0xb9ccc RX
>> ARP cache update: IP 10.20.76.1 => IPoIB
>> 80000404:fe800000:00000000:0002c902:0022e5e5
>> ARP reply: IP 10.20.76.45 => IPoIB
>> 00550402:fe800000:00000000:0002c902:00243035
>> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
>> MI 0xba564 abandoning TID 6750584500000004
>> CM 0xbbb64 connection request failed: Connection timed out (0x4c206035)
>> CMRC 0xbb1b4 disconnected: Connection timed out (0x4c206035)
>> SRP 0xbb134 socket closed: Connection timed out (0x4c206035)
>>
>>
>>
>> From: Itay Gazit [mailto:itaygazit at gmail.com]
>> Sent: Friday, June 25, 2010 11:47 AM
>> To: Stefan Hajnoczi; M Lowe
>> Cc: etherboot-discuss at lists.sourceforge.net; gpxe; Michael Brown
>> Subject: Re: [Etherboot-discuss] SRP timeout
>>
>>
>> Hi Matthew,
>> Stefan is right, you should reduce the DEBUG messages depth to find the
>> fail cause.
>> I have tried SRP boot only with Hermon driver (ConnectX) and it worked
>> for me.
>> Regards,
>> Itay
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>
>> _______________________________________________
>> Etherboot-discuss mailing list
>> Etherboot-discuss at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/etherboot-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://etherboot.org/pipermail/gpxe/attachments/20100724/4acc02c0/attachment.html 


More information about the gPXE mailing list