[gPXE] [Etherboot-discuss] SRP timeout
Itay Gazit
itaygazit at gmail.com
Fri Jun 25 13:46:59 EDT 2010
Hi Matthew,
Stefan is right, you should reduce the DEBUG messages depth to find the fail
cause.
I have tried SRP boot only with Hermon driver (ConnectX) and it worked for
me.
Regards,
Itay
On Wed, Jun 23, 2010 at 11:27 AM, Stefan Hajnoczi <stefanha at gmail.com>wrote:
> On Wed, Jun 23, 2010 at 6:44 AM, M Lowe <mlowe at shaw.ca> wrote:
> > My motherboard doesn't have a serial port, so that's not an
> > option. Unless gPXE supports USB-Serial converters?
>
> gPXE doesn't support USB. Can your BIOS can redirect to (USB-)serial?
>
> > Arbel 0x1f7c4 command failed with status 22:
> > 000404c8: 00 00 00 00 00 00 00 00-00 00 00 00 00 (rest of line cut off)
> > 000404d8: cf ec 00 00 00 00 0 (rest of line cut off)
>
> I think this is the error code (from the Linux driver):
>
> /* HCA local attached memory not present: */
> MTHCA_CMD_STAT_LAM_NOT_PRE = 0x22,
>
> The gPXE source says this error can be ignored.
>
> > Arbel 0x1f7c4 command failed with status 0a:
> > 0004019c: 00 00 00 00 cf eb f0 00-00 00 00 02 00 00 00 00 :
> > ...............
> > 000401ac: cf ec 00 00 00 00 00 00-0a 00 30 24 : .........0$
> > Arbel 0x1f7c4 could not issue MAD IFC: Input/output error (0x1d714039)
>
> Error code from Linux again:
>
> /* Index out of range: */
> MTHCA_CMD_STAT_BAD_INDEX = 0x0a,
>
> I think this happens here:
> /* Update MAD parameters */
> for ( i = 0 ; i < ARBEL_NUM_PORTS ; i++ )
> ib_smc_update ( arbel->ibdev[i], arbel_mad );
>
> The driver defines ARBEL_NUM_PORTS to 2, so perhaps it is probing a port
> that
> doesn't exist. This should be fine, too.
>
> > It seems that running gdbstub halts whatever thread is handling the
> network
> > IO, making it impossible to connect to gdbstub over udp. After exiting
> > gdbstub, gPXE starts responding to pings and arp requests again.
>
> The gdbstub performs low-level network I/O - it directly polls the network
> device for packets. The network stack will not respond while the gdbstub
> is
> active. However, the gdbstub implements ARP response directly.
>
> Are you running gdbudp on the NIC you are trying to debug? In order to be
> able
> to debug the arbel driver, gdbudp needs to use another NIC (e.g. an e1000
> card). This is because setting breakpoints in the arbel code won't work if
> gdbudp is using the arbel card.
>
> > Any ideas?
>
> I think you are on the right track looking at DBG() messages. You've
> established that transmit is working and the target receives the login
> request.
>
> You might need to reduce the number of DBG() messages in gPXE's
> receive code path when debugging without a serial port. Run without
> the ":3" on the DEBUG= options for less verbose output. You can also
> try commenting out or moving DBG() messages that are too frequent and
> not useful.
>
> The aim would be to find out if the response is being received at each
> layer of the stack (arbel driver, infiniband, srp) and then understand
> the reason for dropping the response.
>
> Michael Brown and Itay Gazit may have better Infiniband and SRP
> debugging ideas. I have CCed them and added the gPXE mailing list
> (the Etherboot-discuss list has been replaced by gpxe at etherboot.org).
>
> Stefan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://etherboot.org/pipermail/gpxe/attachments/20100625/b522a3bb/attachment.html
More information about the gPXE
mailing list