[gPXE] [Etherboot-discuss] SRP timeout

Stefan Hajnoczi stefanha at gmail.com
Wed Jun 23 04:27:59 EDT 2010


On Wed, Jun 23, 2010 at 6:44 AM, M Lowe <mlowe at shaw.ca> wrote:
> My motherboard doesn't have a serial port, so that's not an
> option. Unless gPXE supports USB-Serial converters?

gPXE doesn't support USB.  Can your BIOS can redirect to (USB-)serial?

> Arbel 0x1f7c4 command failed with status 22:
> 000404c8: 00 00 00 00 00 00 00 00-00 00 00 00 00 (rest of line cut off)
> 000404d8: cf ec 00 00 00 00 0 (rest of line cut off)

I think this is the error code (from the Linux driver):

/* HCA local attached memory not present: */
MTHCA_CMD_STAT_LAM_NOT_PRE    = 0x22,

The gPXE source says this error can be ignored.

> Arbel 0x1f7c4 command failed with status 0a:
> 0004019c: 00 00 00 00 cf eb f0 00-00 00 00 02 00 00 00 00 :
> ...............
> 000401ac: cf ec 00 00 00 00 00 00-0a 00 30 24             : .........0$
> Arbel 0x1f7c4 could not issue MAD IFC: Input/output error (0x1d714039)

Error code from Linux again:

/* Index out of range: */
MTHCA_CMD_STAT_BAD_INDEX      = 0x0a,

I think this happens here:
/* Update MAD parameters */
for ( i = 0 ; i < ARBEL_NUM_PORTS ; i++ )
	ib_smc_update ( arbel->ibdev[i], arbel_mad );

The driver defines ARBEL_NUM_PORTS to 2, so perhaps it is probing a port that
doesn't exist.  This should be fine, too.

> It seems that running gdbstub halts whatever thread is handling the network
> IO, making it impossible to connect to gdbstub over udp.  After exiting
> gdbstub, gPXE starts responding to pings and arp requests again.

The gdbstub performs low-level network I/O - it directly polls the network
device for packets.  The network stack will not respond while the gdbstub is
active.  However, the gdbstub implements ARP response directly.

Are you running gdbudp on the NIC you are trying to debug?  In order to be able
to debug the arbel driver, gdbudp needs to use another NIC (e.g. an e1000
card).  This is because setting breakpoints in the arbel code won't work if
gdbudp is using the arbel card.

> Any ideas?

I think you are on the right track looking at DBG() messages.  You've
established that transmit is working and the target receives the login
request.

You might need to reduce the number of DBG() messages in gPXE's
receive code path when debugging without a serial port.  Run without
the ":3" on the DEBUG= options for less verbose output.  You can also
try commenting out or moving DBG() messages that are too frequent and
not useful.

The aim would be to find out if the response is being received at each
layer of the stack (arbel driver, infiniband, srp) and then understand
the reason for dropping the response.

Michael Brown and Itay Gazit may have better Infiniband and SRP
debugging ideas.  I have CCed them and added the gPXE mailing list
(the Etherboot-discuss list has been replaced by gpxe at etherboot.org).

Stefan


More information about the gPXE mailing list