A SGI booting problem

From: Hsin Wang <wang_at_postbox.csi.cuny.edu>
Date: Fri, 1 Dec 2000 16:57:32 -0500

Dear AMMRLs,

This is not a spectrometer problem. But I think this is a pretty
knowledgeable group. Perhaps I can get some advice here.

Our SGI Octane system has encountered a problem in rebooting. While it is
up, it appears normal. But frequently it has problem rebooting. The
problem is that it could not find the boot file in the OSLoadPartition. We
do have a vasity program with SGI and after talking with them we narrowed
down to scsi controller or the disk that is connected to the scsi
controller. I am trying to decide which is the culprit and what to do next.

There are 2 scsi controllers on this system. All devices connected to the
scsi(1) are fine. The /root partition is in the internal disk that is
connected to scsi (0). The system sees scsi(0) when it is up, hinv shows 2
controllers and everything. But when it is brought down to the command
monitor level, hinv only displays devices related to the scsi(1).
Furthermore, printenv shows that
the OSLoadPartition has been altered from scsi(0) to scsi(1). Of course it
could not find the boot file there. We reset the OSLoadPartition and
SystemPartition back to scsi(0) with setenv. But the system sometimes does
not see scsi(0) devices and said boot file not found. Sometimes it sees it
and the system booted up without a problem. So far the unplugging of the
system and replugging it always resulted in successful bootup.

I also saw scsi parity error messages for this controller (0) while trying
to get into miniroot. Once it booted up with miniroot, it sees the internal
disk (on scsi(0)), one disk on scsi(1), but not the other disk on scsi(1).
Once we umount the /root, miniroot does not see /dev/dsk (which includes
devices on both controllers) at all and we could not do xfs_check or
xfs_repair.

It seems that either the scsi(0) is bad or the disk might be causing scsi
problems. I do not have hardware support and have to troubleshoot this by
myself. One way is to find another disk and clone the /root disk into it,
put it in the place of /root disk to rule out the disk. Trouble is I would
have to wipe out data from another disk to do this test. That is pretty
painful process, it seems to me. Are there any other thoughts about this?

Hsin

--
Hsin Wang, Ph.D.
NMR Facility Manager
College of Staten Island
2800 Victory Boulevard
Staten Island, NY 10314
Phone: 718-982-3809
Fax: 718-982-3910
Email: wang_at_postbox.csi.cuny.edu
Received on Fri Dec 01 2000 - 16:54:56 MST

This archive was generated by hypermail 2.4.0 : Sat Jun 03 2023 - 18:14:00 MST