Article: 81919 of comp.unix.solaris
From: Juhan Leemet <juhan@logicognosis.com>
Newsgroups: comp.unix.solaris
Subject: Re: DiskSuite - puzzling issue
Date: Thu, 22 Jul 2004 22:21:45 -0200
Organization: Logicognosis, Inc.
Lines: 130
Message-ID: <pan.2004.07.23.00.21.44.867282@logicognosis.com>
References: <69a4dd7c.0407200137.6fe74488@posting.google.com> <pan.2004.07.20.13.58.00.881704@logicognosis.com> <1090500769.423115@docbert>
NNTP-Posting-Host: wiley-161-34910.roadrunner.nf.net
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: nntp-stjh-01-01.rogers.nf.net 1090542639 3194 205.251.70.11 (23 Jul 2004 00:30:39 GMT)
X-Complaints-To: newsadmin@thezone.net
NNTP-Posting-Date: Fri, 23 Jul 2004 00:30:39 +0000 (UTC)
User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.)
Path: news.meer.net!newsread1.mlpsca01.us.to.verio.net!newsartnum1.dllstx09.us.to.verio.net!newspeer1.stngva01.us.to.verio.net!news.verio.net!news.glorb.com!newsfeed2.telusplanet.net!newsfeed.telus.net!west2.newsfeed.sprint-canada.net!newsfeed.grouptelecom.net!news.rogers.nf.net!news.logicognosis.com!not-for-mail
Xref: archive.mv.meer.net comp.unix.solaris:81919

On Thu, 22 Jul 2004 12:52:49 +0000, Scott Howard wrote:
> Juhan Leemet <juhan@logicognosis.com> wrote:
>> I believe the reason the system does not load the metadbs is that you do
>> not have more than 1/2 the number of valid metadatabase replicas that SVM
>> prefers. I'll bet your /etc/system does not have the line:
>> 
>>        set md:mirrored_root_flag=1
> 
> Please don't use this flag. Please?

Um, OK... I'll take it "under advisement". I have not read any explanation
for why this should never be used, considering that it is mentioned in
some Sun document(s). I have always been careful to qualify any mention of
this flag by saying "some claim that this is dangerous" (or some such).

I would like to understand how (under what circumstances) SVM/SDS "gets
confused" and trashes the mirror by replicating backwards. I have not read
anywhere any clear description of how/why that would occur, your post
notwithstanding. I would really like to understand this issue.

> To take a step back...
> A _correctly configured_ SDS/SVM, even running on only two disks, will
> _never_ drop to the OK prompt, crash, reboot or anything else when a disk
> fails.  If it does, then either your system isn't correctly configured,
> or you've found a bug.

Yes, I agree, and I've said that in past posts. I have corrected some
others who have suggested not mirroring swap (which would cause a crash).

> The only time that the md:mirrored_root_flag comes into play is during
> boot, where it will allow the system to boot with exactly 50% or more of
> metadb replicas available, instead of the normal >50% of replicas
> available.

Yes, I agree. No argument from me.

> There's generally only 3 situations where a machine will be rebooting with
> exactly 50% of it's replicas available :
> * After a disk replacement in a system where disks are not hot-swap, and
> the replicas on the disk being replaced were not removed before the machine
> was shut down.  This is a process issue - the admins need to be educated
> to remove the metadb replicas before shutting the machine down. The fix
> in this situation is trivial (metadb -d and reboot) and relatively quick
> (especially given that machines with non-hotswap disks are generally fairly
> quick to post).

OK. Doesn't apply to me (SCA disks), but I'll grant that.

BTW, what happens if you (make a mistake) and just replace the disks,
without deleting the metadb on them? There should not be any valid metadb
on the replacement disk. I would hope that SVM/SDS can detect that?!? The
metadb man page says "Each copy, referred to as a replica, is subject to
strict consistency checking to ensure correctness." Are you saying that
does not work? Do you have documented cases/incidents where it didn't?

> * When rebooting a machine where the admins have not noticed a failed
> disk.  If you've got a failed disk and you haven't noticed, then the
> machine failing to reboot so that you notice is probably a good thing!
> Certainly it's far better than waiting for the 2nd disk to fail to find
> out. Again the fix is trivial, and fairly quick.

Unless you're not there, and the machine won't boot, and other machines
then get stuck for whatever reason (NFS files not accessible?).

> * If a disk fails during a reboot (or power cycle, or relocation, etc).
> Again as the machine is already down the overhead of having to fix this
> is minimal, and being notified of this failure fairly early in the boot
> process is not a bad thing in my mind.

Unlikely, but I would want the system to boot, if possible.

> The only situation where this really becomes a problem in terms of outage
> time is if you've got a failed disk and a machine reboots unexpectedly
> (panic, power outage, failfast). There's a really simple solution to this
> one too - monitor your systems! Running metadb/metastat from cron every
> 10 minutes and looking for errors is simple to setup, not to mention the
> dozens of scripts out there to do it for you - not the least the one in
> the SDS manuals!

OK, this is the situation that I would like to avoid. I have my servers
running off in the corner. I do keep an eye on them, but I don't monitor
them every second. I have other things to do. I would like to setup some
monitoring (every 10 minutes? sounds excessive? you don't expect 2 disk
failures within anything close to a 10 minute interval, do you?). I was
thinking more of a few times a day, and sending e-mail and/or SMS message
to my cell phone. I might not be there, leaving the servers to run,
sort of "lights out" but without the fancy/expensive gear. For my use, I
would expect a replacement response "within 1 day" to be acceptable. No?

I'm not recommending that people be foolish with their crucial gear, which
has contractual or severe availability implications. If you have a
big/important system, definitely use 3 way root mirrors. Problem solved.
If you're controlling a nuclear reactor (Sun specifically exclude that
usage in their license text!), go for full clustering on other gear. No
sense in being stupid, jeopardizing your systems, your job and career.

Now, my situation is probably different from yours. I don't have 1000 high
paying customers and a contractual obligation for 99.99% availability
(does anyone actually reach those numbers?!?). Not everyone does, either.
Maybe the qualification is: don't use X when Y, where we specify Y. I
haven't seen anything in your post to discourage me from using that flag.
I haven't seen the specific condition for which this is discouraged, just
your blanket preference/recommendation, which has validity. However, it
does not do anything to explain why (can't help it, I'm an EE).

In other discussions, we have agreed that it is better to have more
metadbs scattered across more (than 2) disks, in which case the question
is moot. One could even argue that for a small machine, with only 2 disks
(so I cannot put more metadbs anywhere) there is little likelihood of any
catastrophic contractual failure either. I'm not going to be supporting
1000 users with any Ultra2 with 2 x 9GB, am I? I was initially concerned
about a small Ultra2 with a 711: wondered if I would ever want to boot the
Ultra2 alone, without the 711, and I didn't want to have problems. Now I'm
thinking that I'll put more replicas on the 2 system disks, and less
replicas in the 711. Then the Ultra2 will boot without the 711, or the
Ultra2 (without a mirror) will boot with the 711. The only thing that
won't boot is the Ultra2 with a failed mirror and without the 711. I will
use quorum and will not be needing the flag, when I reconfigure like that.

If I had a small machine tucked away somewhere doing monitoring or process
control, I would probably use the flag with root mirrors too, unless
someone can show how the system might boot "insane and dangerous!"?

If you're never supposed to use this flag, then why did Sun put it into
Solaris? Why did they document it? Where were the retractions? Reasons?

-- 
Juhan Leemet
Logicognosis, Inc.



Article: 81712 of comp.unix.solaris
From: martin_google@oasys.demon.co.uk (MartinH)
Newsgroups: comp.unix.solaris
Subject: Re: DiskSuite - puzzling issue
Date: 20 Jul 2004 07:52:39 -0700
Organization: http://groups.google.com
Lines: 103
Message-ID: <dc448bdf.0407200652.6323fc39@posting.google.com>
References: <69a4dd7c.0407200137.6fe74488@posting.google.com>
NNTP-Posting-Host: 195.171.114.194
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: posting.google.com 1090335160 2227 127.0.0.1 (20 Jul 2004 14:52:40 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 20 Jul 2004 14:52:40 +0000 (UTC)
Path: news.meer.net!newsread1.mlpsca01.us.to.verio.net!newsartnum1.dllstx09.us.to.verio.net!newspeer1.stngva01.us.to.verio.net!news.verio.net!news.maxwell.syr.edu!postnews2.google.com!not-for-mail
Xref: archive.mv.meer.net comp.unix.solaris:81712

arviek32@yahoo.com (Arvie) wrote in message news:<69a4dd7c.0407200137.6fe74488@posting.google.com>...
> Hi 
> 
> Environment: 
> 
> V210 with 2 x 36Gb disks, Solaris 9  & SVM (DiskSuite)
> 
> - Had a disk failure where the primary failed (c0t0d0)
> - Broke mirroring
> - Replaced primary disk with new disk
> - Renabled mirroring
> - Waited for disk sync to complete
> 
> However, when I booted up the system, I got:
> ==========================================
> Hostname: aw1.abcxxx.net
> metainit: aw1.abcxxx.net: stale databases
> 
> Insufficient metadevice database replicas located.
> 
> Use metadb to delete databases which are broken.
> Ignore any "Read-only file system" error messages.
> Reboot the system when finished to reload the metadevice database.
> After reboot, repair any broken database replicas which were deleted.
> 
> Type control-d to proceed with normal startup,
> (or give root password for system maintenance):
> =============================================
> 
> So , I checked with metadb
> # metadb
>         flags           first blk       block count
>     M     p             16              unknown         /dev/dsk/c0t0d0s4
>     M     p             8208            unknown         /dev/dsk/c0t0d0s4
>     M     p             16400           unknown         /dev/dsk/c0t0d0s4
>      a m  p  lu         16              8192            /dev/dsk/c0t1d0s4
>      a    p  l          8208            8192            /dev/dsk/c0t1d0s4
>      a    p  l          16400           8192            /dev/dsk/c0t1d0s4
> 
> Then I removed the stale db
> 
> # metadb -d -f c0t0d0s4
> metadb: aw1.abcxxx.net: Bad address
> 
> and checked to make sure it was gone: 
> 
> # metadb
>         flags           first blk       block count
>      a m  p  lu         16              8192            /dev/dsk/c0t1d0s4
>      a    p  l          8208            8192            /dev/dsk/c0t1d0s4
>      a    p  l          16400           8192            /dev/dsk/c0t1d0s4
> 
> I then recreated the db on the disk and re-checked by:
> 
> # metadb -a -c 3 c0t0d0s4
> metadb: nw1.itglobal.net: Bad address
> 
> # metadb
>         flags           first blk       block count
>      a        u         16              8192            /dev/dsk/c0t0d0s4
>      a        u         8208            8192            /dev/dsk/c0t0d0s4
>      a        u         16400           8192            /dev/dsk/c0t0d0s4
>      a m  p  lu         16              8192            /dev/dsk/c0t1d0s4
>      a    p  l          8208            8192            /dev/dsk/c0t1d0s4
>      a    p  l          16400           8192            /dev/dsk/c0t1d0s4
> 
> However, now if I reboot, I still get the same problem:
> ========================================================
> Hostname: aw1.abcxxx.net
> metainit: aw1.abcxxx.net: stale databases
> 
> Insufficient metadevice database replicas located.
> 
> Use metadb to delete databases which are broken.
> Ignore any "Read-only file system" error messages.
> Reboot the system when finished to reload the metadevice database.
> After reboot, repair any broken database replicas which were deleted.
> 
> Type control-d to proceed with normal startup,
> (or give root password for system maintenance):
> ========================================================
> 
> So what am I missing? Thanks for any help!
> 
> Regards
> 
> Arvie

Ensure you have got:

set md:mirrored_root_flag=1

in /etc/system

Then delete the bad metadb with:
metadb -d -f c0t0d0s4

THEN REBOOT

You will be thown back to single user, NOW recreate the metadb:
metadb -a -c 3 c0t0d0s4

HTH.


