We'd like to remind Forumites to please avoid political debate on the Forum... Read More »
We're aware that some users are experiencing technical issues which the team are working to resolve. See the Community Noticeboard for more info. Thank you for your patience.
📨 Have you signed up to the Forum's new Email Digest yet? Get a selection of trending threads sent straight to your inbox daily, weekly or monthly!
What's happening with the Forums
Options
Comments
-
One thing i'd like to say is I love the openness of this site, Martin you're actually connected to the site and its posters, not just an Admin such as many other sites, its refreshing to see!
What Happened To Summer!?
0 -
It's something in the slave code. Both slaves stopped - but never at the same time. To be fair, it was always under heavy load.
It's completely unacceptable for replication to fail because of any amount of load. That would be bug territory and MySQL wants to know about it. This is the stuff that Google's ad system, Wikipedia and lots of others rely on at very high loads, so no way for it to be tolerable to have load-related failures. But it's data inconsistency from your description later and that's usually due to someone or an application changing the slave or starting out of sync somehow. Occasionally bugs, depending in part on how old the MySQL version is or whether you're lucky enough to find a new one.You can't do that. In order to get a consistant database from the slave you'd need to stop the database (either full lock or down it), and then do the copy. That would mean you were no longer resiliant.
This is part of why I like 4 as a number, not three.But if you're not using InnoDB, just MyISAM, STOP SLAVE SQL_THREAD will stop just the SQL part while the I/O thread will continue getting the changes from the master. Then you can use FLUSH TABLES WITH READ LOCK in one connection and make a file level copy followed by releasing the locks and START SLAVE SQL_THREAD. Do not do this if you are using InnoDB, it will not work properly. For InnoDB you'd need to use a filesystem snapshot instead (which is not the same as copying the files in the filesystem) and for mixed MyISAM and InnoDB you can't really do it online reliably without interruption if it's too slow to do mysqldump - really needs a server shutdown to work otherwise.
The *easiest* way is to just wipe the database on the faulty slave, set the transaction number back to zero and then point it at the master again. The slave then completely rebuilds the database from the beginning from the transaction log.
Takes time, but no more time that copying it from the other slave.
I'm surprised that you find it no slower than copying OS files from another slave. It's usually far slower. But if it works for you that's what matters.They usually stop because the values/structure of the database is different from what the db engine was expecting, so the transaction can't go through. It's almost impossible to fix because you don't know what values where would be acceptable. The logs in that situation are less than helpful...
If a replay of all of the binary logs works, then it'll probably be some human or the application changing something on the slave, usually by accident. Not always, bugs happen too. Setting the slave to read_only is handy to limit the potential for this but SUPER/root can still make changes. Doesn't block replication or, in recent versions, the creation of temporary tables.0 -
Thanks for all the great work you guys do.
We are all behind you 100%Eleventh Heaven No. 321
1.2.3.4.5.6.7.8.9.10.110 -
It's completely unacceptable for replication to fail because of any amount of load. That would be bug territory and MySQL wants to know about it.
It was definately a bug somewhere; it had the smell of "race-condition". I just never had the time to chase it up.This is part of why I like 4 as a number, not three.
I had enough trouble scraping up 2 slaves... And yes, their "day job" was something else (not db related), but not very heavily loaded doing that.I'm surprised that you find it no slower than copying OS files from another slave. It's usually far slower. But if it works for you that's what matters.
Well the database was an "accumulate only" kind of database (bit like MSE is) with virtually no changing of stuff once it's there, so the transaction log would not be hugely different from the database. I use scp which isn't the fastest in the world.
Don't tell me you use NFS... :eek:If a replay of all of the binary logs works, then it'll probably be some human or the application changing something on the slave, usually by accident. Not always, bugs happen too.
No db related applications running on the slaves - backups only."Follow the money!" - Deepthroat (AKA William Mark Felt Sr - Associate Director of the FBI)
"We were born and raised in a summer haze." Adele 'Someone like you.'
"Blowing your mind, 'cause you know what you'll find, when you're looking for things in the sky." OMD 'Julia's Song'0 -
great site i discovered and its brilliant.
good luck to the whole team and keep up the good work!!!!!!!!!!!!0 -
NFS via something like a NetApp box isn't so bad. Otherwise it's a right pain so far as reliability goes. Putting database files on a plain NFS mount (not a good appliance) is asking for trouble. Only worth doing it if you need so many drives that you can't fit them in a regular box, or if you need the sort of backup capabilities that a really nice SAN setup offers. Cost of entry is a bit on the high side, though.
I hesitated about writing about putting logs on a (NFS) mounted volume but that is better than the same computer for loss of the box failures, even though it'll break more than a local drive. Better is replication so you get to avoid NFS but still get the continuous backup on another computer.
Better still is something like the semi-synchronous replication that Google implemented, and which I think MySQL is planning to include in the standard server, along with several other of the Google improvements. That waits until at least one slave has confirmed that it has the update before it commits the transaction on the master.
For replication you might be suffering from the glitch where a slave seems to rewrite a few snippets of past events sometimes after a disconnection. The usual workaround for that is CHANGE MASTER TO the position of the SQL thread. At least until MySQL manages to find the cause.
Good to hear that you got in a couple of slaves even though you didn't need them for load. And that you seem to have all of the binary logs. Just the sort of prudence that I love to hear about. And that saved my own skin at least once.0 -
NFS via something like a NetApp box isn't so bad. Otherwise it's a right pain so far as reliability goes.
Thinking more of security - NFS is unencrypted and has a past history (like sendmail) of being taken to the cleaners.
I don't like to trust my own network because that's a sure way of getting owned...Putting database files on a plain NFS mount (not a good appliance) is asking for trouble. Only worth doing it if you need so many drives that you can't fit them in a regular box, or if you need the sort of backup capabilities that a really nice SAN setup offers. Cost of entry is a bit on the high side, though.
Probably better to either go the full SAN route (rather than route it through NFS) or use a dedicated iSCSI connection. Both have the advantages of removing a layer of indirection from your setup. Netapps do have other advantages that may make NFS more worthwhile - but then again, db storage in one giant file tends to negate them.I hesitated about writing about putting logs on a (NFS) mounted volume but that is better than the same computer for loss of the box failures, even though it'll break more than a local drive.
That's only a good idea if uptime is unimportant. When NFS goes, your db process will stop until it comes back. Probably drbd is a better solution in that circumstance - with logcheck doing the appropiate singing and dancing about your network/machine problem.
If you really want to live dangerously, use Microsoft's NFS implentation... :eek: :eek: :eek: :eek: :eek: :rotfl: :rotfl: :rotfl: :rotfl:
It very rarely fails more than once in the same minute...Better still is something like the semi-synchronous replication that Google implemented, and which I think MySQL is planning to include in the standard server, along with several other of the Google improvements. That waits until at least one slave has confirmed that it has the update before it commits the transaction on the master.
Does it just let a backlogue of unconfirmed transactions accumulate in cases of network outages?For replication you might be suffering from the glitch where a slave seems to rewrite a few snippets of past events sometimes after a disconnection. The usual workaround for that is CHANGE MASTER TO the position of the SQL thread. At least until MySQL manages to find the cause.
That may well be it, but never really had time to look into it.Good to hear that you got in a couple of slaves even though you didn't need them for load. And that you seem to have all of the binary logs. Just the sort of prudence that I love to hear about. And that saved my own skin at least once.
Well ironically it never has for me. If I prepare for things failing they never do. My life is like that... :rolleyes:"Follow the money!" - Deepthroat (AKA William Mark Felt Sr - Associate Director of the FBI)
"We were born and raised in a summer haze." Adele 'Someone like you.'
"Blowing your mind, 'cause you know what you'll find, when you're looking for things in the sky." OMD 'Julia's Song'0 -
Martin if it was up to us all on here you'd definately get a Knighthood.
Thanks for all you do. xOfficial DFW Nerd Club Member no:219In the Court Of The Crimson KingI don't believe in the concept of hell, but if I did I would think of it as filled with people who were cruel to animals.Gary Larson0 -
NFS security (and MySQL security) in a colo. Hmm, I wonder just how good this colo is at keeping customers apart so one can't attack the others, perhaps as a result of a compromised other customer? I hope they at least don't run MySQL on the standard port (known target, easy protection), do use the latest more secure client protocols and do follow the basic account security recommendations that MySQL makes.
MySQL will buffer writes for a while in disk full cases, no bets on how NFS might break to break that. But it's a trade-off: do you want an outage that's going to be really obvious or potential loss of recent data? Depends; I tend to prefer obvious trouble so it's noticed and fixed quickly. But since you can use replication to make it irrelevant that's the better answer.
Haven't looked at the finer details of semi-sync replication but it probably assumes a reliable network, or at least that one of them is reliable - say a dedicated connection via the second network port that's common on servers.
I've never used MS NFS. You make me happy that I haven't.0 -
Were all behind you Martin, and we will all do our little bit to help, whatever we can and whatever it takes to keep this site on top form:A:dance:1+1+1=1:dance::A
"Marleyboy you are a legend!"
MarleyBoy "You are the Greatest"
Marleyboy You Are A Legend!
Marleyboy speaks sense
marleyboy (total legend)
Marleyboy - You are, indeed, a legend.0
This discussion has been closed.
Confirm your email address to Create Threads and Reply

Categories
- All Categories
- 351K Banking & Borrowing
- 253.1K Reduce Debt & Boost Income
- 453.6K Spending & Discounts
- 244K Work, Benefits & Business
- 598.9K Mortgages, Homes & Bills
- 176.9K Life & Family
- 257.3K Travel & Transport
- 1.5M Hobbies & Leisure
- 16.1K Discuss & Feedback
- 37.6K Read-Only Boards