- 1.58 MB
- 2022-04-29 14:31:54 发布
- 1、本文档共5页,可阅读全部内容。
- 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,可选择认领,认领后既往收益都归您。
- 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形,可联系本站下载客服投诉处理。
- 文档侵权举报电话:19940600175。
'Chapter19:DistributedDatabases
Chapter19:DistributedDatabasesHeterogeneousandHomogeneousDatabasesDistributedDataStorageDistributedTransactionsCommitProtocolsConcurrencyControlinDistributedDatabasesAvailabilityDistributedQueryProcessingHeterogeneousDistributedDatabasesDirectorySystems
DistributedDatabaseSystemAdistributeddatabasesystemconsistsoflooselycoupledsitesthatsharenophysicalcomponentDatabasesystemsthatrunoneachsiteareindependentofeachotherTransactionsmayaccessdataatoneormoresites
HomogeneousDistributedDatabasesInahomogeneousdistributeddatabaseAllsiteshaveidenticalsoftwareAreawareofeachotherandagreetocooperateinprocessinguserrequests.EachsitesurrenderspartofitsautonomyintermsofrighttochangeschemasorsoftwareAppearstouserasasinglesystemInaheterogeneousdistributeddatabaseDifferentsitesmayusedifferentschemasandsoftwareDifferenceinschemaisamajorproblemforqueryprocessingDifferenceinsoftwareisamajorproblemfortransactionprocessingSitesmaynotbeawareofeachotherandmayprovideonlylimitedfacilitiesforcooperationintransactionprocessing
DistributedDataStorageAssumerelationaldatamodelReplicationSystemmaintainsmultiplecopiesofdata,storedindifferentsites,forfasterretrievalandfaulttolerance.FragmentationRelationispartitionedintoseveralfragmentsstoredindistinctsitesReplicationandfragmentationcanbecombinedRelationispartitionedintoseveralfragments:systemmaintainsseveralidenticalreplicasofeachsuchfragment.
DataReplicationArelationorfragmentofarelationisreplicatedifitisstoredredundantlyintwoormoresites.Fullreplicationofarelationisthecasewheretherelationisstoredatallsites.Fullyredundantdatabasesarethoseinwhicheverysitecontainsacopyoftheentiredatabase.
DataReplication(Cont.)AdvantagesofReplicationAvailability:failureofsitecontainingrelationrdoesnotresultinunavailabilityofrisreplicasexist.Parallelism:queriesonrmaybeprocessedbyseveralnodesinparallel.Reduceddatatransfer:relationrisavailablelocallyateachsitecontainingareplicaofr.DisadvantagesofReplicationIncreasedcostofupdates:eachreplicaofrelationrmustbeupdated.Increasedcomplexityofconcurrencycontrol:concurrentupdatestodistinctreplicasmayleadtoinconsistentdataunlessspecialconcurrencycontrolmechanismsareimplemented.Onesolution:chooseonecopyasprimarycopyandapplyconcurrencycontroloperationsonprimarycopy
DataFragmentationDivisionofrelationrintofragmentsr1,r2,…,rnwhichcontainsufficientinformationtoreconstructrelationr.Horizontalfragmentation:eachtupleofrisassignedtooneormorefragmentsVerticalfragmentation:theschemaforrelationrissplitintoseveralsmallerschemasAllschemasmustcontainacommoncandidatekey(orsuperkey)toensurelosslessjoinproperty.Aspecialattribute,thetuple-idattributemaybeaddedtoeachschematoserveasacandidatekey.
HorizontalFragmentationofaccountRelationbranch_nameaccount_numberbalanceHillsideHillsideHillsideA-305A-226A-15550033662account1=branch_name=“Hillside”(account)branch_nameaccount_numberbalanceValleyviewValleyviewValleyviewValleyviewA-177A-402A-408A-639205100001123750account2=branch_name=“Valleyview”(account)
VerticalFragmentationofemployee_infoRelationbranch_namecustomer_nametuple_idHillsideHillsideValleyviewValleyviewHillsideValleyviewValleyviewLowmanCampCampKahnKahnKahnGreendeposit1=branch_name,customer_name,tuple_id(employee_info)1234567account_numberbalancetuple_id500336205100006211237501234567A-305A-226A-177A-402A-155A-408A-639deposit2=account_number,balance,tuple_id(employee_info)
AdvantagesofFragmentationHorizontal:allowsparallelprocessingonfragmentsofarelationallowsarelationtobesplitsothattuplesarelocatedwheretheyaremostfrequentlyaccessedVertical:allowstuplestobesplitsothateachpartofthetupleisstoredwhereitismostfrequentlyaccessedtuple-idattributeallowsefficientjoiningofverticalfragmentsallowsparallelprocessingonarelationVerticalandhorizontalfragmentationcanbemixed.Fragmentsmaybesuccessivelyfragmentedtoanarbitrarydepth.
DataTransparencyDatatransparency:DegreetowhichsystemusermayremainunawareofthedetailsofhowandwherethedataitemsarestoredinadistributedsystemConsidertransparencyissuesinrelationto:FragmentationtransparencyReplicationtransparencyLocationtransparency
NamingofDataItems-Criteria1.Everydataitemmusthaveasystem-wideuniquename.2.Itshouldbepossibletofindthelocationofdataitemsefficiently.3.Itshouldbepossibletochangethelocationofdataitemstransparently.4.Eachsiteshouldbeabletocreatenewdataitemsautonomously.
CentralizedScheme-NameServerStructure:nameserverassignsallnameseachsitemaintainsarecordoflocaldataitemssitesasknameservertolocatenon-localdataitemsAdvantages:satisfiesnamingcriteria1-3Disadvantages:doesnotsatisfynamingcriterion4nameserverisapotentialperformancebottlenecknameserverisasinglepointoffailure
UseofAliasesAlternativetocentralizedscheme:eachsiteprefixesitsownsiteidentifiertoanynamethatitgeneratesi.e.,site17.account.Fulfillshavingauniqueidentifier,andavoidsproblemsassociatedwithcentralcontrol.However,failstoachievenetworktransparency.Solution:Createasetofaliasesfordataitems;Storethemappingofaliasestotherealnamesateachsite.Theusercanbeunawareofthephysicallocationofadataitem,andisunaffectedifthedataitemismovedfromonesitetoanother.
DistributedTransactionsTransactionmayaccessdataatseveralsites.Eachsitehasalocaltransactionmanagerresponsiblefor:MaintainingalogforrecoverypurposesParticipatingincoordinatingtheconcurrentexecutionofthetransactionsexecutingatthatsite.Eachsitehasatransactioncoordinator,whichisresponsiblefor:Startingtheexecutionoftransactionsthatoriginateatthesite.Distributingsubtransactionsatappropriatesitesforexecution.Coordinatingtheterminationofeachtransactionthatoriginatesatthesite,whichmayresultinthetransactionbeingcommittedatallsitesorabortedatallsites.
TransactionSystemArchitecture
SystemFailureModesFailuresuniquetodistributedsystems:Failureofasite.LossofmassagesHandledbynetworktransmissioncontrolprotocolssuchasTCP-IPFailureofacommunicationlinkHandledbynetworkprotocols,byroutingmessagesviaalternativelinksNetworkpartitionAnetworkissaidtobepartitionedwhenithasbeensplitintotwoormoresubsystemsthatlackanyconnectionbetweenthemNote:asubsystemmayconsistofasinglenodeNetworkpartitioningandsitefailuresaregenerallyindistinguishable.
CommitProtocolsCommitprotocolsareusedtoensureatomicityacrosssitesatransactionwhichexecutesatmultiplesitesmusteitherbecommittedatallthesites,orabortedatallthesites.notacceptabletohaveatransactioncommittedatonesiteandabortedatanotherThetwo-phasecommit(2PC)protocoliswidelyusedThethree-phasecommit(3PC)protocolismorecomplicatedandmoreexpensive,butavoidssomedrawbacksoftwo-phasecommitprotocol.Thisprotocolisnotusedinpractice.
TwoPhaseCommitProtocol(2PC)Assumesfail-stopmodel–failedsitessimplystopworking,anddonotcauseanyotherharm,suchassendingincorrectmessagestoothersites.Executionoftheprotocolisinitiatedbythecoordinatorafterthelaststepofthetransactionhasbeenreached.TheprotocolinvolvesallthelocalsitesatwhichthetransactionexecutedLetTbeatransactioninitiatedatsiteSi,andletthetransactioncoordinatoratSibeCi
Phase1:ObtainingaDecisionCoordinatorasksallparticipantstopreparetocommittransactionTi.CiaddstherecordstothelogandforceslogtostablestoragesendsprepareTmessagestoallsitesatwhichTexecutedUponreceivingmessage,transactionmanageratsitedeterminesifitcancommitthetransactionifnot,addarecordtothelogandsendabortTmessagetoCiifthetransactioncanbecommitted,then:addtherecordtothelogforceallrecordsforTtostablestoragesendreadyTmessagetoCi
Phase2:RecordingtheDecisionTcanbecommittedofCireceivedareadyTmessagefromalltheparticipatingsites:otherwiseTmustbeaborted.Coordinatoraddsadecisionrecord,or,tothelogandforcesrecordontostablestorage.Oncetherecordstablestorageitisirrevocable(eveniffailuresoccur)Coordinatorsendsamessagetoeachparticipantinformingitofthedecision(commitorabort)Participantstakeappropriateactionlocally.
HandlingofFailures-SiteFailureWhensiteSirecovers,itexaminesitslogtodeterminethefateoftransactionsactiveatthetimeofthefailure.Logcontainrecord:siteexecutesredo(T)Logcontainsrecord:siteexecutesundo(T)Logcontainsrecord:sitemustconsultCitodeterminethefateofT.IfTcommitted,redo(T)IfTaborted,undo(T)ThelogcontainsnocontrolrecordsconcerningTrepliesthatSkfailedbeforerespondingtotheprepareTmessagefromCisincethefailureofSkprecludesthesendingofsucharesponseC1mustabortTSkmustexecuteundo(T)
HandlingofFailures-CoordinatorFailureIfcoordinatorfailswhilethecommitprotocolforTisexecutingthenparticipatingsitesmustdecideonT’sfate:Ifanactivesitecontainsarecordinitslog,thenTmustbecommitted.Ifanactivesitecontainsanrecordinitslog,thenTmustbeaborted.Ifsomeactiveparticipatingsitedoesnotcontainarecordinitslog,thenthefailedcoordinatorCicannothavedecidedtocommitT.CanthereforeabortT.Ifnoneoftheabovecasesholds,thenallactivesitesmusthavearecordintheirlogs,butnoadditionalcontrolrecords(suchasof).InthiscaseactivesitesmustwaitforCitorecover,tofinddecision.Blockingproblem:activesitesmayhavetowaitforfailedcoordinatortorecover.
HandlingofFailures-NetworkPartitionIfthecoordinatorandallitsparticipantsremaininonepartition,thefailurehasnoeffectonthecommitprotocol.Ifthecoordinatoranditsparticipantsbelongtoseveralpartitions:Sitesthatarenotinthepartitioncontainingthecoordinatorthinkthecoordinatorhasfailed,andexecutetheprotocoltodealwithfailureofthecoordinator.Noharmresults,butsitesmaystillhavetowaitfordecisionfromcoordinator.Thecoordinatorandthesitesareinthesamepartitionasthecoordinatorthinkthatthesitesintheotherpartitionhavefailed,andfollowtheusualcommitprotocol.Again,noharmresults
RecoveryandConcurrencyControlIn-doubttransactionshavea,butneithera,noranlogrecord.Therecoveringsitemustdeterminethecommit-abortstatusofsuchtransactionsbycontactingothersites;thiscanslowandpotentiallyblockrecovery.Recoveryalgorithmscannotelockinformationinthelog.Insteadof,writeoutL=listoflocksheldbyTwhenthelogiswritten(readlockscanbeomitted).Foreveryin-doubttransactionT,allthelocksnotedinthelogrecordarereacquired.Afterlockreacquisition,transactionprocessingcanresume;thecommitorrollbackofin-doubttransactionsisperformedconcurrentlywiththeexecutionofnewtransactions.
AlternativeModelsofTransactionProcessingNotionofasingletransactionspanningmultiplesitesisinappropriateformanyapplicationsE.g.,transactioncrossinganorganizationalboundaryNoorganizationwouldliketopermitanexternallyinitiatedtransactiontoblocklocaltransactionsforanindeterminateperiodAlternativemodelscarryouttransactionsbysendingmessagesCodetohandlemessagesmustbecarefullydesignedtoensureatomicityanddurabilitypropertiesforupdatesIsolationcannotbeguaranteed,inthatintermediatestagesarevisible,butcodemustensurenoinconsistentstatesresultduetoconcurrencyPersistentmessagingsystemsaresystemsthatprovidetransactionalpropertiestomessagesMessagesareguaranteedtobedeliveredexactlyonceWilldiscussimplementationtechniqueslater
AlternativeModels(Cont.)Motivatingexample:fundstransferbetweentwobanksTwophasecommitwouldhavethepotentialtoblockupdatesontheaccountsinvolvedinfundstransferAlternativesolution:DebitmoneyfromsourceaccountandsendamessagetoothersiteSitereceivesmessageandcreditsdestinationaccountMessaginghaslongbeenusedfordistributedtransactions(evenbeforecomputerswereinvented!)Atomicityissueoncetransactionsendingamessageiscommitted,messagemustguaranteedtobedeliveredGuaranteeaslongasdestinationsiteisupandreachable,codetohandleundeliverablemessagesmustalsobeavailablee.g.,creditmoneybacktosourceaccount.Ifsendingtransactionaborts,messagemustnotbesent
ErrorConditionswithPersistentMessagingCodetohandlemessageshastotakecareofvarietyoffailuresituations(evenassumingguaranteedmessagedelivery)E.g.,ifdestinationaccountdoesnotexist,failuremessagemustbesentbacktosourcesiteWhenfailuremessageisreceivedfromdestinationsite,ordestinationsiteitselfdoesnotexist,moneymustbedepositedbackinsourceaccountProblemifsourceaccounthasbeenclosedgethumanstotakecareofproblemUsercodeexecutingtransactionprocessingusing2PCdoesnothavetodealwithsuchfailuresTherearemanysituationswhereextraeffortoferrorhandlingisworththebenefitofabsenceofblockingE.g.,prettymuchalltransactionsacrossorganizations
PersistentMessagingandWorkflowsWorkflowsprovideageneralmodeloftransactionalprocessinginvolvingmultiplesitesandpossiblyhumanprocessingofcertainstepsE.g.,whenabankreceivesaloanapplication,itmayneedtoContactexternalcredit-checkingagenciesGetapprovalsofoneormoremanagersandthenrespondtotheloanapplicationWestudyworkflowsinChapter25Persistentmessagingformstheunderlyinginfrastructureforworkflowsinadistributedenvironment
ConcurrencyControlModifyconcurrencycontrolschemesforuseindistributedenvironment.Weassumethateachsiteparticipatesintheexecutionofacommitprotocoltoensureglobaltransactionautomicity.WeassumeallreplicasofanyitemareupdatedWillseehowtorelaxthisincaseofsitefailureslater
Single-Lock-ManagerApproachSystemmaintainsasinglelockmanagerthatresidesinasinglechosensite,saySiWhenatransactionneedstolockadataitem,itsendsalockrequesttoSiandlockmanagerdetermineswhetherthelockcanbegrantedimmediatelyIfyes,lockmanagersendsamessagetothesitewhichinitiatedtherequestIfno,requestisdelayeduntilitcanbegranted,atwhichtimeamessageissenttotheinitiatingsite
Single-Lock-ManagerApproach(Cont.)Thetransactioncanreadthedataitemfromanyoneofthesitesatwhichareplicaofthedataitemresides.WritesmustbeperformedonallreplicasofadataitemAdvantagesofscheme:SimpleimplementationSimpledeadlockhandlingDisadvantagesofschemeare:Bottleneck:lockmanagersitebecomesabottleneckVulnerability:systemisvulnerabletolockmanagersitefailure.
DistributedLockManagerInthisapproach,functionalityoflockingisimplementedbylockmanagersateachsiteLockmanagerscontrolaccesstolocaldataitemsButspecialprotocolsmaybeusedforreplicasAdvantage:workisdistributedandcanbemaderobusttofailuresDisadvantage:deadlockdetectionismorecomplicatedLockmanagerscooperatefordeadlockdetectionMoreonthislaterSeveralvariantsofthisapproachPrimarycopyMajorityprotocolBiasedprotocolQuorumconsensus
PrimaryCopyChooseonereplicaofdataitemtobetheprimarycopy.SitecontainingthereplicaiscalledtheprimarysiteforthatdataitemDifferentdataitemscanhavedifferentprimarysitesWhenatransactionneedstolockadataitemQ,itrequestsalockattheprimarysiteofQ.ImplicitlygetslockonallreplicasofthedataitemBenefitConcurrencycontrolforreplicateddatahandledsimilarlytounreplicateddata-simpleimplementation.DrawbackIftheprimarysiteofQfails,Qisinaccessibleeventhoughothersitescontainingareplicamaybeaccessible.
MajorityProtocolLocallockmanagerateachsiteadministerslockandunlockrequestsfordataitemsstoredatthatsite.WhenatransactionwishestolockanunreplicateddataitemQresidingatsiteSi,amessageissenttoSi‘slockmanager.IfQislockedinanincompatiblemode,thentherequestisdelayeduntilitcanbegranted.Whenthelockrequestcanbegranted,thelockmanagersendsamessagebacktotheinitiatorindicatingthatthelockrequesthasbeengranted.
MajorityProtocol(Cont.)IncaseofreplicateddataIfQisreplicatedatnsites,thenalockrequestmessagemustbesenttomorethanhalfofthensitesinwhichQisstored.ThetransactiondoesnotoperateonQuntilithasobtainedalockonamajorityofthereplicasofQ.Whenwritingthedataitem,transactionperformswritesonallreplicas.BenefitCanbeusedevenwhensomesitesareunavailabledetailsonhowhandlewritesinthepresenceofsitefailurelaterDrawbackRequires2(n/2+1)messagesforhandlinglockrequests,and(n/2+1)messagesforhandlingunlockrequests.Potentialfordeadlockevenwithsingleitem-e.g.,eachof3transactionsmayhavelockson1/3rdofthereplicasofadata.
BiasedProtocolLocallockmanagerateachsiteasinmajorityprotocol,however,requestsforsharedlocksarehandleddifferentlythanrequestsforexclusivelocks.Sharedlocks.WhenatransactionneedstolockdataitemQ,itsimplyrequestsalockonQfromthelockmanageratonesitecontainingareplicaofQ.Exclusivelocks.WhentransactionneedstolockdataitemQ,itrequestsalockonQfromthelockmanageratallsitescontainingareplicaofQ.Advantage-imposeslessoverheadonreadoperations.Disadvantage-additionaloverheadonwrites
QuorumConsensusProtocolAgeneralizationofbothmajorityandbiasedprotocolsEachsiteisassignedaweight.LetSbethetotalofallsiteweightsChoosetwovaluesreadquorumQrandwritequorumQwSuchthatQr+Qw>Sand2*Qw>SQuorumscanbechosen(andScomputed)separatelyforeachitemEachreadmustlockenoughreplicasthatthesumofthesiteweightsis>=QrEachwritemustlockenoughreplicasthatthesumofthesiteweightsis>=QwFornowweassumeallreplicasarewrittenExtensionstoallowsomesitestobeunavailabledescribedlater
TimestampingTimestampbasedconcurrency-controlprotocolscanbeusedindistributedsystemsEachtransactionmustbegivenauniquetimestampMainproblem:howtogenerateatimestampinadistributedfashionEachsitegeneratesauniquelocaltimestampusingeitheralogicalcounterorthelocalclock.Globaluniquetimestampisobtainedbyconcatenatingtheuniquelocaltimestampwiththeuniqueidentifier.
Timestamping(Cont.)AsitewithaslowclockwillassignsmallertimestampsStilllogicallycorrect:serializabilitynotaffectedBut:“disadvantages”transactionsTofixthisproblemDefinewithineachsiteSialogicalclock(LCi),whichgeneratestheuniquelocaltimestampRequirethatSiadvanceitslogicalclockwheneverarequestisreceivedfromatransactionTiwithtimestampandxisgreaterthatthecurrentvalueofLCi.Inthiscase,siteSiadvancesitslogicalclocktothevaluex+1.
ReplicationwithWeakConsistencyManycommercialdatabasessupportreplicationofdatawithweakdegreesofconsistency(i.e.,withoutaguaranteeofserializabiliy)E.g.,master-slavereplication:updatesareperformedatasingle“master”site,andpropagatedto“slave”sites.Propagationisnotpartoftheupdatetransaction:itsisdecoupledMaybeimmediatelyaftertransactioncommitsMaybeperiodicDatamayonlybereadatslavesites,notupdatedNoneedtoobtainlocksatanyremotesiteParticularlyusefulfordistributinginformationE.g.,fromcentralofficetobranch-officeAlsousefulforrunningread-onlyqueriesofflinefromthemaindatabase
ReplicationwithWeakConsistency(Cont.)Replicasshouldseeatransaction-consistentsnapshotofthedatabaseThatis,astateofthedatabasereflectingalleffectsofalltransactionsuptosomepointintheserializationorder,andnoeffectsofanylatertransactions.E.g.,OracleprovidesacreatesnapshotstatementtocreateasnapshotofarelationorasetofrelationsataremotesitesnapshotrefresheitherbyrecomputationorbyincrementalupdateAutomaticrefresh(continuousorperiodic)ormanualrefresh
MultimasterandLazyReplicationWithmultimasterreplication(alsocalledupdate-anywherereplication)updatesarepermittedatanyreplica,andareautomaticallypropagatedtoallreplicasBasicmodelindistributeddatabases,wheretransactionsareunawareofthedetailsofreplication,anddatabasesystempropagatesupdatesaspartofthesametransactionCoupledwith2phasecommitManysystemssupportlazypropagationwhereupdatesaretransmittedaftertransactioncommitsAllowsupdatestooccurevenifsomesitesaredisconnectedfromthenetwork,butatthecostofconsistency
DeadlockHandlingConsiderthefollowingtwotransactionsandhistory,withitemXandtransactionT1atsite1,anditemYandtransactionT2atsite2:T1:write(X)write(Y)T2:write(Y)write(X)X-lockonXwrite(X)X-lockonYwrite(Y)waitforX-lockonXWaitforX-lockonYResult:deadlockwhichcannotbedetectedlocallyateithersite
CentralizedApproachAglobalwait-forgraphisconstructedandmaintainedinasinglesite;thedeadlock-detectioncoordinatorRealgraph:Real,butunknown,stateofthesystem.Constructedgraph:Approximationgeneratedbythecontrollerduringtheexecutionofitsalgorithm.theglobalwait-forgraphcanbeconstructedwhen:anewedgeisinsertedinorremovedfromoneofthelocalwait-forgraphs.anumberofchangeshaveoccurredinalocalwait-forgraph.thecoordinatorneedstoinvokecycle-detection.Ifthecoordinatorfindsacycle,itselectsavictimandnotifiesallsites.Thesitesrollbackthevictimtransaction.
LocalandGlobalWait-ForGraphsLocalGlobal
ExampleWait-ForGraphforFalseCyclesInitialstate:
FalseCycles(Cont.)Supposethatstartingfromthestateshowninfigure.1.T2releasesresourcesatS1resultinginamessageremoveT1T2messagefromtheTransactionManageratsiteS1tothecoordinator)2.AndthenT2requestsaresourceheldbyT3atsiteS2resultinginamessageinsertT2T3fromS2tothecoordinatorSupposefurtherthattheinsertmessagereachesbeforethedeletemessagethiscanhappenduetonetworkdelaysThecoordinatorwouldthenfindafalsecycleT1T2T3T1Thefalsecycleaboveneverexistedinreality.Falsecyclescannotoccuriftwo-phaselockingisused.
UnnecessaryRollbacksUnnecessaryrollbacksmayresultwhendeadlockhasindeedoccurredandavictimhasbeenpicked,andmeanwhileoneofthetransactionswasabortedforreasonsunrelatedtothedeadlock.Unnecessaryrollbackscanresultfromfalsecyclesintheglobalwait-forgraph;however,likelihoodoffalsecyclesislow.
AvailabilityHighavailability:timeforwhichsystemisnotfullyusableshouldbeextremelylow(e.g.,99.99%availability)Robustness:abilityofsystemtofunctionspiteoffailuresofcomponentsFailuresaremorelikelyinlargedistributedsystemsToberobust,adistributedsystemmustDetectfailuresReconfigurethesystemsocomputationmaycontinueRecovery/reintegrationwhenasiteorlinkisrepairedFailuredetection:distinguishinglinkfailurefromsitefailureishard(partial)solution:havemultiplelinks,multiplelinkfailureislikelyasitefailure
ReconfigurationReconfiguration:AbortalltransactionsthatwereactiveatafailedsiteMakingthemwaitcouldinterferewithothertransactionssincetheymayholdlocksonothersitesHowever,incaseonlysomereplicasofadataitemfailed,itmaybepossibletocontinuetransactionsthathadaccesseddataatafailedsite(moreonthislater)Ifreplicateddataitemswereatfailedsite,updatesystemcatalogtoremovethemfromthelistofreplicas.Thisshouldbereversedwhenfailedsiterecovers,butadditionalcareneedstobetakentobringvaluesuptodateIfafailedsitewasacentralserverforsomesubsystem,anelectionmustbeheldtodeterminethenewserverE.g.,nameserver,concurrencycoordinator,globaldeadlockdetector
Reconfiguration(Cont.)Sincenetworkpartitionmaynotbedistinguishablefromsitefailure,thefollowingsituationsmustbeavoidedTwooremorecentralserverselectedindistinctpartitionsMorethanonepartitionupdatesareplicateddataitemUpdatesmustbeabletocontinueevenifsomesitesaredownSolution:majoritybasedapproachAlternativeof“readonewriteallavailable”istantalizingbutcausesproblems
Majority-BasedApproachThemajorityprotocolfordistributedconcurrencycontrolcanbemodifiedtoworkevenifsomesitesareunavailableEachreplicaofeachitemhasaversionnumberwhichisupdatedwhenthereplicaisupdated,asoutlinedbelowAlockrequestissenttoatleast½thesitesatwhichitemreplicasarestoredandoperationcontinuesonlywhenalockisobtainedonamajorityofthesitesReadoperationslookatallreplicaslocked,andreadthevaluefromthereplicawithlargestversionnumberMaywritethisvalueandversionnumberbacktoreplicaswithlowerversionnumbers(noneedtoobtainlocksonallreplicasforthistask)
Majority-BasedApproachMajorityprotocol(Cont.)Writeoperationsfindhighestversionnumberlikereads,andsetnewversionnumbertooldhighestversion+1WritesarethenperformedonalllockedreplicasandversionnumberonthesereplicasissettonewversionnumberFailures(networkandsite)causenoproblemsaslongasSitesatcommitcontainamajorityofreplicasofanyupdateddataitemsDuringreadsamajorityofreplicasareavailabletofindversionnumbersSubjecttoabove,2phasecommitcanbeusedtoupdatereplicasNote:readsareguaranteedtoseelatestversionofdataitemReintegrationistrivial:nothingneedstobedoneQuorumconsensusalgorithmcanbesimilarlyextended
ReadOneWriteAll(Available)BiasedprotocolisaspecialcaseofquorumconsensusAllowsreadstoreadanyonereplicabutupdatesrequireallreplicastobeavailableatcommittime(calledreadonewriteall)Readonewriteallavailable(ignoringfailedsites)isattractive,butincorrectIffailedlinkmaycomebackup,withoutadisconnectedsiteeverbeingawarethatitwasdisconnectedThesitethenhasoldvalues,andareadfromthatsitewouldreturnanincorrectvalueIfsitewasawareoffailurereintegrationcouldhavebeenperformed,butnowaytoguaranteethisWithnetworkpartitioning,sitesineachpartitionmayupdatesameitemconcurrentlybelievingsitesinotherpartitionshaveallfailed
SiteReintegrationWhenfailedsiterecovers,itmustcatchupwithallupdatesthatitmissedwhileitwasdownProblem:updatesmaybehappeningtoitemswhosereplicaisstoredatthesitewhilethesiteisrecoveringSolution1:haltallupdatesonsystemwhilereintegratingasiteUnacceptabledisruptionSolution2:lockallreplicasofalldataitemsatthesite,updatetolatestversion,thenreleaselocksOthersolutionswithbetterconcurrencyalsoavailable
ComparisonwithRemoteBackupRemotebackup(hotspare)systems(Section17.10)arealsodesignedtoprovidehighavailabilityRemotebackupsystemsaresimplerandhaveloweroverheadAllactionsperformedatasinglesite,andonlylogrecordsshippedNoneedfordistributedconcurrencycontrol,or2phasecommitUsingdistributeddatabaseswithreplicasofdataitemscanprovidehigheravailabilitybyhavingmultiple(>2)replicasandusingthemajorityprotocolAlsoavoidfailuredetectionandswitchovertimeassociatedwithremotebackupsystems
CoordinatorSelectionBackupcoordinatorssitewhichmaintainsenoughinformationlocallytoassumetheroleofcoordinatoriftheactualcoordinatorfailsexecutesthesamealgorithmsandmaintainsthesameinternalstateinformationastheactualcoordinatorfailsexecutesstateinformationastheactualcoordinatorallowsfastrecoveryfromcoordinatorfailurebutinvolvesoverheadduringnormalprocessing.ElectionalgorithmsusedtoelectanewcoordinatorincaseoffailuresExample:BullyAlgorithm-applicabletosystemswhereeverysitecansendamessagetoeveryothersite.
BullyAlgorithmIfsiteSisendsarequestthatisnotansweredbythecoordinatorwithinatimeintervalT,assumethatthecoordinatorhasfailedSitriestoelectitselfasthenewcoordinator.Sisendsanelectionmessagetoeverysitewithahigheridentificationnumber,SithenwaitsforanyoftheseprocessestoanswerwithinT.IfnoresponsewithinT,assumethatallsiteswithnumbergreaterthanihavefailed,Sielectsitselfthenewcoordinator.IfanswerisreceivedSibeginstimeintervalT’,waitingtoreceiveamessagethatasitewithahigheridentificationnumberhasbeenelected.
BullyAlgorithm(Cont.)IfnomessageissentwithinT’,assumethesitewithahighernumberhasfailed;Sirestartsthealgorithm.Afterafailedsiterecovers,itimmediatelybeginsexecutionofthesamealgorithm.Iftherearenoactivesiteswithhighernumbers,therecoveredsiteforcesallprocesseswithlowernumberstoletitbecomethecoordinatorsite,evenifthereisacurrentlyactivecoordinatorwithalowernumber.
DistributedQueryProcessingForcentralizedsystems,theprimarycriterionformeasuringthecostofaparticularstrategyisthenumberofdiskaccesses.Inadistributedsystem,otherissuesmustbetakenintoaccount:Thecostofadatatransmissionoverthenetwork.Thepotentialgaininperformancefromhavingseveralsitesprocesspartsofthequeryinparallel.
QueryTransformationTranslatingalgebraicqueriesonfragments.ItmustbepossibletoconstructrelationrfromitsfragmentsReplacerelationrbytheexpressiontoconstructrelationrfromitsfragmentsConsiderthehorizontalfragmentationoftheaccountrelationintoaccount1=branch_name=“Hillside”(account)account2=branch_name=“Valleyview”(account)Thequerybranch_name=“Hillside”(account)becomesbranch_name=“Hillside”(account1account2)whichisoptimizedintobranch_name=“Hillside”(account1)branch_name=“Hillside”(account2)
ExampleQuery(Cont.)Sinceaccount1hasonlytuplespertainingtotheHillsidebranch,wecaneliminatetheselectionoperation.Applythedefinitionofaccount2toobtainbranch_name=“Hillside”(branch_name=“Valleyview”(account)Thisexpressionistheemptysetregardlessofthecontentsoftheaccountrelation.FinalstrategyisfortheHillsidesitetoreturnaccount1astheresultofthequery.
SimpleJoinProcessingConsiderthefollowingrelationalalgebraexpressioninwhichthethreerelationsareneitherreplicatednorfragmentedaccountdepositorbranchaccountisstoredatsiteS1depositoratS2branchatS3ForaqueryissuedatsiteSI,thesystemneedstoproducetheresultatsiteSI
PossibleQueryProcessingStrategiesShipcopiesofallthreerelationstositeSIandchooseastrategyforprocessingtheentirelocallyatsiteSI.ShipacopyoftheaccountrelationtositeS2andcomputetemp1=accountdepositoratS2.Shiptemp1fromS2toS3,andcomputetemp2=temp1branchatS3.Shiptheresulttemp2toSI.Devisesimilarstrategies,exchangingtherolesS1,S2,S3Mustconsiderfollowingfactors:amountofdatabeingshippedcostoftransmittingadatablockbetweensitesrelativeprocessingspeedateachsite
SemijoinStrategyLetr1bearelationwithschemaR1storesatsiteS1Letr2bearelationwithschemaR2storesatsiteS2Evaluatetheexpressionr1r2andobtaintheresultatS1.1.Computetemp1R1R2(r1)atS1.2.Shiptemp1fromS1toS2.3.Computetemp2r2temp1atS24.Shiptemp2fromS2toS1.5.Computer1temp2atS1.Thisisthesameasr1r2.
FormalDefinitionThesemijoinofr1withr2,isdenotedby:r1r2itisdefinedby:R1(r1r2)Thus,r1r2selectsthosetuplesofr1thatcontributedtor1r2.Instep3above,temp2=r2r1.Forjoinsofseveralrelations,theabovestrategycanbeextendedtoaseriesofsemijoinsteps.
JoinStrategiesthatExploitParallelismConsiderr1r2r3r4whererelationriisstoredatsiteSi.TheresultmustbepresentedatsiteS1.r1isshippedtoS2andr1r2iscomputedatS2:simultaneouslyr3isshippedtoS4andr3r4iscomputedatS4S2shipstuplesof(r1r2)toS1astheyproduced;S4shipstuplesof(r3r4)toS1Oncetuplesof(r1r2)and(r3r4)arriveatS1(r1r2)(r3r4)iscomputedinparallelwiththecomputationof(r1r2)atS2andthecomputationof(r3r4)atS4.
HeterogeneousDistributedDatabasesManydatabaseapplicationsrequiredatafromavarietyofpreexistingdatabaseslocatedinaheterogeneouscollectionofhardwareandsoftwareplatformsDatamodelsmaydiffer(hierarchical,relational,etc.)TransactioncommitprotocolsmaybeincompatibleConcurrencycontrolmaybebasedondifferenttechniques(locking,timestamping,etc.)System-leveldetailsalmostcertainlyaretotallyincompatible.Amultidatabasesystemisasoftwarelayerontopofexistingdatabasesystems,whichisdesignedtomanipulateinformationinheterogeneousdatabasesCreatesanillusionoflogicaldatabaseintegrationwithoutanyphysicaldatabaseintegration
AdvantagesPreservationofinvestmentinexistinghardwaresystemsoftwareApplicationsLocalautonomyandadministrativecontrolAllowsuseofspecial-purposeDBMSsSteptowardsaunifiedhomogeneousDBMSFullintegrationintoahomogeneousDBMSfacesTechnicaldifficultiesandcostofconversionOrganizational/politicaldifficultiesOrganizationsdonotwanttogiveupcontrolontheirdataLocaldatabaseswishtoretainagreatdealofautonomy
UnifiedViewofDataAgreementonacommondatamodelTypicallytherelationalmodelAgreementonacommonconceptualschemaDifferentnamesforsamerelation/attributeSamerelation/attributenamemeansdifferentthingsAgreementonasinglerepresentationofshareddataE.g.,datatypes,precision,CharactersetsASCIIvsEBCDICSortordervariationsAgreementonunitsofmeasureVariationsinnamesE.g.,KölnvsCologne,MumbaivsBombay
QueryProcessingSeveralissuesinqueryprocessinginaheterogeneousdatabaseSchematranslationWriteawrapperforeachdatasourcetotranslatedatatoaglobalschemaWrappersmustalsotranslateupdatesonglobalschematoupdatesonlocalschemaLimitedquerycapabilitiesSomedatasourcesallowonlyrestrictedformsofselectionsE.g.,webforms,flatfiledatasourcesQuerieshavetobebrokenupandprocessedpartlyatthesourceandpartlyatadifferentsiteRemovalofduplicateinformationwhensiteshaveoverlappinginformationDecidewhichsitestoexecutequeryGlobalqueryoptimization
MediatorSystemsMediatorsystemsaresystemsthatintegratemultipleheterogeneousdatasourcesbyprovidinganintegratedglobalview,andprovidingqueryfacilitiesonglobalviewUnlikefullfledgedmultidatabasesystems,mediatorsgenerallydonotbotherabouttransactionprocessingButthetermsmediatorandmultidatabasearesometimesusedinterchangeablyThetermvirtualdatabaseisalsousedtorefertomediator/multidatabasesystems
DirectorySystemsTypicalkindsofdirectoryinformationEmployeeinformationsuchasname,id,email,phone,officeaddr,..Evenpersonalinformationtobeaccessedfrommultipleplacese.g.,WebbrowserbookmarksWhitepagesEntriesorganizedbynameoridentifierMeantforforwardlookuptofindmoreaboutanentryYellowpagesEntriesorganizedbypropertiesForreverselookuptofindentriesmatchingspecificrequirementsWhendirectoriesaretobeaccessedacrossanorganizationAlternative1:Webinterface.NotgreatforprogramsAlternative2:SpecializeddirectoryaccessprotocolsCoupledwithspecializeduserinterfaces
DirectoryAccessProtocolsMostcommonlyuseddirectoryaccessprotocol:LDAP(LightweightDirectoryAccessProtocol)SimplifiedfromearlierX.500protocolQuestion:WhynotusedatabaseprotocolslikeODBC/JDBC?Answer:Simplifiedprotocolsforalimitedtypeofdataaccess,evolvedparalleltoODBC/JDBCProvideanicehierarchicalnamingmechanismsimilartofilesystemdirectoriesDatacanbepartitionedamongstmultipleserversfordifferentpartsofthehierarchy,yetgiveasingleviewtouserE.g.,differentserversforBellLabsMurrayHillandBellLabsBangaloreDirectoriesmayusedatabasesasstoragemechanism
LDAP:LightweightDirectoryAccessProtocolLDAPDataModelDataManipulationDistributedDirectoryTrees
LDAPDataModelLDAPdirectoriesstoreentriesEntriesaresimilartoobjectsEachentrymusthaveuniquedistinguishedname(DN)DNmadeupofasequenceofrelativedistinguishednames(RDNs)E.g.,ofaDNcn=Silberschatz,ou-BellLabs,o=Lucent,c=USAStandardRDNs(canbespecifiedaspartofschema)cn:commonnameou:organizationalunito:organizationc:countrySimilartopathsinafilesystembutwritteninreversedirection
LDAPDataModel(Cont.)EntriescanhaveattributesAttributesaremulti-valuedbydefaultLDAPhasseveralbuilt-intypesBinary,string,timetypesTel:telephonenumberPostalAddress:postaladdressLDAPallowsdefinitionofobjectclassesObjectclassesspecifyattributenamesandtypesCanuseinheritancetodefineobjectclassesEntrycanbespecifiedtobeofoneormoreobjectclassesNoneedtohavesinglemost-specifictype
LDAPDataModel(cont.)EntriesorganizedintoadirectoryinformationtreeaccordingtotheirDNsLeaflevelusuallyrepresentspecificobjectsInternalnodeentriesrepresentobjectssuchasorganizationalunits,organizationsorcountriesChildrenofanodeinherittheDNoftheparent,andaddonRDNsE.g.,internalnodewithDNc=USAChildrennodeshaveDNstartingwithc=USAandfurtherRDNssuchasoorouDNofanentrycanbegeneratedbytraversingpathfromrootLeaflevelcanbeanaliaspointingtoanotherentryEntriescanthushavemorethanoneDNE.g.,personinmorethanoneorganizationalunit
LDAPDataManipulationUnlikeSQL,LDAPdoesnotdefineDDLorDMLInstead,itdefinesanetworkprotocolforDDLandDMLUsersuseanAPIorvendorspecificfrontendsLDAPalsodefinesafileformatLDAPDataInterchangeFormat(LDIF)Queryingmechanismisverysimple:onlyselection&projection
LDAPQueriesLDAPquerymustspecifyBase:anodeintheDITfromwheresearchistostartAsearchconditionBooleancombinationofconditionsonattributesofentriesEquality,wild-cardsandapproximateequalitysupportedAscopeJustthebase,thebaseanditschildren,ortheentiresubtreefromthebaseAttributestobereturnedLimitsonnumberofresultsandonresourceconsumptionMayalsospecifywhethertoautomaticallydereferencealiasesLDAPURLsareonewayofspecifyingqueryLDAPAPIisanotheralternative
LDAPURLsFirstpartofURLspecifisserverandDNofbaseldap:://aura.research.bell-labs.com/o=Lucent,c=USAOptionalfurtherpartsseparatedby?symbolldap:://aura.research.bell-labs.com/o=Lucent,c=USA??sub?cn=KorthOptionalpartsspecifyattributestoreturn(emptymeansall)Scope(subindicatesentiresubtree)Searchcondition(cn=Korth)
CCodeusingLDAPAPI#include#includemain(){LDAP*ld;LDAPMessage*res,*entry;char*dn,*attr,*attrList[]={“telephoneNumber”,NULL};BerElement*ptr;intvals,i;//Openaconnectiontoserverld=ldap_open(“aura.research.bell-labs.com”,LDAP_PORT);ldap_simple_bind(ld,“avi”,“avi-passwd”);…actualquery(nextslide)…ldap_unbind(ld);}
CCodeusingLDAPAPI(Cont.)ldap_search_s(ld,“o=Lucent,c=USA”,LDAP_SCOPE_SUBTREE,“cn=Korth”,attrList,/*attrsonly*/0,&res);/*attrsonly=1=>returnonlyschemanotactualresults*/printf(“found%dentries”,ldap_count_entries(ld,res));for(entry=ldap_first_entry(ld,res);entry!=NULL;entry=ldap_next_entry(id,entry)){dn=ldap_get_dn(ld,entry);printf(“dn:%s”,dn);/*dn:DNofmatchingentry*/ldap_memfree(dn);for(attr=ldap_first_attribute(ld,entry,&ptr);attr!=NULL;attr=ldap_next_attribute(ld,entry,ptr)){//foreachattributeprintf(“%s:”,attr);//printnameofattributevals=ldap_get_values(ld,entry,attr);for(i=0;vals[i]!=NULL;i++)printf(“%s”,vals[i]);//sinceattrscanbemultivaluedldap_value_free(vals);}}ldap_msgfree(res);
LDAPAPI(Cont.)LDAPAPIalsohasfunctionstocreate,updateanddeleteentriesEachfunctioncallbehavesasaseparatetransactionLDAPdoesnotsupportatomicityofupdates
DistributedDirectoryTreesOrganizationalinformationmaybesplitintomultipledirectoryinformationtreesSuffixofaDITgivesRDNtobetaggedontotoallentriestogetanoverallDNE.g.,twoDITs,onewithsuffixo=Lucent,c=USAandanotherwithsuffixo=Lucent,c=IndiaOrganizationsoftensplitupDITsbasedongeographicallocationorbyorganizationalstructureManyLDAPimplementationssupportreplication(master-slaveormulti-masterreplication)ofDITs(notpartofLDAP3standard)AnodeinaDITmaybeareferraltoanodeinanotherDITE.g.,Ou=BellLabsmayhaveaseparateDIT,andDITforo=Lucentmayhavealeafwithou=BellLabscontainingareferraltotheBellLabsDITReferallsarethekeytointegratingadistributedcollectionofdirectoriesWhenaservergetsaqueryreachingareferralnode,itmayeitherForwardquerytoreferredDITandreturnanswertoclient,orGivereferralbacktoclient,whichtransparentlysendsquerytoreferredDIT(withoutuserintervention)
EndofChapter
ThreePhaseCommit(3PC)Assumptions:NonetworkpartitioningAtanypoint,atleastonesitemustbeup.AtmostKsites(participantsaswellascoordinator)canfailPhase1:ObtainingPreliminaryDecision:Identicalto2PCPhase1.EverysiteisreadytocommitifinstructedtodosoPhase2of2PCissplitinto2phases,Phase2andPhase3of3PCInphase2coordinatormakesadecisionasin2PC(calledthepre-commitdecision)andrecordsitinmultiple(atleastK)sitesInphase3,coordinatorsendscommit/abortmessagetoallparticipatingsites,Under3PC,knowledgeofpre-commitdecisioncanbeusedtocommitdespitecoordinatorfailureAvoidsblockingproblemaslongas
您可能关注的文档
- 数据结构课件PPT110章全 第十章 内部排序old.ppt
- 数据库系统概念全套配套课件PPT ch13.ppt
- 数据库系统概念全套配套课件PPT ch5.ppt
- 数据库系统概念全套配套课件PPT ch3.ppt
- 数据库系统概念全套配套课件PPT ch1.ppt
- 数据库系统概念全套配套课件PPT ch25.ppt
- 数据库系统概念全套配套课件PPT ch24.ppt
- 数据库系统概念全套配套课件PPT ch17.ppt
- 数据库系统概念全套配套课件PPT ch20.ppt
- 数据库系统概念全套配套课件PPT ch11.ppt
- 数据库系统概念全套配套课件PPT ch14.ppt
- 数据库系统概念全套配套课件PPT ch10.ppt
- 数据库系统概念全套配套课件PPT ch8.ppt
- 数据库系统概念全套配套课件PPT appC.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 03土木工程材料.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 02土木工程材料.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 15土木工程材料.ppt
- 新编经济法高职财经大类专业基础课 配套教学课件PPT 第四章.ppt