Kinesia Online Course
Advanced Operating Systems
Kinesia LLC, 2003

    1. Review and Overview
    2. Deadlocks
    3. Distributed Systems Architecture
    4. Theoretical Foundations
    5. Distributed Mutual Exclusions
        6. Agreement Protocols
    7. Distributed Resource Management
    8. Distributed Scheduling
    9. Secutiry and Protection
    10. Recovery and Fault Tolerance
     
    Distributed Mutual Exclusion

    
    Life consists not in holding good cards but in playing those you hold well.
    
    							Josh Billings
    
    1. Introduction nontoken-based
      token-based

    2. Requirements of Mutual Exclusion Algorithms
      • only one request accessess the CS at a time ( primary goal )
      • Freedom from deadlocks
      • Freedom from starvation
      • Fairness
      • Fault Tolerance

    3. Performance of a mutual exclusion algorithm

      System throughput S ( rate at which the system executes requests for the CS )

        S = 1
        --------
        Sd + E
      Sd = synchronization delay
      E = average execution time
    4. low load and high load performance
    5. best and worst case performance; if fluctuates statistically, take average
    6. Non-token-based algorithms
    7. a) Lamport's Algorithm

      Si -- site, N sites

      each site maintains a request set

        Ri = { S1, S2, ..., SN }
      request-queuei containing mutual exclusion requests ordered by their timestamps,
      use => total order relation ( with Lamport's clock )
      tsi -- timestamp of site i
      Assume
      • messages are received in the same order as they are sent
      • eventually every message is received

      1. To request entering the CS, process Pi sends a REQUEST( tsi, i ) message to every process ( including itself ) puts the rquest on request-queuei

      2. When process Pj receives REQUEST(tsi, i ), it places it on its request-queuej and sends a timestamped REPLY ( acknowledgement ) to Pi

      3. Process Pi enters CS when the following 2 conditions are satisfied:
        • Pi's request is at the head of request-queuei
        • Pi has received a ( REPLY ) message from every other process time-stamped later than tsi

      4. When exiting the CS, process Pi removes its request from head of its request-queue and sends a timestamped RELEASE to every other process

      5. When Pj receives a RELEASE from Pi, it removes Pi's request from its request queue.

        Performance
        for each CS invocation

        		(N-1) REQUEST
        		(N-1) REPLY
        		(N-1) RELEASE
        		total 3(N-1) messages
        	  synchronization delay Sd = average delay
        		
      Ricart, Agrawala optimized Lamport's algorithm by merging the RELEASE and REPLY messages.

    8. b) Maekawa's Algorithm

      request sets

      N = { 1, 2, ..., N }

      Ri ∩ Rj ≠ ∅   all i, j ∈ N

      A site can send a REPLY ( LOCKED ) message only if it has not been LOCKED.

      Properties:

      1. Ri ∩ Rj ≠ ∅
      2. Si ∈ Ri
      3. |Ri| = K     for all i ∈ N
      4. any site Si is in K number of Ri's

      Maekawa found that:

        N = K ( K - 1 ) + 1

        or K = |Ri| ≈ √N

      Messages exchange:

        Failed -- F,   Sj cannot grant permission to Sk because Sj has granted permission to a site with higher request priority.
        Inquire -- I,   Sj wants to find out if Sk has successfully locked all sites.
        Yield -- Y,   Sj yields to Sk
      Sj sends F to Sk if Sk's request has lower priority; otherwise Sj sends Y to Sk
      (i.e. A node will yield to others if the priority of its request is lower than that of any other conflicting request. The request's priority is determined by its sequence number ( timestamp ); the samller the sequence number, the higher the priority; if sequence # same, the one with smaller site number has higher priority )

      Algorithm:

      1. A site Si requests access to CS by sending REQUEST(i) messages to all the sites in its request set Ri
      2. When a site Sj receives the REQUEST(i) message, it sends a REPLY(j) message to Si provided it hasn't sent any REPLY to any site since last RELEASE. Otherwise, it queues up the REQUEST.
      3. Site Si could access the CS only after it has received REPLY from all sites in Ri

      Example
      13 nodes, 13 = 4(4-1) + 1, thus K = 4

        R1 = { 1, 2, 3, 4 }
        R2 = { 2, 5, 8, 11 }
        R3 = { 3, 6, 8, 13 }
        R4 = { 4, 6, 10, 11 }
        R5 = { 1, 5, 6, 7 }
        R6 = { 2, 6, 9, 12 }
        R7 = { 2, 7, 10, 13 }
        R8 = { 1, 8, 9, 10 }
        R9 = { 3, 7, 9, 11 }
        R10 = { 3, 5, 10, 12 }
        R11 = { 1, 11, 12, 13 }
        R12 = { 4, 7, 8, 12 }
        R13 = { 4, 5, 9, 13 }

      suppose sites 11, 8, 7 want to enter CS; they all send requests with sequence number 1.

      1. site 11 wants to enter; requests have arrived at 12, 13; R to 1 is on the way
      2. 7 wants to enter CS; R arrived at 2 and 10 but R to 13 is on its way
      3. 8 also wants to enter CS; sends R to 1, 9, 10 but fails to lock 10 because 10 has been locked by 7 with higher priority

      4. R from 11 finally arrived at 1 and R from 7 arrived at 13

        diagram

        11, 7, 8 are circularly locked:

      5. 8 receives F and cannot enter CS
      6. 11 receives F and cannot enter CS
      7. 7 cannot enter CS because it has not received all REPLY ( LOCKED) messages
      8. 13 is locked by 11 ( has larger timestamp ) and receives request from 7, so it sends an INQUIRE to 11 to ask it to yield
      9. When 11 receives an INQUIRE, it knows that it cannot enter CS; therefore it sends a YIELD to 13
      10. then 13 can send L to 7 which enters CS
      11. when 7 finished, sends RELEASE
      12. then 8 locks all members, ... , sends RELEASE
      13. then 11 enters

        diagram

    9. Token-based algorithms

    10. Principles
      • one token, shared among all sites
      • site can enter its CS iff it holds token
      • The major difference is the way the token is searched
      • use sequence numbers instead of timestamps o used to distinguish requests from same site
        o kept independently for each site
      • The proof of mutual exclusion is trivial
      • The proof of other issues (deadlock and starvation) may be less so

    11. a) Suzuki-Kasami's Broadcast Algorithm

      • TOKEN -- a special PRIVILEGE message
      • node owns TOKEN can enter CS
      • initially node 1 has the TOKEN
      • node holding TOKEN can execute CS repeatedly if no request from others comes
      • if a node wants TOKEN, it broadcasts a REQUEST message to all other nodes
      • node:
        REQUEST(j, n)
          node j requesting n-th CS invocation n = 1, 2, 3, ... , sequence #
        node i receives REQUEST from j update RNi[j] = max ( RNi[j], n )
        RNi[j] = largest seq # received so far from node j

      • TOKEN:
        TOKEN(Q, LN ) ( suppose at node i ) Q -- queue of requesting nodes
        LN -- array of size N such that
           LN[j] = the seq # of the request of node j granted most recently
        When node i finished executing CS, it does the following
        1. set LN[i] = RNi[i] to indicate that current request of node i has been granted ( executed )
        2. all node k such that RNi[k] > LN[i] (i.e. node k requesting ) is appended to Q if its not there
        When these updates are complete, if Q is not empty, the front node is deleted and TOKEN is sent there

        FCFS

        Example:

        There are three processes, p1, p2, and p3.
        p1 and p3 seek mutually exclusive access to a shared resource.

        Initially: the token is at p2 and the token's state is LN = [0, 0, 0] and Q empty;

        p1's state is: n1 ( seq # ) = 0, RN1 = [0, 0, 0];
        p2's state is: n2 = 0, RN2 = [0, 0, 0];
        p3's state is: n3 = 0, RN3 = [0, 0, 0];

      • p1 sends REQUEST(1, 1) to p2 and p3; p1: n1 = 1, RN1 = [ 1, 0, 0 ]

      • p3 sends REQUEST(3, 1) to p1 and p2; p3: n3 = 1, RN3 = [ 0, 0, 1 ]

      • p2 receives REQUEST(1, 1) from p1; p2: n2 = 1, RN2 = [ 1, 0, 0 ], holding token

      • p2 sends the token to p1

      • p1 receives REQUEST(3, 1) from p3: n1 = 1, RN1 = [ 1, 0, 1 ]

      • p3 receives REQUEST(1, 1) from p1; p3: n3 = 1, RN3 = [ 1, 0, 1 ]

      • p1 receives the token from p2

      • p1 enters the critical section

      • p1 exits the critical section and sets the token's state to LN = [ 1, 0, 0 ] and Q = ( 3 )
      • p1 sends the token to p3; p1: n1 = 2, RN1 = [ 1, 0, 1 ], holding token; token's state is LN = [ 1, 0, 0 ] and Q empty

      • p3 receives the token from p1; p3: n3 = 1, RN3 = [ 1, 0, 1 ], holding token
      • p3 enters the critical section

      • p3 exits the critical section and sets the token's state to LN = [ 1, 0, 1 ] and Q empty
      • Performance:
      • It requires at most N message exchange per CS execution ( (N-1) REQUEST messages + TOKEN message
      • or 0 message if TOKEN is in the site
      • synchronization delay is 0 or T
      • deadlock free ( because of TOKEN requirement )
      • no starvation ( i.e. a requesting site enters CS in finite time )
      • Comparison of Lamport and Suzuki-Kazami Algorithms


        The essential difference is in who keeps the queue. In one case every site keeps its own local copy of the queue. In the other case, the queue is passed around within the token.

    12. b)Raymond's Tree-based Algorithm

      sites are arranged logically as a tree ( e.g. minimal spanning tree ) edges are directed toward the site that holds the token

      each site has a variable HOLDER -- indicates the location of TOKEN relative to the site ( node ) itself

      		HOLDER(A) = D
      		HOLDER(B) = A
      		HOLDER(C) = A
      		HOLDER(D) = E
      		HOLDER(E) = self
      		HOLDER(F) = D
      
      	  
    13. When a nontoken-node (e.g. A ) wants to enter CS, it sends REQUEST to HOLDER(A) ( i.e. D )
    14. D sends REQUEST to HOLDER(D), i.e. E
    15. when E no longer needs TOKEN, it sends TOKEN to one of its neighbours who has requested TOKEN
    16. Information held by each node
        HOLDER HOLDER(x)

        USING

        = TRUE means using TOKEN

        REQUEST_Q

        FIFO, holds neighbor requests

        ASKED

        = TRUE, when a non-token holding node has sent a REQUEST to current holder

    17. Necessary requirement for a node k to send TOKEN to requesting node
        ( HOLDER(k) = self ) and ( NOT USING ) and ( REQUST_Q(k) ≠ empty ) and ( head( REQUEST_Q(k) ) ≠ self )

    18. node i MAKE_REQUEST ( either for itself or for others ):
        if ( ( HOLDER(i) ≠ self ) and ( REQUEST_Q(i) ≠ empty ) and ( NOT ASKED ) ) { send REQUEST to HOLDER;
        set ASKED = TRUE;
        }

    19. Sj receives request from Si
        if Sj is holding token o send token to Si
        o set HOLDER(j) to Si
        if Sj is not holding token o place request in REQUEST_Q(j)

    20. Si Entering CS A site enters CS when it receives the TOKEN and its own entry and head ( REQUEST_Q(i)) == self
    21. Si Exit CS
      • If ( REQUEST_Q(i) &ne empty ), delete front, send TOKEN to that site and set HOLDER(i) to that site
      • If ( REQUEST_Q(i) &ne empty ), send REQUEST to HOLDER(i)
      • set USING = false

      Performance

      • no deadlock ( because no circular wait )
      • no starvation
      • synchronization delay T(log N ) / 2     ( T = average message delay )
      • messages
        At light load, 2(log N )/2 messages per CS execution
        At heavy load 4 messages per CS because:
        N nodes, TOKEN vists all nodes once and back to original node, need to traverse edge twice 2(N-1)
        A TOKEN message travels along an edge in response to a REQUEST message
        thus a total 4(N-1) messages are sent among N nodes
        # of messages per CS execution ~ 4(N-1) / N ~ 4