Kinesia Online Course
Advanced Operating Systems
Kinesia LLC, 2003

    1. Review and Overview
    2. Deadlocks
    3. Distributed Systems Architecture
    4. Theoretical Foundations
    5. Distributed Mutual Exclusions
        6. Agreement Protocols
    7. Distributed Resource Management
    8. Distributed Scheduling
    9. Secutiry and Protection
    10. Recovery and Fault Tolerance
     
    Distributed Resource Management

    1. Distributed File Systems
    2. Overview
      most visible part of OS, organized with tree structure

    3. Architecture
      • file servers and file clients interconnected by a communication network.
      • two most important components: name server and cache manager.
      • name server: map names to stored objects (files, directories)
      • cache manager: perform file caching. Can present on both servers and clients.
          o Cache on the client deals with network latency
          o Cache on the server deals with disk latency

    4. Typical steps to access data
      • Check client cache, if present, return data
      • Check local disk, if present, load into local cache, return data
      • Send request to file server
      • ...
      • server checks cache, if present, load into client cache, return data
      • disk read
      • load into server cache
      • load into client cache
      • return data

    5. Mechanisms:
      • mounting: binding together different filename spaces to form a single hierarchically structured name space. o The mount table maintains the mapping from mount points to storage devices.
        o Can be used in distributed file systems to do name resolution. Where to maintain the mount information?
        • at the clients: (NFS) different clients may see different files
        • at the server: all clients may see the same filename space. Good when files move around the servers.
      • Caching: a common technique to exploit locality and reduce delays.
      • Hints: caching without cache coherence. Cache coherence operations too expensive in the distributed systems (too many communications). Hints work well for applications that can recover from invalid data (e.g. address mapping)
      • Bulk Data Transfer: communication cost = S + C*B, S: startup cost, mostly software, C: per byte cost, B: number of bytes to send.
      • Encryption: typical method to enforce security.

    6. Concerns
    7. Naming
    8. Data transfer
    9. Locking and access synchronization
    10. Coherency
    11. Security
    12. Case Studies

    13. Sun NFS ( Network File System )

      At application layer of TCP/IP model, make use of RPC


      When someone access a file over NFS, the kernel places an RPC call to nfsd ( the NFS daemon ) on the server machine

      $cat /proc/filesystems displays file systems it supports

      stateless

      Caching

    14. cache consistency

      treat cache data as links -- cached data are not expected to be completely accurate

      e.g. NFS tries to make server as simple as possible and let clients do most of the work

      NFS does not specify a mechanism for assuring that cached copies of data blocks are consistent
      Each implementation is free to use its own method of determining consistency

      A typical approach is

        client inquires periodically about last modification time for a given block

        if client detects that the copy has been modified since the client received it, it can invalidate its own copy

    15. Virtual File System ( VFS ) interface -- defines the procedures that operate on the file as a whole

      By supporting different VFS interfaces, NFS can support different file systems

      a virtual node ( vnode ) for every object

      vnodes are unique reimplementation of inodes

      contains a numerical designator that is network node unique

      Thus VFS distinguishes local files from remote ones, and local files are further distinguished according to their file-system types

      Client
      System Call
      OS maps this call to a VFS
      operation on the appropriate vnode
      VFS identifies the call as remote
      and invoke NFS procedure
      RPC Call
      |
      |
                                v ……… ………
      ---------
      VFS finds its local
      invokes NFS
      RPC Procedure

      |
      ……… ………                           

      • client and server are identical, a machine can be client or server
      • standard -- specifying how files should be encoded, data conversion
        XDR ( External Data Representation ) is used to describe RPC protocols in a system-independent way
      • NFS servers are stateless: file servers do not matain records of past requests, clients requests are completely self-contained ( in client )
        robust under crashes of server
      • name resolution -- process of mapping a name to an object or objects
      • How does NFS lookup a file given by the filename /a/b/c in which a corresponds to vnode1:
        • lookup on vnode1/b
        • might return vnode2 for component b and indicates that the object is on Server X
        • next, Server X looks up
        • file handle returns to client by Server X

    16. Intermezzo File System

      See Lab 2

    17. Distributed Shared Memory ( DSM )

      Node 1
       
      Memory
       
       
      Mapping
      Manager
       
      Node 2
       
      Memory
       
       
      Mapping
      Manager
      . . . . . .
      Node 3
       
      Memory
       
       
      Mapping
      Manager
            *                                         *                                                                          *
                      *                                  *                                                            *
                                *                           *                                              *
                                          *                    *                                *
                                                    *             *                  *
      Shared Memory

      Advantages of DSM:

    18. Easier programming:
      • No need to deal with communication details, unlike message passing model ( e.g. RPC ), data movement is transparent to users
      • Easy to handle complex data structures
    19. DSM systems are much cheaper than tightly coupled multiprocessor systems (DSMs can be built over commodity components).
    20. DSM takes advantages of the memory reference locality -- data are moved in the unit of pages.
    21. DSM can form a large physical memory.
    22. Programs written for shared memory multiprocessors can easily be ported to DSMs
    23. Challenges of DSM:

    24. How to keep track of the location of remote data?
    25. How to overcome the communication delays and high overhead associated with the references to remote data?
    26. How to allow "controlled" concurrent accesses to shared data?
    27. Mapping manager:

      Basic Concept of
      Shared Memory
      Actual Implementation

      Shared memory partitioned into pages

      pages |
      |
      -


      -
      read-only -- can have copies reside in physical memories
      of many processors at the same time
      write -- can reside in only one processor's physical memory

      A memory reference causes a page fault when the page containing the memory location is not in a processor's current physical memory.
      When this happens, M.M.M. retrieves the page from either disk or from the memory of another processor
      If the page also has copies in other nodes, then some work must be done to keep the memory coherent.

      A parallel program is a set of threads or processes that share a virtual address space.

      allow processes of a program to execute on different processors in parallel

      Implementations

    28. Central Server
      • A central-server ( centralized manager at a single processor ) maintains all the shared data.
      • Manager maintains mutual exclusive access to data.
      • for read: the server just return the data
      • for write: update the data and send acknowledgement to the client
      • Data can be distributed -- need a directory to store the location of a page, done by maintaining a table which has one entry for each page
      • The central manager can be a bottle neck.

    29. Data Migration
    30. Each node has a MM
    31. Data blocks are sent to the request location; subsequent accesses can be performed locally
    32. For both read/write: get the remote page to the local machine, then perform the operation.
    33. Keeping track of memory location: location service, home machine for each page, broadcast.
    34. Problems: thrashing -- pages move between nodes frequently, false sharing
    35. Multiple reads can be costly.
    36. Read-replication
    37. In previous approaches, only processes on one node could access shared data at any one moment
    38. read-replication -- replicating data blocks, allow multiple nodes to have read access and one node to have write-access
    39. a write to a copy causes all copies of the data to be updated or invalidated
    40. All locations must be kept track of: locations of service/home machines
    41. Full-replication
      • Allows multiple read and multiple write concurrently
      • Must control the access to shared memory
      • Need to maintain consistency
        e.g. use a gap-free sequencer, all nodes wishing to modify shared data will send the modification to a sequencer which multicast the modifications

        if gap between sequence # => something missing → request retransmission

    42. Memory Coherence

    43. The needs to make the data replicas consistent
    44. Two types of basic protocols
      • Write-Invalidate Protocol: a write to a shared data causes the invalidation of all copies except one
      • Write-Update Protocol: a write to a shared data causes all copies of that data to be updated
      • Case Study: Cache coherence in the PLUS system.
      • Write-update protocol
      • Supports general consistency ( all the copies of a memory location eventually contain the same data when all the writes issued by every processor have completed )
      • A memory coherence manager ( MCM ) running at each node is responsible for maintaining the consistency.
      • Unit of replication: a page (4KB)
      • Coherence maintenance in the unit of one word ( 32 bit )
      • A virtual page in PLUS corresponds to a list of replicas, one of the replica is the master copy. The locations of other replicas are maintained through a distributed link list (copy list)
      • On a read fault: if local memory, read local memory. Otherwise, send request to a specified remote node and get the data
      • For write: First update the master copy and then propagated to the copies linked by the copy list. On a write fault, if the address indicates a remote node, the update request is sent to the remote node. If the copy is not the master copy, the update request is sent to the node containing the master copy for updating and then further propagation. write is nonblocking.
      • read is blocked until all writes completes
      • write-fence is used to flush all previous writes
    45. Granularity

    46. Granularity: size of the shared memory unit
    47. the page size is usually a multiple of the size provided by the underlying hardware and memory management system.
    48. large page size -- more locality, less communication overheads, more contention, more false sharing
    49. separate the unit of replication and the unit for coherence maintenance.
    50. Page replacement

    51. traditional methods like least recently used (LRU) may not be appropriate as data need to be moved
    52. data can be accessed in different mode: shared, private, read-only, writable, etc.
    53. replacement policy needs to take access modes into consideration.e.g. private data should be replaced before shared data. READ-only page can just be deleted.