Kinesia Online Course
Advanced Operating SystemsKinesia LLC, 2003
We are what we repeatedly do.
Excellence, then, is not an act, but a habit.
Aristotle
Distributed Systems Architecture
- What is a distributed system?
- It consists of multiple computers that do not share memory.
- Each Computer has its own memory and runs its own operating system.
- The computers can communicate with each other through a communication network.
- Why distributed systems?
Advantages of distributed systems over traditional time-sharing systems
- much better price/performance ratio
- resource sharing
- enhanced performance -- tasks can be executed concurrently
- higher reliability -- data replication
- easier modular expansion -- hardware and software resources
can be easily added without replacing existing resources
- Distributed OS
networking; transparent to users; virtual uniprocessor
A few issues need to be addressed
- lack of global knowledge of whole network
- very complex, no global memory and global clock, unpredictable delays
- Communication delays are at the core of the problem
- Information may become false before it can be acted upon
- these create some fundamental problems:
o no global clock -- scheduling based on fifo queue?
o no global state -- what is the state of a task? What is a correct program?
- naming
- name objects ( e.g. files, printers, users, services )
- namespace must be large
- unique (or at least unambiguous) names are needed
- name service maps a logical name to a physical address by
table-look-up or through an algorithm ( or a combination of two );
but in distributed OS, directories may be replicated --> synchronization problem
- mapping must be changeable, expandable, reliable, fast
- scalability
- system grows with time
- How large is the system designed for?
- How does increasing number of hosts affect overhead?
- broadcasting primitives, directories stored at every computer -- these design options will not work for large systems.
- compatibility
- binary -- all processes execute same code
- execution -- source codes can be compiled in any machine
- protocol -- use same rules for communication
- process synchronization
- difficult, because no shared memory and clock
- test-and-set-lock instruction won't work.
- Need all new synchronization mechanisms for distributed systems.
- resource management
- allocate local and remote resources in an effective manner
- Data migration: data are brought to the location that needs them.
o distributed filesystem (file migration)
o distributed shared memory (page migration)
- Computation migration: the computation migrates to another location.
o remote procedure call: computation is done at the remote machine.
o processes migration: processes are transferred to other processors.
- Data migration: data are brought to the location that needs them.
o distributed filesystem (file migration)
o distributed shared memory (page migration)
- distributed scheduling
- security
authentication, authorization
- structuring
defines how various parts of the OS are structured ( e.g. monolithic kernel,
collective kernel structure ( layered, separation of policy and
mechanisms ), object oriented OS )
- Client-Server computing model
- Communication Networks
ISO OSI model ( International Standard's Reference Model of Open Systems Interconnection )
| Host A Layers | | Host B Layers | | Examples |
| |
| Application | <--------> |
Application | |
MP3 |
| |
| Presentation | <--------> |
Presentation | |
|
| |
| Session | <--------> |
Session | |
|
| |
| Transport | <--------> |
Transport | |
HTTP, SMTP, FTP, etc. |
| |
| Network | <--------> |
Network | |
IP |
| |
| Datalink | <--------> |
Datalink | |
SDLC, HDLC |
| |
| Physical | <--------> |
Physical | |
RS 232, X.21, ISDN |
- Physical
raw bit streams
- Datalink
check and recover from errors during transmission
- Network
form bits into packets,
defines how packets are organized, assembled, disassembled and routed
- Transport
controls the traffic flow, reliable or unreliable communications
- Session
establishing and maintaining a connection known as a session
- Presentation
interface between a user program and the rest of the network
- Application
any application
Point-to-Point network
Other network topologies
- Communication Primitives
- Message Passing Model
send ( destination, buffer ); //buffer contains message to send
receive( source, buffer ); //buffer is used for saving message while receiving
nonblocking :
blocking:
send() does not return control to user immediately until the
message has been sent or until an acknowledgement has been
received
receive() does not return control to user until the message
is copied to user buffer
Advantages -- behavior of program predictable
Disadvantages -- lack of flexibility
synchronous:
send() is blocked until receive() is executed ( rendezvous )
asynchronous:
messages are buffered, not necessarily blocked
- Remote Procedure Calls ( RPC )
simplest way to implement client-server applications
- The caller process sends a call message to the server process and blocks (that is, waits) for a reply message. The call message contains the parameters of the procedure and the reply message contains the procedure results.
- When the caller receives the reply message, it gets the results of the procedure.
- The caller process then continues executing.
|
 |
- mechanisms: stub procedures
a call to client stub, client stub communicates with
server stub using RPC runtime library
- binding
- when a client makes an RPC call, a binding relationship
is established with the server:
o server address may be looked up by service-name
o or port number may be looked up
- binding information -- information location for a particular server
- protocol sequence -- containing a combination of communication
protocols used in client-server communication
- server host -- client needs to identify the host ( name ) IP address
- endpoint -- client needs to identify a server process on the
server host
- Parameter and Result Passing
Local procedure call
parameters
|
| ↓ |
stub procedure
|
| ↓ |
Convert parameters and
results to appropriate
form ( marshalling )
|
| ↓ |
Pack into buffer
|
| ↓ |
|
|
|
|
| |
Local procedure call
|
| ↑ |
Unpack message
( unmarshalling )
|
| ↑ |
Stub procedure
|
| ↑ |
|
-----------------------------------------------------------
network connection
|
- time-consuming if data conversion needs to be done on every call
- alternatively, send parameters with format code, let receiver
do the conversion
but this requires machine hnow how to convert all formats,
also leads to poor portability
- alternatively, each data type may be in standard format in message
sender converts data to standard format
receiver converts standard format to its local representations
wasteful if both sender and receiver use same internal representation
- by value -- simple, stub procedure simply copies values ( parameters ) into message
- by reference -- complicated, global memory
- Example
Java APIs for XML-based Remote Procedure Call (JAX-RPC)
help with Web service interoperability and accessibility by defining
Java APIs that Java applications use to develop and access
Web services. JAX-RPC fully embraces the heterogeneous nature of
Web services -- it allows a JAX-RPC client to talk to another Web
service deployed on a different platform and coded in a different
language. Similarly, it also allows clients on other platforms and
coded in different languages to talk to a JAX-RPC
service
- Error handling, semantics and corrections
in case of failure
if remote server slow, client may suspect message loss and invoke
RP more than once
if client crashes, server needs to wait to send back results ( wasting time )
- At-least-once semantics
if RPC succeeds => at least one execution of RPC in remote machine,
if fails, it is possible that 0, partial, 1 or more execution has taken place
- Exactly-once semantics
RPC succeeds => exactly one execution of RPC
fails, it is possible 0, partial or 1 execution has taken place
no call will be executed more than one time ( no time out )
- At-most-once semantics
same as exactly-once but calls that do not terminate normally
do not produce side effects
- Correctness condition
Ci denotes a call made by a machine
Wi denotes corresponding computation of called machines
C1 → C2 ( C1 happened before C2 )
Correctness criteria: C1 → C2 => W1 → W2
- Other issues
- connection-oriented
virtual-circuit, e.g. TCP
more reliable, complex
- connectionless-oriented
datagram, e.g. UDP ( user datagram protocol )
less complex, less reliable
- incremental results -- rpc cannot easily return incremental results
- protocol flexibility -- cannot be freely stored in memory,
and thus make the protocol inflexible