.nr PO 1.5i
.sp 2i
.ft B
.ce 2
The MultiProcessor Simulator\(co / Experimental Version 1.0
.sp 2
.ft R
by F\*'elix E. Quevedo Stuva
.sp 4
.ad b
.fi
.ls 2
.NH 1
Nice test program
.PP
This example does not other objective to just test the communication
and share memory features of the simulator. In general allocates
share memory segment and sends a round ribbon greeting message.
.PP
The host program (hih), loads the processor program (hi)
to the available processors. Then creates the share memory segment
and saves the pointer. Finally sends a greeting message to processor 0
of type 1, and waits for a message of type number of proecessors plus 1.
Once the message is received it displays the content of the share memory
segment.
.PP
The processor programs, waits for a message of any type. Once received
it attaches the shere memory segment and updates an area of index equal
to the processor number with the processor number. Finally sends a message to
the next processor after incrementing the previous type received in 1.
If the processor is the last it send it to the host.
.bp
.sp 5
.NH 2
Maximum Finding Algorithm CRWR SM MIMD.
.PP
This example uses the share memory feature of the MPsim. It also intends
to show how to make a user semaphore and simulate a SIMD machine.
This algorithm is known to be the fastest.
.PP
The host program creates the share segment, prepares data, and loads the
processor program one at a time. Once all the processors are loaded, the
host wiats for all the processors to complete the first step. Once all the 
processors has completed the first step, it resets the step flag and
waits for the second step. Finally displays the results on the
corresponding share memory area.
.PP
The processor program in the first step obtains the maximum of the two
assigned values, marking the looser vector. Then increments the flag a
and if its a second step processor, waits for the host to reset the flag.
Once the processor resets the flag verifies if the assigned looser index
value is winner (not looser) to save the maximum value on the correspponding
share memory area.
.bp
.sp 5
.NH 2
Maximum Finding Algorithm for a Binary Tree machine, mapped on a
Linear Array Computer.
.PP
This example uses the interprocess communication feature. It also simulates
the mapping of a Binary Tree Machine on a simulated Linear Array Computer.
The algorithm is known to be Cost Optimal.
.PP
The host program prepares data, load to all the processor available the
processor program and send to each one the assigned partition. Finally
waits for the result as a messages from the root processor.
.PP
The processor program receives the partition, finds the maximum and send
the result to its parent. At this moment the mapping of a  binary tree
machine on a linear array computer, proposed by Sarkar and Deo [1988], enters
in effect. Depending on the stage algorithm it will receive, process and
send; or will receive and send. Finally if the processor is the mapped root
of the binary tree, send the result to the host process.
.PP
The output attached to the implementation, shows step by step how the
communication is performed. Observe how the execution of the processors (nodes)
are completely asynchronical at the begining.
.bp
.sp 5
.NH 2
Mesh Matrix Multiplication
.PP
This example uses the interprocess communication feature, simulating a Mesh 
network. The basic algorithm was taken from Alk [1989] page 179.
.PP
The host program setups by loading the processor program and sending
the matrix dimension as messages. Then prepares data and sends the elements as
describe in the algorithm as messages of type 1 (horizontal) and type 2
(vertical). Finally it waits for the results at each processor.
.PP
The processor program after setup, waits for messages type 1 and 2. Once received
it operates the multiplication. After the given number of operations, it sends
to the host as a message of type 3.
.PP
The output attached shows the results of multiplying two matrices and
status of the processor after the execution.
.bp
.sp 5
.NH 2
Prim's MST Algorithm for a Hypercube machine.
.PP
This example uses the interprocess communication feacture, simulating a
Hypercube machine.
.PP
The host setups by loading the processor program and sending the dimension
and partition of the graph. Then it waits for the node 0 to send the paths
of the MST.
.PP
The process program finds the minimum at each entry level and sends to its
parents. A parent receives from its childs their results and compares to
the one they obtained. The parent selects the smallest, after receiving from
all childs, the smallest one, and send to its parent the results, and so on.
If it is the node 0, it sends the new MST path to the host and the new entry
to its child, and its child to its child, up to the the leaf nodes.
.bp
.sp 5
.NH 1
Implementation
.PP
The simulator has been implemented on UNIX C. It is transportability
has been proved between a SUN 3/60 and a VaxStation II. In both cases
the main modules were working, but with no guarantee of full feature
capacity.
.PP
The implementation consist of the following files:
.IP a)
mpsim.c: the interactive module. Contains the
.I main()
and the
.I _stop_handler()
functions.
.IP b)
mplib.c: the common function library. Contains the
.I
mpopen(), mptime(), mpprint(), mperro(), mehost(), _mplnode()
.R
functions.
.IP c)
mpsys.c: the system function library. Contains the
.I
_handler_req(), _handler_chl(), _handler_sys(), _mpinit(), _mpkill(),
_mplhost(), _mprun(), _mpstart(), _mpstatus(), _mpterm(),
.R and
_mpwait()
.R
functions.
.IP d)
mpusr.c: the user function library. Contains the
.I
_pause_handler()*, _getpt(), _trchd(), _imhost(), mprnd(), mpdim(),
mynode(), load(), _mvstr(), _rcvm(), recv(), trcv()*, send(), 
probe(), crtshm(), attshm(), detshm()
.R and
slpmsq()*
.R
functions. The ones marked with a * are still on experimental phase.
