Phd Journal: July 2014

These are some notes from

Adaptive workflow scheduling for dynamic grid and cloud computing environment, Rahman et al, Journal of concurrency and control, 2013

Definitions

Workflow: W(T,E) consists of a set of tasks
T= {T1,T2, Tx,.... Ty, Tn}
A set of task dependencies among the tasks,
E = {<Ta,Tb>, ....., <Tx,Ty>}
where Tx is the parent of Ty
R = R1,R2, .... Rx,.... Ry, Rm }
set of available resources in the computational grid.

Workflow Scheduling Problem :
represented a s DAG ( directed acyclic graph )
nodes : tasks
graphs edges : data dependenies

mapping of the WF tasks to grid resources (T --> R).
where makespan M is minimized.

Workflow Task : a set of instructions that can be executed on a single PE of a computing resource.
Entry Task: no parent task.
Exit task : no child task.
Ready task : task that has all of its parent tasks finished.

Schedule length / makespan : overall finish / completion time of an application

Objective : to minimize the makespan of a parallel application by proper allocation of tasks to processors/resources and arrangements of task execution sequences.

* It is an NP complete problem, i.e., problem can be reduced to be solved in polynomial time
Notes on NP time

NP stands for Non-deterministic Polynomial time.
This means that the problem can be solved in Polynomial time using a Non-deterministic Turing machine (like a regular Turing machine but also including a non-deterministic "choice" function). Basically, a solution has to be testable in poly time. If that's the case, and a known NP problem can be solved using the given problem with modified input (an NP problem can be reduced to the given problem) then the problem is NP complete.
The main thing to take away from an NP-complete problem is that it cannot be solved in polynomial time in any known way. NP-Hard/NP-Complete is a way of showing that certain classes of problems are not solvable in realistic time.

These NP-complete problems really come up all the time. Knowing they're hard lets you stop beating your head against a wall trying to solve them, and do something better:

Use a heuristic: If you can't quickly solve the problem with a good worst case time, maybe you can come up with a method for solving a reasonable fraction of the common cases.

Solve the problem approximately instead of exactly: A lot of the time it is possible to come up with a provably fast algorithm, that doesn't solve the problem exactly but comes up with a solution you can prove is close to right.

Use an exponential time solution anyway: If you really have to solve the problem exactly, you can settle down to writing an exponential time algorithm and stop worrying about finding a better solution.

Choose a better abstraction: The NP-complete abstract problem you're trying to solve presumably comes from ignoring some of the seemingly unimportant details of a more complicated real world problem. Perhaps some of those details shouldn't have been ignored, and make the difference between what you can and can't solve.

Most popular heuristics used for WF Scheduling

(A heuristic describes a rule or a method that comes from experience and helps you think through things, like the process of elimination, or the process of trial and error. You can think of a heuristic as a shortcut.)

Myopic
Min Min
Max-Min
HEFT

Metaheuristics used

(Meta-heuristics are general-purpose algorithms that can be applied to solve almost any optimization problem. They may be viewed as upper level general methodologies that can be used as a guiding strategy in designing underlying heuristic.)

GRASP
GA

dff

The most common performance metrics are

response time
According to IBM dictionary of Computing.
"the elapsed time between the end of an enquiry or demand on a computer system and the beginning of a response."
In the simulation I've done.... it will be the difference between start time and submit time, i.e.,
response time = start time - submit time.
In this case, it will be the same for all, because we are using time shared mode.

CPU Utilization
Came across this post from Rodrigo on calculating CPU utilization in cloudsim...
time vs. % of cpu utilization.
found this article on how to calculate CPU utilization.
another link on calculating CPU utilization.
another one here.

Throughput : the number of served requests.
time vs. number of served requests

Phd Journal

Friday, July 4, 2014

Notes on Workflow Scheduling and heuristics used

Definitions

Most popular heuristics used for WF Scheduling

Metaheuristics used

Discussion with Guide

To be done:

Performance metrics