Tuesday, August 26, 2014

Notes on A market-oriented hierarchical scheduling strategy in cloud WF systems

Notes on A market-oriented hierarchical scheduling strategy in cloud WF systems

Authors
Zhangjun Wu, Zhiwei Ni : Hefei Univ of Tech., China
Zhangjun Wu, Xiao Liu, Dong Yuan, Yun Yang - Swinburne Univ of Tech, Australia

A market-oriented hierarchical scheduling strategy is proposed.
service-level scheduling deals with task-to-service assignment
tasks in indv. WF instances are mapped to cloud services in the global cloud markets based on functional and non-functional QoS req.
task-level scheduling : deals with optimization of the Task -to-VM assignment in local cloud data centres.
Running cost of cloud WF systems will be minimized give the satisfaction of QoS constraints for individual tasks.
A package based random scheduling algorithm is presented for candidate service-level scheduling algorithm
3 metaheuristic scheduling algorithms implemented for candidate task-level scheduling algorithm

  • GA
  • ACO - best performance.
  • PSO

Performance metric :
Optimization rate on makespan,
Optimization rate on cost
CPU time



Why do research?

Found this article inspiring and helpful when sometimes I lost track of what exactly i'm doing

Doing a PhD – Preparing for a Career in Research 

“The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skill. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science.” – Albert Einstein 

What is a problem formulation? The problem formulation consists of just one sentence and should make it clear to everyone what research problem, you aim to address and to whom and where it is relevant. In other words, the problem formulation is the heart (or core) of your thesis to which you should always return if you lose track during your further research and writing process

The problem formulation is based on the rationale you reached through your explorative search and may be the first thing you write related to your thesis. The aim of a problem formulation is also to set a framework for your research and a good problem formulation is essential for completing a good study.

Having read all that... I feel inspired... but still lost :(

Monday, August 11, 2014

Notes from agent based elastic cloud bag-of-tasks - Garcia et al

A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling

September 2013

Authors : 
J. Octavio Guitierrez -Garcia, Post doc in Korea (now in ITAM )
Kwang Mong Sim, United Kingdom.
Published in FGCS, link

In this paper they have presented the results of 14 scheduling heuristics which they have used to schedule BoTs in a dynamic environment using agents communicating and coordination using Contract Net Protocol.

BoTs : numerous unconnected (i.e. without precedence constraints) tasks which can be highly parallelized given their unconnected nature.

Classification of scheduling heuristics

immediate mode : map BoT tasks to cloud resources as soon as they arrive at the scheduler, eg. FCFS
batch mode : pre-schedule BoT tasks on a previously defined set of cloud resources before starting the execution, eg. Min-min, Max-min.

Aim : to maximize resource utilization

Why agents?  They are endowed with distributed and cooperative problem solving techniques.

Contract Net Protocol : allow automated and dynamic selection and composition of cloud resources from a pool of cloud providers to execute BoTs.

Why agent based problem solving technique?  it supports distributed, concurrent, and parallel scheduling and execution of BoTs in heterogeneous sets of dynamically provisionned cloud resources allocated in terms of 1 hour time slots (after the model in Amazon EC2)

Scheduling heuristics - derived from a combination of the following two phases

1) Task Ordering 
  • Unordered (U)
  • Tasks ordered by size from large to small (LtoS)
  • Tasks ordered by sized from small to large (StoL)
2)Task Mapping Policies
  • Random Mapping(R)
  • Maximum expected remaining allocation time mapping (MaxET)
  • Maximum current remaining allocation time mapping (MaxCT)
  • Minimum expected remaining allocation time mapping (MinET)
  • Minimum current remaining allocation time mapping (MinCT)
{U, LtoS, StoL) X {R, MaxET, MaxCT, MinET, MinCT} /{U,R}

{U,R} represents a completely random immediate mode scheduling heuristic that follows FCFS allocation policy... therefore this is used for comparison.

Implementation

It is implemented using JADE (Jave Agent Development framework).... (finally! something familiar :) )


Important Observations and conclusions from Garcia et al (that I feel may be relevant for my work )
  •  BoTs containing many small tasks and few big tasks scheduled using the MaxET task mapping policy ( {U, LtoS, StoL} X {MaxET}) had the shortest makespans regardless of the task ordering type.
  • Due to the NP-complete nature of the scheduling problem, there was not dominant scheduling heuristic for all the BoT types ( uniformly, normally, right-normally and left-half normally distributed BoT's ), neither the proposed cloud scheduling heuristics based on the remaining allocation time of cloud resources nor the benchmark scheduling heuristics.
  • Agents in the testbed were capable of successfully executing and scheduling BoTs concurrently with a 100% success rate for randomly generated BoTs of a wide variety of sizes in randomly generated cloud environments even with high execution request rates and handling multiple BoTs at the same time.
  • agents in the testbed efficiently scheduled and executed BoTs by sending only a linearly increasing number of messages among CAs, BAs, SPAs and RAs that depends only in the number of tasks contained in BoTs.
  • Agents' distributed and cooperative problem solving techniques are efficient enough to support concurrent scheduling and execution of multiple BoTs with extremely high execution request rates and stringent constraints.

Performance Metrics

  • Av BoT makespan
  • BoT Success rate
  • Av. BoT scheduling overhead time
  • Av. no. of msg exchanged for executing BoT's
Found this in future research directions useful
Designing cloud scheduling heuristics for handling data 

Scheduling in hybrid cloud notes from Bosche et al

Objective of hybrid clouds :

  The usage of internal infrastructure in tandem with public cloud services.

Hybrid Clouds

In a hybrid cloud setting, the execution of applications occurs
through the deployment of virtual machine instances on resources
internal to the consumer organization, as well as resources
provided by the public cloud providers.

Characteristics of Provider


Need for research :

  optimization problem of allocating resources in a cost-optimal manner with support of QoS such as deadlines.

From what I understand, they are calculating cost and the tasks are being scheduling in public or private cloud... whichever is cheaper
A linear optimization solution was done in their previous work for static jobs... they concluded it is not feasible.

Another dead end... I really thought I can formulate a problem with the future work in this paper... :(

Friday, August 8, 2014

Popular heuristics for Workflow Scheduling

Myopic

It is an individual task scheduling heruristic
It is the simplest scheduling method for scheduling workflow applications.
It makes scheduling decisions based on individual task.
System Implemented : Condor DAGMan
  It schedules an unmapped ready task in arbitrary order to the resource, which is expected to complete the task earliest.

Min-min

  This prioritizes workflow tasks based on their priorities
It assigns task priority on the basis of its expected completion time (ECT) on a resource.
It organizes workflow tasks on the basis of its expected ECT on a resource.
It organizes WF tasks into several independent task groups
schedules each group of independent tasks interactively.
In every iteration, it takes the set of all unmapped independent tasks T.
generates the minimum ECTs (MCT) for each task t in T where MCT t = min ECT (t,r) or every resource in R.

** It considers all unmapped tasks during each mapping decision.
In comparison, Myopic only considers one task at a time.
Systems implemented in : vGrADS, Pegasus
http://vgrads.rice.edu/

Max -min

It is similar to min-min
it sets the priority of the task to that which requires the longest execution time rather than shortest execution time.
At each iteration after attaining MCT values for all unmapped independent tasks, tasks having maximum MCT is chosen to be scheduled earliest.  Any attempts to minimize WF execution time by assigning longer tasks to comparitively best resources.
System Implemented in : Pegasus

HEFT (Heterogeneous Earliest Finish Time)

It is a list scheduling algorithm
gives higher priority to WF task having higher rank value
rank value:
  It is calculated by utilizing avg. execution time for each task and avg. communication times between resources of two successive tasks.
  Tasks in CP have comparitively higher rank values.
It sorts the tasks in decreasing order of their rank values.
tasks with higher rank value have higher priority.
tasks are scheduled in order of their priorities.
each task is assigned to the resource that can complete the task at the earliest time.
WF is represented as DAG and rank values are calculated by traversing the task graph in BFS (Breadth First Search) manner. in reverse order of task dependencies.

Advantage over min-min and max-min

while assigning priorities to tasks, it considers entire WF rather than focusing on only unmapped independent tasks.
Systems Implemented : ASKALON WF manager to provide scheduling for quantum chemistry appln WIEN2K

Metaheuristics

GRASP: Greedy ranodmized adaptive search procedure

iterative radomized search technique
A number of iterations are conducted to search a possible optimal solution for mapping tasks on resources.
A solution is generated at each iterative step.
best solution is kept as the final schedule.
It terminates when the specified termination criterion (no. of iterations) is satisfied.
* can generate better schedules than other scheduling techniques.
* It searches the whole solution space considering entire workflow and available resources.

GA : Genetic Algorithm

similar to GRASP
It allows high quality solution to be derived from a large search space in polynomial time by applying principles of evolution.
It combines exploitation of best solution from past searches with exploration of new regions of solution space.
Instead of creating a new solution by randomized search as in GRASP, GA generates new solution at each step by randomly modifying the good solutions generated in previous steps.
This results in a better solution.