Research OutlineI have mainly interest in issues linked to high-performance computing (HPC) and distributed applications (Big Data or Cloud Computing for example). I focus in particular on scheduling aspects, more precisely on static scheduling (computed before execution) based on theoretical models. During my (short) career I deal with several specific problems. Work in progress (If you can read it, French version is more complete). |
This problem is close to the scheduling of the "Map" phase in a computation of type MapReduce. To solve this problem there is two possible approaches. The first one is to keep the initial replication and try to have the best possible makespan with it. The second one is to create additional duplication to reach the best possible makespan and in this case the goal is to have the least possible amount of new replicated data. During my PhD we focused on homogeneous tasks and proved, first, that a greedy algorithm can be modelled by the probabilistic process "balls-into-bins" with choice, which implies an overhead (the difference between the most loaded machine with the average) of size O(m log m) where m is the number of processors. Second, in the case of forbidden non-local tasks, we linked the problem to graph orientability, which leads to polynomial optimal algorithms and the existence of an allocation with an overhead of at most 1 with high probability. In the case where non-local tasks are allowed, the problem is similar to the search of a semi-matching (a variant of matching problem on bipartite graph), what also leads to polynomial optimal algorithms. Finally, we provided a set of simulations to prove the efficiency of this techniques in case of heterogeneous tasks.
Adapted from a template by FreeHTML5.co