This is a 'Paper Reading' post for Course ECE1747. The course name is 'Parallel Computing'

Introduction

Background

In games, players in one big map are usually to separated into different game regions which are hosted on some servers.

Target

Introduce an architecture that, players can feel like play in one big region. The interaction between users from different regions are allowed and will not be aware by the users.

Method

A locality aware load balancing algorithm which can adaptively disperse or aggregate the regions.

Trades-off between:

Balancing the server load in terms of numbers of players
Decreasing inter-server communication

Review of previous work

Two types:

Global algorithm that does not consider spatial locality
- Defect: Compute and communication intensive,
Static game partitioning algorithm
- Defect: Cannot solve the flocking problems

Results:

while simulate two types of configurations:

A centralized local area network server (LAN) as in state-of-the-art game servers
- better than static
- same as without locality
A large-scale wide-area distributed server (WAN)
- better than static
- improve without locality by up to a factor of 6

Core Ideas

GAME SERVER ARCHITECTURE BACKGROUND

Current games require game designers to carefully plan the load distribution at the design phase in order to avoid flocking.

LOCALITY AWARE DYNAMIC LOAD MANAGEMENT

Every server monitor
- quality of service (QoS) violations: average update interval by sampling
Measure locality in two dimensions
- Communication based on network proximity in the game
- Region clustering based on adjacency on the game map.

Definition

SLA violation: when a server exceeds the predefined update interval for 90% of its clients.
Overload threshold (Overload th): the server load in terms of number of clients for which the SLA is violated.
Safe load threshold (Safety_th): the highest server load for which the SLA is still met for all clients

Locality Aware Load Shedding

Target: To solve the hotspot problems by shedding some load to other servers with lighter load.

Best Expectation: Both overloaded source server and lightly loaded target server can become safe loaded.

Two Opt:

Locality is preserved, i.e., the same number of strongly connected components is maintained after load shedding as before and
The minimum number of region migrations occur.

Locality Aware Aggregation

Target: In order to solve net consumption problems caused by locality disruption through merge regions among servers.

Expectation: Reduce the number of inter-server boundaries.

Other Static and Dynamic Partitioning Algorithms Used for Comparison

Spread

Dynamic
Uniformly and simply spread players to other servers regardless of the locality.
Works good for global optimum in terms of the smallest difference between the lowest and highest loaded server.
But ignore network proximity or region adjacency. So may cause high network consumptions.

Lightest

Dynamic
In order to keep the remapping cost low, prioritize shedding load to a single server (the lightest loaded server) instead of several servers.
Clustering of adjacent regions is concerned but as secondary factor.

Static Partitioning

All bad
Use static block partitioning as baseline

IMPLEMENTATION

Based on the game: SimMud

Questions

Why not regions from servers to servers instead of move servers?

Advantages

Resource management is always one of the most critical topics in optimization. Unlike most papers that only consider CPU resources, the paper also finds network bandwidth resources as an essential factor, further relieving players' crowding and increasing resource utilization rates.
Use an actual game to illustrate and experiment. The paper's experiment was based on the game, SimMud. Since the paper presents an algorithm useful for game servers, using a real game to experiment is the most understandable and convincing way.
The experiment section examines the algorithms in two environments, LAN and WAN, which bring detailed and comprehensive experiment data.

Disadvantages

The merge algorithm regards the number of inter-server boundaries as the criterion for a load of inter-server communication. But the inter-server communication frequencies among the different boundaries differ. So a better algorithm should also take the inter-server communication load like bandwidth through a boundary as a weight while merging.
Network bandwidth is now becoming less important considering the popularity of 10 Gigabyte networks (IEEE 802.3a) in the industry. But CPU load is still the bottleneck. If the paper is written now, it may need to test whether reducing inter-server communication can still help reduce user response time. If the answer is no, then the locality seems to be less meaningful. Instead, increasing CPU utilization rate is more critical.
The research is for static-map-partitioning games. It would be better if the algorithm could handle the dynamic partitioning maps, which means the adjacencies of regions are not fixed. The region distribution may also be dynamic and divided or aggregated, thus providing more flexibility for balancing.

kexiang's blog

Locality Aware Dynamic Load Management for Massively Multiplayer Games