In the last years, there has been a tremendous growth in the use of Internet services. In particular, the world-wide web and applications like News- and Video-on-Demand have become very popular. Thus, the number of users, as well as the amount of data each user downloads from servers in the Internet, is rapidly increasing. The usage of multimedia data to represent information in a user-friendly way is one important reason for these two developments. Today, contemporary mid-price personal computers are capable of handling the load that such multimedia applications impose on the client system. However, the potentially (very) high number of concurrent users that download data from Media-on-Demand (MoD) servers represents a generic problem for this kind of client-server applications.

In MoD servers, the data retrieval operations represent a severe bottleneck, because the clients concurrently retrieve data with high data rates. We have developed a new architecture for MoD servers that maximizes the number of concurrent clients that a single server can support. Traditional bottlenecks, like copy operations, multiple copies of the same data element in main memory, and checksum calculation in communication protocols are avoided by applying three orthogonal techniques: (1) the zero-copy-one-copy memory architecture removes all in-memory copy operations and shares a single data element between all concurrent clients; (2) the network level framing mechanism precalculates the transport level checksum and thereby removes most of the communication protocol execution overhead; and (3) the integrated error management scheme removes the redundant error management functionality, i.e., eliminating the parity data encoding costs in a forward error correction scenario.

Our performance measurements show that a lot of resources within the server are freed for other tasks, i.e., enabling more concurrent clients, when using our proposed improvements for streamed multimedia data. The broadcasting scheme eliminates identical data elements in memory while keeping the start-up delay at a minimum, i.e., an unlimited number of users may retrieve data from the number of streams broadcasted from our server. Furthermore, we achieve throughputs of 1 Gbps (limited by the network card) using our zero-copy data path. The amount of CPU time is reduced by approximately 35 %. The communication protocol processing overhead is almost eliminated with network level framing which reduces the checksum procedure by at least 95 % and gives a total server speed-up of a factor of two. Finally, our integrated error management performs the parity data encoding operation off-line, and the parity information is retrieved together with the application data from the storage system at transmission time. This means that the potential encoding bottleneck is eliminated. If this operation would be performed at transmission time, our measurements show a maximum throughput of 25 Mbps in a gigabit environment. Thus, our server supports a high number of broadcasted streams to an unlimited number of clients, and the number of concurrent streams is increased by reducing the resource usage for each stream.

ABSTRACT