Figure 1: Application scenario.
As shown in Figure 2, current server based systems see an end-to-end relationship between server and client. Consequently, servers have to handle all the end-to-end protocols when satisfying a request from a client. Data management system and application are placed on top of the end-to-end protocols. Thus, for each client the same data must be processed through all the end-to-end protocols performing the same CPU intensive operations.
Figure 2: Traditional data storage in a server. | Figure 3: Network level framing. |
Traditionally, as depicted in Figure 4, the storage system and the communication system use different error management mechanisms. A RAID system generates redundant parity information to be able to recover from a disk crash and restore the data, and when the data is sent to the communication system, a new error mechanism is applied where a new set of redundant parity information is computed.
Figure 4: Traditional error management.
Figure 5 shows our approach to integrate the different error recovery mechanism in the storage system and the communication system. Opposed to the traditional data read from a RAID system where the parity information is only read when a disk error occurs, we would also like to read the redundant error recovery data and transmit this data to the remote client for use as forward error correction scheme in the communication system. Thus, instead of reading only the original data, we also read the parity data. All the data retrieved from the storage system is sent to the communication system where the error encoder, which performs CPU expensive operations, now can be committed and removed. Finally, the data is sent to the client where the communication system has a similar error decoder for forward error correction as in the storage system on the server side.
Figure 5: Integrated error management.
In the traditional disk-to-network data path, data is copied between the memory locations of all different subsystems (see Figure 6). Therefore, it is not suitable for high performance systems like a multimedia storage server. This approach results in several expensive cross domain copy operations. To reduce the overhead of copy operations, several zero-copy memory architectures (for example IO-Lite [8], mmbufs [2], the UVM virtual memory system [4], the Genie I/O system [1], etc. - see [11] for more references), as shown in Figure 7, have been proposed to enable data transfers between disk-to-network without physically copying data between the subsystems.
Figure 6: Datapath in a traditional memory architecture. | Figure 7: Zero-copy memory architecture. |
Figure 8: Traditional broadcast. | Figure 9: Delay minimized broadcast. |
Figure 10: Integrated zero-copy and minimized delay broadcasting.
Future work include implementation and evaluation of these mechanisms, evaluation of other existing mechanisms which may be integrated including existing error mechanisms and zero-copy mechanisms, and the integration of all these concepts into the operating system for support of a multimedia storage server.
[1] | Brustoloni, J. C.: ``Interoperation of Copy Avoidance in Network and File I/O'', Proceedings of the 18th IEEE Conference on Computer Communications (INFOCOM'99), New York, NY, USA, March 1999 |
[2] | Buddhikot, M. M: ``Project MARS: Scalable, High Performance, Web Based Multimedia-on-Demand (MOD) Services and Servers'', PhD Thesis, Sever Institute of Technology, Department of Computer Science, Washington University, St. Louis, MO, USA, August 1998 |
[3] | Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., Patterson, D. A.: ``RAID: High-Performance, Reliable Secondary Storage'', ACM Computing Surveys, Vol. 26., No. 2, June 1994, pp. 145 - 185 |
[4] | Cranor, C. D.: ``The Design and Implementation of the UVM Virtual Memory System'', PhD Thesis, Sever Institute of Technology, Department of Computer Science, Washington University, St. Louis, MO, USA, August 1998 |
[5] | Gao, L., Kurose, J., Towsley, D.: ``Efficient Schemes for Broadcasting Popular Videos'', Proceedings of the 8th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV'98), Cambridge, UK, 1998 |
[6] | Halsall, F: ``Data Communications, Computer Networks and Open Systems'', Fourth edition, Addison-Wesley, 1995 |
[7] | Hua, K. A., Sheu, S.: ``Skyscraper Broadcasting: A New Broadcasting Scheme for Meteropolitan Video-on-Demand System'', Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'97), Cannes, France, September 1997, pp. 89-100 |
[8] | Pai, V. S., Druschel, P., Zwaenepoel, W.: ``IO-Lite: A Unified I/O Buffering and Caching System'', Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI'99), New Orleans, LA, USA, February 1999, pp. 15 - 28 |
[9] | Patterson, D. A., Gibson, G., Katz, R. H.: ``A Case for Redundant Arrays of Inexpensive Disks (RAID)'', Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, June 1988, pp. 109 - 116 |
[10] | Plagemann, T., Goebel, V.: ``INSTANCE: The Intermediate Storage Node Concept'', Proceedings of the 3rd Asian Computing Science Conference (ASIAN'97), Kathmandu, Nepal, December 1997, pp. 151-165 |
[11] | Plagemann, T., Goebel, V., Halvorsen, P., Anshus, O.: "Operating System Support for Multimedia Systems", to be published in The Computer Communications Journal, Elsevier, Fall 1999 or Spring 2000 |
[12] | Viswanathan, S., Imielinski, T.: ``Metropolitan area Video-on-Demand Service Using Pyramid Broadcasting'', Multimedia Systems, Vol 4., No. 4, 1996, pp. 197-208 |