Caltech Control and Dynamical Systems Technical Reports

Heavy-Tailed Distributions, Generalized Source Coding and Optimal Web Layout Design

Zhu, Xiaoyun and Yu, Jie and Doyle, John (2000) Heavy-Tailed Distributions, Generalized Source Coding and Optimal Web Layout Design. Technical Report. California Institute of Technology, Pasadena, CA. [CaltechCDSTR:2000.001]

Full text available as:

Postscript - Requires a viewer, such as GhostView

Abstract

The design of robust and reliable networks and network services has become an increasingly challenging task in today's Internet world. To achieve this goal, understanding the characteristics of Internet traffic plays a more and more critical role. Empirical studies of measured traffic traces have led to the wide recognition of self-similarity in network traffic. Moreover, a direct link has been established between the self-similar nature of measured aggregate network traffic and the underlying heavy-tailed distributions of the Web traffic at the source level. This report provides a natural and plausible explanation for the origin of heavy tails in Web traffic by introducing a series of simplified models for optimal Web layout design with varying levels of realism and analytic tractability. The basic approach is to view the minimization of the average file download time as a generalization of standard source coding for data compression, but with the design of the Web layout rather than the codewords. The results, however, are quite different from standard source coding, as all assumptions produce power law distributions for a wide variety of user behavior models. In addition, a simulation model of more complex Web site layouts is proposed, with more detailed hyperlinks and user behavior. The throughput of a Web site can be maximized by taking advantage of information on user access patterns and rearranging (splitting or merging) files on the Web site accordingly, with a constraint on available resources. A heuristic optimization on random graphs is formulated, with user navigation modeled as Markov Chains. Simulations on different classes of graphs as well as more realistic models with simple geometries in individual Web pages all produce power law tails in the resulting size distributions of the files transferred from the Web sites. This again verifies our conjecture that heavy-tailed distributions result naturally from the tradeoff between the design objective and limited resources, and suggests a methodology for aiding in the design of high-throughput Web sites.

EPrint Type:Monograph (Technical Report)
Subjects:All Records
ID Code:63
Deposited By:Caltech Library System
Deposited On:16 July 2006
Unique Identifier:CaltechCDSTR:2000.001
Official Persistent URL:http://resolver.caltech.edu/CaltechCDSTR:2000.001
Usage Policy:You are granted permission for individual, educational, research and non-commercial reproduction, distribution, display and performance of this work in any format.

Archive Staff Only: edit this record