Hello, We need some volunteers to collect wide-area traffic traces. Below we briefly describe the research context this collection fits in and what you need to do if you would help us do the collection. 1. Research Context We are trying to characterize the traffic that flows through the Internet; you may have seen our SIGCOMM 91 paper on characterizing wide-area TCP/IP traffic. We are now extending this work to be a complete traffic source model of the Internet to be used in driving performance evaluations of routing, flow, congestion, and resource management algorithms. This type of research requires instrumenting lots of stub networks so that they save and make available descriptions of all the network traffic that terminates or originates at their stub. The traffic descriptions we need has to be much more detailed than the monthly traffic reports NSFNET already makes available. More specifically, we need sufficient information to describe every "conversation" with an endpoint at the stub. This requires saving three or four packet headers per TCP conversation. This data collection would not interfere with normal operation of the network and does not pose a network security threat. Sites sensitive to privacy issues could perform a one-way transformation on the conversation's source and destination IP addresses. 2. What You Need to Do If you would like to help us do the collection, you need to do an anonymous ftp to jerico.usc.edu (128.125.51.6) and get the collection package. The package is in the directory: ~ftp/pub/jamin/collect. The directory contains the following two files: 13585 collect.tar.Z 262019 tcpdump-2.0.tar.Z The file "tcpdump-2.0.tar.Z" is the tcpdump distribution from LBL. You need this file only if you don't have tcpdump (version 2.0) installed on your system. The file "collect.tar.Z" contains the two programs used in the data collection process: "collect" and "generator." "Collect" is a shell script which invokes the tcpdump program to collect network packets. (Tcpdump is a network instrumentation program from Lawrence Berkeley Lab.) It is important that the collection routine be run on a machine on the ethernet segment connected to the sites internetwork gateway, so that all internet packets can be observed. When running on Ultrix or BSD+BPF, tcpdump can log dropped packets. Not so on SunOS with the nit interface; the program "generator" is used to estimate the loss-rate of the collection run. The validity of studies based on collecting traces depends on the loss-rate of the collection. Since loss-rate depends on the collection machine's CPU utilization, it is best that the collection be run on a dedicated workstation. The nit interface of SunOS cannot collect packets generated by the local machine, thus on SunOS "generator" should run on a different machine from the one running collect. This means that estimating the loss-rate of a collection requires three machines: one to run "collect," one to run "generator," and one to act as the target of packets generated by "generator" (the load put on the "generator" machines is negligible). If you absolutely can spare only one machine, you can forgo doing loss-rate estimation; we would still encourage you to collect data which we will use to support conclusions drawn from primary traces. There are more detailed instructions on installing and running the collection package in the README file and manual pages included in collect.tar.Z, but to summarize, these are the equipments you need for doing the collection: 1. A machine running an operating system supported by the tcpdump program, i.e. one of SunOS, Ultrix, BSD+BPF (See the README file in tcpdump-2.0.tar.Z). 2. The said machine should have enough disk space to hold the collected data. This space requirement depends on the load on your network. For the three sites we have done our collection so far, UC Berkeley saw 64,800 conversations during the 24 hour collection period, USC saw 17,950 conversations, and Bellcore saw 14,130 conversations. From these figures, we calculated that 20MB disk space should be more than enough to hold a day's worth of data. And if you are doing loss-rate estimation: 3. A machine to run the "generator" program. Any UNIX machine will do. 4. A machine to act as a target of the "generator" program. ====================================== || || +----+ | GW | +----+ | | ------------------------------------------------- | | | | | | +---------+ +-------+ +---------+ |generator| |collect| |generator| | source | | host | | target | |OPTIONAL | |RQUIRED| |OPTIONAL | +---------+ +-------+ +---------+ Fig. 1: Example collection run setup. If you have any problems or questions, please feel free to contact us at: traffic@excalibur.usc.edu.