Hello,

    We need some volunteers to collect wide-area traffic traces.  Below we 
briefly describe the research context this collection fits in and what you need
to do if you would help us do the collection.  

1. Research Context
    We are trying to characterize the traffic that flows through the Internet; 
you may have seen our SIGCOMM 91 paper on characterizing wide-area TCP/IP 
traffic. We are now extending this work to be a complete traffic source model 
of the Internet to be used in driving performance evaluations of routing, flow,
congestion, and resource management algorithms.  
    This type of research requires instrumenting lots of stub networks so that 
they save and make available descriptions of all the network traffic that 
terminates or originates at their stub.  The traffic descriptions we need has 
to be much more detailed than the monthly traffic reports NSFNET already makes 
available.  More specifically, we need sufficient information to describe every
"conversation" with an endpoint at the stub.  This requires saving three or
four packet headers per TCP conversation.  
    This data collection would not interfere with normal operation of the 
network and does not pose a network security threat.  Sites sensitive to
privacy issues could perform a one-way transformation on the conversation's
source and destination IP addresses.

2. What You Need to Do
    If you would like to help us do the collection, you need to do an anonymous
ftp to jerico.usc.edu (128.125.51.6) and get the collection package.  The 
package is in the directory: ~ftp/pub/jamin/collect.  The directory contains
the following two files:

     13585 collect.tar.Z
    262019 tcpdump-2.0.tar.Z

    The file "tcpdump-2.0.tar.Z" is the tcpdump distribution from LBL.  You 
need this file only if you don't have tcpdump (version 2.0) installed on your 
system.
    The file "collect.tar.Z" contains the two programs used in the data 
collection process: "collect" and "generator."  

    "Collect" is a shell script which invokes the tcpdump program to collect 
network packets.  (Tcpdump is a network instrumentation program from Lawrence 
Berkeley Lab.)  It is important that the collection routine be run on a machine
on the ethernet segment connected to the sites internetwork gateway, so that
all internet packets can be observed.
    When running on Ultrix or BSD+BPF, tcpdump can log dropped packets.  Not so
on SunOS with the nit interface; the program "generator" is used to estimate 
the loss-rate of the collection run.  The validity of studies based on 
collecting traces depends on the loss-rate of the collection.  Since loss-rate 
depends on the collection machine's CPU utilization, it is best that the 
collection be run on a dedicated workstation.  The nit interface of SunOS cannot
collect packets generated by the local machine, thus on SunOS "generator" should
run on a different machine from the one running collect.  This means that 
estimating the loss-rate of a collection requires three machines: one to run 
"collect," one to run "generator," and one to act as the target of packets 
generated by "generator" (the load put on the "generator" machines is 
negligible).  If you absolutely can spare only one machine, you can forgo doing
loss-rate estimation; we would still encourage you to collect data which we will
use to support conclusions drawn from primary traces.

   There are more detailed instructions on installing and running the
collection package in the README file and manual pages included in
collect.tar.Z, but to summarize, these are the equipments you need for doing
the collection:

1. A machine running an operating system supported by the tcpdump program, i.e.
   one of SunOS, Ultrix, BSD+BPF (See the README file in tcpdump-2.0.tar.Z).
2. The said machine should have enough disk space to hold the collected data.
   This space requirement depends on the load on your network.  For the three
   sites we have done our collection so far, UC Berkeley saw 64,800 
   conversations during the 24 hour collection period, USC saw 17,950 
   conversations, and Bellcore saw 14,130 conversations.  From these figures, 
   we calculated that 20MB disk space should be more than enough to hold a
   day's worth of data.

And if you are doing loss-rate estimation:
3. A machine to run the "generator" program.  Any UNIX machine will do.
4. A machine to act as a target of the "generator" program.

             ======================================
                               ||
                               ||
                             +----+
                             | GW |
                             +----+
                                |
                                |
             -------------------------------------------------
                 |                       |                 |
                 |                       |                 |
            +---------+              +-------+         +---------+
            |generator|              |collect|         |generator|
            | source  |              | host  |         | target  |
            |OPTIONAL |              |RQUIRED|         |OPTIONAL |
            +---------+              +-------+         +---------+

                Fig. 1: Example collection run setup.

    If you have any problems or questions, please feel free to contact us at:
traffic@excalibur.usc.edu.