HTTrack
The Web Mirror Utility

(Short) Documentation


I- How to use WinHTTrack (on Windows95/98)

This program is an easy way to use HTTrack, through a wizard-like program.

Launch WinHTTrack, and enter URLs in the URL list. Mirroring web site with wizard allows you to control which links must be treated or not if httrack is unable to make a decision. Without wizard, httrack will try to guess it.

Note: The "recurse get" is not oftenly used, because it is not as simple and efficient as the two former methods.
Tip: You can enter more than one URL, by pressing Control-Enter after each line. This will mirror several sites together.

Other fields can be filled, if you want to use them:

Proxy: Set the proxy field if you want to use it (ask your internet provider if you do not know the proxy name/or the proxy port)

Filters: By clicking this button, you will be able to fill two list-boxes : one is for forbidden links, the other is for accepted links. You can use limited jokers (*) to refuse/accept multiple links. Here are some examples:
www.thisweb.com This will refuse/accept this web site (all links located in it will be rejected)
*.com This will refuse/accept all links that contains .com in them
*cgi-bin* This will refuse/accept all links that contains cgi-bin in them
*.zip This will refuse/accept all zip files

Options: Many options can be defined (maximum file size, site size, building option, timeout etc etc.)

Tip: To use WinHTTrack as a spider (for checking links), just set the scan mode as "Just scan", mark the boxes "Log files" and "Test all links" and unmark the "Cache "box.
Use combination of all options to have different results.



II- How to use HTTrack (the command-line version)

The command-line program is available for many systems (PC, Linux PC, Sun Solais, AIX) and allows you to control the robot through a command-line. This can be useful for an automatic mirror of a web site.

IIb- Example: Use of HTTrack (the command-line version)



You are a webmaster, and you would like to make a mirror of a web-site:
Every week (or every day), you can launch (ex: crontab):

httrack --update www.myweb.abc -O /public_html/,/home/root/

This will maintain an up-to-date web site into your host.


You are a simple user, and you would like to make a mirror of a web-site for your own:
Just type:
httrack www.myweb.abc


When you want to update it, just launch: httrack --update and httrack will automatically update it.


You want to check links in a site/web page :
Just type:

httrack www.myweb.abc --spider

And look at the file hts-err.txt : all errors will be reported here.


Comments, problems and bug report are welcome, for the shell and for the robot.