(Short) Documentation
This program is an easy way to use HTTrack, through a wizard-like
program.
Launch WinHTTrack, and enter URLs in the URL list. Mirroring web site with
wizard allows you to control which links must be treated or not if httrack
is unable to make a decision. Without wizard, httrack will try to guess it.
Note: The "recurse get" is not oftenly used, because it is not
as simple and efficient as the two former methods.
Tip: You can enter more than one URL, by pressing Control-Enter
after each line. This will mirror several sites together.
Other fields can be filled, if you want to use them:
Proxy: Set the proxy field if you want to use it (ask your
internet provider if you do not know the proxy name/or the proxy port)
Filters: By clicking this button, you will be able to fill two list-boxes
: one is for forbidden links, the other is for accepted links. You can use
limited jokers (*) to refuse/accept multiple links. Here are some examples:
www.thisweb.com | This will refuse/accept this web site (all links located in it will be rejected) |
*.com | This will refuse/accept all links that contains .com in them |
*cgi-bin* | This will refuse/accept all links that contains cgi-bin in them |
*.zip | This will refuse/accept all zip files |
Options: Many options can be defined (maximum file size, site size, building option, timeout etc etc.)
Tip: To use WinHTTrack as a spider (for checking links), just set
the scan mode as "Just scan", mark the boxes "Log files" and "Test all links"
and unmark the "Cache "box.
Use combination of all options to have different results.
The command-line program is available for many systems (PC, Linux PC, Sun Solais, AIX) and allows you to control the robot through a command-line. This can be useful for an automatic mirror of a web site.
httrack --update www.myweb.abc -O /public_html/,/home/root/ |
This will maintain an up-to-date web site into your host.
httrack www.myweb.abc |
When you want to update it, just launch: httrack --update and httrack
will automatically update it.
httrack www.myweb.abc --spider |
And look at the file hts-err.txt : all errors will be reported here.