» tagged pages
» logout

(Feed found, click Add Page to syndicate.) Error finding feed, please try again » Find feed title

A Blog Page allows you to add entries, for news or other time sensitive postings

(Login required to save to your tagged pages.)
(or Cancel)

Make further edits, (or Cancel)

(Login required to save to your tagged pages.)
(or Cancel)

(Editing anonymously: to be credited for your changes, login or register a new account)

Change Page Permissions? Changing these permissions will adjust who can modify this page.

Anonymous (change)
(change)
(or Cancel)
Upload an image from your computer:
or Copy an image from a URL:
or Erase the current icon:
Icon Preview:

or Cancel

Erase 275? The contents of 275 page and all pages directly attached to 275 will be erased.

or Cancel

(Editing anonymously: to be credited for your changes, login or register a new account)

other page actions:
275

275

Tags Applied to 275

No one has tagged this page.

275 Wiki Pages

Tag Cloud

To further filter what appears in the Things Tagged 275 list, select a tag from the Tag Cloud.
What is 275? Edit this page and describe it here.

sorted by: recent | see : popular
Content Tagged 275

JoBo, crawler program to download complete websites to computer

JoBo is a simple program to download complete websites to your local computer. Internally it is basically a web spider. The main advantage to other download tools is that it can automatically fill out forms (e.g. for automated login) and also use cookies for session handling. Compared to other products the GUI seems to be very simple, but the internal features matters ! Do you know any download tool that allows it to login to a web server and download content if that server uses a web forms for login and cookies for session handling? It also features very flexible rules to limit downloads by URL, size and/or MIME type. For programmers it features a very flexible object model and is easily expandable - expect new modules in the future ! It is implemented in Java and the source code is available. If you want to implement your own web spider, the WebRobot class will be a good starting point. Even if you don't want to use it as a download tool but for indexing, link checking or whatever you want, JoBo is the right tool. Retrieving documents and handling these documents are completely seperated - therefore you can plug in your own module easily. Features * command line and graphical version (but command line version needs a major update, currently the GUI version has much more features) * recursive search of all documents starting from a given start document * support of tags (with fault tolerance) * support of the robot exclusion protocol * user controlled maximal search depth * user agent name can be defined * support of referrer headers * support of automated form handling (JoBo can fill fields with predefined values) * cookie support * XML configuration * used bandwidth can be limited * allow/deny downloads by mime type and document size (e.g. ignore all image/* files) * allow/deny downloads by regular expressions (e.g. don't download /cgi-bin) * can convert absolute links to relative * download only files newer then a given age * resume job JoBo Crawler Home Page http://www.matuschek.net/jobo/ JoBo Crawler Download http://www.matuschek.net/jobo-download/

Java: Open Source Java(OpenJDK)

Username:
Password:
(or Cancel)