» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with Configuration + linux

IsiSetup - APT for your config

"IsiSetup is an utility for the system administrator. It helps you managing your configuration files: You can rollback changes. You can explore the history of changes. You can replicate your configuration. You can backup your configuration. You can blame changes to admins."

IsiSetup: IsiSetup - del.icio.us links

etckeeper

etckeeper is a collection of tools to let /etc be stored in a git, mercurial, or bzr repository. It hooks into apt (and other package managers) to automatically commit changes made to /etc during package upgrades. It tracks file metadata that revison control systems do not normally support, but that is important for /etc, such as the permissions of /etc/shadow. It's quite modular and configurable, while also being simple to use if you understand the basics of working with revision control.

git: del.icio.us tag/git

etckeeper

etckeeper is a collection of tools to let /etc be stored in a git, mercurial, or bzr repository. It hooks into apt (and other package managers) to automatically commit changes made to /etc during package upgrades. It tracks file metadata that revison control systems do not normally support, but that is important for /etc, such as the permissions of /etc/shadow. It's quite modular and configurable, while also being simple to use if you understand the basics of working with revision control.

User:daveg: del.icio.us/daveg

Why is MSNBot ignoring robots.txt?

Today, the root file system on our public svn server nearly ran out of disk space. The reason? The /tmp directory was quickly filling up with temporary files created by websvn, which I set up parallel to the FishEye repository browser for testing purposes. A quick investigation of the apache log files revealed the culprit - a crawler from Microsoft was running haywire and decided to ignore the rules in the robots.txt file, even though it did actually looked at the file before!

Here is how robots.txt looked like (I now changed it to disallow everything):

User-agent: *
Disallow: /fisheye/
Disallow: /websvn/

If I am not mistaken, no crawler should actually consider going into the SVN browser directories. Some snippets from the apache log:

$ grep robots.txt /var/log/apache2/access_log | grep msn
65.55.208.178 - - [03/Aug/2008:16:58:35 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [03/Aug/2008:19:05:55 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.235.139 - - [03/Aug/2008:22:14:47 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.25.136 - - [04/Aug/2008:00:31:32 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [04/Aug/2008:00:57:38 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.235.139 - - [04/Aug/2008:06:49:33 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [04/Aug/2008:07:16:21 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.25.136 - - [04/Aug/2008:09:29:17 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.104.156 - - [04/Aug/2008:11:08:24 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [04/Aug/2008:11:29:34 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [05/Aug/2008:13:30:20 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.208.178 - - [05/Aug/2008:16:17:59 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"

Good boy, it checks the robots.txt file. But what is this?

$ grep msnbot /var/log/apache2/access_log | tail -20
65.55.208.164 - - [05/Aug/2008:22:48:15 +0200] "GET /websvn/filedetails.php?repname=MySQL+Documentation&path=%2Fworkbench%2Fall-entities.ent&rev=9981&sc=1 HTTP/1.1" 200 6408 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:15 +0200] "GET /websvn/dl.php?repname=MySQL+Connector%2FJ&path=%2Fbranches%2Fbranch_5_0%2Fconnector-j%2F&rev=6600&isdir=1 HTTP/1.1" 200 40960 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:19 +0200] "GET /websvn/rss.php?repname=MySQL+Documentation&path=%2Fproto-doc%2F&rev=9994&sc=1&isdir=1 HTTP/1.1" 200 36907 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:21 +0200] "GET /websvn/rss.php?repname=MySQL+Documentation&path=%2Ffalcon%2F&rev=8323&sc=0&isdir=1 HTTP/1.1" 200 15278 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:21 +0200] "GET /websvn/rss.php?repname=MySQL+Proxy&path=%2Ftrunk%2FDoxyfile&rev=365&sc=1&isdir=0 HTTP/1.1" 200 4162 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:21 +0200] "GET /websvn/rss.php?repname=Eventum&path=%2Feventum%2Freports%2F&rev=3542&sc=1&isdir=1 HTTP/1.1" 200 90591 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:23 +0200] "GET /websvn/log.php?repname=MySQL+Documentation&path=%2Fndbapi%2F&rev=9749&sc=0&isdir=1 HTTP/1.1" 200 21440 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:23 +0200] "GET /websvn/log.php?repname=MySQL+Documentation&path=%2Ffalcon%2F&rev=8511&sc=0&isdir=1 HTTP/1.1" 200 18541 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"

As you can see, it is happily crawling everything below /websvn/, which also includes links named "Tarball" - guess what they are good for? Yes, they create tarballs of a given SVN directory, using /tmp to build up the archive file... Within a very short amount of time, it used up more than 6 GB of disk space, as it seems as if websvn leaves these temporary directories behind, if the connection gets aborted or times out. We do have a cron job that wipes /tmp from files older than a certain amount of days, but it currently fills up much faster than what the cron job usually discards. I need to investigate if it is actually is a bug in websvn to leave these temporary dirs behind.

Hello Microsoft? Can you please fix your bots so they not only read but honor robots.txt files and stop DOSing our site? Thanks -)

MySQL: Planet MySQL

Augeas — Main

Augeas is a configuration editing tool. It parses configuration files in their native formats and transforms them into a tree. Configuration changes are made by manipulating this tree and saving it back into native config files.

User:daveg: del.icio.us/daveg

Webmin

Webmin is a web-based interface for system administration for Unix. Using any modern web browser, you can setup user accounts, Apache, DNS, file sharing and much more. Webmin removes the need to manually edit Unix configuration files like /etc/passwd, and

open-source: del.icio.us tag/open-source

WiFi Radar

WiFi Radar is a Python/PyGTK2 utility for managing WiFi profiles. It enables you to scan for available networks and create profiles for your preferred networks. At boot time, running WiFi Radar will automatically scan for an available preferred network an

open-source: del.icio.us tag/open-source

Page 1 | Next >>