Help

Using SWISH-E To Index Your Site

SWISH-Enhanced is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other text files. Once indexed, you can perform quick searches on your Web pages using the index file. It is currently installed on CGI101, so if you're a customer, you don't need to install it. If you're not a customer, check with your ISP to see if SWISH-E is already installed; if not, you can downloaded it from SunSITE.

There are three parts to making your web site searchable with SWISH-E. First, you have to create a configuration file that SWISH-E will read to index your site. Then you have to actually index the site. And lastly, you have to have a CGI that will perform the search and return results.

Step 1. Create the Config File

To create an index of your pages, SWISH-E reads a configuration file to determine which pages should (or should not) be indexed. You should download (or copy) the following file to your own account:

Then you'll need to edit it. There are three things that have to be changed:

The paths to your web directory should be fixed, so you should replace /home/yourusername/public_html with the actual (full) path to your web files.

Nothing else should need changing unless you want to fine-tune your search engine (such as omitting files with certain names, etc.). If you read through the config file you'll see the different options, plus help for each one. Any line that starts with a "#" is a comment, and many options are commented out by default.

The sample config file is also set up so that it only indexes .html files. If you want to index other files, for example .txt or .shtml files, you'll need to change the following line near the bottom of the config file:

And add the suffixes you want, for example:

2. Index The Site

Once your config file is saved, you'll have to run swish-e to create the index file. This can be done from the unix command line like so:

If all goes well, the index file will be created at the location specified by the IndexFile directive in the conf file. The first time you run this, you'll also want to chmod 644 swish.conf to make it readable by your CGIs.

You'll need to re-index your pages whenever you make changes to them. You can either do this manually every few weeks or so (depending on the frequency of the changes), or you may want to create a cron job to re-index your site nightly. (I recommend this, because it lets you change your pages without worrying about the index.) To set it up in cron, type

to edit your cron file. You'll be put into an editor (which will be whatever your default editor is - possibly pico or vi). You'll then add the following line:

Then save the file. This will schedule the indexer to run at midnight every night.

3. A CGI To Perform Searches

Once you've indexed your site, you can make it searchable by adding the following form to any of your pages:

And here's the CGI to handle the actual search:

Source code: http://www.cgi101.com/help/search.txt

You may also want to check out SunSITE's collection of other scripts to search-enable your pages.

Spidering

The above example uses SWISH-E's filesystem method of indexing. It can be configured to spider a site instead (using HTTP calls); this is useful if you want to index a remote site. Visit http://sunsite.berkeley.edu/SWISH-E/Manual/spidering.html for instructions on how to do this.

I don't recommend this for indexing your own site, especially if you have bandwidth limitations on your account, because the spider traffic will eat up some of (or a lot of, depending on the size of your site) your web traffic quota.


home