AnalogX
CGISearch
CGISearch
CGISearch
CGISearch

CGISearch

version 3.00
version 3.00
version 3.00
version 3.00

version 3.00

Documentation

Documentation


I've always liked the idea of having a search engine of some sort on my site, but I never found one that seemed both easy to setup/maintain, and powerful; so I wrote one! At it's simplest form, AnalogX CGISearch is really just a string matching search engine, but I've taken it one step further and given it the ability to extract data from files and do a whole lot more...

Setting up CGISearch is EASY; just copy the executable into your /cgi-bin/ directory, make a subdirectory called 'srchdata', and make sure there are at least four files, 'header.htm', 'footer.htm', 'body.htm', and 'config.txt' inside. Then, simply add the search page to your site, and voila! Your entire site is now searchable! Now, before you get all excited, there are a couple of security issues you should be aware of; first, by default it ONLY searches files that end with '.htm' or '.html', so NOTHING else will be processed. If you need to add more, just go into the 'config.txt' file and add another FILESPEC line with the new extension. Second, if you have some 'hidden' pages, ie, that don't have any direct links to them, the search engine WILL parse them an return them in the results, so this is not a valid way to try to keep something unreadable. You can have the engine exclude a specific path by adding another EXCLUDEPATH to the 'config.txt' file; but keep in mind this is NOT recursive, so it will still search any directories within this excluded directory. Finally, it starts it's search from one directory below where it's installed; so if it's in C:/Internet/Webserver/Site/cgi-bin/, it will search EVERY file (matching it's search spec) in any directory below C:/Internet/Webserver/Site/, EXCEPT it will ALWAYS skip the /cgi-bin/ directory (for security reasons). If none of these are big deals to you (they weren't for me), then get ready to search!

AnalogX CGISearch has many of the same searching features of the big boys like AltaVista; you can tell it that a file MUST contain a word (with the '+' operator), that it MUST NOT contain a word (with the '-' operator), that it should look for an exact phrase (with the '"' operator), or that it is just looking for part of a word (with the '*' operator). The actual use of these is just like all the other engines, so it should be very easy for users to get the hang of. It's also very fast; the average search on my site takes about 0.52 seconds, and that's to search 163 files... Your mileage may vary depending on how many files you have, and how fast the processor is on the server, but I would be surprised if it got to be too long.

The results are displayed in much the same fashion as AltaVista (which is my favorite search engine); it lists the match number first, followed by clickable link to the page, with the title being the <TITLE> from the page. The description is extracted from the Meta tag Description, so you can easily control how the engine describes anything. Meta tag Keywords are given higher ratings than normal text in a page, so you can direct people more easily to a given section by using this field as well.

Inside the /srchdata subdirectory, there are two files, 'header.htm' and 'footer.htm' which unsurprisingly are what it puts before and after (respectively) the returned matches. This is primarily to allow you to customize the engine to fit in perfectly with the look of your site. Also, if you create a dir called 'logs' inside /srchdata, then it will log all of the searches performed by users, which can be handy in trying to figure out what portions of your site should be redesigned.

Here's a brief rundown of the keywords that may be passed to the search engine:

    Search          The search string
    Count           The number of matches to return on each page
    StartAt         The number that the matches should be displayed from
    SortBy          Sort the results in some specific way (Weight, Percent, Date)

The enclosed 'search.htm' template should give you the idea of how it works, and you can always see it in operations on my site in the 'Search' section.

You will also need to modify the 'config.txt' file that's located in the srchdata directory. This file contains the default settings the CGISearch uses when it performs it's search. Currently it supports the following keywords:

    BASEURL         Your site's base domain (http://www.analogx.com/)
    FILESPEC        The extension of a filetype to search
    SKIPLIST        Filename of textfile containing words to exclude
    EXCLUDEALL      Directory tree you want to exclude (all subdirs included)
    EXCLUDEFILE     A sepcific filename you want exluded (system wide)
    EXCLUDEPATH     A specific directory you want to exclude from the search
    SEARCHPATH      Overrides the default search path (one dir up) (C:\search\these\files\)

If you need to specify more than one of a given type, such as more than one FILESPEC, just simply add another line with the FILESPEC command with the new extension. Check out the 'config.txt' file if you're confused and it should get the idea; it's actually very easy and very flexible.

For all the really hardcore users, who like to customize exactly how things look, then you'll love the file 'body.htm'; this specifies the formatting of the returned results - allowing you to totally change the way the results are returned. Just want the title and a link? No prob, just make a few simple changes and voila! The look and feel is however you want it. There are special keywords that CGISearch looks for in the source file, and inserts the appropriate text in place of a certain returned value:

    &Search.URL;            The URL of the page (does not include starting '/')
    &Search.Size;           The size of the page, in either bytes, k, or megs
    &Search.Title;          The title of the page
    &Search.Match;          The match number of the page
    &Search.Weight;         The match weighting value calculated from the search
    &Search.Percent;        The percentage of keywords that matched
    &Search.BaseURL;        The base URL of the website (with ending '/')
    &Search.Modified;       The date the page was last modified
    &Search.MatchTotal;     The total number of matches found
    &Search.Description;    The META Description of the page
    &Search.MatchReturned;  The number of matches returned

If you need to check the version number, you can get it by typing the following at the command prompt:

    cgisrch /?

it will display the version, subversion, and release status.

Although I don't know if I need to state this or not, I'll do it anyway, this will ONLY work on a server running some flavor of Windows, and should work with any Web Server (whether it be AnalogX SimpleServer, IIS, or Apache).