Replacing ‘imageindex’ with my own software: part 2

Over the weekend, I finished the first pass at ‘mcphotoweb’. It generates basic photo albums. On the web server side, I’m using javascript to expand the medium-sized images from the thumbs and the full-size images from the medium-sized images. It works, and the presentation is nicer than what ‘imageindex’ produced. The javascript is hackish, but it works. For an example, see my August 30, 2007 album from the 2007 BMW Z Homecoming.

The next round will probably emit XML files with more meta-information, which will enable some of the EXIF details of photos to be shown.

Replacing ‘imageindex’ with my own software

I’ve grown somewhat tired of ‘imageindex’. It doesn’t do what I want without modifying my ~/.imageindexrc file each time, which is annoying. I modified it to permit reading in a separate configuration, but the perl code is spaghetti and I don’t like the structure of the index file (tables… ugh!). I also want output that fits into my site’s style.

There are options other than imageindex, but I haven’t found any that work exactly as I’d like. It’s particularly important that I be able to easily bring in galleries from my old web site, and I can’t do that with Gallery3 or any of the freely available generators I’ve found while also fitting into my site’s style.

I’m almost done with a replacement written in C++, using the GraphicsMagick library. It’s smart enough to avoid unnecessary image regeneration, and automatically recurses into subdirectories to generate ‘medium’ images, thumbnails and montages. It also generates an index.php with all of the thumbnails and subdirectory montages as links. I need to decide how I want to display the medium images (via a slide file or via javascript) and how I want to link to the full-size image from the medium image.

Indexing my web site for search: part 4

Why do I get the feeling this will be a never-ending blog series? Perhaps because my javascript skills are still at the novice level?

I’ve finished my tweaking of the search interface for now. When a search is started from a non-search page, I give the user the search page with results. Once on the search page, up to 5 previous searches will be saved (in browser memory via jQuery calls and some other javascript), and may be flipped through by clicking the search results title (“N matches found for …”). I may change this number in the future. At the moment it has little effect on browser memory consumption, and is very handy for flipping through search results for different search terms.

You can check out the search page here.

My javascript foo is getting better, as is my CSS foo. It won’t be long before I move from procedural javascript to OO javascript (as unintuitive as its syntax is for OO, it works).

I like jQuery and the yui library.

Automatic menu generation for my web site

It’s tedious to need to update the menu on each of my web pages when I add new children of a page, new siblings of a page, or move a page or directory. Especially since I keep my content organized via the underlying filesystem; why should I have to do more than that to get a menu that shows child directories, siblings and ancestors? I shouldn’t. One could argue that the easiest way to achieve what I want is to not have any index files, but that’d be quite ugly.

I’ve instead written a CGI program in C++ that will automatically add a ‘Genealogy’ menu to a page’s side menu. Within the ‘Genealogy’ menu are submenus: ‘Children’, ‘Siblings’ and ‘Ancestors’. I call the CGI from the RenderHeader() member of my PageStart PHP class to add to the side menu. Simple, always works. The only downside is that it’s a little slow since it accesses the Xapian backend to get page titles to use in the menu. I can live with it for now. I may later write the menus to files in batch mode and remove the CGI.

Indexing my web site for search: part 3

I now have my indexer working on albums and photos in the photo gallery. It needs some additional features, but it’s functional. So now I have indexing and searching for my HTML content, php content, blog posts and photos. This should be sufficient to get me by for a while; I’ll tweak it as needed.

The indexer (dwmsiteindex) uses a configuration file that permits skipping some directories when searching for HTML and php content. It is also used for database settings for the blog database and the gallery.

Indexing my web site for search : part 2

I now have a site indexer that uses my own parsers to put data into a Xapian back end. I’m parsing my html and php page content, and also my blog posts.

I also now have a usable search facility. It needs some cosmetic work, but it’s functional and styled correctly.

Both the indexer and the search CGI are written in C++. The indexer is using Xapian, mysql++, htmlcxx, libDwm and application code. The search CGI is using Xapian, libDwm and application code.

Indexing my web site for search

I’ve been looking for site indexing code to support searching my web site.  Yes, you can certainly use Google for some of my content.  But Google won’t typically see content that’s not linked in.  Then there’s the data hidden away in databases instead of HTML.

Looking around at this time, nothing really excites me.  I want something fairly light weight, fast, and capable of indexing content from my HTML, PHP, blogs and photos.  I also want something that easily integrates into the look and feel of my site, today and 2 years from now.

Conceptually, I like xapian.  There’s also the ‘omega’ package built on xapian.  However, omega’s ‘omindex’ doesn’t work for me; I’ve got considerable textual content in PHP files and ‘omindex’ skips PHP.  It also doesn’t handle WordPress blogs, gallery3 databases, etc.  I’m also not terribly fond of the output of omega (though that’s not difficult to change).

I’ve decided I’ll write my own indexer and search using xapian as the back end.  In fact I’ve already written the code to index HTML and PHP files, and have a design for the code to index my blog posts.  I have a test search program that emits simple HTML.  All looks good so far; after indexing a handful of pages, searches yield appropriate document weights and rankings.