The next round will probably emit XML files with more meta-information, which will enable some of the EXIF details of photos to be shown.
I’ve grown somewhat tired of ‘imageindex’. It doesn’t do what I want without modifying my ~/.imageindexrc file each time, which is annoying. I modified it to permit reading in a separate configuration, but the perl code is spaghetti and I don’t like the structure of the index file (tables… ugh!). I also want output that fits into my site’s style.
There are options other than imageindex, but I haven’t found any that work exactly as I’d like. It’s particularly important that I be able to easily bring in galleries from my old web site, and I can’t do that with Gallery3 or any of the freely available generators I’ve found while also fitting into my site’s style.
You can check out the search page here.
I like jQuery and the yui library.
It’s tedious to need to update the menu on each of my web pages when I add new children of a page, new siblings of a page, or move a page or directory. Especially since I keep my content organized via the underlying filesystem; why should I have to do more than that to get a menu that shows child directories, siblings and ancestors? I shouldn’t. One could argue that the easiest way to achieve what I want is to not have any index files, but that’d be quite ugly.
I’ve instead written a CGI program in C++ that will automatically add a ‘Genealogy’ menu to a page’s side menu. Within the ‘Genealogy’ menu are submenus: ‘Children’, ‘Siblings’ and ‘Ancestors’. I call the CGI from the RenderHeader() member of my PageStart PHP class to add to the side menu. Simple, always works. The only downside is that it’s a little slow since it accesses the Xapian backend to get page titles to use in the menu. I can live with it for now. I may later write the menus to files in batch mode and remove the CGI.
I now have my indexer working on albums and photos in the photo gallery. It needs some additional features, but it’s functional. So now I have indexing and searching for my HTML content, php content, blog posts and photos. This should be sufficient to get me by for a while; I’ll tweak it as needed.
The indexer (dwmsiteindex) uses a configuration file that permits skipping some directories when searching for HTML and php content. It is also used for database settings for the blog database and the gallery.
I now have a site indexer that uses my own parsers to put data into a Xapian back end. I’m parsing my html and php page content, and also my blog posts.
I also now have a usable search facility. It needs some cosmetic work, but it’s functional and styled correctly.
Both the indexer and the search CGI are written in C++. The indexer is using Xapian, mysql++, htmlcxx, libDwm and application code. The search CGI is using Xapian, libDwm and application code.
I’ve been looking for site indexing code to support searching my web site. Yes, you can certainly use Google for some of my content. But Google won’t typically see content that’s not linked in. Then there’s the data hidden away in databases instead of HTML.
Looking around at this time, nothing really excites me. I want something fairly light weight, fast, and capable of indexing content from my HTML, PHP, blogs and photos. I also want something that easily integrates into the look and feel of my site, today and 2 years from now.
Conceptually, I like xapian. There’s also the ‘omega’ package built on xapian. However, omega’s ‘omindex’ doesn’t work for me; I’ve got considerable textual content in PHP files and ‘omindex’ skips PHP. It also doesn’t handle WordPress blogs, gallery3 databases, etc. I’m also not terribly fond of the output of omega (though that’s not difficult to change).
I’ve decided I’ll write my own indexer and search using xapian as the back end. In fact I’ve already written the code to index HTML and PHP files, and have a design for the code to index my blog posts. I have a test search program that emits simple HTML. All looks good so far; after indexing a handful of pages, searches yield appropriate document weights and rankings.