Sphider for WordPress

About three years ago, I attempted a port of Sphider to WordPress. What did result was buggy and incomplete. The Search tab on this blog actually contains a sample of what came out of the effort.

Among the MANY problems:
1. It gives more results that is really desired, making it pretty useless.
2. If the number of results goes beyond one page… well, it breaks if you try go to the next page!
3. Suggestions don’t even begin to work.
4. The effort was based on Sphider 1.5.1, and PHP has advanced since then. Now I can’t even get a screen to do a re-index if I wanted to.
5. It is VERY difficult to integrate into a WordPress theme.
6. There are other issues, but they don’t come to mind off hand.

So, in a nutshell, that attempted port was a dud. An laughable and unmitigated disaster might be a good way to describe it.

Now, Sphider seems to be stable (famous last words?), and I am often a glutton for punishment, so I am THINKING about trying again… kind of a Sphider for WordPress, Take 2, pre-alpha…

This would have to be thought out before actually doing anything, but these are my considerations so far:
1. History has taught that not all hosts provide the MySQLnd module for PHP. Therefore any future WordPress port would need to be based on the PDO Sphider. Version 2 supports PHP 7.1, so that would be the beginning basis.
2. WordPress uses its own class, the wpdb Class, to interact with the database. So code would need to be changed to use wpdb. That is a LOT of code… BUT… why would the spider part of Sphider need to use the wpdb class? Spidering (indexing) itself really doesn’t need to be integrated into WordPress, does it? All it is doing is populating the sphider database. So why couldn’t the spider and search functions of Sphider be separated? The only thing those two functions currently share with each other is the database connection. The current spider part could remain as is (with some modifications specific to WordPress page needs), and only the search function be rewritten to use the wpdb class (with its own database connection). Both functions would connect to the same database but in different manners.
3. Would a WordPress Sphider really need to use categories as used in Sphider? I am thinking not. So scratch that capability. I don’t think we need RSS feed indexing or image indexing, so those can also be cut. We are only concerned with a single site (the blog on which it would be installed), so more code simplification. This all reduces the size and complexity of spidering (indexing).
4. Perhaps embedded into the indexing function would be the elimination of looking it unnecessary places, like /wp-json, /category, /feed… This would reduce the size of the database and eliminated some of the redundant “finds” when a search is performed.
5. Naturally, the search function would eliminate RSS and image search functions and retain the keyword search.
6. Try to get the search page to more easily integrate with themes.
7. Get the multipage search returns to function, forward and backward, without producing an error.
8. Get suggestions to work.

Okay. Before I get in too deep…
1. Is there any real interest in a Sphider for WordPress?
2. Anything I’m missing in thinking ahead?
3. Anybody have any experience integrating content into WordPress themes? Care to share?

Feedback would be appreciated. In fact, without feedback, I may conclude the whole idea is more trouble than it’s worth.


UPDATE: So… I got brave and changed my theme. And the theme had the ability to add a Search widget. And playing around with this simple search, it seems to work just fine. Granted, it is just a simple search, not one with and/or or phrase options, but quite functional nonetheless. I have to imagine any decent theme can do the same thing. Unless there is really a big need for a Sphider for WordPress, I think I’ll save myself the trouble and pass. 🙂

Maintenance releases for Sphider

Sphider release 2.0.1 has some code cleanup and a jquery update.

PDO Sphider, Sphider for PostgreSQL, and Sphider for SQLite are at release 2.0.2. While these too have some code cleanup and jquery update, they are mainly to correct a few problems introduced by release 2.0.1!

No change to the functionality is involved in these releases. They are mainly to clean up a few messy items, although the PDO versions 2.0.1 did correct some problems with database error reporting. Those changes are included in 2.0.2.

Sphider has a new home

Sphider – a PHP spider and search engine

While this blog will continue to provide news and information about Sphider, and links to downloads will continue to be provided from the blog, the principal home for Sphider is now:

http://www.sphidersearch.com or https://www.sphider.worldspaceflight.com. Either url will bring you to the same page. The Spider Forum has not moved and is accessible from the new domain.

Besides the main page, there is a downloads page, an About page, a document page from which the Sphider User’s Guide may be downloaded, and a changelog page. Other pages will be added as the need arises.

Minor bug fix to all Sphider flavors

All the current releases of Sphider had a minor bug when doing an image search by url. The corrected code is available on the downloads page. The main Sphider 2.0 is designated by an “a” suffix. All of the PDO versions have a “c” suffix.

 

The ONLY file changed is search.php. And in search.php, there is only one line altered. A passed parameter “type” was having uppercase characters stripped. A column in the database image table “images” is “imgUrl”. The uppercase “U” was stripped and the query failed when it couldn’t find the column “imgrl”!

If you don’t use the “Image Search” your version will work fine. If you DO use the “Image Search”, the ONLY file you need to replace is search.php. There is no need to do a reinstall.

The embarrassing part of it all is that this problem WAS caught and corrected during testing prior to the 2.0 release. HOWEVER, that corrected piece of code wasn’t placed into the zip files, which shipped with the uncorrected version of search.php. 😳

New Sphider downloads available for PDO versions

A minor problem was found affecting the PDO versions (PDO, SQLite PDO, and PostgreSQL PDO) of Sphider.

During indexing, if the “Use site map” switch was set, but the site map was not found or not usable, the code to update the database to turn the switch off was failing to execute.

The code has been corrected to enable the database to update. The updated downloads are reflected as a “b” version.

The non-PDO version was unaffected. This was strictly a PDO issue.

Thanks go out to Webbo for the catch.

Minor corrections to PDO Sphider versions

It has come to our attention there are typos in the code for all PDO versions. For the normal PDO (MySQL/MariaDB), spider.php and spiderfuncs.php have been slightly modified. Spider.php had a single typo. Sphiderfuncs.php was missing 5 lines of code. While the version number is unchanged, the new download is designated as 2.0.0a.

For the PostgreSQL and SQLite versions, only sphider.php contained a single typo each. No other files are affected. As with the regular PDO, the version is unchanged by the download designations are shown as 2.0.0a.

Our apologies for the inconvenience. During testing of all these versions. these anomalies were uncaught and thus it seems that, for the most part, crawling functionality was not adversely impacted, although it COULD be under certain circumstances.

Our deepest thanks go out to Ed Parrish for having caught these issues.

Sphider 2.0.0 nearing release

Sphider 2.0.0 is under going final testing and will be released probably by mid-October.

Virtually every file has gone at least some alteration. The features of Sphider 2.0.0 are:
– Better page charset handling to ensure that the database receives only UTF-8 input. UTF-8 encodeing of web pages already in UTF-8 format is avoided to eliminate garbled entries.
– Phrase searches have been improved.
– This version is PHP 7.1 ready.
– Integrated indexing of images, with the option to NOT index images. An image search page is also provided.
– RSS content may also be indexed and searched.
– Jquery has been updated to a more recent version.
– While not fully PSR-2 compliant when it comes to PHP coding standards, the code is a LOT closer than it ever has been. This involved the renaming of many functions, the elimination of a few functions which were found to be obsolete (and thus, unused). Coding style had to be changed virtually every module. This is why so much code has been altered, affecting nearly every Sphider PHP code segment.
– The search page is integrated for legacy, RSS, and image searches. Knowing that RSS and images are something not every user will be interested in, an updated (as in 2.0.x compliant) version of the 1.6.x search page is provided. The revised 1.6.x search form, it will work fine with 2.0.x. It will need to be renamed to replace the provided search.php.

Also, finding that porting PDO to databases other than MySQL was messier than anticipated (too many DB specific requirements for each), Sphider 2.0.0 will actually have 4 flavors. The “kits” for PostgreSQL and SQLite were too cumbersome and confusing.
1) The legacy Sphider, using the MySQL database (or MariaDB) and using MySQLi and MySQLnd.
2) PDO Sphider, also using the MySQL database (or MariaDB), but using a PDO implementation (for installations lacking MySQLnd support).
3) PostgreSQL version using a PostgresSQL database and accessed via PDO,
4) SQLite version, using a SQLite database accessed via PDO.

All flavors are testing well and it seems no more coding changes will be needed, after working out some “peculiarities” for each. Now each version must have a final full set of operations performed to ensure everything works. This includes new installation via PHP script, installation using SQL queries, upgrade installation, adding sites, indexing sites, deleting sites, adding, editing, and deleting categories. Also the same is done for RSS indexing. The search functions need to be tested for various situations. We have found a few websites which have, uh…., what you might call “unusual” methods resulting in unusual problems. (Ever seen an image “alt” tag with text running in excess of 1000 characters? We have!)