Future considerations for Sphider (but not guarantees)

I’ve been giving thought to just what should come next for Sphider.

Integrating the Sphider Image Indexing functions with the main Sphider, thus making content and image indexing a single operation is a rather obvious improvement.

The ability to index and search RSS feeds would also be a nice addition. I actually have an alpha of this running on both Linux and Windows machines. Since the spidering operations can be done from a command prompt, a simple cron is keeping the feeds updated on the Linux box. The Windows task scheduler is being a bit more stubborn, mainly because of a pesky PHP error I haven’t solved yet. PHP is fine in a browser, but the command prompt is giving trouble. It works, but I keep getting an error that DEMANDS a response! I’ll figure it out.

Since searching for content is different from searching for images, which in turn is different than searching for RSS feeds, three different sets of search and results pages are needed. To a user, the only obvious difference is the search page, as the results portion is integrated. So I am giving thought to a possible “unified” search page with tabs so that the appropriate search form (and corresponding results) can be present to the user. This is not definite yet, just a thought.

These are all ideas for the future. For now, version 1.6 remains the latest. If the need arises, minor release improvements/fixes are not out of the question.

Anything you would like to see in the Spider of the future? Give me your ideas and … well, who knows? It might be a very good, very doable idea!

Sphider 1.6.0 Released

Sphider 1.6.0 and Sphider 1.6.0 PDO version have been released.

Also released is the Sphider Image Indexer, a companion add-on to Sphider allowing the user to index and search images from a website.

And finally, there is also a conversion kit which will allow the PDO version of Sphider to work with SQLite databases in place of MySQL.

Sphider 1.6 Release Status

The regular version of Sphider 1.6.0 and the associated Sphider Image Indexer are completed, tested, and ready to go. Since I want to release the PDO version in tandem, that is the only hold up.

The PDO version and associated Image Indexer are also essentially completed, but undergoing further testing due to some last minute code changes. These changes involve code portability between database types. The release, as usual, targets MySQL (and presumably, MariaDB). There will also be a small set of four replacement modules (install.php, database.php, db_main.php, and db_backup.php)  available targeting SQLite users! It is anticipated that a similar set will soon be introduced for PostgreSQL users. The power of PDO will finally come to be realized.

As soon as everything has been more thoroughly tested, the appropriate zips will be posted in the Downloads section.

Preview of the OPTIONAL Sphider Image Indexer search results

Work has progressed to the testing phase of both Sphider 1.6 and the OPTIONAL* Sphider Image Indexer. This is a screenshot of the results of an image search during testing. To get these results, the PHP installation needs to have the imagick module installed. The search will still work without it, but the thumbnail previews will be absent. The rest of the results will remain. Search is in the choice of image name, image url or alt tag contents. Search can be for all indexed sites or be site specific.

Release date of mid-July.


* – Sphider 1.6 will work normally without the Sphider Image Indexer and will automatically detect when it has been installed. Image indexing is integrated into Sphider.

What’s next for Sphider?

Work is proceeding with Sphider 1.6!

What will be new in 1.6?

  • The ability to truncate selected tables from the database tab
  • The ability to clear all site data without deleting the site
  • The ability to crawl a site using a sitemap.xml, provided one exists
  • The option to preview pages from the results listing
  • An issue with resuming suspended indexing has finally been resolved
  • Support for an optional Sphider Image Indexer

At this point, the changes have been made in both the vanilla and PDO versions of 1.6 and testing is ongoing.

And what? An optional Sphider Image Indexer?  This is an add-on that will work with Sphider 1.6. You will be able to build a catalog of images from sites where you have previously indexed the pages. Currently, the indexer itself is being tested, with excellent results. Work has begun on an image search function, but that is still in the VERY early stages and nowhere near being a viable tool. While the indexer required some modification of the core Sphider, the search function will not.

What this means is that once testing of the vanilla and PDO versions of 1.6 are complete, it can be released. The Image Indexer add-on still has to have the search function completed, then both the indexer and search function ported to PDO, and finally fully tested. At that time it will be released as version 0.99.

Since the search function of the add-on is in the very early stages of development, input as to how you would like to see it operate would be considered.

Just what IS this Sphider, anyway?

Sphider is a program designed to visit a web site in an ordered fashion to find the information necessary to create an index for a search engine. This, in turn, allows the site to be searched for pages containing certain keywords or phrases. Spidering programs are also called web crawlers or bots. They operate by following the hyperlinks on each page.

The crawlers which build major internet search sites (Google, Yahoo, Bing, etc.) are quite sophisticated and can find not only keywords and phrases, but images and other content as well. The ranking system of these crawlers is equally sophisticated. Not only are keywords, considered, but so is keyword location and density, relevancy, traffic patterns, tld names, page design, and domain registration length. In fact, Google has a list of over 200 page ranking factors.

Sphider is much simpler. Pages are ranked solely on keyword weighting. Keyword weighting is calculated by word position and frquency and the user has a level of control over the weighting process. Images are not indexed and relevancy is not a factor (although better word position and greater frequency DO indicate higher relevance). While Sphider can index practically any website, the main purpose of the application is for the user to index his or her won website so that an internal search can be made available to site visitors.

There are a number of Sphider flavors. The original Sphider (version 1.3.6) can be found at http://www.sphider.eu. It is free, but has the disadvantages of being insecure and badly outdated. It is no longer maintained and will start throwing errors on any system running PHP 6.6 or greater. It will not function at all on PHP 7.

Sphider-Plus (http://www.sphider-plus.eu) and Sphider-Pro (http://www.sphiderpro.eu) are both paid versions of the original and do have added features. I cannot speak as to security or support. Sphider-Pro is at version 3.3, which has a date of 2013, so that may not speak well as to its status. For a small website, many of the enhancements provided by these variations may be overkill.

Then there is the Sphider located here on our Downloads page, It, too, is based upon the original, but has been updated. It functions without error with PHP 5.5 or greater, even with PHP 7. It is much more secure. All SQL queries are made using prepared statements to avoid the risk of SQL injection. Other security measures have also been taken. We even have a variation (PDO) which can not only operate in environments lacking MySQLnd support, but can be used with databases other than MySQL (with some tweaking). It can work with SQLite, PostgreSQL (port kits available for both), ODBC, Microsoft SQL Server, and others. Both the normal and PDO variations are supported. And best of all, they are still free!

Sphider 1.5.4 and Sphider 1.5.4 PDO may not have installed properly

If you did an upgrade, the regular and PDO versions of Sphider 1.5.4 may not have installed properly. You can check whether or not you are affected by checking the Settings tab on the Sphider admin page. If a version other than 1.5.4 (or 1.5.4 PDO) is reported, there is a problem. The settings table in your database is missing a column. Any downloads from this point on will not be affected.

The issue can be easily fixed and is addressed on this sphiderform post.

Sphider 1.5.3 has a similar defect and can be repaired the same way, by editing update_rollup.php and re-running. However, 1.5.3 is not so critical as no changes to the settings table take place excepting for the version number update.

Sphider 1.5.4 and Sphider 1.5.4-PDO to be released on 29 May

On 29 May 2017, Sphider versions 1.5.4 and 1.5.4-PDO will be released and posted on our Downloads page.

Although addressed in the 1.5.3 series, table prefixes containing a hyphen continued to be a problem. Hopefully this time we have tracked down ALL the sources of this problem and corrected them.

Another problem was that the presence of an emoji on a web page (generally uncommon except on blog or forum pages) would cause an error and that page would not be indexed. Emojis are now purged before indexing.

The ability to index decimal numbers has been added. In earlier versions, numbers could be indexed but decimals numbers would be not be. For example, ‘12345.56789’ would be indexed as ‘12345’ and ‘56789’. If the setting for indexing decimals (on the settings page) is checked, ‘12345.56789’ will now be correctly indexed. A side benefit is ANY numerical string with a period will be recognized. For example ‘123.456.789’ would be indexed. This could be useful for pages containing part numbers. The mixing of numeric and alpha characters will still omit the period. ‘12345.abcde’ will still be indexed separately as ‘12345’ and ‘abcde’.

Also changed in these versions are the language files. Since the search page is utf-8 compliant, “special characters” like è or ç would fail to display properly. The Cyrillic alphabet with characters such as Ц or й will also now display correctly. This does NOT mean the text displayed will be the proper translation, as I am no linguist and am either relying on the work of others where possible, or winging it with the use of Google translate. Simply put, these characters are now coded in the language files as unicode entities.