What’s next for Sphider?

Work is proceeding with Sphider 1.6!

What will be new in 1.6?

  • The ability to truncate selected tables from the database tab
  • The ability to clear all site data without deleting the site
  • The ability to crawl a site using a sitemap.xml, provided one exists
  • The option to preview pages from the results listing
  • An issue with resuming suspended indexing has finally been resolved
  • Support for an optional Sphider Image Indexer

At this point, the changes have been made in both the vanilla and PDO versions of 1.6 and testing is ongoing.

And what? An optional Sphider Image Indexer?  This is an add-on that will work with Sphider 1.6. You will be able to build a catalog of images from sites where you have previously indexed the pages. Currently, the indexer itself is being tested, with excellent results. Work has begun on an image search function, but that is still in the VERY early stages and nowhere near being a viable tool. While the indexer required some modification of the core Sphider, the search function will not.

What this means is that once testing of the vanilla and PDO versions of 1.6 are complete, it can be released. The Image Indexer add-on still has to have the search function completed, then both the indexer and search function ported to PDO, and finally fully tested. At that time it will be released as version 0.99.

Since the search function of the add-on is in the very early stages of development, input as to how you would like to see it operate would be considered.

Just what IS this Sphider, anyway?

Sphider is a program designed to visit a web site in an ordered fashion to find the information necessary to create an index for a search engine. This, in turn, allows the site to be searched for pages containing certain keywords or phrases. Spidering programs are also called web crawlers or bots. They operate by following the hyperlinks on each page.

The crawlers which build major internet search sites (Google, Yahoo, Bing, etc.) are quite sophisticated and can find not only keywords and phrases, but images and other content as well. The ranking system of these crawlers is equally sophisticated. Not only are keywords, considered, but so is keyword location and density, relevancy, traffic patterns, tld names, page design, and domain registration length. In fact, Google has a list of over 200 page ranking factors.

Sphider is much simpler. Pages are ranked solely on keyword weighting. Keyword weighting is calculated by word position and frquency and the user has a level of control over the weighting process. Images are not indexed and relevancy is not a factor (although better word position and greater frequency DO indicate higher relevance). While Sphider can index practically any website, the main purpose of the application is for the user to index his or her won website so that an internal search can be made available to site visitors.

There are a number of Sphider flavors. The original Sphider (version 1.3.6) can be found at http://www.sphider.eu. It is free, but has the disadvantages of being insecure and badly outdated. It is no longer maintained and will start throwing errors on any system running PHP 6.6 or greater. It will not function at all on PHP 7.

Sphider-Plus (http://www.sphider-plus.eu) and Sphider-Pro (http://www.sphiderpro.eu) are both paid versions of the original and do have added features. I cannot speak as to security or support. Sphider-Pro is at version 3.3, which has a date of 2013, so that may not speak well as to its status. For a small website, many of the enhancements provided by these variations may be overkill.

Then there is the Sphider located here on our Downloads page, It, too, is based upon the original, but has been updated. It functions without error with PHP 5.5 or greater, even with PHP 7. It is much more secure. All SQL queries are made using prepared statements to avoid the risk of SQL injection. Other security measures have also been taken. We even have a variation (PDO) which can not only operate in environments lacking MySQLnd support, but can be used with databases other than MySQL (with some tweaking). It can work with SQLite, PostgreSQL (port kits available for both), ODBC, Microsoft SQL Server, and others. Both the normal and PDO variations are supported. And best of all, they are still free!

Sphider 1.5.4 and Sphider 1.5.4 PDO may not have installed properly

If you did an upgrade, the regular and PDO versions of Sphider 1.5.4 may not have installed properly. You can check whether or not you are affected by checking the Settings tab on the Sphider admin page. If a version other than 1.5.4 (or 1.5.4 PDO) is reported, there is a problem. The settings table in your database is missing a column. Any downloads from this point on will not be affected.

The issue can be easily fixed and is addressed on this sphiderform post.

Sphider 1.5.3 has a similar defect and can be repaired the same way, by editing update_rollup.php and re-running. However, 1.5.3 is not so critical as no changes to the settings table take place excepting for the version number update.

Sphider 1.5.4 and Sphider 1.5.4-PDO to be released on 29 May

On 29 May 2017, Sphider versions 1.5.4 and 1.5.4-PDO will be released and posted on our Downloads page.

Although addressed in the 1.5.3 series, table prefixes containing a hyphen continued to be a problem. Hopefully this time we have tracked down ALL the sources of this problem and corrected them.

Another problem was that the presence of an emoji on a web page (generally uncommon except on blog or forum pages) would cause an error and that page would not be indexed. Emojis are now purged before indexing.

The ability to index decimal numbers has been added. In earlier versions, numbers could be indexed but decimals numbers would be not be. For example, ‘12345.56789’ would be indexed as ‘12345’ and ‘56789’. If the setting for indexing decimals (on the settings page) is checked, ‘12345.56789’ will now be correctly indexed. A side benefit is ANY numerical string with a period will be recognized. For example ‘123.456.789’ would be indexed. This could be useful for pages containing part numbers. The mixing of numeric and alpha characters will still omit the period. ‘12345.abcde’ will still be indexed separately as ‘12345’ and ‘abcde’.

Also changed in these versions are the language files. Since the search page is utf-8 compliant, “special characters” like è or ç would fail to display properly. The Cyrillic alphabet with characters such as Ц or й will also now display correctly. This does NOT mean the text displayed will be the proper translation, as I am no linguist and am either relying on the work of others where possible, or winging it with the use of Google translate. Simply put, these characters are now coded in the language files as unicode entities.

Sphider 1.5.3 and Sphider 1.5.3.PDO have been released

Updates to the Sphider search engine have been made. The latest version is 1.5.3. Sphider 1.5.3 is for use when both MySQLi and MySQLnd modules are available in PHP. For individuals who’s host does NOT provide MySQLnd support, but DO provide PDO support, Sphider 1.5.3.PDO is also available. You may find both on the Downloads page (Click the Downloads tab at the top of this page.)

To avoid confusion concerning versions, the PDO version not longer contains a “.1” at the end of the version number, but a simple “.PDO” to distinguish it from the non-PDO version. (Some people thought 1.5.2.1 was an minor update from 1.5.2 when it actually was identical but coded for PDO instead of MySQLnd.)

Changes in 1.5.3 from 1.5.2 are:
Better support for https sites.
Ability to better recognize and follow the directives in a robots.txt file.
Correction of a potential problem when using the CleanDomains function in the event there was only a single domain to clean.
Fixed a number of errors which could appear when a database table prefix contains a hyphen.
Fixed a potential error when running under PHP 7.

Sphider Help Forum is now available

The new Sphider Help Forum for help concerning Sphider 1.4.2 or later is now open, at least on a trial basis. Out of necessity, ALL posts will be moderated. This is because of the tremendous amount of blog, forum, and guestbook spam present on the internet. Apologies for the inconvenience, but that’s life.

Hopefully, this forum can be used by the slowly growing community of users of the updated Sphider. The original Sphider Forum (located at sphider.eu) has become steadily less help and more sales pitch for Sphider-Plus. We have no gripe about Sphider-Plus, per se, but the original Sphider was free and just because the original developer moved on to other interests several years ago, we don’t see why the original can’t live on and evolve with the rest of technology.

The original (1.3.6 and before) has problems with anything later than PHP 5.4, and here we are, most platforms on 5.5 or 5.6 and the trend well underway towards PHP 7.  Any internet technology which simply stands still for 4 to 7 years is going to become lost in the cloud of dust.

Anyway, hopefully the forum will be a better place to air problems and find solutions than blog comments.

Considering another Sphider improvement

The original version of Sphider had very erratic support for indexing HTTPS pages, and wouldn’t even look at the robots.txt file on a HTTPS site. That failing has never been addressed, and even the latest version, 1.5.2, has the same failings when it comes to HTTPS. This has never really been an issue for me before, and even now it is more annoyance than issue as I can work around it.

Still, the “problem” does seem intriguing. After a bit of experimenting, a fix may not be all that difficult. (Famous last words, right?)

I am debating now whether or not to continue investigating alternatives and make more code changes which would improve HTTPS support in Sphider, not only to ensure more reliable connectivity but to enable the robots.txt to be utilized as well. I don’t know that there is that big of a need. We’ve never received any complaints or comments on the issue…

Anyway, at this point there is a POSSIBILITY, but no definite plans one way or the other.

*******************************

UPDATE (Apr 6): I was able to get the robots.txt file read from a https site. First problem, regardless of http or https, the parsing of allowed or disallowed user agents and disallowed files/directories was iffy. If the robots.txt file had lines like “user-agent” or “disallow”, it was parsed, but “User-agent” or “Disallow” was not. It was a case issue. That is now fixed (on my side, not published yet). Second problem, now that I know the file IS being read and parsed, Sphider will STILL index some files in disallowed directories!

If you have any files or directories listed as “url_not_inc” in your settings, that will work, but not the robots.txt disallows, even though that SHOULD be the case. Well, this situation certainly has gotten my interest!

*******************************

UPDATE (Apr 7): I have begun the process of troubleshooting the code to see what is going awry and where. Working alone and having other things to do in life, this can be both time consuming and frustrating. So far, I do know the robots.txt is read and parsed properly. Just where and why the instructions are not acted upon is another matter. At least the question of whether or not I will be attempting another modification has been answered!

*******************************

UPDATE (Apr 8): GOT IT! Preliminary tests show robots.txt is now being followed in both http and https. More testing to follow (found a couple other misc issues and fixed them). Once everything is validated, there will be a 1.5.3. Stay tuned.

Sphider 1.5.2 and 1.5.2.1 (the PDO version) have been released

The newest version(s) of the Sphider search tool have been released and are available from the Downloads tab above. While there isn’t really anything NEW in these releases, they do address a couple of problems encountered. Of most importance, the problem of having Sphider exit during indexing due to web page coding errors on the site being indexed has been addressed. Instead of issuing a fatal error and stopping, only warnings are generated and indexing continues on its merry way. A potential database error when updating the settings has also be thwarted.

Also, the previous PDO version had a bug in which descriptions could disappear from search results listings. This has been fixed.
If you had the previous PDO version (1.5.1.1) and have lost the descriptions, after upgrading to 1.5.2.1, you will need to restore the descriptions by going into the settings tab, go down to the “Search settings” section where it says “Maximum length of page summary displayed in search results”, change the selection to 250 and “Save settings”. (Updating the settings before would change this from the default 250 to either 0 or 1!)

Happy Holidays and Happy indexing!

Sphider 1.5.2 – coming soon

The next version of the Sphider search tool is now in testing. Sphider 1.5.2 (and its companion PDO version, 1.5.2.1) is not very different from the previous version, except for a couple minor fixes on the Settings tab and the fact that the indexing portion has been toned down to issue warnings only when an improperly coded web page is encountered. Sphider 1.5.1 exits with a fatal error instead of continuing to index the site. While improper coding in a web page (commonly having to do with some off beat special character the database has no idea how to interpret) is rare, it sure was a monkey wrench when it came to indexing a web site. A couple other page conditions which could have produced a fatal exit now simply issue warnings (like the url exceeding the length the database could store).

At any rate, both the PDO and non-PDO varieties are now being tested to make sure the intended fixes work properly, and that we haven’t introduced any new problems. Expected arrival at this time is early December.

PDO version of Sphider

Sphider 1.5.1 has proven to be a good, stable version of Sphider. HOWEVER, it seems some people can’t use it because their host chooses not to support MySQLnd, typically for shared hosting. It isn’t because it can’t be done, but because they don’t want to do it. In those instances, if you want MySQLnd, you to have to upgrade to VPS, at an additional charge of course. Sphider users in that scenario now have an option.

We have taken Sphider 1.5.1 and converted the sql to PDO (PHP Data Objects). PDO support is virtually guaranteed. The PDO version is referred to as Sphider 1.5.1.1. PDO has some advantages over MySQLi/MySQLnd, but there are also disadvantages.

MySQLi/MySQLnd is SPECIFIC to a MySQL database, where PDO is a generic supporting a variety of databases, one of which is MySQL. There is an overhead involved. For Sphider, we STILL consider the MySQLnd prepared statement methodology over PDO prepared statements. Reality dictates a PDO version be made available. Our recommendation is that you install the PDO version only if the standard MySQLi/MySQLnd option is not available. If you already have a working Sphider 1.5.1, DO NOT install 1.5.1.1.

One issue encountered was that PDO has no need to use the real_escape_string function…. EXCEPT WHERE IT IS NEEDED!!! The backup and restore functions failed without it. All research indicated “You don’t need real_escape_string, just use PDO prepared statements!” Dogmatic statements like that can come back to bite you. Well, our scenario wasn’t executing sql, it was CREATING sql, specifically, an sql string. Real_escape_string was necessary to create a valid string, and a prepared statement was not possible. We had ALREADY run a query, now we were manipulating the queried data to create a string for LATER use in a different kind of query. So we had to create an emulation for real_escape_string, which was a bit of trial and error. So much for “PDO NEVER needs real_escape_string”.