SphiderLite

After several years of improvements and enhancements to Sphider, we have decided to do something different. We are going to produce a scaled back edition!

All security and indexing improvements will remain, but the indexing and search capabilities will be removed for images and RSS feeds, resulting in SphiderLite.

A good number of users don’t need image or RSS capability. SphiderLite will be smaller, simpler, more compact.

Look for SphiderLite later this year, perhaps early November.

Sphider 3.3.0-MB released

Version 3.3.0-MB of Sphider has been released. As far as indexing and searching functionality is concerned, this version is IDENTICAL to version 3.2.1.

What HAS changed is that the database has been altered to include the use of foreign key restraints. With the database thus being able to maintain key relationships on its own, some admin functions have been simplified as the code no longer needs to maintain the relationships.  Database maintenance functions are accelerated  and more reliable.

The BEST way to implement this newest version is either with a clean install, or to empty the database, upgrade, and re-index.  It IS possible to upgrade in place, but the larger the database, the larger the risk. The upgrade process will attempt to back up the data only, delete the tables, recreate the tables with the foreign key restraints, then restore the data. This has been tested numerous times, but as previously mentioned, the more data there is, the higher the risk of data loss.

There is a file, “README_FIRST.” You are definitely encouraged to do just that for the simple reason that not all MySQL installations are created equal.

Seguarzo Anti(??) Virus – AN UNWANTED INSTALL

I was trying to install a codec for Windows Media Player. (I know what you are thinking… WINDOWS? I usually work with Ubuntu, but confess I do sometimes use Windows…). Anyway, low and behold I happen to discover something called Segurazo Antivirus running on my machine!

Supposedly, it is a good, lightweight antivirus… But I have a question…

IF Seguarazo is SO good, why do they have to do a clandestine install, without asking for permission, or ANY notification? I have to suspect, that claims to the contrary, there is nothing ANTI about this Segurazo Virus!

And it wasn’t easy to get rid of, either!

So remember SEGURAZO, a product to shun, avoid, stay away from… and uninstall if you find yourself victim. After the uninstall, there was STILL a lot of Segurazo crap in my registry.

Segurazo is NOT an antivirus, it IS a virus, an unwanted program, possibly spyware or adware.  No reputable program needs to secretly install itself.

Segurazo… you need to be ashamed of yourselves!

PHP, Shared hosting, and MySQLnd

I have posted before about the problems Sphider may have on websites using a shared hosting plan. Sphider, in its normal form, uses both mysqli and mysqlnd extensions. Mysqli means “mysql improved”, and Mysqlnd stands for “mysql native driver”. In the past, mysqlnd was actually an optional extension, whereas beginning with PHP 7 it is integral to a MySQL installation.
With most PHP installations, nd (native driver) is the default. This is not the case with many installations used in shared hosting. The default may be mysqli, and not nd_mysqli. You can determine if nd is the default or not by running phpinfo() on your website and examining the results. Firstly, the results should contain a section with the title “mysqlnd”. Within that section, you will find a line “API Extensions”. If the value for “API Extensions is “no value”, nd is NOT your default. Below is a screenshot of a typical installation in which nd is NOT the default.

If nd is not the default, you may be able to change it. If your control panel gives you the option to view/change PHP extension settings, check that page. If you see “mysqli” is checked, and “nd_mysqli” isn’t, uncheck “mysqli” and check “nd_mysqli”. (“mysqlnd” should also be checked.) Save your changes. Now when you view the mysqlnd section of phpinfo(), the API Extensions should show mysqli. (You might need to do a browser refresh.) Note that having BOTH “mysqli” and “nd_mysqli” checked will give you a error when trying to save the settings.

In the event you do not have the ability to edit the PHP extension settings, contact your host administrator and ask if they will perform this change. Changing to the native driver as default should have zero impact on other parts of your website while making Sphider usable.

If you can’t change the extension settings, and your host admins can’t or won’t, your only Sphider option is to use the PDO edition. The PDO edition is currently at 2.4.2-PDO, which is stable, but there are no plans for further development. Meanwhile, the normal Sphider, which is currently at 3.2.0, continues to be developed and improved.

Sphider Backup Tips

Sphider comes with the ability to backup and restore your database. How well this works depends on not only the size of the database, but on your MySQL settings. The restore could restore a single record at a time, but this would be time consuming. It would be reliable, but for a larger database you could probably spend the weekend at the shore while it ran. So, to speed things up, the restore process works on blocks of records. However, this increase in speed comes with a cost. If a block or records is too big, the restore will fail. There is a way to prevent this.

First off, check to see if you might have an issue. From a command prompt:
mysqld –help –verbose –pid-file

In the results, look for “max_allowed_packet”. If the value is less than 33554432 (32M), you might have an issue. Values of 67108864 (64M) or greater and you should be good to go.  The 64M is recommended, although larger won’t harm a thing! The value can be up to a maximum of 1G (1073741824).

If you need to increase the value of “max_allowed_packet”, there are two ways of doing so. The first is a permanent fix. Edit my.cnf (my.ini in Windows). In the “[mysqld]” or “[client]” section,  put in “max_allowed_packet=64M”. If the line doesn’t exist, add it. Then restart the mysql service.

The second method is temporary, existing until the next time the service is restarted. Run this simple query:
SET GLOBAL max_allowed_packet=67108864

Of course, in either instance, entering larger numbers will do no harm. More importantly, you can have confidence that the backup and restore procedures will work properly.

 

Sphider database

The Sphider database, like most databases, has relationships between various tables. Unlike most databases, however, these relationships have never been defined within the database! Sure, tables have had their keys, but there were no foreign key constraints. All relationships between tables and the consequences of record deletions or modifications have been handled strictly by the Sphider code. MySQL can handle this more efficiently than any Sphider code ever could.

At this point, adding foreign key constraints is really a straight forward task, since for the typical user, the tables all have data. It would be easy for new installations, but the presence of data complicates things. What has been done is to develop a method to back up the database, destroy and recreate all the tables (with foreign key constraints), and then repopulate the tables with the backed up data. The existing backup procedure was not an option because when a restore is run from that, the database would revert to one without the constraints. Then, Sphider code can be REDUCED in size by a couple hundred lines to eliminate code that handles relationship changes that MySQL can do more efficiently.

The process of building a process to back up the data, recreate the database with constraints, and restore the data has been completed. Now it has to be thoroughly tested. Early tests are positive, but we want to be sure. Once we have a high level of confidence in the process, Sphider 3.3.0-MB will be released. The intent is that the ONLY changes in 3.3.0 will be the database structure and associated code related to structure.

Look for Sphider 3.3.0-MB towards the end of July.

Sphider 3.2.0-MB will have a couple new features

The next release of Sphider, 3.2.0-MB, will feature two new enhancements. The first will be the ability to show the query score using a 0-5 star system, using half-stars. The options to either show no score, or to show a score as a percentage (100% being the highest) will remain.

Query results with relevancy shown as stars

The second enhancement is the addition of the ability to limit the number of query returns by percentage. Currently, a query will find every possible result. Scores range from 100% to 0% relevancy. In 3.2.0-MB, a minimum relevancy can be set. Users can choose from 0, 20, 40, 60, and 80% relevancy as the minimum. The higher the number, the fewer results will be returned. In the example shown above, the query produced 74 results. If the floor had been set at 20%, the number of results would be reduced to 12. A floor of 40% further reduced the results returned to 7,

Along with a couple minor bug fixes, Sphider 3.2.0-MB will be coming in perhaps early June.

Sphider 3.1.0-MB and Sphider 2.4.2-PDO released

Sphider 3.1.0-MB is multibyte capable, like 3.0.0-MB. However, 3.1.0-MB does NOT require the PHP mbstring extension. Mbstring is recommended, but not required. If it is available, it will be used. If not, Sphider will emulate the mulitbyte character string functions. Also, 3.1.0-MB continues the improvements always being made to the original fork. Since there is no longer any special requirements other than the typical MySQLi/MySQLnd extensions, there is no longer a need for the 2.4.x line.

Sphider 2.4.2-PDO provides a fix for a problem with 2.4.1-PDO which could cause some UTF-8 characters to be mistaken for ISO-8859-1 characters. The resulting “conversion” produced rubbish. The PDO fork will continue to be available and supported, but no further product enhancements are anticipated.

What’s next for Sphider?

Sphider 2.4.0 is barely out the door, and thoughts are already turning to — “What next?”

There actually are some plans well in the works. Sphider 2.4.1 will be pretty low impact. The “major” change will be in sql error reporting when a statement preparation fails. At this point, an sql statement should never fail, but in the off chance one ever does, better to have a meaningful error message! A second very minor change will improve utf8 text handling.

Thought has been given to the status of the PDO edition. The fact that many people, particularly those on shared hosting, are “forced” into using PDO dictates that the edition should continue to be available. At the same time, PDO users tend to be smaller in scope and less demanding in requirements than others (who tend to be either fully hosted or self hosted). With these thoughts in mind, the PDO edition will continue to be supported and there may even be minor updates from time to time, but major updates in functionality will be discontinued.

Now… the regular/classic/legacy edition will continue on. There is a NEW fork in the works, also. Sphider was been constantly improving with the use of unicode (utf8 variety), but there is still one stumbling block. Unicode has multi-byte characters and character strings, Standard PHP string functions aren’t equipped to handle multi-byte characters/strings. The mbstring module for PHP is equipped… that’s what the “mb” part of the name means — “multi-byte”. The problem is, not every installation of PHP comes with mbstring.

For the time being, the “normal” Sphider will use standard PHP string handling functions, with the drawback that indexing and searching of multi-byte strings may be unpredictable. Sphider 3-MB has replaced all standard string handling with multi-byte string handling, with the drawback that it won’t work for all clients.

Once again, there will be two editions of Sphider — standard string handling, and multi-byte string handling. The eventual goal will be to merge the two so that if mbstring is available, it will be used. If mbstring isn’t available, some custom functions will try to achieve the same result.

Sphider 3-MB will require a MINIMUM of MySQL server 5.5.3. Recommended MySQL server is 5.6 or better. Utf8mb4 is NOT supported in MySQL server versions earlier tha 5.5.3. Sphider 3-MB will also require that PHP have both MySQLnd and mbstring installed and available. Sphider 3-MB will be available by the end of April or early May. A test script will be provided so that support can be determined before installation.

Speaking of MySQL server and utf8mb4, Sphider 2.3+, both standard and PDO, use utf8mb4, so they too require MySQL server 5.5.3+. IF you happen to have a lower version of MySQL server, and are unable to upgrade, we can provide an earlier version (2.2.0) of Sphider upon request.  (Specify standard or PDO.)

Emojis revisited

Not very long ago, I wondered whether or not Sphider still needed to scan for, and remove, emojis. This came about because of a change in the database from 3-byte utf to 4-byte. Upon testing, the scan and removal of emojis will continue. Sphider, and more particularly, MySQL, just doesn’t like emojis. When trying to store any full text containing an emoji, an SQL exception is thrown and the page is not stored.

The earlier issue with the function that was reported is due to the removeEmoji() function operating on an utf-8 level, and the probability of the input NOT being utf-8. For future releases of Sphider, this function will be executing AFTER it is (nearly) guaranteed that the input will be utf-8. (I say “nearly” because there are no guarantees in this world where code is involved.)

It was also noted that the function, as currently implemented, is somewhat outdated.  While updating it is possible, the function would become a bit  unwieldy.  Leaving it alone is practical, however. This is because pages containing emojis are, while not rare, relatively uncommon. And with the pages that DO contain an emoji, the odds are that emoji is of the simpler, more common type. The kind an expanded filter would catch ARE rare in web pages, being more likely to occur in messaging applications used in smart phones and tablets. In other words, why add the complexity to Sphider to catch something that the vast majority of users are never going to encounter?

Maybe someday I will once again update the database collation to use utf8mb4_unicode_ci as opposed to the current utf8mb4_general_ci, which should allow these emojis, but even if I do, there will probably be a setting to exclude them anyway.

.