Apache TIKA parser update for TeamForge 6.1, 6.1.1 and 6.2.x

Read this topic to know how to install the Apache TIKA parser update available for TeamForge 6.1, 6.1.1 and 6.2.x.

What is this Apache TIKA parser update?

In TeamForge 7.0, the underlying parser library for indexing has been changed from Stellent to Apache TIKA. To extend the benefits of the Apache TIKA parser library to the existing TeamForge 6.1, 6.1.1 and 6.2.x customers, an update is now available for TeamForge 6.1, 6.1.1 and 6.2 (including 6.2 Patch 1 release).

Why TIKA?

Improved performance and reliability

The Apache TIKA parser library has advantages over the Stellent parser library used in TeamForge 6.1, 6.1.1, 6.2 and 6.2 Patch 1.

Stale process issue: Parsing of corrupt or unrecognized files
  • Use of the Stellent parser libraries can result in stale processes, which may at times lead to site outage. Stale processes must be removed manually to prevent site outage.
  • Use of the Apache TIKA parser libraries is robust and needs no manual intervention as there are no stale process issues.
Search queue processing speed
  • When the Stellent parser library encounters a corrupt or unrecognized file it takes five minutes to timeout – this can lead to processing speed deterioration.
  • The Apache TIKA parser library is capable of determining whether a file can be parsed or not. No time is wasted waiting for a response (or a timeout), the search queue processing speed can be faster.
Multiple processes Vs Single JVM
  • The Stellent parser library spawns one subprocess per file. It is possible to end up with stale processes (see above) so other processes and applications are left with fewer resources.
  • The Apache TIKA parser does not have this issue.

For information about the benefits of using the TIKA parser library over the Stellent parser library, refer to the topic Advantages of using the Apache TIKA parser library for indexing.

Who needs the update?

Customers with versions prior to TeamForge 7.0, i.e. TeamForge 6.1, 6.1.1, 6.2, or 6.2.0.1 need to update their version of TeamForge.
Important: This Apache TIKA update is available only for Red Hat Enterprise Linux and CentOS platforms.
To know the version of the TeamForge application you run:
  1. Click About TeamForge in the TeamForge Help menu.
  2. Verify the TeamForge version shown on the About TeamForge window.

How do I get the update?

Contact CollabNet Support to get the Apache TIKA update RPM package for the version of TeamForge you run.
  • RPM for TeamForge 6.1: tika_hotfix-1.0-6.1.0.0.noarch.rpm
  • RPM for TeamForge 6.1.1: tika_hotfix-1.0.0.0-6.1.1.0.noarch.rpm
  • RPM for TeamForge 6.2: tika_hotfix-1.0.0.0-6.2.0.0.noarch.rpm
  • RPM for TeamForge 6.2 Patch 1: tika_hotfix-1.0.0.0-6.2.0.1.noarch.rpm

Should I stop TeamForge to install this update? What's the estimated downtime to install this update?

You can install the Apache TIKA update without stopping TeamForge. However, after installing the update, you must restart TeamForge so that the new TIKA library takes effect. Note that the restart may take 5 to 15 minutes, depending on your server's processing speed and application configuration.

Installing the Apache TIKA update for TeamForge 6.1, 6.1.1 or 6.2.x

Is the Customization Installer add-on for CollabNet TeamForge installed on your site?

Verify if the Customization Installer add-on for CollabNet TeamForge is installed on your site. If not installed, you may have to install it before installing this Apache TIKA update.
  1. Run the command, custom-install --version, and verify the system output.
    System output if the Customization Installer add-on is installed System output if the Customization Installer add-on is not installed
  2. If the Customization Installer add-on for CollabNet TeamForge is not installed, contact CollabNet Support to get the Customization Installer RPM package and install it.
  3. Run the command, rpm -ivh ctf_customization_installer-3.8.0.2-1019.x86_64.rpm, to install the add-on.

Are there any search-related hotfixes already installed on your site?

If you run TeamForge 6.1, 6.1.1 or 6.2, you must ensure that no search-related hotfix was installed on your site to ensure trouble-free installation of this Apache TIKA update. The default TeamForge add-on install directory is: /opt/collabnet/teamforge/add-ons/. Run the following commands and reach out to CollabNet Support if there are any search-related add-ons installed on your site:
  • cd $SITE_DIR/add-ons/ (default location: /opt/collabnet/teamforge/add-ons/ )
  • find . -name search.sar

Installing the Apache TIKA update

Important: Before installing the Apache Tika update on TeamForge 6.2.0.1, make sure that you have the latest version of common_hotfixes installed.
  1. Run the following commands.
    • cd /tmp/
    • rpm -ivh tika_hotfix-1.0.0.0-6.x.x.x.noarch.rpm
    • service collabnet restart all
  2. After the installation is complete, run the follwoing command to verify the installation.
    • custom-install --list