=head1 NAME Full text indexing in RT =head1 LIMITATIONS While all of the below solutions can search for Unicode characters, they are not otherwise Unicode aware, and do no case folding, normalization, or the like. That is, a string that contains C followed by C will not match a search for C. They also only know how to tokenize C-ish languages where words are separated by whitespace or similar characters; as such, support for searching for Japanese and Chinese content is extremely limited. =head1 POSTGRES =head2 Creating and configuring the index Postgres 8.3 and above support full-text searching natively; to set up the required C column, and create either a C or C index on it, run: sbin/rt-setup-fulltext-index If you have a non-standard database administrator username or password, you may need to pass the C<--dba> or C<--dba-password> options: sbin/rt-setup-fulltext-index --dba postgres --dba-password secret This will also output an appropriate C<%FullTextSearch> configuration to add to your F; you will need to restart your webserver after making these changes. However, the index will also need to be filled before it can be used. To update the index initially, run: sbin/rt-fulltext-indexer --all This will tokenize and index all existing attachments in your database; it may take quite a while if your database already has a large number of tickets in it. =head2 Updating the index To keep the index up-to-date, you will need to run: sbin/rt-fulltext-indexer ...at regular intervals. By default, this will only tokenize up to 100 tickets at a time; you can adjust this upwards by passing C<--limit 500>. Larger batch sizes will take longer and consume more memory. Care should be taken to ensure that multiple instances of C are not run at the same time. =head1 MYSQL MySQL does not support full-text indexing natively. However, it does integrate with the external Sphinx engine, available from L. Unfortunately, Sphinx integration (using SphinxSE) does require that you recompile MySQL from source. Most distribution-provided packages for MySQL do not include SphinxSE integration, merely the external Sphinx tools; these are not sufficient for RT's needs. =head2 Compiling MySQL and SphinxSE SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not been tested at this time. Sphinx version 2.0.1 has been tested to work, but version 0.9.9 may work as well. Compilation and installation instructions for MySQL with SphinxSE can be found at L. =head2 Creating and configuring the index Once MySQL has been recompiled with SphinxSE, and Sphinx itself is installed, you may create the required SphinxSE communication table via: sbin/rt-setup-fulltext-index If you have a non-standard database administrator username or password, you may need to pass the C<--dba> or C<--dba-password> options: sbin/rt-setup-fulltext-index --dba root --dba-password secret This will also provide you with the appropriate C<%FullTextSearch> configuration to add to your F; you will need to restart your webserver after making these changes. It will also print a sample Sphinx configuration, which should be placed in F, or equivalent. To fill the index, you will need to run the C command-line tool provided by Sphinx: indexer rt Finally, start the Sphinx search daemon: searchd =head2 Updating the index To keep the index up-to-date, you will need to run: indexer rt --rotate ...at regular intervals in order to pick up new and updated attachments from RT's database. Failure to do so will result in stale data. =head2 Caveats Sphinx only returns a finite number of matches to any query; this number is controlled by C in F and C<%FullTextSearch>'s C in C, which must be kept in sync. The default, set during C, is 10000. This limit may lead to false negatives in search results if the maximum number of matches is reached but the results returned do not match RT's other criteria. Take, for example, the instance where Sphinx is configured to return a maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the string "target", but only ticket 5 is in status "Open". A search for C may return no results, despite ticket 5 matching those criteria, as Sphinx will only return tickets 1, 2, and 3 as possible matches. After index creation, altering C in C is insufficient to adjust this limit; both C in F and C<%FullTextSearch>'s C in C must be updated. =head1 ORACLE =head2 Creating and configuring the index Oracle supports full-text indexing natively using the Oracle Text package. Once Oracle Text is installed and configured, run: sbin/rt-setup-fulltext-index If you have a non-standard database administrator username or password, you may need to pass the C<--dba> or C<--dba-password> options: sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret This will create an Oracle CONTEXT index on the Content column in the Attachments table, as well as several preferences, functions and triggers to support this index. The script will also output an appropriate C<%FullTextSearch> configuration to add to your F. =head2 Updating the index To update the index, you will need to run the following at regular intervals: sbin/rt-fulltext-indexer This, in effect, simply runs: begin ctx_ddl.sync_index('rt_fts_index', '2M'); end; The amount of memory used for the sync can be controlled with the C<--memory> option: rt-fulltext-indexer --memory 10M Instead of being run via C, this may instead be run via a DBMS_JOB; read the B chapter of Oracle's B for details how to keep the index optimized, perform garbage collection, and other tasks. =cut