From 6587f6ba7d047ddc1686c080090afe7d53365bd4 Mon Sep 17 00:00:00 2001 From: Ivan Kohler Date: Tue, 24 Apr 2012 11:35:56 -0700 Subject: first pass RT4 merge, RT#13852 --- rt/docs/full_text_indexing.pod | 174 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 rt/docs/full_text_indexing.pod (limited to 'rt/docs/full_text_indexing.pod') diff --git a/rt/docs/full_text_indexing.pod b/rt/docs/full_text_indexing.pod new file mode 100644 index 000000000..0952b4eb9 --- /dev/null +++ b/rt/docs/full_text_indexing.pod @@ -0,0 +1,174 @@ +=head1 NAME + +Full text indexing in RT + +=head1 LIMITATIONS + +While all of the below solutions can search for Unicode characters, they +are not otherwise Unicode aware, and do no case folding, normalization, +or the like. That is, a string that contains C followed by C will not match a +search for C. They also only +know how to tokenize C-ish languages where words are separated +by whitespace or similar characters; as such, support for searching for +Japanese and Chinese content is extremely limited. + +=head1 POSTGRES + +=head2 Creating and configuring the index + +Postgres 8.3 and above support full-text searching natively; to set up +the required C column, and create either a C or C +index on it, run: + + sbin/rt-setup-fulltext-index + +If you have a non-standard database administrator username or password, +you may need to pass the C<--dba> or C<--dba-password> options: + + sbin/rt-setup-fulltext-index --dba postgres --dba-password secret + +This will also output an appropriate C<%FullTextSearch> configuration to +add to your F; you will need to restart your webserver +after making these changes. However, the index will also need to be +filled before it can be used. To update the index initially, run: + + sbin/rt-fulltext-indexer --all + +This will tokenize and index all existing attachments in your database; +it may take quite a while if your database already has a large number of +tickets in it. + +=head2 Updating the index + +To keep the index up-to-date, you will need to run: + + sbin/rt-fulltext-indexer + +...at regular intervals. By default, this will only tokenize up to 100 +tickets at a time; you can adjust this upwards by passing +C<--limit 500>. Larger batch sizes will take longer and +consume more memory. Care should be taken to ensure that multiple +instances of C are not run at the same time. + +=head1 MYSQL + +MySQL does not support full-text indexing natively. However, it does +integrate with the external Sphinx engine, available from +L. Unfortunately, Sphinx integration (using +SphinxSE) does require that you recompile MySQL from source. Most +distribution-provided packages for MySQL do not include SphinxSE +integration, merely the external Sphinx tools; these are not sufficient +for RT's needs. + +=head2 Compiling MySQL and SphinxSE + +SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not +been tested at this time. Sphinx version 2.0.1 has been tested to work, +but version 0.9.9 may work as well. Compilation and installation +instructions for MySQL with SphinxSE can be found at +L. + +=head2 Creating and configuring the index + +Once MySQL has been recompiled with SphinxSE, and Sphinx itself is +installed, you may create the required SphinxSE communication table via: + + sbin/rt-setup-fulltext-index + +If you have a non-standard database administrator username or password, +you may need to pass the C<--dba> or C<--dba-password> options: + + sbin/rt-setup-fulltext-index --dba root --dba-password secret + +This will also provide you with the appropriate C<%FullTextSearch> +configuration to add to your F; you will need to +restart your webserver after making these changes. It will also print a +sample Sphinx configuration, which should be placed in +F, or equivalent. + +To fill the index, you will need to run the C command-line tool +provided by Sphinx: + + indexer rt + +Finally, start the Sphinx search daemon: + + searchd + +=head2 Updating the index + +To keep the index up-to-date, you will need to run: + + indexer rt --rotate + +...at regular intervals in order to pick up new and updated attachments +from RT's database. Failure to do so will result in stale data. + +=head2 Caveats + +Sphinx only returns a finite number of matches to any query; this number +is controlled by C in F and +C<%FullTextSearch>'s C in C, which must be +kept in sync. The default, set during C, is +10000. This limit may lead to false negatives in search results if the +maximum number of matches is reached but the results returned do not +match RT's other criteria. + +Take, for example, the instance where Sphinx is configured to return a +maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the +string "target", but only ticket 5 is in status "Open". A search for +C may return no results, +despite ticket 5 matching those criteria, as Sphinx will only return +tickets 1, 2, and 3 as possible matches. + +After index creation, altering C in C is +insufficient to adjust this limit; both C in +F and C<%FullTextSearch>'s C in +C must be updated. + +=head1 ORACLE + +=head2 Creating and configuring the index + +Oracle supports full-text indexing natively using the Oracle Text +package. Once Oracle Text is installed and configured, run: + + sbin/rt-setup-fulltext-index + +If you have a non-standard database administrator username or password, +you may need to pass the C<--dba> or C<--dba-password> options: + + sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret + +This will create an Oracle CONTEXT index on the Content column in the +Attachments table, as well as several preferences, functions and +triggers to support this index. The script will also output an +appropriate C<%FullTextSearch> configuration to add to your +F. + +=head2 Updating the index + +To update the index, you will need to run the following at regular +intervals: + + sbin/rt-fulltext-indexer + +This, in effect, simply runs: + + begin + ctx_ddl.sync_index('rt_fts_index', '2M'); + end; + +The amount of memory used for the sync can be controlled with the +C<--memory> option: + + rt-fulltext-indexer --memory 10M + +Instead of being run via C, this may instead be run via a +DBMS_JOB; read the B +chapter of Oracle's B for details +how to keep the index optimized, perform garbage collection, and other +tasks. + +=cut -- cgit v1.2.1