diff options
author | Ivan Kohler <ivan@freeside.biz> | 2012-04-24 11:35:56 -0700 |
---|---|---|
committer | Ivan Kohler <ivan@freeside.biz> | 2012-04-24 11:35:56 -0700 |
commit | 6587f6ba7d047ddc1686c080090afe7d53365bd4 (patch) | |
tree | ec77342668e8865aca669c9b4736e84e3077b523 /rt/docs/full_text_indexing.pod | |
parent | 47153aae5c2fc00316654e7277fccd45f72ff611 (diff) |
first pass RT4 merge, RT#13852
Diffstat (limited to 'rt/docs/full_text_indexing.pod')
-rw-r--r-- | rt/docs/full_text_indexing.pod | 174 |
1 files changed, 174 insertions, 0 deletions
diff --git a/rt/docs/full_text_indexing.pod b/rt/docs/full_text_indexing.pod new file mode 100644 index 000000000..0952b4eb9 --- /dev/null +++ b/rt/docs/full_text_indexing.pod @@ -0,0 +1,174 @@ +=head1 NAME + +Full text indexing in RT + +=head1 LIMITATIONS + +While all of the below solutions can search for Unicode characters, they +are not otherwise Unicode aware, and do no case folding, normalization, +or the like. That is, a string that contains C<U+0065 LATIN SMALL +LETTER E> followed by C<U+0301 COMBINING ACUTE ACCENT> will not match a +search for C<U+00E9 LATIN SMALL LETTER E WITH ACUTE>. They also only +know how to tokenize C<latin-1>-ish languages where words are separated +by whitespace or similar characters; as such, support for searching for +Japanese and Chinese content is extremely limited. + +=head1 POSTGRES + +=head2 Creating and configuring the index + +Postgres 8.3 and above support full-text searching natively; to set up +the required C<ts_vector> column, and create either a C<GiN> or C<GiST> +index on it, run: + + sbin/rt-setup-fulltext-index + +If you have a non-standard database administrator username or password, +you may need to pass the C<--dba> or C<--dba-password> options: + + sbin/rt-setup-fulltext-index --dba postgres --dba-password secret + +This will also output an appropriate C<%FullTextSearch> configuration to +add to your F<RT_SiteConfig.pm>; you will need to restart your webserver +after making these changes. However, the index will also need to be +filled before it can be used. To update the index initially, run: + + sbin/rt-fulltext-indexer --all + +This will tokenize and index all existing attachments in your database; +it may take quite a while if your database already has a large number of +tickets in it. + +=head2 Updating the index + +To keep the index up-to-date, you will need to run: + + sbin/rt-fulltext-indexer + +...at regular intervals. By default, this will only tokenize up to 100 +tickets at a time; you can adjust this upwards by passing +C<--limit 500>. Larger batch sizes will take longer and +consume more memory. Care should be taken to ensure that multiple +instances of C<rt-fulltext-indexer> are not run at the same time. + +=head1 MYSQL + +MySQL does not support full-text indexing natively. However, it does +integrate with the external Sphinx engine, available from +L<http://sphinxsearch.com>. Unfortunately, Sphinx integration (using +SphinxSE) does require that you recompile MySQL from source. Most +distribution-provided packages for MySQL do not include SphinxSE +integration, merely the external Sphinx tools; these are not sufficient +for RT's needs. + +=head2 Compiling MySQL and SphinxSE + +SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not +been tested at this time. Sphinx version 2.0.1 has been tested to work, +but version 0.9.9 may work as well. Compilation and installation +instructions for MySQL with SphinxSE can be found at +L<http://sphinxsearch.com/docs/current.html#sphinxse-installing>. + +=head2 Creating and configuring the index + +Once MySQL has been recompiled with SphinxSE, and Sphinx itself is +installed, you may create the required SphinxSE communication table via: + + sbin/rt-setup-fulltext-index + +If you have a non-standard database administrator username or password, +you may need to pass the C<--dba> or C<--dba-password> options: + + sbin/rt-setup-fulltext-index --dba root --dba-password secret + +This will also provide you with the appropriate C<%FullTextSearch> +configuration to add to your F<RT_SiteConfig.pm>; you will need to +restart your webserver after making these changes. It will also print a +sample Sphinx configuration, which should be placed in +F</etc/sphinx.conf>, or equivalent. + +To fill the index, you will need to run the C<indexer> command-line tool +provided by Sphinx: + + indexer rt + +Finally, start the Sphinx search daemon: + + searchd + +=head2 Updating the index + +To keep the index up-to-date, you will need to run: + + indexer rt --rotate + +...at regular intervals in order to pick up new and updated attachments +from RT's database. Failure to do so will result in stale data. + +=head2 Caveats + +Sphinx only returns a finite number of matches to any query; this number +is controlled by C<max_matches> in F</etc/sphinx.conf> and +C<%FullTextSearch>'s C<MaxMatches> in C<RT_SiteConfig.pm>, which must be +kept in sync. The default, set during C<rt-setup-fulltext-index>, is +10000. This limit may lead to false negatives in search results if the +maximum number of matches is reached but the results returned do not +match RT's other criteria. + +Take, for example, the instance where Sphinx is configured to return a +maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the +string "target", but only ticket 5 is in status "Open". A search for +C<Content LIKE 'target' AND Status = 'Open'> may return no results, +despite ticket 5 matching those criteria, as Sphinx will only return +tickets 1, 2, and 3 as possible matches. + +After index creation, altering C<MaxMatches> in C<RT_SiteConfig.pm> is +insufficient to adjust this limit; both C<max_matches> in +F</etc/sphinx.conf> and C<%FullTextSearch>'s C<MaxMatches> in +C<RT_SiteConfig.pm> must be updated. + +=head1 ORACLE + +=head2 Creating and configuring the index + +Oracle supports full-text indexing natively using the Oracle Text +package. Once Oracle Text is installed and configured, run: + + sbin/rt-setup-fulltext-index + +If you have a non-standard database administrator username or password, +you may need to pass the C<--dba> or C<--dba-password> options: + + sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret + +This will create an Oracle CONTEXT index on the Content column in the +Attachments table, as well as several preferences, functions and +triggers to support this index. The script will also output an +appropriate C<%FullTextSearch> configuration to add to your +F<RT_SiteConfig>. + +=head2 Updating the index + +To update the index, you will need to run the following at regular +intervals: + + sbin/rt-fulltext-indexer + +This, in effect, simply runs: + + begin + ctx_ddl.sync_index('rt_fts_index', '2M'); + end; + +The amount of memory used for the sync can be controlled with the +C<--memory> option: + + rt-fulltext-indexer --memory 10M + +Instead of being run via C<cron>, this may instead be run via a +DBMS_JOB; read the B<Managing DML Operations for a CONTEXT Index> +chapter of Oracle's B<Text Application Developer's Guide> for details +how to keep the index optimized, perform garbage collection, and other +tasks. + +=cut |