3 Full text indexing in RT
7 While all of the below solutions can search for Unicode characters, they
8 are not otherwise Unicode aware, and do no case folding, normalization,
9 or the like. That is, a string that contains C<U+0065 LATIN SMALL
10 LETTER E> followed by C<U+0301 COMBINING ACUTE ACCENT> will not match a
11 search for C<U+00E9 LATIN SMALL LETTER E WITH ACUTE>. They also only
12 know how to tokenize C<latin-1>-ish languages where words are separated
13 by whitespace or similar characters; as such, support for searching for
14 Japanese and Chinese content is extremely limited.
18 =head2 Creating and configuring the index
20 Postgres 8.3 and above support full-text searching natively; to set up
21 the required C<ts_vector> column, and create either a C<GiN> or C<GiST>
24 sbin/rt-setup-fulltext-index
26 If you have a non-standard database administrator username or password,
27 you may need to pass the C<--dba> or C<--dba-password> options:
29 sbin/rt-setup-fulltext-index --dba postgres --dba-password secret
31 This will also output an appropriate C<%FullTextSearch> configuration to
32 add to your F<RT_SiteConfig.pm>; you will need to restart your webserver
33 after making these changes. However, the index will also need to be
34 filled before it can be used. To update the index initially, run:
36 sbin/rt-fulltext-indexer --all
38 This will tokenize and index all existing attachments in your database;
39 it may take quite a while if your database already has a large number of
42 =head2 Updating the index
44 To keep the index up-to-date, you will need to run:
46 sbin/rt-fulltext-indexer
48 ...at regular intervals. By default, this will only tokenize up to 100
49 tickets at a time; you can adjust this upwards by passing
50 C<--limit 500>. Larger batch sizes will take longer and
53 If there is already an instances of C<rt-fulltext-indexer> running, new
54 ones will exit abnormally (with exit code 1) and the error message
55 "rt-fulltext-indexer is already running." You can suppress this message
56 and end those processes normally (with exit code 0) using the C<--quiet>
57 option; this is particularly useful when running the command via
60 sbin/rt-fulltext-indexer --quiet
64 MySQL does not support full-text indexing natively. However, it does
65 integrate with the external Sphinx engine, available from
66 L<http://sphinxsearch.com>. Unfortunately, Sphinx integration (using
67 SphinxSE) does require that you recompile MySQL from source. Most
68 distribution-provided packages for MySQL do not include SphinxSE
69 integration, merely the external Sphinx tools; these are not sufficient
72 =head2 Compiling MySQL and SphinxSE
74 SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not
75 been tested at this time. Sphinx version 2.0.1 has been tested to work,
76 but version 0.9.9 may work as well. Compilation and installation
77 instructions for MySQL with SphinxSE can be found at
78 L<http://sphinxsearch.com/docs/current.html#sphinxse-installing>.
80 =head2 Creating and configuring the index
82 Once MySQL has been recompiled with SphinxSE, and Sphinx itself is
83 installed, you may create the required SphinxSE communication table via:
85 sbin/rt-setup-fulltext-index
87 If you have a non-standard database administrator username or password,
88 you may need to pass the C<--dba> or C<--dba-password> options:
90 sbin/rt-setup-fulltext-index --dba root --dba-password secret
92 This will also provide you with the appropriate C<%FullTextSearch>
93 configuration to add to your F<RT_SiteConfig.pm>; you will need to
94 restart your webserver after making these changes. It will also print a
95 sample Sphinx configuration, which should be placed in
96 F</etc/sphinx.conf>, or equivalent.
98 To fill the index, you will need to run the C<indexer> command-line tool
103 Finally, start the Sphinx search daemon:
107 =head2 Updating the index
109 To keep the index up-to-date, you will need to run:
113 ...at regular intervals in order to pick up new and updated attachments
114 from RT's database. Failure to do so will result in stale data.
118 Sphinx only returns a finite number of matches to any query; this number
119 is controlled by C<max_matches> in F</etc/sphinx.conf> and
120 C<%FullTextSearch>'s C<MaxMatches> in C<RT_SiteConfig.pm>, which must be
121 kept in sync. The default, set during C<rt-setup-fulltext-index>, is
122 10000. This limit may lead to false negatives in search results if the
123 maximum number of matches is reached but the results returned do not
124 match RT's other criteria.
126 Take, for example, the instance where Sphinx is configured to return a
127 maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the
128 string "target", but only ticket 5 is in status "Open". A search for
129 C<Content LIKE 'target' AND Status = 'Open'> may return no results,
130 despite ticket 5 matching those criteria, as Sphinx will only return
131 tickets 1, 2, and 3 as possible matches.
133 After index creation, altering C<MaxMatches> in C<RT_SiteConfig.pm> is
134 insufficient to adjust this limit; both C<max_matches> in
135 F</etc/sphinx.conf> and C<%FullTextSearch>'s C<MaxMatches> in
136 C<RT_SiteConfig.pm> must be updated.
140 =head2 Creating and configuring the index
142 Oracle supports full-text indexing natively using the Oracle Text
143 package. Once Oracle Text is installed and configured, run:
145 sbin/rt-setup-fulltext-index
147 If you have a non-standard database administrator username or password,
148 you may need to pass the C<--dba> or C<--dba-password> options:
150 sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret
152 This will create an Oracle CONTEXT index on the Content column in the
153 Attachments table, as well as several preferences, functions and
154 triggers to support this index. The script will also output an
155 appropriate C<%FullTextSearch> configuration to add to your
158 =head2 Updating the index
160 To update the index, you will need to run the following at regular
163 sbin/rt-fulltext-indexer
165 This, in effect, simply runs:
168 ctx_ddl.sync_index('rt_fts_index', '2M');
171 The amount of memory used for the sync can be controlled with the
174 rt-fulltext-indexer --memory 10M
176 If there is already an instance of C<rt-fulltext-indexer> running, new
177 ones will exit abnormally (with exit code 1) and the error message
178 "rt-fulltext-indexer is already running." You can suppress this message
179 and end those processes normally (with exit code 0) using the C<--quiet>
180 option; this is particularly useful when running the command via
183 sbin/rt-fulltext-indexer --quiet
185 Instead of being run via C<cron>, this may instead be run via a
186 DBMS_JOB; read the B<Managing DML Operations for a CONTEXT Index>
187 chapter of Oracle's B<Text Application Developer's Guide> for details
188 how to keep the index optimized, perform garbage collection, and other