Master to 4.2.8
[usit-rt.git] / docs / full_text_indexing.pod
CommitLineData
84fb5b46
MKG
1=head1 NAME
2
3Full text indexing in RT
4
5=head1 LIMITATIONS
6
7While all of the below solutions can search for Unicode characters, they
8are not otherwise Unicode aware, and do no case folding, normalization,
9or the like. That is, a string that contains C<U+0065 LATIN SMALL
10LETTER E> followed by C<U+0301 COMBINING ACUTE ACCENT> will not match a
11search for C<U+00E9 LATIN SMALL LETTER E WITH ACUTE>. They also only
12know how to tokenize C<latin-1>-ish languages where words are separated
13by whitespace or similar characters; as such, support for searching for
14Japanese and Chinese content is extremely limited.
15
16=head1 POSTGRES
17
18=head2 Creating and configuring the index
19
20Postgres 8.3 and above support full-text searching natively; to set up
21the required C<ts_vector> column, and create either a C<GiN> or C<GiST>
22index on it, run:
23
24 sbin/rt-setup-fulltext-index
25
26If you have a non-standard database administrator username or password,
27you may need to pass the C<--dba> or C<--dba-password> options:
28
29 sbin/rt-setup-fulltext-index --dba postgres --dba-password secret
30
31This will also output an appropriate C<%FullTextSearch> configuration to
32add to your F<RT_SiteConfig.pm>; you will need to restart your webserver
33after making these changes. However, the index will also need to be
34filled before it can be used. To update the index initially, run:
35
36 sbin/rt-fulltext-indexer --all
37
38This will tokenize and index all existing attachments in your database;
39it may take quite a while if your database already has a large number of
40tickets in it.
41
42=head2 Updating the index
43
44To keep the index up-to-date, you will need to run:
45
46 sbin/rt-fulltext-indexer
47
48...at regular intervals. By default, this will only tokenize up to 100
49tickets at a time; you can adjust this upwards by passing
50C<--limit 500>. Larger batch sizes will take longer and
c33a4027
MKG
51consume more memory.
52
53If there is already an instances of C<rt-fulltext-indexer> running, new
54ones will exit abnormally (with exit code 1) and the error message
55"rt-fulltext-indexer is already running." You can suppress this message
56and end those processes normally (with exit code 0) using the C<--quiet>
57option; this is particularly useful when running the command via
58C<cron>:
59
60 sbin/rt-fulltext-indexer --quiet
84fb5b46
MKG
61
62=head1 MYSQL
63
64MySQL does not support full-text indexing natively. However, it does
65integrate with the external Sphinx engine, available from
66L<http://sphinxsearch.com>. Unfortunately, Sphinx integration (using
67SphinxSE) does require that you recompile MySQL from source. Most
68distribution-provided packages for MySQL do not include SphinxSE
69integration, merely the external Sphinx tools; these are not sufficient
70for RT's needs.
71
72=head2 Compiling MySQL and SphinxSE
73
74SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not
75been tested at this time. Sphinx version 2.0.1 has been tested to work,
76but version 0.9.9 may work as well. Compilation and installation
77instructions for MySQL with SphinxSE can be found at
78L<http://sphinxsearch.com/docs/current.html#sphinxse-installing>.
79
80=head2 Creating and configuring the index
81
82Once MySQL has been recompiled with SphinxSE, and Sphinx itself is
83installed, you may create the required SphinxSE communication table via:
84
85 sbin/rt-setup-fulltext-index
86
87If you have a non-standard database administrator username or password,
88you may need to pass the C<--dba> or C<--dba-password> options:
89
90 sbin/rt-setup-fulltext-index --dba root --dba-password secret
91
92This will also provide you with the appropriate C<%FullTextSearch>
93configuration to add to your F<RT_SiteConfig.pm>; you will need to
94restart your webserver after making these changes. It will also print a
95sample Sphinx configuration, which should be placed in
96F</etc/sphinx.conf>, or equivalent.
97
98To fill the index, you will need to run the C<indexer> command-line tool
99provided by Sphinx:
100
101 indexer rt
102
103Finally, start the Sphinx search daemon:
104
105 searchd
106
107=head2 Updating the index
108
109To keep the index up-to-date, you will need to run:
110
111 indexer rt --rotate
112
113...at regular intervals in order to pick up new and updated attachments
114from RT's database. Failure to do so will result in stale data.
115
116=head2 Caveats
117
118Sphinx only returns a finite number of matches to any query; this number
119is controlled by C<max_matches> in F</etc/sphinx.conf> and
120C<%FullTextSearch>'s C<MaxMatches> in C<RT_SiteConfig.pm>, which must be
121kept in sync. The default, set during C<rt-setup-fulltext-index>, is
12210000. This limit may lead to false negatives in search results if the
123maximum number of matches is reached but the results returned do not
124match RT's other criteria.
125
126Take, for example, the instance where Sphinx is configured to return a
127maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the
128string "target", but only ticket 5 is in status "Open". A search for
129C<Content LIKE 'target' AND Status = 'Open'> may return no results,
130despite ticket 5 matching those criteria, as Sphinx will only return
131tickets 1, 2, and 3 as possible matches.
132
133After index creation, altering C<MaxMatches> in C<RT_SiteConfig.pm> is
134insufficient to adjust this limit; both C<max_matches> in
135F</etc/sphinx.conf> and C<%FullTextSearch>'s C<MaxMatches> in
136C<RT_SiteConfig.pm> must be updated.
137
138=head1 ORACLE
139
140=head2 Creating and configuring the index
141
142Oracle supports full-text indexing natively using the Oracle Text
143package. Once Oracle Text is installed and configured, run:
144
145 sbin/rt-setup-fulltext-index
146
147If you have a non-standard database administrator username or password,
148you may need to pass the C<--dba> or C<--dba-password> options:
149
150 sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret
151
152This will create an Oracle CONTEXT index on the Content column in the
153Attachments table, as well as several preferences, functions and
154triggers to support this index. The script will also output an
155appropriate C<%FullTextSearch> configuration to add to your
156F<RT_SiteConfig>.
157
158=head2 Updating the index
159
160To update the index, you will need to run the following at regular
161intervals:
162
163 sbin/rt-fulltext-indexer
164
165This, in effect, simply runs:
166
167 begin
168 ctx_ddl.sync_index('rt_fts_index', '2M');
169 end;
170
171The amount of memory used for the sync can be controlled with the
172C<--memory> option:
173
174 rt-fulltext-indexer --memory 10M
175
c33a4027
MKG
176If there is already an instance of C<rt-fulltext-indexer> running, new
177ones will exit abnormally (with exit code 1) and the error message
178"rt-fulltext-indexer is already running." You can suppress this message
179and end those processes normally (with exit code 0) using the C<--quiet>
180option; this is particularly useful when running the command via
181C<cron>:
182
183 sbin/rt-fulltext-indexer --quiet
184
84fb5b46
MKG
185Instead of being run via C<cron>, this may instead be run via a
186DBMS_JOB; read the B<Managing DML Operations for a CONTEXT Index>
187chapter of Oracle's B<Text Application Developer's Guide> for details
188how to keep the index optimized, perform garbage collection, and other
189tasks.
190
191=cut