Upgrade to 4.2.2
[usit-rt.git] / docs / full_text_indexing.pod
CommitLineData
84fb5b46
MKG
1=head1 NAME
2
3Full text indexing in RT
4
5=head1 LIMITATIONS
6
7While all of the below solutions can search for Unicode characters, they
8are not otherwise Unicode aware, and do no case folding, normalization,
9or the like. That is, a string that contains C<U+0065 LATIN SMALL
10LETTER E> followed by C<U+0301 COMBINING ACUTE ACCENT> will not match a
11search for C<U+00E9 LATIN SMALL LETTER E WITH ACUTE>. They also only
12know how to tokenize C<latin-1>-ish languages where words are separated
13by whitespace or similar characters; as such, support for searching for
14Japanese and Chinese content is extremely limited.
15
16=head1 POSTGRES
17
18=head2 Creating and configuring the index
19
20Postgres 8.3 and above support full-text searching natively; to set up
21the required C<ts_vector> column, and create either a C<GiN> or C<GiST>
22index on it, run:
23
24 sbin/rt-setup-fulltext-index
25
26If you have a non-standard database administrator username or password,
27you may need to pass the C<--dba> or C<--dba-password> options:
28
29 sbin/rt-setup-fulltext-index --dba postgres --dba-password secret
30
31This will also output an appropriate C<%FullTextSearch> configuration to
32add to your F<RT_SiteConfig.pm>; you will need to restart your webserver
33after making these changes. However, the index will also need to be
34filled before it can be used. To update the index initially, run:
35
36 sbin/rt-fulltext-indexer --all
37
38This will tokenize and index all existing attachments in your database;
39it may take quite a while if your database already has a large number of
40tickets in it.
41
42=head2 Updating the index
43
44To keep the index up-to-date, you will need to run:
45
46 sbin/rt-fulltext-indexer
47
48...at regular intervals. By default, this will only tokenize up to 100
49tickets at a time; you can adjust this upwards by passing
50C<--limit 500>. Larger batch sizes will take longer and
51consume more memory. Care should be taken to ensure that multiple
52instances of C<rt-fulltext-indexer> are not run at the same time.
53
54=head1 MYSQL
55
56MySQL does not support full-text indexing natively. However, it does
57integrate with the external Sphinx engine, available from
58L<http://sphinxsearch.com>. Unfortunately, Sphinx integration (using
59SphinxSE) does require that you recompile MySQL from source. Most
60distribution-provided packages for MySQL do not include SphinxSE
61integration, merely the external Sphinx tools; these are not sufficient
62for RT's needs.
63
64=head2 Compiling MySQL and SphinxSE
65
66SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not
67been tested at this time. Sphinx version 2.0.1 has been tested to work,
68but version 0.9.9 may work as well. Compilation and installation
69instructions for MySQL with SphinxSE can be found at
70L<http://sphinxsearch.com/docs/current.html#sphinxse-installing>.
71
72=head2 Creating and configuring the index
73
74Once MySQL has been recompiled with SphinxSE, and Sphinx itself is
75installed, you may create the required SphinxSE communication table via:
76
77 sbin/rt-setup-fulltext-index
78
79If you have a non-standard database administrator username or password,
80you may need to pass the C<--dba> or C<--dba-password> options:
81
82 sbin/rt-setup-fulltext-index --dba root --dba-password secret
83
84This will also provide you with the appropriate C<%FullTextSearch>
85configuration to add to your F<RT_SiteConfig.pm>; you will need to
86restart your webserver after making these changes. It will also print a
87sample Sphinx configuration, which should be placed in
88F</etc/sphinx.conf>, or equivalent.
89
90To fill the index, you will need to run the C<indexer> command-line tool
91provided by Sphinx:
92
93 indexer rt
94
95Finally, start the Sphinx search daemon:
96
97 searchd
98
99=head2 Updating the index
100
101To keep the index up-to-date, you will need to run:
102
103 indexer rt --rotate
104
105...at regular intervals in order to pick up new and updated attachments
106from RT's database. Failure to do so will result in stale data.
107
108=head2 Caveats
109
110Sphinx only returns a finite number of matches to any query; this number
111is controlled by C<max_matches> in F</etc/sphinx.conf> and
112C<%FullTextSearch>'s C<MaxMatches> in C<RT_SiteConfig.pm>, which must be
113kept in sync. The default, set during C<rt-setup-fulltext-index>, is
11410000. This limit may lead to false negatives in search results if the
115maximum number of matches is reached but the results returned do not
116match RT's other criteria.
117
118Take, for example, the instance where Sphinx is configured to return a
119maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the
120string "target", but only ticket 5 is in status "Open". A search for
121C<Content LIKE 'target' AND Status = 'Open'> may return no results,
122despite ticket 5 matching those criteria, as Sphinx will only return
123tickets 1, 2, and 3 as possible matches.
124
125After index creation, altering C<MaxMatches> in C<RT_SiteConfig.pm> is
126insufficient to adjust this limit; both C<max_matches> in
127F</etc/sphinx.conf> and C<%FullTextSearch>'s C<MaxMatches> in
128C<RT_SiteConfig.pm> must be updated.
129
130=head1 ORACLE
131
132=head2 Creating and configuring the index
133
134Oracle supports full-text indexing natively using the Oracle Text
135package. Once Oracle Text is installed and configured, run:
136
137 sbin/rt-setup-fulltext-index
138
139If you have a non-standard database administrator username or password,
140you may need to pass the C<--dba> or C<--dba-password> options:
141
142 sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret
143
144This will create an Oracle CONTEXT index on the Content column in the
145Attachments table, as well as several preferences, functions and
146triggers to support this index. The script will also output an
147appropriate C<%FullTextSearch> configuration to add to your
148F<RT_SiteConfig>.
149
150=head2 Updating the index
151
152To update the index, you will need to run the following at regular
153intervals:
154
155 sbin/rt-fulltext-indexer
156
157This, in effect, simply runs:
158
159 begin
160 ctx_ddl.sync_index('rt_fts_index', '2M');
161 end;
162
163The amount of memory used for the sync can be controlled with the
164C<--memory> option:
165
166 rt-fulltext-indexer --memory 10M
167
168Instead of being run via C<cron>, this may instead be run via a
169DBMS_JOB; read the B<Managing DML Operations for a CONTEXT Index>
170chapter of Oracle's B<Text Application Developer's Guide> for details
171how to keep the index optimized, perform garbage collection, and other
172tasks.
173
174=cut