]>
Commit | Line | Data |
---|---|---|
84fb5b46 MKG |
1 | =head1 NAME |
2 | ||
3 | Full text indexing in RT | |
4 | ||
5 | =head1 LIMITATIONS | |
6 | ||
7 | While all of the below solutions can search for Unicode characters, they | |
8 | are not otherwise Unicode aware, and do no case folding, normalization, | |
9 | or the like. That is, a string that contains C<U+0065 LATIN SMALL | |
10 | LETTER E> followed by C<U+0301 COMBINING ACUTE ACCENT> will not match a | |
11 | search for C<U+00E9 LATIN SMALL LETTER E WITH ACUTE>. They also only | |
12 | know how to tokenize C<latin-1>-ish languages where words are separated | |
13 | by whitespace or similar characters; as such, support for searching for | |
14 | Japanese and Chinese content is extremely limited. | |
15 | ||
16 | =head1 POSTGRES | |
17 | ||
18 | =head2 Creating and configuring the index | |
19 | ||
20 | Postgres 8.3 and above support full-text searching natively; to set up | |
21 | the required C<ts_vector> column, and create either a C<GiN> or C<GiST> | |
22 | index on it, run: | |
23 | ||
24 | sbin/rt-setup-fulltext-index | |
25 | ||
26 | If you have a non-standard database administrator username or password, | |
27 | you may need to pass the C<--dba> or C<--dba-password> options: | |
28 | ||
29 | sbin/rt-setup-fulltext-index --dba postgres --dba-password secret | |
30 | ||
31 | This will also output an appropriate C<%FullTextSearch> configuration to | |
32 | add to your F<RT_SiteConfig.pm>; you will need to restart your webserver | |
33 | after making these changes. However, the index will also need to be | |
34 | filled before it can be used. To update the index initially, run: | |
35 | ||
36 | sbin/rt-fulltext-indexer --all | |
37 | ||
38 | This will tokenize and index all existing attachments in your database; | |
39 | it may take quite a while if your database already has a large number of | |
40 | tickets in it. | |
41 | ||
42 | =head2 Updating the index | |
43 | ||
44 | To keep the index up-to-date, you will need to run: | |
45 | ||
46 | sbin/rt-fulltext-indexer | |
47 | ||
48 | ...at regular intervals. By default, this will only tokenize up to 100 | |
49 | tickets at a time; you can adjust this upwards by passing | |
50 | C<--limit 500>. Larger batch sizes will take longer and | |
51 | consume more memory. Care should be taken to ensure that multiple | |
52 | instances of C<rt-fulltext-indexer> are not run at the same time. | |
53 | ||
54 | =head1 MYSQL | |
55 | ||
56 | MySQL does not support full-text indexing natively. However, it does | |
57 | integrate with the external Sphinx engine, available from | |
58 | L<http://sphinxsearch.com>. Unfortunately, Sphinx integration (using | |
59 | SphinxSE) does require that you recompile MySQL from source. Most | |
60 | distribution-provided packages for MySQL do not include SphinxSE | |
61 | integration, merely the external Sphinx tools; these are not sufficient | |
62 | for RT's needs. | |
63 | ||
64 | =head2 Compiling MySQL and SphinxSE | |
65 | ||
66 | SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not | |
67 | been tested at this time. Sphinx version 2.0.1 has been tested to work, | |
68 | but version 0.9.9 may work as well. Compilation and installation | |
69 | instructions for MySQL with SphinxSE can be found at | |
70 | L<http://sphinxsearch.com/docs/current.html#sphinxse-installing>. | |
71 | ||
72 | =head2 Creating and configuring the index | |
73 | ||
74 | Once MySQL has been recompiled with SphinxSE, and Sphinx itself is | |
75 | installed, you may create the required SphinxSE communication table via: | |
76 | ||
77 | sbin/rt-setup-fulltext-index | |
78 | ||
79 | If you have a non-standard database administrator username or password, | |
80 | you may need to pass the C<--dba> or C<--dba-password> options: | |
81 | ||
82 | sbin/rt-setup-fulltext-index --dba root --dba-password secret | |
83 | ||
84 | This will also provide you with the appropriate C<%FullTextSearch> | |
85 | configuration to add to your F<RT_SiteConfig.pm>; you will need to | |
86 | restart your webserver after making these changes. It will also print a | |
87 | sample Sphinx configuration, which should be placed in | |
88 | F</etc/sphinx.conf>, or equivalent. | |
89 | ||
90 | To fill the index, you will need to run the C<indexer> command-line tool | |
91 | provided by Sphinx: | |
92 | ||
93 | indexer rt | |
94 | ||
95 | Finally, start the Sphinx search daemon: | |
96 | ||
97 | searchd | |
98 | ||
99 | =head2 Updating the index | |
100 | ||
101 | To keep the index up-to-date, you will need to run: | |
102 | ||
103 | indexer rt --rotate | |
104 | ||
105 | ...at regular intervals in order to pick up new and updated attachments | |
106 | from RT's database. Failure to do so will result in stale data. | |
107 | ||
108 | =head2 Caveats | |
109 | ||
110 | Sphinx only returns a finite number of matches to any query; this number | |
111 | is controlled by C<max_matches> in F</etc/sphinx.conf> and | |
112 | C<%FullTextSearch>'s C<MaxMatches> in C<RT_SiteConfig.pm>, which must be | |
113 | kept in sync. The default, set during C<rt-setup-fulltext-index>, is | |
114 | 10000. This limit may lead to false negatives in search results if the | |
115 | maximum number of matches is reached but the results returned do not | |
116 | match RT's other criteria. | |
117 | ||
118 | Take, for example, the instance where Sphinx is configured to return a | |
119 | maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the | |
120 | string "target", but only ticket 5 is in status "Open". A search for | |
121 | C<Content LIKE 'target' AND Status = 'Open'> may return no results, | |
122 | despite ticket 5 matching those criteria, as Sphinx will only return | |
123 | tickets 1, 2, and 3 as possible matches. | |
124 | ||
125 | After index creation, altering C<MaxMatches> in C<RT_SiteConfig.pm> is | |
126 | insufficient to adjust this limit; both C<max_matches> in | |
127 | F</etc/sphinx.conf> and C<%FullTextSearch>'s C<MaxMatches> in | |
128 | C<RT_SiteConfig.pm> must be updated. | |
129 | ||
130 | =head1 ORACLE | |
131 | ||
132 | =head2 Creating and configuring the index | |
133 | ||
134 | Oracle supports full-text indexing natively using the Oracle Text | |
135 | package. Once Oracle Text is installed and configured, run: | |
136 | ||
137 | sbin/rt-setup-fulltext-index | |
138 | ||
139 | If you have a non-standard database administrator username or password, | |
140 | you may need to pass the C<--dba> or C<--dba-password> options: | |
141 | ||
142 | sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret | |
143 | ||
144 | This will create an Oracle CONTEXT index on the Content column in the | |
145 | Attachments table, as well as several preferences, functions and | |
146 | triggers to support this index. The script will also output an | |
147 | appropriate C<%FullTextSearch> configuration to add to your | |
148 | F<RT_SiteConfig>. | |
149 | ||
150 | =head2 Updating the index | |
151 | ||
152 | To update the index, you will need to run the following at regular | |
153 | intervals: | |
154 | ||
155 | sbin/rt-fulltext-indexer | |
156 | ||
157 | This, in effect, simply runs: | |
158 | ||
159 | begin | |
160 | ctx_ddl.sync_index('rt_fts_index', '2M'); | |
161 | end; | |
162 | ||
163 | The amount of memory used for the sync can be controlled with the | |
164 | C<--memory> option: | |
165 | ||
166 | rt-fulltext-indexer --memory 10M | |
167 | ||
168 | Instead of being run via C<cron>, this may instead be run via a | |
169 | DBMS_JOB; read the B<Managing DML Operations for a CONTEXT Index> | |
170 | chapter of Oracle's B<Text Application Developer's Guide> for details | |
171 | how to keep the index optimized, perform garbage collection, and other | |
172 | tasks. | |
173 | ||
174 | =cut |