MATCH AGAINST
Description
This expression is used for text retrieval on a full-text index. The AGAINST clause accepts a search string and searches the index based on the character set comparison. For each row in the table, the return value of MATCH represents the relevance between the search string and the data in the row, which is the similarity between the text in the search string and the text in the data table.
Syntax
MATCH (column_set) AGAINST (query_expr [mode_flag])
column_set:
column_name [, column_name ...]
mode_flag:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE
Parameters
| Parameter | Description |
|---|---|
| column_set | Specifies the column to be searched. If multiple columns are specified, they must be separated by commas. The query must have a full-text index that exactly matches the column_set parameter to execute the MATCH AGAINST expression. |
| query_expr | Specifies the keyword or phrase to be searched, which is the parameter to be matched. seekdb will tokenize query_expr using the tokenizer of the full-text index and then perform the search on the full-text index. |
| mode_flag | Optional. Specifies the mode of the full-text search. The default value is IN NATURAL LANGUAGE MODE. For more information, see mode_flag. |
mode_flag
-
IN NATURAL LANGUAGE MODE: The default value. This mode specifies that the search is performed in natural language mode. In this mode, the full-text search is performed by tokenizing the query expression (query_expr) to obtain a set of tokens (token), and then matching the tokens with those in the index. A match is considered successful if at least one token matches. Additionally, the relevance of the matched rows is ranked using the Okapi BM25 method based on term frequency.By default, or when
IN NATURAL LANGUAGE MODEis specified, theMATCH AGAINSTexpression uses theNATURAL LANGUAGEmode for full-text search. In this mode,AGAINSTaccepts a search string and searches the index based on the character set comparison. For each row in the table, the return value ofMATCHrepresents the relevance between the search string and the data in the row, which is the similarity between the text in the search string and the text in the data table. -
IN BOOLEAN MODE: This mode specifies that the search is performed in boolean mode. The current version supports three commonly used boolean operators and nested operations, as follows:-
+: Represents theANDoperator, which is used to find the intersection of sets. -
-: Represents the negation operator, which is used to find the difference between sets. -
No operator: When used alone, it represents the
ORoperator, which is used to find the union of sets. For example,A BrepresentsA OR B. When used with other operators, it can increase the relevance of the sentences but will lose theORsemantics. For example,+A Bmeans that A must be present, and the relevance of A and B in the sentence is calculated. -
(): Represents nested operations. When used without an operator, it has theORsemantics. For example,+A (nested clause)means that A or the nested clause must be present.Examples:
-
The output sentence must contain the word "computer".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE); -
The output sentence must contain the word "computer" and must not contain the word "weather".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE); -
The output sentence must contain the word "computer", and "oceanbase" is more relevant.
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);
-
-
Considerations
The MATCH AGAINST expression not only represents text matching but also has other semantics in different clauses of SQL:
-
Projection of relevance:
- Full-text search supports ranking based on term frequency. The
MATCH AGAINSTexpression can be used to project the relevance. - Relevance represents the degree of match between the matched data rows and the query (Query) in the
MATCH AGAINSTexpression. - Relevance is a
DOUBLEtype data greater than or equal to 0. A value of 0 indicates that the data row is unrelated to the query. The higher the value, the stronger the relevance.
- Full-text search supports ranking based on term frequency. The
-
Filtering semantics:
-
If the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to other filtering conditions usingAND, it has filtering semantics. This means that data rows that do not match the query will be filtered out.Example:
SELECT id, digest, detail
FROM t1
WHERE MATCH (detail) AGAINST ('oceanbase');This is equivalent to:
SELECT id, digest, detail
FROM t1
WHERE MATCH (detail) AGAINST ('oceanbase') > 0;
-
-
Sorting semantics:
- If the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to other filtering conditions usingAND, it has sorting semantics. - This means that the results of the full-text search will be sorted in descending order based on the relevance ranking of the
MATCH AGAINSTexpression. - When multiple
MATCH AGAINSTexpressions are present, the sorting will be based on the relevance of the firstMATCH AGAINSTexpression.
Example:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
FROM t1
WHERE MATCH (detail) against ('oceanbase');This is equivalent to:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
FROM t1
WHERE MATCH (detail) AGAINST ('oceanbase')
ORDER BY relevance DESC; - If the
Execution method of text retrieval
-
When the
MATCH AGAINSTexpression in the SQL statement has filtering semantics, seekdb can execute the query by scanning the full-text index, calculating theMATCH AGAINSTexpression, and then accessing the table. -
When the
MATCH AGAINSTexpression in the SQL statement does not have filtering semantics, seekdb can execute the query by scanning and filtering using other secondary indexes and then accessing the full-text index based on the primary key. -
When the
WHEREclause of the SQL statement contains theMATCH AGAINSTexpression and multiple filtering conditions connected byAND, and other secondary indexes can be hit, seekdb will try to select the most suitable index based on the cost. -
When scanning the full-text index to calculate the
MATCH AGAINSTexpression, if the query contains aLIMITclause, seekdb will try to push the top-k calculation down to the full-text index scan to improve performance. -
When querying with
MATCH AGAINSTpredicates connected byOR, an index merge plan will be automatically generated. Whether to use the index merge plan depends on the cost competition result.
Examples
-
Create a table named
test_tbl1, and create two full-text indexes namedft_idx1_test_tbl1andft_idx2_test_tbl1, and an index namedidx_test_tbl1.CREATE TABLE test_tbl1(id INT, ref_no INT, digest VARCHAR(512), detail VARCHAR(4096),
FULLTEXT INDEX ft_idx1_test_tbl1(detail),
FULLTEXT INDEX ft_idx2_test_tbl1(digest, detail),
INDEX idx_test_tbl1 (id)); -
Insert test data into the
test_tbl1table.INSERT INTO test_tbl1 VALUES
(1, 1234, 'fulltext', 'Try text retrieval with OceanBase fulltext index'),
(2, 2345, 'log', 'OceanBase can halp with log analysis'),
(3, 3456, 'order', 'Simple text retrieval scan will return result set in order of descending ranking in OceanBase'),
(4, 4567, 'ranking', 'OceanBase will ranking relevance to query for matched result set'),
(5, 5678, 'filter', 'Using text retrieval as a filter condition would be probably more efficient'); -
Project the relevance by using the
MATCH AGAINSTexpression.SELECT id, digest, detail, MATCH (detail) AGAINST ('oceanbase') AS relevance
FROM test_tbl1;The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+--------------------+
| id | digest | detail | relevance |
+------+----------+-----------------------------------------------------------------------------------------------+--------------------+
| 1 | fulltext | Try text retrieval with OceanBase fulltext index | 0.2989130434782609 |
| 2 | log | OceanBase can halp with log analysis | 0.3142857142857143 |
| 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | 0.240174672489083 |
| 4 | ranking | OceanBase will ranking relevance to query for matched result set | 0.2849740932642488 |
| 5 | filter | Using text retrieval as a filter condition would be probably more efficient | 0 |
+------+----------+-----------------------------------------------------------------------------------------------+--------------------+
5 rows in set (0.003 sec) -
Filter the data by using the
MATCH AGAINSTexpression.SELECT id, digest, detail
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase');The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+
| id | digest | detail |
+------+----------+-----------------------------------------------------------------------------------------------+
| 2 | log | OceanBase can halp with log analysis |
| 1 | fulltext | Try text retrieval with OceanBase fulltext index |
| 4 | ranking | OceanBase will ranking relevance to query for matched result set |
| 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase |
+------+----------+-----------------------------------------------------------------------------------------------+
4 rows in set (0.002 sec) -
Retrieve and sort the data by using the
MATCH AGAINSTexpression.SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase');or
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase')
ORDER BY relevance DESC;The return result is as follows:
+------+----------+--------------------+
| id | digest | relevance |
+------+----------+--------------------+
| 2 | log | 0.3142857142857143 |
| 1 | fulltext | 0.2989130434782609 |
| 4 | ranking | 0.2849740932642488 |
| 3 | order | 0.240174672489083 |
+------+----------+--------------------+
4 rows in set (0.002 sec)
Example of text retrieval execution plan
-
The
MATCH AGAINSTexpression contains filtering semantics, and the full-text indexft_idx1_test_tbl1is scanned.EXPLAIN SELECT id, digest, detail
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase')
AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================================= |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------------------------------------- |
| |0 |SORT | |1 |30 | |
| |1 |└─TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|1 |30 | |
| ============================================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 |
| sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]) |
| 1 - output([test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest], [MATCH(test_tbl1.detail) AGAINST('oceanbase')]), filter([test_tbl1.id = 3]), rowset=16 |
| access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), |
| pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
15 rows in set (0.008 sec) -
The
MATCH AGAINSTexpression contains filtering semantics, and it hits the general secondary indexidx_test_tbl1. The indexidx_test_tbl1has a lower cost than the full-text indexft_idx1_test_tbl1, so the indexidx_test_tbl1is scanned.EXPLAIN SELECT id
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase simple text retreival scan on fulltext index')
AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ====================================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------------------- |
| |0 |SORT | |1 |103 | |
| |1 |└─TABLE RANGE SCAN|test_tbl1(idx_test_tbl1)|1 |103 | |
| ====================================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([test_tbl1.id]), filter(nil), rowset=16 |
| sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index'), DESC]) |
| 1 - output([test_tbl1.id], [MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index')]), filter([MATCH(test_tbl1.detail) |
| AGAINST('oceanbase simple text retreival scan on fulltext index')]), rowset=16 |
| access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(3,MIN ; 3,MAX), |
| range_cond([test_tbl1.id = 3]), has_functional_lookup=true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
16 rows in set (0.006 sec) -
The
MATCH AGAINSTexpression does not contain filtering semantics. Therefore, the result set cannot be filtered, and the full-text index cannot be scanned. The primary table is scanned.EXPLAIN SELECT id, digest, detail
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase')
OR id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ==================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------- |
| |0 |TABLE FULL SCAN|test_tbl1|5 |503 | |
| ==================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 |
| access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([test_tbl1.__pk_increment]), range(MIN ; MAX)always true, has_functional_lookup=true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec) -
The
MATCH AGAINSTexpression does not contain filtering semantics. Therefore, the result set cannot be filtered, and the full-text index cannot be scanned. The indexidx_test_tbl1is hit, and the indexidx_test_tbl1is scanned.EXPLAIN SELECT id
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase')
OR id = 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------------------------------------+
| =================================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------------- |
| |0 |TABLE FULL SCAN|test_tbl1(idx_test_tbl1)|5 |503 | |
| =================================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([test_tbl1.id]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 |
| access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true, has_functional_lookup=true |
+------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec) -
The
MATCH AGAINSTexpression contains filtering semantics, and the corresponding full-text indexft_idx1_test_tbl1is scanned. TheLIMITclause is pushed down to the index scan for top-k retrieval.EXPLAIN SELECT id, digest, detail
FROM test_tbl1
WHERE MATCH (detail) AGAINST ('oceanbase')
LIMIT 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------------------------+
| =========================================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| --------------------------------------------------------------------------- |
| |0 |TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|4 |31 | |
| =========================================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 |
| access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) |
| is_index_back=true, is_global_index=false, |
| calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), |
| pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')), |
| sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]), limit(3), offset(nil), with_ties(false) |
+------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.004 sec)