Skip to main content
Version: V1.0.0

MATCH AGAINST

Description

This expression is used for text retrieval on a full-text index. The AGAINST clause accepts a search string and searches the index based on the character set comparison. For each row in the table, the return value of MATCH represents the relevance between the search string and the data in the row, which is the similarity between the text in the search string and the text in the data table.

Syntax

MATCH (column_set) AGAINST (query_expr [mode_flag])

column_set:
column_name [, column_name ...]

mode_flag:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE

Parameters

ParameterDescription
column_setSpecifies the column to be searched. If multiple columns are specified, they must be separated by commas. The query must have a full-text index that exactly matches the column_set parameter to execute the MATCH AGAINST expression.
query_exprSpecifies the keyword or phrase to be searched, which is the parameter to be matched. seekdb will tokenize query_expr using the tokenizer of the full-text index and then perform the search on the full-text index.
mode_flagOptional. Specifies the mode of the full-text search. The default value is IN NATURAL LANGUAGE MODE. For more information, see mode_flag.

mode_flag

  • IN NATURAL LANGUAGE MODE: The default value. This mode specifies that the search is performed in natural language mode. In this mode, the full-text search is performed by tokenizing the query expression (query_expr) to obtain a set of tokens (token), and then matching the tokens with those in the index. A match is considered successful if at least one token matches. Additionally, the relevance of the matched rows is ranked using the Okapi BM25 method based on term frequency.

    By default, or when IN NATURAL LANGUAGE MODE is specified, the MATCH AGAINST expression uses the NATURAL LANGUAGE mode for full-text search. In this mode, AGAINST accepts a search string and searches the index based on the character set comparison. For each row in the table, the return value of MATCH represents the relevance between the search string and the data in the row, which is the similarity between the text in the search string and the text in the data table.

  • IN BOOLEAN MODE: This mode specifies that the search is performed in boolean mode. The current version supports three commonly used boolean operators and nested operations, as follows:

    • +: Represents the AND operator, which is used to find the intersection of sets.

    • -: Represents the negation operator, which is used to find the difference between sets.

    • No operator: When used alone, it represents the OR operator, which is used to find the union of sets. For example, A B represents A OR B. When used with other operators, it can increase the relevance of the sentences but will lose the OR semantics. For example, +A B means that A must be present, and the relevance of A and B in the sentence is calculated.

    • (): Represents nested operations. When used without an operator, it has the OR semantics. For example, +A (nested clause) means that A or the nested clause must be present.

      Examples:

      1. The output sentence must contain the word "computer".

        SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);
      2. The output sentence must contain the word "computer" and must not contain the word "weather".

        SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);
      3. The output sentence must contain the word "computer", and "oceanbase" is more relevant.

        SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);

Considerations

The MATCH AGAINST expression not only represents text matching but also has other semantics in different clauses of SQL:

  • Projection of relevance:

    • Full-text search supports ranking based on term frequency. The MATCH AGAINST expression can be used to project the relevance.
    • Relevance represents the degree of match between the matched data rows and the query (Query) in the MATCH AGAINST expression.
    • Relevance is a DOUBLE type data greater than or equal to 0. A value of 0 indicates that the data row is unrelated to the query. The higher the value, the stronger the relevance.
  • Filtering semantics:

    • If the MATCH AGAINST expression appears in the WHERE clause and is connected to other filtering conditions using AND, it has filtering semantics. This means that data rows that do not match the query will be filtered out.

      Example:

      SELECT id, digest, detail
      FROM t1
      WHERE MATCH (detail) AGAINST ('oceanbase');

      This is equivalent to:

      SELECT id, digest, detail
      FROM t1
      WHERE MATCH (detail) AGAINST ('oceanbase') > 0;
  • Sorting semantics:

    • If the MATCH AGAINST expression appears in the WHERE clause and is connected to other filtering conditions using AND, it has sorting semantics.
    • This means that the results of the full-text search will be sorted in descending order based on the relevance ranking of the MATCH AGAINST expression.
    • When multiple MATCH AGAINST expressions are present, the sorting will be based on the relevance of the first MATCH AGAINST expression.

    Example:

    SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
    FROM t1
    WHERE MATCH (detail) against ('oceanbase');

    This is equivalent to:

    SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
    FROM t1
    WHERE MATCH (detail) AGAINST ('oceanbase')
    ORDER BY relevance DESC;

Execution method of text retrieval

  • When the MATCH AGAINST expression in the SQL statement has filtering semantics, seekdb can execute the query by scanning the full-text index, calculating the MATCH AGAINST expression, and then accessing the table.

  • When the MATCH AGAINST expression in the SQL statement does not have filtering semantics, seekdb can execute the query by scanning and filtering using other secondary indexes and then accessing the full-text index based on the primary key.

  • When the WHERE clause of the SQL statement contains the MATCH AGAINST expression and multiple filtering conditions connected by AND, and other secondary indexes can be hit, seekdb will try to select the most suitable index based on the cost.

  • When scanning the full-text index to calculate the MATCH AGAINST expression, if the query contains a LIMIT clause, seekdb will try to push the top-k calculation down to the full-text index scan to improve performance.

  • When querying with MATCH AGAINST predicates connected by OR, an index merge plan will be automatically generated. Whether to use the index merge plan depends on the cost competition result.

Examples

  1. Create a table named test_tbl1, and create two full-text indexes named ft_idx1_test_tbl1 and ft_idx2_test_tbl1, and an index named idx_test_tbl1.

    CREATE TABLE test_tbl1(id INT, ref_no INT, digest VARCHAR(512), detail VARCHAR(4096),
    FULLTEXT INDEX ft_idx1_test_tbl1(detail),
    FULLTEXT INDEX ft_idx2_test_tbl1(digest, detail),
    INDEX idx_test_tbl1 (id));
  2. Insert test data into the test_tbl1 table.

    INSERT INTO test_tbl1 VALUES
    (1, 1234, 'fulltext', 'Try text retrieval with OceanBase fulltext index'),
    (2, 2345, 'log', 'OceanBase can halp with log analysis'),
    (3, 3456, 'order', 'Simple text retrieval scan will return result set in order of descending ranking in OceanBase'),
    (4, 4567, 'ranking', 'OceanBase will ranking relevance to query for matched result set'),
    (5, 5678, 'filter', 'Using text retrieval as a filter condition would be probably more efficient');
  3. Project the relevance by using the MATCH AGAINST expression.

    SELECT id, digest, detail, MATCH (detail) AGAINST ('oceanbase') AS relevance
    FROM test_tbl1;

    The return result is as follows:

    +------+----------+-----------------------------------------------------------------------------------------------+--------------------+
    | id | digest | detail | relevance |
    +------+----------+-----------------------------------------------------------------------------------------------+--------------------+
    | 1 | fulltext | Try text retrieval with OceanBase fulltext index | 0.2989130434782609 |
    | 2 | log | OceanBase can halp with log analysis | 0.3142857142857143 |
    | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | 0.240174672489083 |
    | 4 | ranking | OceanBase will ranking relevance to query for matched result set | 0.2849740932642488 |
    | 5 | filter | Using text retrieval as a filter condition would be probably more efficient | 0 |
    +------+----------+-----------------------------------------------------------------------------------------------+--------------------+
    5 rows in set (0.003 sec)
  4. Filter the data by using the MATCH AGAINST expression.

    SELECT id, digest, detail
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase');

    The return result is as follows:

    +------+----------+-----------------------------------------------------------------------------------------------+
    | id | digest | detail |
    +------+----------+-----------------------------------------------------------------------------------------------+
    | 2 | log | OceanBase can halp with log analysis |
    | 1 | fulltext | Try text retrieval with OceanBase fulltext index |
    | 4 | ranking | OceanBase will ranking relevance to query for matched result set |
    | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase |
    +------+----------+-----------------------------------------------------------------------------------------------+
    4 rows in set (0.002 sec)
  5. Retrieve and sort the data by using the MATCH AGAINST expression.

    SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase');

    or

    SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase')
    ORDER BY relevance DESC;

    The return result is as follows:

    +------+----------+--------------------+
    | id | digest | relevance |
    +------+----------+--------------------+
    | 2 | log | 0.3142857142857143 |
    | 1 | fulltext | 0.2989130434782609 |
    | 4 | ranking | 0.2849740932642488 |
    | 3 | order | 0.240174672489083 |
    +------+----------+--------------------+
    4 rows in set (0.002 sec)

Example of text retrieval execution plan

  1. The MATCH AGAINST expression contains filtering semantics, and the full-text index ft_idx1_test_tbl1 is scanned.

    EXPLAIN SELECT id, digest, detail
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase')
    AND id = 3;

    The return result is as follows:

    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | ============================================================================= |
    | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
    | ----------------------------------------------------------------------------- |
    | |0 |SORT | |1 |30 | |
    | |1 |└─TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|1 |30 | |
    | ============================================================================= |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 |
    | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]) |
    | 1 - output([test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest], [MATCH(test_tbl1.detail) AGAINST('oceanbase')]), filter([test_tbl1.id = 3]), rowset=16 |
    | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) |
    | is_index_back=true, is_global_index=false, filter_before_indexback[false], |
    | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), |
    | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')) |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    15 rows in set (0.008 sec)
  2. The MATCH AGAINST expression contains filtering semantics, and it hits the general secondary index idx_test_tbl1. The index idx_test_tbl1 has a lower cost than the full-text index ft_idx1_test_tbl1, so the index idx_test_tbl1 is scanned.

    EXPLAIN SELECT id
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase simple text retreival scan on fulltext index')
    AND id = 3;

    The return result is as follows:

    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | ====================================================================== |
    | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
    | ---------------------------------------------------------------------- |
    | |0 |SORT | |1 |103 | |
    | |1 |└─TABLE RANGE SCAN|test_tbl1(idx_test_tbl1)|1 |103 | |
    | ====================================================================== |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([test_tbl1.id]), filter(nil), rowset=16 |
    | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index'), DESC]) |
    | 1 - output([test_tbl1.id], [MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index')]), filter([MATCH(test_tbl1.detail) |
    | AGAINST('oceanbase simple text retreival scan on fulltext index')]), rowset=16 |
    | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) |
    | is_index_back=false, is_global_index=false, filter_before_indexback[false], |
    | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(3,MIN ; 3,MAX), |
    | range_cond([test_tbl1.id = 3]), has_functional_lookup=true |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    16 rows in set (0.006 sec)
  3. The MATCH AGAINST expression does not contain filtering semantics. Therefore, the result set cannot be filtered, and the full-text index cannot be scanned. The primary table is scanned.

    EXPLAIN SELECT id, digest, detail
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase')
    OR id = 3;

    The return result is as follows:

    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | ==================================================== |
    | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
    | ---------------------------------------------------- |
    | |0 |TABLE FULL SCAN|test_tbl1|5 |503 | |
    | ==================================================== |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 |
    | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) |
    | is_index_back=false, is_global_index=false, filter_before_indexback[false], |
    | range_key([test_tbl1.__pk_increment]), range(MIN ; MAX)always true, has_functional_lookup=true |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
    11 rows in set (0.005 sec)
  4. The MATCH AGAINST expression does not contain filtering semantics. Therefore, the result set cannot be filtered, and the full-text index cannot be scanned. The index idx_test_tbl1 is hit, and the index idx_test_tbl1 is scanned.

    EXPLAIN SELECT id
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase')
    OR id = 3;

    The return result is as follows:

    +------------------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +------------------------------------------------------------------------------------------------------------------------------+
    | =================================================================== |
    | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
    | ------------------------------------------------------------------- |
    | |0 |TABLE FULL SCAN|test_tbl1(idx_test_tbl1)|5 |503 | |
    | =================================================================== |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([test_tbl1.id]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 |
    | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) |
    | is_index_back=false, is_global_index=false, filter_before_indexback[false], |
    | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true, has_functional_lookup=true |
    +------------------------------------------------------------------------------------------------------------------------------+
    11 rows in set (0.005 sec)
  5. The MATCH AGAINST expression contains filtering semantics, and the corresponding full-text index ft_idx1_test_tbl1 is scanned. The LIMIT clause is pushed down to the index scan for top-k retrieval.

    EXPLAIN SELECT id, digest, detail
    FROM test_tbl1
    WHERE MATCH (detail) AGAINST ('oceanbase')
    LIMIT 3;

    The return result is as follows:

    +------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +------------------------------------------------------------------------------------------------------------------+
    | =========================================================================== |
    | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
    | --------------------------------------------------------------------------- |
    | |0 |TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|4 |31 | |
    | =========================================================================== |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 |
    | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) |
    | is_index_back=true, is_global_index=false, |
    | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), |
    | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')), |
    | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]), limit(3), offset(nil), with_ties(false) |
    +------------------------------------------------------------------------------------------------------------------+
    13 rows in set (0.004 sec)

References

Full-text query