Using FULLTEXT Indexes in MySQL

Programming for Search Engines 101. An area for avid PHP and .NET developers to chat about Programming techniques and how to make better use of search engines.

Moderator: Moderators

Using FULLTEXT Indexes in MySQL

Postby jay » Tue May 06, 2008 6:19 am

This post is made in response to the query made by our programmer Ashik.

One of the more useful MySQL features is the ability to search for text using a FULLTEXT index. Currently this is only available if you use the MyISAM table type (which is the default table type, so if you don't know what table type you're using, it'll most likely be MyISAM). A fulltext index can be created for a TEXT, CHAR or VARCHAR type field, or combination of fields.

DO IT YOURSELF.

1. Create a table
CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy)) TYPE=MyISAM;

2.Insert values in to table
INSERT INTO fulltext_sample VALUES
('It appears good from here'),
('The here and the past'),
('Why are we hear'),
('An all-out alert'),
('All you need is love'),
('A good alert');


3.Searching for text using FULLTEXT

The syntax of a FULLTEXT search is simple. You MATCH the field AGAINST the text you are searching for, for example:

SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('alert');


Searches on a FULLTEXT index are performed case-insensitively (as are searches on TEXT and non-binary VARCHAR fields generally). So the following works as well:

SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('ALERT');

You can also search for more than one word at a time, by separating the words with commas.

SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');


FULLTEXT indexes are most often used to search natural language text, such as through newspaper articles, web page contents and so on. For this reason MySQL has added a number of features to assist this kind of searching. MySQL does not index any words less than or equal to 3 characters in length, nor does it index any words that appear in more than 50% of the rows. This means that if your table contains 2 or less rows, a search on a FULLTEXT index will never return anything. In future, MySQL will make this behavior more flexible, but for now it should suit most natural language uses. If most fields in your database contain the word 'music', you probably don't want these records returned.

A list of the main features of a standard FULLTEXT search follows:

* Excludes partial words
* Excludes words less than 4 characters in length (3 or less)
* Excludes words that appear in more than half the rows (meaning at least 3 rows are required)
* Hyphenated words are treated as two words
* Rows are returned in order of relevance, descending
* Words in the stopword list (common words) are also excluded from the search results. The stopword list is based upon common English words, so if your data is used for a different purpose, you'll probably want to change the list. Unfortunately, doing so at present is not easy. You'll need to edit the file myisam/ft_static.c. recompile MySQL, and rebuild the indexes! To save you hunting through the source, or if you have a binary version of MySQL, here is a list of stopwords. Note that these can and do change with different versions. To be absolutely sure, you'll have to check the specific list for your version.

Stopwords

"a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually", "after", "afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully", "b", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", "can't", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldn't", "course", "currently", "d", "definitely", "described", "despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth", "four", "from", "further", "furthermore", "g", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "he's", "hello", "help", "hence", "her", "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept", "know", "knows", "known", "l", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "p", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "s", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t", "t's", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value", "various", "very", "via", "viz", "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", "what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "z", "zero".

Thanks Ashik, good information.
Jay M
Write Less, Do More
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Return to Programming

Who is online

Users browsing this forum: No registered users and 6 guests