Monday, July 30, 2012

Noise/Stop Words in SQL Server


Understanding Stop/Noise Words in Full Text Search for SQL Server

                A stopword can be a word with meaning in a specific language, or it can be a token that does not have linguistic meaning. Those words are left out of the full-text index since they are known to be useless to a search. Stopwords are managed in databases using objects called stoplists. A stoplist is a list of stopwords that, when associated with a full-text index, is applied to full-text queries on that index. Noise word in SQL server 2005 is known as Stopword in SQL Server 2008.
                You can use the below query to find the system specified stop words in English Language (language id =1033)
            SELECT ssw.stopword, slg.name
      FROM sys.fulltext_system_stopwords ssw
      JOIN sys.fulltext_languages slg
      ON slg.lcid = ssw.language_id
      WHERE slg.lcid =1033
If you want to know all the system specified stopwords in all languages please remove the where clause in above query.
                If you have configured the stopwords in your application then you can view the list of stop words in the list using the below query. [In English Language]

      SELECT *
FROM sys.fulltext_stopwords
WHERE language_id = 1033

So, if you include these words in full text search (for exact match, match giving in double quotes) it won’t give you the exact result.
Eg: Consider there are words such as test, test 1, test 2, testing, test for computers etc in search_content column. Please find the below scenarios for exact matches.
Query 1:  SELECT *
 FROM tbl_search_index
 WHERE CONTAINS(search_content,’”test”’)
Result 1: test, test 1, test 2, test for computers
Query 2: SELECT *
 FROM tbl_search_index
 WHERE CONTAINS(search_content,’”testing”’)
Result 2: testing
Query 3: SELECT *
 FROM tbl_search_index
 WHERE CONTAINS(search_content,’”test 1”’)
Result 3: test, test 1, test 2, test for computers
Query 4: SELECT *
 FROM tbl_search_index
 WHERE CONTAINS(search_content,’” test for computers”’)
Result 4: test for computers

Please find below a list of system specified stopwords in sql server

stopword
name
$
English
0
English
1
English
2
English
3
English
4
English
5
English
6
English
7
English
8
English
9
English
A
English
B
English
C
English
D
English
E
English
F
English
G
English
H
English
I
English
J
English
K
English
L
English
M
English
N
English
O
English
P
English
Q
English
R
English
S
English
T
English
U
English
V
English
W
English
X
English
Y
English
Z
English
about
English
after
English
all
English
also
English
an
English
and
English
another
English
any
English
are
English
as
English
at
English
be
English
because
English
been
English
before
English
being
English
between
English
both
English
but
English
by
English
came
English
can
English
come
English
could
English
did
English
do
English
does
English
each
English
else
English
for
English
from
English
get
English
got
English
had
English
has
English
have
English
he
English
her
English
here
English
him
English
himself
English
his
English
how
English
if
English
in
English
into
English
is
English
it
English
its
English
just
English
like
English
make
English
many
English
me
English
might
English
more
English
most
English
much
English
must
English
my
English
never
English
no
English
now
English
of
English
on
English
only
English
or
English
other
English
our
English
out
English
over
English
re
English
said
English
same
English
see
English
should
English
since
English
so
English
some
English
still
English
such
English
take
English
than
English
that
English
the
English
their
English
them
English
then
English
there
English
these
English
they
English
this
English
those
English
through
English
to
English
too
English
under
English
up
English
use
English
very
English
want
English
was
English
way
English
we
English
well
English
were
English
what
English
when
English
where
English
which
English
while
English
who
English
will
English
with
English
would
English
you
English
your
English