Problem: To support Turkish capital İ (İ with dot) is ı (ı without dot) characters while fetching the strings with those two characters from MSSQL database while doing case insensitive query. Problem comes because Turkish has got different meaning altogether to the i and I. Upper case of i is İ and upper case of ı is I. So, the normal upper functions doesnt work correctly. The correct collation should be used as explained below.Solution: Idea is to use TURKISH_CI_AI collation to get the expected results. Queries run on a standard MSSQL database are as below. What I have done is created a TEST table and inserted the different string with two english i/I and two turkish ı/İ using following commands and then ran few select queries to get the correct results.
create table dbo.TEST(row int, f1 nvarchar(10));
insert into dbo.TEST values(1,N'AIr');
insert into dbo.TEST values(2,N'aIr');
insert into dbo.TEST values(3,N'air');
insert into dbo.TEST values(4,N'Air');
insert into dbo.TEST values(5,N'Aİr');
insert into dbo.TEST values(6,N'aİr');
insert into dbo.TEST values(7,N'aır');
insert into dbo.TEST values(8,N'Aır');
select * from dbo.TEST where f1 like (N'aı%' collate Turkish_CI_AI)
1 AIr
2 aIr
7 aır
8 Aır
select * from dbo.TEST where f1 like (N'aI%' collate Turkish_CI_AI)
1 AIr
2 aIr
7 aır
8 Aır
select * from dbo.TEST where f1 like (N'ai%' collate Turkish_CI_AI)
3 air
4 Air
5 Aİr
6 aİr
select * from dbo.TEST where f1 like (N'aİ%' collate Turkish_CI_AI)
3 air
4 Air
5 Aİr
6 aİr
Please note that the character ‘N’ has to be used on the value side while doing a select so that we get correct results while doing query on (ı and İ). If ‘N’ is not appended, the results are not inline with expectation.
e.g.
select * from dbo.TEST where f1 like ('aı%' collate Turkish_CI_AI)
3 air
4 Air
5 Aİr
6 aİr
select * from dbo.TEST where f1 like ('aİ%' collate Turkish_CI_AI)
1 AIr
2 aIr
7 aır
8 Aır
Now, considering the case of a database which has entries containing all four I’s (two Turkish and two English)
select * from dbo.TEST where (upper(f1) like upper(N'ai%'))
1 AIr
2 aIr
3 air
4 Air
7 aır
8 Aır
‘aır’ and ‘Aır‘ in above results are not expected ones. To resolve the above issue SQL_Latin1_General_CP1_CI_AS collation need to be used even for non Turkish OMs when the database have all the four types of I’s.
e.g.
select * from dbo.TEST where (f1 like upper(N'ai%' collate SQL_Latin1_General_CP1_CI_AS))
1 AIr
2 aIr
3 air
4 Air
Alternate Solutions Considered and Dropped:There were other solutions as well which were proposed and thought not to be implemented for the reasons mentioned below each solution.
Solution 1 -
select * from dbo.TEST where (f1) like ('aı%' COLLATE SQL_Latin1_General_CP1_CI_AI)
1 AIr
2 aIr
3 air
4 Air
5 Aİr
6 aİr
7 aır
8 Aır
Above queries are returning all the four db values. This solution is not good if customer has millions of records with all types of I(Turkish and English). This will give lot of unwanted results.
Solution 2 – select * from dbo.TEST where (f1 like UPPER(N'aİ%')) OR (UPPER(f1) like UPPER(N'aı%'))
1 AIr
2 aIr
3 air
4 Air
5 Aİr
7 aır
8 Aır
select * from dbo.TEST where (UPPER(f1) like UPPER(N'ai%')) OR
(UPPER(f1) like UPPER(N'aı%'))OR (UPPER(f1) like UPPER(N'aI%'))
1 AIr
2 aIr
3 air
4 Air
7 aır
8 Aır
UPPER() does support Unicode but it doesn’t work for Tukish I’s because the sorting is done as per the English (i and I). So, the results are returned as per English. e.g. if ‘i’ is used in the query, it gets converted to English upper case I (upper case I without dot) and NOT into Turkish upper case İ with dot. So the results are as per ‘I’ and NOT ‘İ’. For more information, visit http://msdn2.microsoft.com/en-us/library/aa214408(SQL.80).aspx.
Solution 3 –
C++ approach: use IBM ICU library, a generic C++ internationalization library which we already license and ship that can be used by OM to do upper function on the query string. This solution is not feasible because problem is not only on client side but also on server side. So fixing it on client side might still mean the characters get converted at the db side.
I am still trying to find a solution for Oracle DB. Once I get it I'll update it here.
Write to me at anurag.gupta@marketfry.com if you have any ques or feedback.