Edit

Share via


Contained database collations

Applies to: SQL Server Azure SQL Managed Instance

Various properties affect the sort order and equality semantics of textual data, including case sensitivity, accent sensitivity, and the base language being used. These qualities are expressed to SQL Server through the choice of collation for the data. For a more in-depth discussion of collations themselves, see Collation and Unicode support.

Collations apply not only to data stored in user tables, but to all text handled by SQL Server, including metadata, temporary objects, variable names, etc. The handling of these differs in contained and non-contained databases. This change doesn't affect many users, but helps provide instance independence and uniformity. But this might also cause some confusion, and problems for sessions that access both contained and non-contained databases.

The collation behavior of contained databases differs subtly from the behavior in non-contained databases. This behavior is generally beneficial, providing instance-independence and simplicity. Some users might have issues, particularly when a session accesses both contained and non-contained databases.

This article clarifies the content of the change, and examines areas where the change might cause problems.

Note

For Azure SQL Database, collations for contained databases are different. The database collation and catalog collation can be set on database creation and can't be updated. Specify a collation for data (COLLATE) and a catalog collation for system metadata and object identifiers (CATALOG_COLLATION). For more information, see CREATE DATABASE.

Non-contained databases

All databases have a default collation (which can be set when creating or altering a database). This collation is used for all metadata in the database, and the default for all string columns within the database. Users can choose a different collation for any particular column by using the COLLATE clause.

Example 1

For example, if we were working in Beijing, we might use a Chinese collation:

ALTER DATABASE MyDB
    COLLATE Chinese_Simplified_Pinyin_100_CI_AS;

Now if we create a column, its default collation is this Chinese collation, but we can choose another one if we want:

CREATE TABLE MyTable
(
    mycolumn1 NVARCHAR,
    mycolumn2 NVARCHAR COLLATE Frisian_100_CS_AS
);
GO

SELECT name, collation_name
FROM sys.columns
WHERE name LIKE 'mycolumn%';
GO

Here's the result set.

name            collation_name
--------------- ----------------------------------
mycolumn1       Chinese_Simplified_Pinyin_100_CI_AS
mycolumn2       Frisian_100_CS_AS

This appears relatively simple, but several problems arise. Because the collation for a column is dependent on the database in which the table is created, problems arise with the use of temporary tables that are stored in tempdb. The collation of tempdb usually matches the collation for the instance, which doesn't have to match the database collation.

Example 2

For example, consider the (Chinese) database shown previously, when used on an instance with a Latin1_General collation:

CREATE TABLE T1 (T1_txt NVARCHAR (MAX));
GO

CREATE TABLE #T2 (T2_txt NVARCHAR (MAX));
GO

At first glance, these two tables look like they have the same schema, but since the collations of the databases differ, the values are incompatible:

SELECT T1_txt, T2_txt
FROM T1
     INNER JOIN #T2
         ON T1.T1_txt = #T2.T2_txt;

Here's the result set.

Msg 468, Level 16, State 9, Line 2

Can't resolve the collation conflict between "Latin1_General_100_CI_AS_KS_WS_SC" and "Chinese_Simplified_Pinyin_100_CI_AS" in the equal to operation.

We can fix this by explicitly collating the temporary table. SQL Server makes this easier by providing the DATABASE_DEFAULT keyword for the COLLATE clause.

CREATE TABLE T1 (T1_txt NVARCHAR (MAX));
GO

CREATE TABLE #T2 (T2_txt NVARCHAR (MAX) COLLATE DATABASE_DEFAULT);
GO

SELECT T1_txt, T2_txt
FROM T1
     INNER JOIN #T2
         ON T1.T1_txt = #T2.T2_txt;

This query now runs without error.

We can also see collation-dependent behavior with variables. Consider the following function:

CREATE FUNCTION f (@x INT)
RETURNS INT
AS
BEGIN
    DECLARE @I AS INT = 1;
    DECLARE @İ AS INT = 2;
    RETURN @x * @i;
END

This is a rather peculiar function. In a case-sensitive collation, the @i in the return clause can't bind to either @I or . In a case-insensitive Latin1_General collation, @i binds to @I, and the function returns 1. But in a case-insensitive Turkish collation, @i binds to , and the function returns 2. This can wreak havoc on a database that moves between instances with different collations.

Contained databases

Since a design objective of contained databases is to make them self-contained, the dependence on the instance and tempdb collations must be severed. To do this, contained databases introduce the concept of the catalog collation. The catalog collation is used for system metadata and transient objects. Details are provided as follows.

In a contained database, the catalog collation is Latin1_General_100_CI_AS_WS_KS_SC. This collation is the same for all contained databases on all instances of SQL Server and can't be changed.

The database collation is retained, but is only used as the default collation for user data. By default, the database collation is equal to the model database collation, but can be changed by the user through a CREATE or ALTER DATABASE command as with non-contained databases.

A new keyword, CATALOG_DEFAULT, is available in the COLLATE clause. This is used as a shortcut to the current collation of metadata in both contained and non-contained databases. That is, in a non-contained database, CATALOG_DEFAULT returns the current database collation, since metadata is collated in the database collation. In a contained database, these two values might be different, since the user can change the database collation so that it doesn't match the catalog collation.

The behavior of various objects in both non-contained and contained databases is summarized in this table:

Item Non-contained database Contained database
User data (default) DATABASE_DEFAULT DATABASE_DEFAULT
Temp data (default) tempdb collation DATABASE_DEFAULT
Metadata DATABASE_DEFAULT / CATALOG_DEFAULT CATALOG_DEFAULT
Temporary metadata tempdb collation CATALOG_DEFAULT
Variables Instance collation CATALOG_DEFAULT
Goto labels Instance collation CATALOG_DEFAULT
Cursor names Instance collation CATALOG_DEFAULT

In the temp table example previously described, we can see that this collation behavior eliminates the need for an explicit COLLATE clause in most temp table uses. In a contained database, this code now runs without error, even if the database and instance collations differ:

CREATE TABLE T1 (T1_txt NVARCHAR (MAX));
GO

CREATE TABLE #T2 (T2_txt NVARCHAR (MAX));
GO

SELECT T1_txt, T2_txt
FROM T1
     INNER JOIN #T2
         ON T1.T1_txt = #T2.T2_txt;

This query works because both T1_txt and T2_txt are collated in the database collation of the contained database.

Cross between contained and uncontained contexts

As long as a session in a contained database remains contained, it must remain within the database to which it connected. In this case, the behavior is straightforward. But if a session crosses between contained and non-contained contexts, the behavior becomes more complex, since the two sets of rules must be bridged. This can happen in a partially contained database, since a user might USE to another database. In this case, the difference in collation rules is handled by the following principle.

  • The collation behavior for a batch is determined by the database in which the batch begins.

This decision is made before any commands are issued, including an initial USE. That is, if a batch begins in a contained database, but the first command is USE to a non-contained database, the contained collation behavior is still used for the batch. Given this scenario, a reference to a variable, for example, might have multiple possible outcomes:

  • The reference might find exactly one match. In this case, the reference works without error.

  • The reference might not find a match in the current collation where there was one before. This raises an error indicating that the variable doesn't exist, even though it was apparently created.

  • The reference might find multiple matches that were originally distinct. This also raises an error.

We illustrate this with a few examples. For these, we assume there's a partially contained database named MyCDB with its database collation set to the default collation, Latin1_General_100_CI_AS_WS_KS_SC. We assume that the instance collation is Latin1_General_100_CS_AS_WS_KS_SC. The two collations differ only in case sensitivity.

Example 1

The following example illustrates the case where the reference finds exactly one match.

USE MyCDB;
GO

CREATE TABLE #a (x INT);

INSERT INTO #a VALUES (1);
GO

USE master;
GO

SELECT * FROM #a;
GO

Results:

Here's the result set.

x
-----------
1

In this case, the identified #a binds in both the case-insensitive catalog collation and the case-sensitive instance collation, and the code works.

Example 2

The following example illustrates the case where the reference doesn't find a match in the current collation where there was one before.

USE MyCDB;
GO

CREATE TABLE #a (x INT);

INSERT INTO #A VALUES (1);
GO

Here, the #A binds to #a in the case-insensitive default collation, and the insert works,

Here's the result set.

(1 row(s) affected)

But if we continue the script...

USE master;
GO

SELECT * FROM #A;
GO

We get an error trying to bind to #A in the case-sensitive instance collation;

Here's the result set.

Msg 208, Level 16, State 0, Line 2
Invalid object name '#A'.

Example 3

The following example illustrates the case where the reference finds multiple matches that were originally distinct. First, we start in tempdb (which has the same case-sensitive collation as our instance) and execute the following statements.

USE tempdb;
GO

CREATE TABLE #a (x INT);
GO

CREATE TABLE #A (x INT);
GO

INSERT INTO #a VALUES (1);
GO

INSERT INTO #A VALUES (2);
GO

This query succeeds, since the tables are distinct in this collation:

Here's the result set.

(1 row(s) affected)
(1 row(s) affected)

If we move into our contained database, however, we find that we can no longer bind to these tables.

USE MyCDB;
GO

SELECT * FROM #a;
GO

Here's the result set.

Msg 12800, Level 16, State 1, Line 2
The reference to temp table name #a is ambiguous and cannot be resolved. Possible candidates are #a and #A.