Change Data Capture (CDC): What it is and How it Works

 

What is Change Data Capture?

Change Data Capture is a software process that identifies and tracks changes to data in a database. CDC provides real-time or near-real-time movement of data by moving and processing data continuously as new database events occur.

 

In high-velocity data environments where time-sensitive decisions are made, Change Data Capture is an excellent fit to achieve low-latency, reliable, and scalable data replication. Change Data Capture is also ideal for zero-downtime migrations to the cloud.

With over 80% of companies planning on implementing multi-cloud strategies by 2025, picking the right change data capture method for your business is more critical than ever given the need to replicate data across multiple environments.

 

Built-in SQL Server CDC

SQL Server offers a built-in CDC functionality, as depicted in the diagram below. 

The source table contains the data that’s being inserted, updated, and deleted by applications. The SQL Server transaction log keeps a record of the changes. If CDC is enabled, a SQL Server agent reads inserts, updates, and deletes from the transaction log. These changes are added to a separate change table, where query functions are used to capture the changes so they can be delivered to a target data warehouse or other destination via a data integration process like ETL.

What are the benefits and drawbacks of the built-in SQL Server Change Data Capture feature?

Pros:

  • Built-in feature, no external tools needed
  • Uses familiar T-SQL language

Cons:

  • Adding a change table adds overhead to the source database
  • Querying change tables adds latency
  • Change tables need to be cleaned up on a regular basis
  • Since a change table is created for each CDC-enabled table, it can be challenging to consolidate changes from multiple tables (and databases)

 

Note: See Monitoring changes in SQL Server using change data capture for more information about CDC.

 

The additional columns include

  • __$start_lsn and __$end_lsn that show the commit log sequence number (LSN) assigned by the SQL Server Engine to the recorded change
  • __$seqval that shows the order of that change related to other changes in the same transaction, 
  • __$operation that shows the operation type of the change, where 1 = delete, 2 = insert, 3 = update (before change), and 4 = update (after change)
  • __$update_mask that is a bit mask defined for each captured column, identifying the updating columns 

This detailed information makes it easier to monitor the database changes for security or auditing purposes, or incrementally load these changes from the OLTP source to the target OLAP data warehouse, using T-SQL or ETL methods. 

 

How do you configure Change Data Capture in SQL Server?

 

CDC must be enabled both at the database and table level in SQL Server. Database-level CDC functionality is enabled by setting is_cdc_enabled to 1 in the sys.databases table.

Invoke the following commands to check the current state of database-level CDC functionality:

SELECT
database_id,
name,
is_cdc_enabled
FROM sys.databases
GO

 

Invoke the following commands to enable database-level CDC. These commands use SQL Server’s built-in sys.sp_cdc_enabled_dbd stored procedure to perform the necessary updates:

IF NOT EXISTS (SELECT 1 FROM sys.databases WHERE name = 'CDCDB')

 BEGIN

 USE master

 CREATE DATABASE [CDCDB];

 PRINT 'db Created...'

 END

 

 USE [CDCDB]

 GO

 

 IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'User')

 BEGIN

 PRINT 'CREATING USER TABLE...'

 CREATE TABLE [USER](

 [UserId] INT IDENTITY(1,1) NOT NULL,

 [FirstName] NVARCHAR(50) NOT NULL,

 [LastName] NVARCHAR(50) NOT NULL)

 END

 

USE [CDCDB]

GO

 EXEC sys.sp_cdc_enable_db

 

Once CDC is enabled on the table, a number of system tables will be created under the CDC schema of the database to store the CDC related information. These tables include the following

  • CDC.captured_columns table that contains the list of captured column
  • CDC.change_tables table that contains the list of tables that are enabled for capture
  • CDC.ddl_history table that records the history of all the DDL changes since capture data enabled
  • CDC.index_columns table that contains all the indexes that are associated with change table
  • CDC.lsn_time_mapping table that is used to map the LSN number with the time and finally one change table for each CDC enabled table that is used to capture the DML changes on the source table, as shown below:

 



… and the SQL Agent jobs associated to CDC enabled table, the capture and cleanup jobs, will be created like below:

 



 Set up Table-Level CDC

In addition to setting up database-level CDC, you must also enable CDC on a per-table basis.

This section describes how to set up and verify table-level CDC on the CDCDB database’s USER table.

Invoke the following commands to set up CDC on the USER table. Note the source_schema parameter must be assigned the name of the schema to which the table (USER) belongs, in this case dbo:

USE CDCDB
GO

 EXEC sys.sp_cdc_enable_table

 @source_schema = N'dbo',

 @source_name = N'User',

 @role_name = NULL; 

 

Verify that CDC is enabled for the USER table by checking that is_tracked_by_cdc is set to 1 for the database in sys.tables.

Invoke the following command to return all tables listed in sys.tables:

USE CDCDB
GO
SELECT
object_id,
SCHEMA_NAME(Schema_id) As [Schema Name],
name As [Table Name],
is_tracked_by_cdc
FROM sys.tables
GO

 

This process results in the creation of a system table prefixed with cdc. called cdc.Dbo_user_CT that can be seen in Object Explorer and further verified by querying the table.

Invoke the following command to verify that cdc.Dbo_user_CT can be queried and that a row is returned:

USE CDCDB
GO
SELECT * from cdc.Dbo_user_CT
GO

 

Disabling CDC

 The Change Data Capture can be easily disabled on a specific table using the sys.sp_cdc_disable_table system stored procedure, as shown below:

 



… or disabled completely at the database level, without the need to disable it on CDC enabled tables one by one, using the sys.sp_cdc_disable_db system stored procedure, as shown below:

 



 

 

Useful queries to work with the CDC.

 

1.To review the status of the Log of the databases with the CDC activated.

select name, log_reuse_wait_desc, is_cdc_enabled from sys.databases

 2. On the databases that the CDC has active, we can consult which tables are included in the CDC.

select name,type,type_desc,is_tracked_by_cdc from sys.tables where is_tracked_by_cdc = 1

 3.To review the CDC configuration status.

 EXECUTE sys.sp_cdc_help_jobs

 




4. Change CDC settings.

EXECUTE sys.sp_cdc_change_job
@job_type = N'cleanup',
@retention = 30;

 

5. To review the operation of our CDC by database and the errors we can review the following system views.

 

select * from sys.dm_cdc_log_scan_sessions

 



  • Here we will see all the sessions that have occurred, and in the first row the aggregate since the CDC was started.

 select * from sys.dm_cdc_errors

 

  • Disable CDC completely.

Use 'Database_name'
GO
EXEC sys.sp_cdc_disable_db
GO

 6. CDC.fn_cdc_get_all_changes function can be queried by providing the @from_lsn@to_lsn@row_filter_option parameters, that will retrieve all the DML changes information, as shown below: 



 Thanks!!

Subbu CN


Azure SQL Elastic Pools: a way for saving costs

  Azure SQL Elastic Pools: a way for saving costs  Elastic pools enable you to purchase resources for a pool shared by multiple databases to...