Trying to validate source to stage (ETL) SQL Information schema. I did connect two servers/databases through pyodbc+pandas and assigned those Information Schemas as 'sourceDF' and 'StageDF'. The problem is 'COLUMN_NAME' is not identically matched in two dataframe.
Ex: 'Account_ID' and 'AccountId'
But I want to check 'IS_NULLABLE' with source and stage relevant to each row.
Source
Database: Source_Database / Server: Source_Server >>> sourceDF
<table><tbody><tr><th>|TABLE_SCHEMA|</th><th>|TABLE_NAME|</th><th>|COLUMN_NAME|</th><th>|IS_NULLABLE|</th></tr><tr><td>Stage</td><td>AccountDetails</td><td>Account_ID</td><td>Yes</td></tr><tr><td>Stage</td><td>AccountDetails</td><td>Engagement_ID</td><td>Yes</td></tr><tr><td>Stage</td><td>AccountDetails</td><td>Client_ID</td><td>NO</td></tr></tbody></table>
Database: Stage_Databse / Server: Stage_Server >>> stageDF
<table><tbody><tr><th>|TABLE_SCHEMA|</th><th>|TABLE_NAME|</th><th>|COLUMN_NAME|</th><th>|IS_NULLABLE|</th></tr><tr><td>Source</td><td>AccountDetails</td><td>AccountID</td><td>NO</td></tr><tr><td>Source</td><td>AccountDetails</td><td>EngagementID</td><td>NO</td></tr><tr><td>Source</td><td>AccountDetails</td><td>ClientID</td><td>NO</td></tr></tbody></table>
python code to connect two servers/Database:
import pyodbc
import pandas as pd
import time
from termcolor import colored, cprint
server1 = "serverSource.net"
database1 = "SourceDBneu01"
username1 = "SourceAdmin"
password1 = "Sourcepassword"
conn = pyodbc.connect(
"DRIVER={ODBC Driver};SERVER="
+ server1
+ ";DATABASE="
+ database1
+ ";UID="
+ username1
+ ";PWD="
+ password1
)
SQL_QuerySource = pd.read_sql_query("""SELECT [TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME ], [IS_NULLABLE]
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'AccountDetails'""", conn)
sourceDF = pd.DataFrame(SQL_QuerySource, columns=('TABLE_SCHEMA'
, 'TABLE_NAME'
, 'COLUMN_NAME'
, 'IS_NULLABLE''))
print(sourceDF)
server1 = "Stageserver1.net"
database1 = "StageDBneu01"
username1 = "StageAdmin"
password1 = "Stagepassword"
conn = pyodbc.connect(
"DRIVER={ODBC Driver};SERVER="
+ server1
+ ";DATABASE="
+ database1
+ ";UID="
+ username1
+ ";PWD="
+ password1
)
SQL_QueryStage = pd.read_sql_query("""SELECT [TABLE_SCHEMA], [TABLE_NAME], [COLUMN_NAME ], [IS_NULLABLE]
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'AccountDetails' AND TABLE_SCHEMA = 'Stage'""", conn)
stageDF = pd.DataFrame(SQL_QueryStage, columns=('TABLE_SCHEMA'
, 'TABLE_NAME'
, 'COLUMN_NAME'
, 'IS_NULLABLE''))
print(stageDF)
Note: Tying to prepare automated database validation code for the 'SQL infomation schema' which focusing source to Stage validation (ETL test validation). Stuck due to different column name
Aucun commentaire:
Enregistrer un commentaire