I could use some remedial help on a query that compares rows in the same table. I'm revamping our code for pushing to Postgres, and have set up a test bed to push using our old and new code. The row counts are fine, but that doesn't tell me if the data is the same. For that, I realized I can use built-in features to get a hash of the row. Here's one of my simple tables as a start:
CREATE TABLE IF NOT EXISTS data.hsys (
"id" uuid NOT NULL DEFAULT NULL,
"marked_for_deletion" boolean NOT NULL DEFAULT false,
"name_" citext NOT NULL DEFAULT NULL,
CONSTRAINT hsys_id_pkey
PRIMARY KEY ("id")
);
The hash then just takes this line:
select hashtext(hsys::text) from hsys;
What I want to do is store the ID, hash, code version and table name for each row after each test push into a small table:
CREATE TABLE IF NOT EXISTS data.row_check (
id uuid NOT NULL DEFAULT NULL,
version int8 NOT NULL DEFAULT NULL,
row_hash int8 NOT NULL DEFAULT NULL,
table_name text NOT NULL DEFAULT NULL,
CONSTRAINT row_check_pkey
PRIMARY KEY (id, version)
);
Inserting the data into row_check isn't hard. The proof-of-concept query looks like this:
select
id,
0 as version,
hashtext(hsys::text) as row_hash,
'hsys' as table_name,
from hsys;
A working insert query looks like this:
INSERT INTO row_check (id,version,row_hash,table_name)
SELECT id, 0, hashtext(hsys::text),'hsys'
FROM hsys
ON CONFLICT ON CONSTRAINT row_check_pkey DO UPDATE SET
row_hash = EXCLUDED.row_hash,
table_name = EXCLUDED.table_name;
Once the data is in place, I can then see it like this:
select * from row_check
order by 1,2
limit 6;
id version row_hash table_name
17ea1ed4-87b0-0840-912f-d29de2a06f5d 0 -1853961325 hsys
17ea1ed4-87b0-0840-912f-d29de2a06f5d 1 -1853961325 hsys
2200d1da-73e7-419c-9e4c-efe020834e6f 0 -482794730 hsys
2200d1da-73e7-419c-9e4c-efe020834e6f 1 482794730 hsys <--- Different from version 0
47f4a50e-2358-434b-b30d-1f707ea9ee1b 0 -1539190835 hsys
47f4a50e-2358-434b-b30d-1f707ea9ee1b 1 -1539190835 hsys
What I'd ideally like from a sample like that is:
table_name id v0 v1
hsys 2200d1da-73e7-419c-9e4c-efe020834e6f 0 -482794730 482794730
But even this minimal result would be helpful:
2200d1da-73e7-419c-9e4c-efe020834e6f hsys
And this is where I'm stumped. What I would like to build is a query on row_check that spots any IDs where the hash is different between versions. I've got versions 0 and 1 above. Can someone point me in the right direction for grouping and/or joining to get only the rows that don't match across versions? Those are my red flags that I need to spot and track down. I really need the ID and table name back, the version and hash are of secondary importance. I've got a few constraints going into this, some of which help:
-
The ID values are unique across all tables.
-
I'll only compare two versions at a time.
-
I have dozens of tables to test.
-
Some of the tables have millions of rows.
That last point might matter. I though of using SELECT DISTINCT id
in a CTE, but I didn't get far with that.
Thanks for advice.
Aucun commentaire:
Enregistrer un commentaire