Hi guys, What tools in matillion can I use get a row count of all the tables in a snowflake schema without using the information schema

Hi guys,

What tools in matillion can I use get a row count of all the tables in a snowflake schema without using the information schema.

Hi @Suhayr​, the only way to get row counts would be to write a query against every table in a schema or use information_schema. Virtually any statement/operation in Matillion that has anything to do with Snowflake is nothing more than a SQL statement. Meaning, if you wrote the query or a component in Matillion was used, it's still just a query in Snowflake.

With that said, I would be curious as to why you wouldn't want to or can't use information_schema? The only other method would be much slower. You could run a query to get database, schema, and table names which then populate a grid variable using the Query to Grid component. You would then iterate over the grid variable and do a (select count(*) cnt from "DB.SCHEMA.TABLE_NAME_FROM_GRID_VARIABLE") you could then populate another column in the same grid variable with the count, save the results to a table, or many other storage methods depending on what you are trying to accomplish.

I hope this helps and thanks for posting!

Hi @Bryan, thanks for getting back to me really appreciate it. I'm still fairly new with matillion, how would i loop over and repopulate the same grid variable column? Would I need to use a grid iterator and python component ?

Also the reason I don't want to use the information schema is because I'm working on a data integrity project where I need to track number of rows being staged and merged into our database. This would be done every 30 minutes. There is a worry that the info schema may not always be up to date or accurate. So I want to get the count myself.

After thinking about this more, I am not sure the method I proposed is going to help. The problem is that in order to get a list of databases, schemas, and tables, you are going to have to query the information schema anyway. As a matter of fact any solution I can think of is going to use information_schema to some extent.

 

Either way if you really want to go down that path, the approach you use could be different depending on the version of Matillion you are using. As an example, in the newest versions of Matillion you can use the Table iterator and supply a query which will iterate over the results returned from the query. This is the easiest way to accomplish this.

 

As an example using Table Iterator:

 

The other way is to create a grid variable with a single text column and then use Query Result to Grid component to set the grid with the SQL for count statements. The SQL for this approach and the Table iterator would be something like this:

SELECT 'select count(*) cnt from ’ || TABLE_CATALOG || ‘.’ || TABLE_SCHEMA || ‘.’ || TABLE_NAME || ‘;’ as statement

FROM Your_Database.INFORMATION_SCHEMA.TABLES

WHERE TABLE_TYPE = ‘BASE TABLE’

You would then use a Grid Iterator to iterate over the results in the grid. Assuming you want to write the counts back to a table, you could attach a Database Query component to the iterator where you execute the SQL statement and write it to a table.

 

I hope this helps!