MongoDB Query guesses the wrong schema for collection

anhkhoadx · April 23, 2021, 4:53am

I'm trying to load data from MongoDB to Redshift. As I know, it will try to guess the schema by reading the first 100 documents returned from Mongo.

However, there're new columns added to the Mongo collection, which means that MongoDB Query can't find the new columns and it says: Invalid column name for the table

There's a am article about fixed schema for Mongo collection, but it hasn't been updated for the new Matillion version

Do you have any ideas on how to solve this one?

Thank you

ganesh_pitchai · April 23, 2021, 3:53pm

Hi, Did you tried the connection option "RowScanDepth"? You can set the scan limit here in the MongoDB component.

anhkhoadx · April 27, 2021, 8:30am

Thank you, but unluckily it doesn't work (it still can't recognize the new column)

Sarin · July 8, 2021, 12:42pm

I'm currently stuck with the same issue. Any workarounds for this?

ian.funnell · July 8, 2021, 3:54pm

Hi all,

The RowScanDepth option that @ganesh_pitchai mentioned is good, but it doesn't work beyond an undocumented limit (which seems to be fairly low).

But I can confirm that the Fixed Schema Definitions workaround definitely does work in the current latest version of Matillion ETL, v1.54. Although agreed with @anhkhoadx's comment that the instructions are now out of date.

Now you have to go to Project / Manage API Profiles / Manage Query Profiles

.. and to get to the table definition you have to toggle the Advanced Mode button.

Here's an updated screenshot. I copied line 7 and pasted a new line 8 containing a new column that it did not autodetect. The "rarecolumn" does not occur in the first 1 million documents of the collection.

Note you also have to set the other:bsonpath = "$.rarecolumn" at the end of that line.

After running a MongoDB component once where you have set Location and GenerateSchemaFiles, you must create a second MongoDB component containing only the same Location parameter.

After that the component should show your manually added column even in Basic mode, for example like this:

A couple of last gotchas:

You have to restart Matillion after any change to the RSD files
The QueryPassthrough option looks promising, but itself suffers from the same problem: it doesn't list columns that it doesn't find in the first handful of documents.

Hope that's helpful.

Ian

Sarin · July 8, 2021, 4:20pm

Thanks for sharing Ian. I will give this a shot.

Are there plans to fix this, where we don't have to do this?

-Sarin

ian.funnell · July 14, 2021, 6:30am

Hi @Sarin,

All new features and improvements like this come via the Ideas Portal. Currently there's nothing there about making the MongoDB interface simpler, so I would ask please add a new Idea there. To our other community members... please upvote if it's relevant to your use case.

Ian

Topic		Replies	Views
How to specify a filter parameter when using a custom RSD schema for MongoDB Matillion ETL	1	0	September 11, 2023
Is anyone facing this Issue? Matillion ETL	6	23	March 12, 2025
Managing change in the column definition from source DB Matillion ETL	1	0	September 7, 2023
Incremental load fails for a table when a new column is added in the source table. What can be done to solve this issue? Matillion ETL	6	0	December 20, 2023
Is anyone else facing client driver issues when using the MongoDB connector? Matillion ETL	3	0	April 19, 2023

MongoDB Query guesses the wrong schema for collection

Related topics