MongoDB Query guesses the wrong schema for collection

I'm trying to load data from MongoDB to Redshift. As I know, it will try to guess the schema by reading the first 100 documents returned from Mongo.

However, there're new columns added to the Mongo collection, which means that MongoDB Query can't find the new columns and it says: Invalid column name for the table

There's a am article about fixed schema for Mongo collection, but it hasn't been updated for the new Matillion version

Do you have any ideas on how to solve this one?

Thank you

Hi, Did you tried the connection option "RowScanDepth"? You can set the scan limit here in the MongoDB component.

Thank you, but unluckily it doesn't work (it still can't recognize the new column)

I'm currently stuck with the same issue. Any workarounds for this?

Hi all,

The RowScanDepth option that @ganesh_pitchai​ mentioned is good, but it doesn't work beyond an undocumented limit (which seems to be fairly low).

But I can confirm that the Fixed Schema Definitions workaround definitely does work in the current latest version of Matillion ETL, v1.54. Although agreed with @anhkhoadx​'s comment that the instructions are now out of date.

Now you have to go to Project / Manage API Profiles / Manage Query Profiles

.. and to get to the table definition you have to toggle the Advanced Mode button.

Here's an updated screenshot. I copied line 7 and pasted a new line 8 containing a new column that it did not autodetect. The "rarecolumn" does not occur in the first 1 million documents of the collection.

Note you also have to set the other:bsonpath = "$.rarecolumn" at the end of that line.

After running a MongoDB component once where you have set Location and GenerateSchemaFiles, you must create a second MongoDB component containing only the same Location parameter.

After that the component should show your manually added column even in Basic mode, for example like this:

A couple of last gotchas:

  • You have to restart Matillion after any change to the RSD files
  • The QueryPassthrough option looks promising, but itself suffers from the same problem: it doesn't list columns that it doesn't find in the first handful of documents.

Hope that's helpful.

Ian

Thanks for sharing Ian. I will give this a shot.

 

Are there plans to fix this, where we don't have to do this?

 

-Sarin

Hi @Sarin​,

All new features and improvements like this come via the Ideas Portal. Currently there's nothing there about making the MongoDB interface simpler, so I would ask please add a new Idea there. To our other community members... please upvote if it's relevant to your use case.

Ian