Hi, I have a requirement to use a wide set of column input, currently in pandas data frame to be used as an input for reconciliation of metrics across huge set of tables. Ripping off the pandas logic and redoing in native python or METL components is going to be cumbersome.
hence, i am planning to have an orchestration job to call the Python/pandas program directly from METL server.
But, METL server freezes due to lack of memory, even when i use a sample set of tables.
has anyone using Pandas successfully? if so, is there any trick that i can use, rather than simply calling the Python program from a bash script component ?
Correct my if I am wrong but I think that won't work and is not intended to work
This is the disadvantage of an ELT Tool.
This concept leverages the advantage of doing all the actual compute intense work inside the database and not in the tool.
The main purpose of the Python component inside Matillion is to make smaller calls to APIs, work with variables and so on. This is not designed like e.g. Amazon Sagemaker to actually perform massiv transformations using this python component.
Thanks Nils. that is my thought as well. Matillion Python component should NOT be used as a "shell" to something completely offbase of ELT processing. But wanted to have a second opinion, as i had peer-pressure to use pandas with Matillion, as Matillion documentation says that required python packages can be installed in METL EC2 instance.
i believe it as a mis-interpretation of documentation. Thanks for your comment