After about a year of development on a project, we are getting ready to deploy a major project to production. From what we can tell it's going to be lot of manual setup in the production environment and extremely prone to all kinds possible pitfalls. This is a simplified version our setup:
The key thing to note here is that our Development Environment and Production Environment are 100% exclusive. They are not and cannot be connected in any fashion due to regulatory, security, and business requirements. The only connection we have between the 2 environments is GitHub. Our original thought when we first started on Matillion was that we would just use the Git integration and push our changes from our Dev Matillion instance to the master/production branch in GitHub and then fetch them on the production Matillion instance. After a lot of knowledge gained during the project, we now have a much better understanding of Matillion. With that knowledge came the realization that we are truly setup for failure when it comes down deploying this project and pretty much any other project to production. Our current Matillion instance is brand new and has nothing on it. So, we are starting from scratch on that production instance.
As we have learned, the idea of just fetching our project(s) from GitHub on the production instance doesn’t really get us anywhere. If we went the route of exporting the projects and importing them on the production side we are still in the same position. We won’t have schedules, password store, API’s, shared jobs, etc. All of these things are not included in an export of the project but are an absolute requirement for our projects to run. We have a serious concern around having to set several of these dependencies up manually. It just opens the door for endless possibilities of something going very wrong.
Sorry for the long winded explanation of how we got here. The question I have is, what are others doing for their deployments? I don’t suspect there is a magic solution but it just seems like it should be easier to do a deployment to another environment. Thanks to all who post their thoughts, processes, suggestions in advance!
The setup you got here is quite similiar we have now ended up with. I have a personal issue with the Matillion scheduler, due its using the quartz scheduling thing.. we had an issue where our production ec2 instance got hanged when doing snapshot.
So we managed to boot the ec2 back on the next morning, but due to quartz scheduling, all of the 'nightrun' jobs were enforced on reboot on us.
Well, we have 10+ sets of jobs executed at various times during the night.(database sources/file sources) So its quite hard to stop these tasks in time before they go too far to keep data consistency.
And sometimes, when fetching stuff to our production, the project breaks, the versionID gets in corrupt state or something and the Matillion GUI doesn't recognize the project anymore. I've learned that best way to recover is by using the Matillion API and the script built for migration (which also uses the api).
I have been thinking of replacing the scheduling of the Matillion tasks to some other place like Jenkins or something more CI/CD approach, which would also have this production deployment and perhaps test automation with robot framework involved.
I mean, I guess it's the only way to really pull off these deployments to build around all the building blocks the proper platform to support it. And Matillions API does well when it comes to these kinds of tasks that 'need to be done automatically'
Oh, noticed I started rambling now, so better to stop :)
This is a really good post, thank you for taking the time to write it and present it to the community. This has peaked some discussions here at Matillion.
We have found that users tend to opt for the migrate feature, there is a document in relation to this here:
We push job code (Orchestrations and Transformations) to Git. We do not push project level configuration (things like password, environment variables, 0auth profiles etc) therefore we have seen users have more success using the migrate feature.
Again @Bryan, this is a brilliant post and I look forward to seeing what other members of the community have to bring to the discussion.
Thank you for sharing your user case with us all. I will forward your feedback regarding the scheduler to the @MatillionProductTeam , particularly the information relating to your instance being "hanged".
Its really good to see you sharing your user story as this will not only encourage and help your fellow users but also helps us identify areas we can improve.
Thanks for sharing Michael, your post was all new and important information for us. I appreciate your insight on your process pitfalls you have experienced. We also use Jenkins for other data unrelated deployments within the company so that is of interest to us as well. We built our test automation using Snowflake and Matillion where most of the work is being done in Snowflake. It works quite well.
We are likely going to go down the route of using the Migration tool. For other users reading this thread. I would warn you that the UI Migration tool is all but useless for anything outside the project migrating projects until you get to more recent versions of Matillion. This is a little bit of a guess I believe most of migration abilities just started showing around version 1.46.
After a ton of testing and trial and error we have come to the conclusion that the Migration tool has too many pitfalls, unknowns, and throws way to many errors in places it should just work that we cannot see ourselves being able to use the Migration tool for purposes of deploying to production. Migrating schedules seems to be the worst out of all of them. If a schedule already exists on the target it fails 100% of the time. If it doesn't exists then it may or may not fail. The orchestrations fail 100% of the time if they already exist on the target and have a schedule attached to them. I am guessing that most of the issues in the Migration tool don't exist if you are migrating from one server to another where the target server is empty. If users are in fact using the Migration tool in a deployment capacity I would love to hear how they are getting around all the errors because we haven't figured it out.
For what it's worth, I believe most of the issues in the migration tool could be boiled down to order of operation and issues and Matillion API issues I have seen while using them. From everything I can tell when a users selects Schedules and some Orchestrations to Migrate it tries to migrate the Schedules first. If one of those schedules uses an orchestration that is not there yet, it blows up with an error. If the schedules and orchestrations are already on the target and one or more schedules are referencing an orchestration that is trying to be replaced, the API call fails. This is because the API should be removing the Schedule, replacing the orchestration, and then migrating the Schedule in that order. Due to the order of operation being off and the API not removing the schedule before trying to replace the orchestration it fails.
I hope this helps others. We are still searching for a solution to our situation. Thanks!
Hi Nils, we wrote a python script for deployments. Basically it exports the project to a file and allows you to bring other artifacts like API Profiles and shared jobs. Those items end up in their own export file. We then have another Python script that imports them to the production Matillion instance. The way we wrote the Python script is that it will only import the Project environment for Production and it will only import a specific version of the project. This keeps the production environment super clean. This also help us with getting around the old requirement of 15 or less environments on a single Matillion instance. That requirement has since gone away as long as you are using the Hub to manage your Matillion instances.
With all of that said, most of our schedules start at night. So, we don't battle issues where jobs are running during our deployments. I suspect we would run into the same issue as you have even with our deployment pattern.
Based on what you described, I think it's worth submitting an idea on this because this is going to become an issue with a lot of customers as time goes on. To me, if Matillion could build an option in their import that would check to see if any job in the project is running before importing. If a job was running then wait until it finishes, pause the schedules for those jobs in the project, do the import, then resume the schedules. This would make it easy on us customers because we wouldn't necessarily have to do anything extra other than maybe select an option for the method described above. Does this make sense to you? I would be interested in your thoughts.
Are you using the API or something else inside the Pyhton script to export/import the files?
Do you export/import everything in the environment or only the actual changes?
In the past most of our workload was done at night as well.
We are starting having more near realtime requirements or at least hourly loads to fulfill the business needs on some topics.
We also have some nice to have data loads during the day. So delivering data to other departments that are based on top of all the other loads we do.
In my opinion the main issue we face with our approach is that the git checkout will replace everything.
This leads to the issue you can not change objects that are not running.
So one solution might be to use something else for deployments and try to deploy only the changed objects.
But yes when an object we like to change is running this will be a problem as well.
One more sophisticated approach might be to cache the old jobs and to avoid the issue at all.
So the running jobs should not care if you do changed on the existing job.
I don't think Matillion will be able to build something like this with the current architecture.
So your solution might be a good workaround.
Hopefully the new SaaS solution will not face this kind of issues.
I would expect that this solution will have worker nodes that copy the job information from an central repository and the repository should not care if anything is running.
So maybe this will be fixed with a more horizontal scaleable architecture.
Hey all. Nice post on code migration between Environments. We almost ended up with the same model but however, the migration between environments using Git isnt that successful.
Export and Import has been the most successful approach for us so far , but it has got its own hiccups. Whenever we try to import an exported JSON, we get an error "Error parsing file". Not sure what exactly is causing the issue. Anyone had this issue before and tried to solve it ?
We are currently using Git as version control only to keep a track of changes. It would be best if there is some documentation around using Git for code migration. If there is already some, can someone please share the link?
We went from exporting to use git for deployment to production about 3 months ago, and are happy with that. We only do small, frequent deployments. When we recently upgraded our servers, we used the Matillion migrate function migrating everything but the projects. Then we added the the environment variables and passwords by hand (mainly copy/paste in text mode). Our passwords are handled by aws secret manager. The project we migrated using git.
btw. I have started another thread here about merging remote in the pull request process.
Hi Nils, sorry it took me so long to reply we have had some big deliverables to finish up along with holiday activities.
We are leveraging the API but putting some of our own spin on it. Basically we do a full export of the project and then strip the elements from the Json that we know we are not going to need in Production. For example we have a Dev and Test Matillion Environment in all of our projects. We strip out all environments that are not Prd. Once we are done, we write the results out as a Json file based on a path that we pass in as an argument when call the export script. Other arguments we pass in determine if we export shared jobs, api profiles, etc. Each of those items will be their own Json file. We then use those same files to import into the production instance. This definitely isn't a long term solution but we didn't care for the documented Git approach.
The biggest hurdle is knowing the dependency chain and understanding how to do a low level diff on objects, which is why Matillion's Git deployment replaces everything in the project. The other issue is that Matillion's Git approach only keeps track of projects. There are so many objects outside a project that Matillion should be keeping track of and committing to a repo like API and Extract Profiles, Shared Jobs, etc. If there are any changes to those objects in combination with other changes in a project the whole deployment by Git method falls apart. This means you are back to doing manual deployments to a certain extent anyway. Based on that, we made the decision to just handle the deployments on our own via Python across the board rather than trying to figure out what objects we need to manually import.
I think I just came up with a thought on an approach that would work for you and it's solid enough that I would think about implementing it on my end. It's basically a Blue Green deployment. In every project you have versions, with the default being called "default". What if you had a "Blue" version and a "Green" version? To start, you would point all your scheduled jobs to the "Blue" version. Your deployment would consist of replacing the project in the "Green" version which would work because nothing is running on that version. Once deployed, you would go into the Schedules and point the jobs to the "Green" version. With every deployment you are just toggling between versions. You could easily automate that with Python using Matillion API. All the functionality exists in the API, you just have to wire it up. Hopefully, that sparks an idea.
I'm with you though, I am hoping the SaaS solution is a reboot architecturally and these very common but critical activities work like they should.
Happy new year I hope you and your family had a great start into the new year.
Thanks for the detailed reply.
Hopefully Matillion is implementing an valid CICD approach for the "new" Saas Platform.
But I think with the new design a lot of things should be easier to implement so fingers crossed.
It seems like you put a lot of effort in your custom deployment solution. Sounds pretty advanced to me. I think we will stick with the GIT approach for now. But yeah missing a lot of other stuff besides jobs is annoying...
Actually we are not using versions at the moment. To be honest we never took a closer look into versions.
But I think your idea is great and might solve our issue. So thank you very much.
Like always implementing a new approach depends on the pain vs. affordable time to solve this issue. Maybe Matillion is faster with the Saas solution.
We will see but I will keep your solution in mind. So thanks again.
As a person, who is currently developing his custom scripts to automate the deployment process, this entire thread is a gold mine. Thanks @Bryan and @Nils for a great discussion and knowledge share.
I have seen the issue you are talking about when the source Matillion Instance (you exported from) is running a version lower or higher than the Matillion Instance version you are importing into. Where this hit us was that a dev that had admin access on the Matillion instance decided to run an in-place upgrade on one of the Matillion instances without letting everyone else know. We started having issues shortly after and it took us a while to figure out that the Matillion versions were different from Dev to Prod.
This is the best documentation available anymore: https://documentation.matillion.com/docs/2974180
This has some good info at the tail end of the document: https://documentation.matillion.com/docs/2978750
Long story short, it specifically calls out that nothing outside the project is in Git or source controlled. Meaning, environment variables, shared jobs, API Profiles, Extract Profiles, etc. are not in source control and have to be deployed manually. In my opinion, there is a huge gap here if you are looking to bundle a set of changes for a specific feature release that encompasses project changes as well as changes outside the project. For us, even small changes often lead to a combination of a orchestration and a shared job or API Profile change which would require us to manually deploy those changes anyway.
Thanks for the thoughtful response Lars! How do you handle deployments for API Profile and Shared Job changes since those are outside the Git repo and live at a higher level than the project you are deploying changes for?
FYI: If it helps, we found that running a Postgres database via RDS instead of using the MongoDB that comes on the Matillion instance has made upgrades a breeze. No more needing to run the Matillion migration, updating passwords, environment variables, etc. The only hiccup we have from time to time is the Git integrations within the projects get out of sorts. We have started going through the projects and ensuring they are all exported or committed to the remote repo before migration which makes bring back the changes easy.
Hi Brian, we haven't updated any api profiles nor shared jobs for a while, so thanks for reminding us of this issue to come! - and thanks for the Postgres tip!