How to load data from GitHub Raw?

Hi all,

I try to load json files that are in a private GitHub Repo.

The issue is that the API Query component can't handle it, as it is "text/plain" (even tough it's perfectly fine json)

Alternative would be to use the Data Transfer component, but that can't handle the Bearer Token Auth, which is needed for the private Repo.

Do I miss something or you guys have other ideas how to load these files?

Hi @Manuel​ , just to make sure I am understand what you are attending to retrieve... Are you retrieving files from a repo in GitHub or pulling some other GitHub information?

The file content itself, as you would get it via the Raw endpoint:

Thanks for your proposal. I checked it, but it didn't work. As far as I understood, that's the header that Matillion is sending to GitHub, right?

So Matillion to Github communication is not an issue here, that works.

The problem is the response, Github to Matillion, which gives this error:

Error: Invalid JSON markup. Expected json, but instead found [text/plain; charset=utf-8].

 

However, the content is proper JSON (the call from Github to Matillion is "text/plain"). Content of the reponse (This works in Postman, but it detects it as text, not as JSON):

{

    "type": "record",

    "name": "vacancy",

    "namespace": "marketplace.core.entity",

    "fields": [

        {

            "name": "id",

            "type": "string",

            "doc": "Unique id of the vacancy (UUIDv4)"

        },

        {

            "name": "accountId",

            "type": "string",

            "doc": "Id of the account to which owns the vacancy"

        },

...

    ]

}

Thanks for providing the info and screenshot. That is really helpful. One thing that comes to mind is if you are using the GitHub API to pull file contents you would typically need to include a specific "Accept" attribute and value in the header. In the API Query component make sure you have this connection option configured:

 

If this doesn’t work perhaps you could you post the API Profile you are using to get the file contents from GitHub and the API Query configuration you currently have? Feel free to obfuscate any content you deem intellectual property or security related to protect your GitHub environment and company. I hope the above helps.

the used rsd file

Hi @Manuel​,

What version of Matillion ETL are you using there? It worked OK for me on a Version: 1.55 (for Snowflake) querying this raw JSON file from the Developer Relations account...

Best regards,

Ian

We're using Version 1.54.10 for AWS Redshift.

Ok, got it running when recreating the profile with the wizard. Will check what may have caused it

Ok, looks like

 

 <api:set attr="BackwardsCompatibilityMode" value="true" />

 

is needed here. Without this being set, you'll run into the error described above.

I should have known that! I have ran into issues with the BackwardsCompatibilityMode being true when it needed to be false and vice-versa. The errors I have seen where different than the one you are getting which probably makes sense because you are going after a completely different API than I was. Glad you got it working!