I cannot get Microsoft Graph API pagination to work using @odata.nextLink

Microsoft Graph documentation shows how they use @odata.nextLink to provide the next URL for paging data. However, the RSD file for that API query cannot interpret the column. If I change the RepeatingElement to "/" instead of "/value" and try displaying @odata.nextLink and @odata.context, nothing it returned. I found that if I put brackets around it in the in the column mapping, I get data ( other:xPath="/[@odata.nextLink]"). Using that same logic in pageurlpath does not work.

Variations I've tried for pageurlpath:

  • value="/@odata.nextLink" - just returns the first 100 records, no paging
  • value="/nextLink" - returns only first 100 records
  • value="/[@odata.nextLink]" - error that object @odata does not exist
  • value="[/@odata.nextLink] " - error that object /@odata does not exist

I can't find any documentation on how to get around this. I can use a Python control to do this but would much rather use native API Query capabilities.

I have the same issue as well. Any reply is greatly appreciated. thanks.

I got an answer from support that helped me resolve this:

You can escape a string in the paging URL in a RSD file (Query Profile) like this:

<api:set attr="pageurlpath" value="/json/\[@odata.nextLink\]" />

According to our driver vendor, you would need to use the escape syntax which is just putting square brackets around the name you want to use:

<attr name="nextlink" xs:type="string"

readonly="false"

other:xPath="/json/[@odata.nextLink]"/>

When you're setting the pageurlpath, you'd also have to escape the square brackets themselves like so:

<api:set attr="pageurlpath" value="/json/\[@odata.nextLink\]" />

There are 2 things to note here:

1) For an element outside the JSON path, the value will not be pushed until it's read from the document. For example, if the /json/[@odata.nextLink] element appears at the end of the JSON file, only the last row of output will have the data in that field. If it were at the beginning of the file, then all rows would output it.

2) If you set pageurlpath, this will override the output value for the column and you'll just see a null value. If you're just doing testing to see the output, you'll want to comment out the line where you set pageurlpath.

Thank you so much @bbenfield​ !

The details that you explain enabled me to finish the data pipe with ease. From the start I was using 'Managed Extract Profiles' but after a few unsuccessful tries with the formatting as suggested above, I switched to 'Managed Query Profiles' instead.

I have also raised a ticket with the support team specifically on using 'Managed Extract Profiles', which I will keep it open to see how it actually gets resoluted - I can share the findings with you if you are keen :)

Key reason on using 'Managed Query Profiles' is because I can keep the actual JSON data for data lake use case, I haven't figured out how to extract the entire JSON in 'Managed Query Profiles' yet.

I have not used Query Profiles before, it is a bit time consuming to learn how it works and how to get around some of the quirks i.e. I am using MS Graph - List Users: https://learn.microsoft.com/en-us/graph/api/user-list?view=graph-rest-1.0&amp;tabs=http, and also need to apply $select=..... and $expand=manager in the query call. But with 'Managed Query Profiles', it keeps getting a bad request when running the API Query component in the Orc job. Which then I managed to resolve this by comment out the parameters and put them directly in the URI instead:

I also notice that when there is multiple values within an array (repeat element), it will get the first one by default, it will not flatten it into multiple rows, nor keep the combined array values as string:

But fortunately I can live without having this data for now.

Effort is not wasted though, I managed to get it going - all thanks to your timely post above :slight_smile:

Winston

Thanks for the tip @bbenfield​ , they will come in handy for sure!

Another question that I haven't asked which is related to the below:

Use the nextLink field as example, mine return null for all rows. Correct me if I am wrong, it is because of point 2 - I have set it with pageurlpath. May I know whether you have managed to output to a field in any of your use cases?

Rationale of the question: I would like to store the actual and entire JSON contents for data lake use case, and hence I was trying to use ‘Managed Extract Profiles’ Method as the first instance.

Thanks again

Winston

I am trying to extract data from microsoft defender through api extract component , however pagination is not working is there a way to enable pagination using odata.nextlink in api extract component . specific to my scenario i dont want to use api query profile ?

I am trying to extract data from microsoft defender through api extract component , however pagination is not working is there a way to enable pagination using odata.nextlink in api extract component . specific to my scenario i dont want to use api query profile ?@winston.l​ did you get any resolution arround api extract component for this?

Thank you! Escaping the @odata.nextLink worked for me.

Winston,

I had the same issues too. Documentation around these files is very lacking. I've been able to piece a lot of this together though. For the parameters piece, the wizard is annoying that way in breaking out all URL parameters. I typically hard code like you did but found that you can populate those using the Connection Options in the API Query control. This is very handy if you're using iterators. The [_connection.***] definitions will use whatever you pass in the Connection Options using Other. The nice thing is you can put the variables in the URL instead of the breakout the wizard does. You could do something like this:

<api:set attr="uri" value="https://graph.microsoft.com/v1.0/users?$select=[_connection.select]&$expand=[_connection.expand]" />

Then, in the Connection Options, you could set those to the values you want. You separate the different definitions with a semicolon.

Not very practical in your scenario since these values never change, but doable. I typically use this where one of those values are variable. Then you would set your value to something like expand=${expandVariable}.

I also had the same issue with the expanded nodes having multiple values, and Matillion only grabbing the first one. I couldn’t find a way to do it in a single query profile, but you can do with a second query profile using the SubRepeatElement. You use this to create another table that you then join to the user table. Here’s a basic example that queries users and captures their assigned SKUs:

It returns two columns = id (the user’s ID) and skuId. It will do a row per SKU so some users will only have one row, with or without a SKU, and others with have an entry per SKU.

@winston.l​ I haven't had a need to export the full JSON package but in my initial testing, I was able to confirm your point. I could only output the value of nextLink if I only looked at / without pagination. I've never used Extra Profiles so I can't speak to the differences there.

All good and thank you again.

 

Winston

Hi @terwiela​ and welcome to the forums, so glad that @bbenfield​'s response was able to help you!

Many thanks,

Claire

Here’s how i got the pagnation to work correctly with Microsoft Graph API. :slight_smile: Enjoy.

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <!-- See Column Definitions to specify column behavior and use XPaths to extract column values from JSON. -->
  <api:info title="Azure_Entra" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
    <!-- You can modify the name, type, and column size here. -->
    <attr name="mail"                                                      xs:type="string" readonly="false"  other:xPath="mail"                                                      />
    <attr name="id"                                                        xs:type="string" readonly="false"  other:xPath="id"                                                        />
    <attr name="userPrincipalName" 										   xs:type="string" readonly="false"  other:xPath="userPrincipalName" />
    <attr name="displayName"											   xs:type="string" readonly="false"  other:xPath="displayName" />
    <attr name="givenName"												   xs:type="string" readonly="false"  other:xPath="givenName"  />
    <attr name="surname"												   xs:type="string" readonly="false"  other:xPath="surname"  />
    <input name="rows@next" desc="Identifier for the next page of results. Do not set this value manually." />
  </api:info>

  <api:set attr="BackwardsCompatibilityMode" value="true" />
  <api:set attr="uri" value="https://graph.microsoft.com/v1.0/users" />
  <api:set attr="EnablePaging" value="TRUE" />
  <api:set attr="pageurlpath" value="/json/\[@odata.nextLink\]" />

  <!-- Column XPaths are relative to a RepeatElement that splits the JSON into rows. -->
  <api:set attr="RepeatElement" value="/value" />

  <!-- The GET method corresponds to SELECT. Here you can override the default processing of the SELECT statement. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->

  <!-- Uri parameters set up -->
  <api:set attr="paramname#1" value="$select" /> 
  <api:set attr="paramvalue#1" value="mail,id,userPrincipalName,displayName,givenName,surname" />  

  <!-- Uri unset parameters set up -->
    <api:check attr="_input.rows@next">
    <api:unset attr="paramname#1" />
      <api:unset attr="paramvalue#1" /> 
    </api:check> 

  <api:script method="GET">
    <api:set attr="method" value="GET"/> <!-- HTTP request type -->

    <api:call op="jsonproviderGet">
      <api:push/>
    </api:call>
  </api:script>

</api:script>


















Thanks for sharing @gbird

That is awesome!

Kind regards, Joe