Category Archives: XPath

Non Deterministic XPath Expression

I have an xml that has an element which is recursive. A category node has it’s own definition and also the ancestors of that category. Those ancestors repeat this structure till the top most category. To make things complicated, a product can have multiple category paths. In order to identify the top-most category, I used

//Category[position()=1]/descendant::Category[last()]/Name

This worked fine in Java. But when I tried the same in Perl (XML::XPath), it didn’t work. It turned out, the inner most Category happened to be the first one within the descendant::Category nodeset. So, using

//Category[position()=1]/descendant::Category[position()=1]/Name

it worked. But I wanted an xpath expression that is language independent.

So, I ended up using

//Category[position()=1]/descendant::Category[count(Ancestors)=0]/Name

I am not sure if the perl implementation of the xpath is incorrect or if the xpath specification is unclear on the ordering of the nodes for certain type of expressions.

Leave a comment

Filed under XPath

XPath: What about the between operator?

XPath 1.0 has following and preceding axes to help you get all the elements after a certain element or all the elements before a certain element. But what if you want all the elements between two elements? I didn’t find any explicit construct in xpath 1.0 specification to do this.

One way to do this is use the following axis and get all the elements after the first element and use the preceding axis and get all the elements before the second element and then find the intersection of these two. Sounds like a good idea, but how to do this using xpath only and no procedural code? So, I searched for the intersection operator and to my delight found this in XPath 2.0 specification, that also contains except and union operators.

But the tools I am working with only support XPath 1.0 and so, I was back to finding a way to do this in XPath 1.0. After a bit of experimentation, came up with the following strategy.

Take a simple XML like

<a><b/><c/><b/><b/><d/><b/><b/></a>

The goal is to get all the b elements between c and d.

  • Using //b, you get 5 <b> elements
  • Using //c/following::b, you get 4 <b> elements
  • Using //d/preceding::b, you get 3 <b> elements
  • Using //c/following::b[following::d] you get 2 <b> elements, the end goal!

Technically, the above solution is not 100% correct as there can be multiple c and d elements, but it’s good enough for my use case. However, the general idea is

first-element-axis/following::desired-elements[following::second-element]

to get the desired behavior of the between axis. I would be curious to hear any other solutions. And BTW, you can verify all this by using the online xpath evaluator, a nice tool to experiment with xpaths.

4 Comments

Filed under XML, XPath