Extended join algorithms and path based approach for XPATH evaluation, M S thesis, by M Archana, March 2006. XML(eXtensible Markup Language) is gaining prevalence in the representation and exchange of web data. XML document can be modeled as a tree and the queries specify selection predicates on this tree structure. XPath queries can specify axis such as parent-child(P-C), ancestor-descendant(A-D) and following-preceding(F-P). Efficient evaluation of XPath queries is a major research issue. Structural join techniques are well known for query evaluation. Stack-tree algorithms are structural join algorithms which advocate the use of A-D algorithms to evaluate the P-C axis also. Though P-C axis generates less number of results than A-D axis, this approach needs same processing overhead to evaluate both the axes. To reduce the computation overhead for evaluating P-C axis, we propose algorithms exclusively for P-C axis. These join algorithms use interval encoding to uniquely identify the nodes. Current algorithms concentrate mainly on A-D and P-C axis but pay little attention towards evaluating F-P axis. But F-P axis is also useful in some queries. In the framework of interval encoding, algorithms for all the axes are present except for the F-P axis in the literature. To fiill this gap, we also propose structural join algorithms to evaluate F-P axis. The proposed algorithms have linear time complexity. The join techniques for XPath evaluation are costly as they need large number of disk accesses. Path based techniques can reduce the disk I/O cost by reducing the number of joins. Existing path based approaches store pathIDs along with interval encoding labels for all the elements present in the document. These approaches have di_culties in assigning pathIDs to the documents which are deep and have large number of distinct tags and they require reassigning of the labels in case of updates to the document. To overcome these drawbacks we use structural summary techniques to assign pathIDs. We propose novel metadata guided query evaluation scheme which uses string matching technique to solve the queries. Our experimental evaluation shows that the proposed algorithms perform better than the existing algorithms. |