Implementing an XML Object Identification System on an archive data

2011

Despite the existence of various techniques and tools at early stage, the data quality

problem was not given the attention it deserves, until recent time,1990s the data quality

was restricted to certain sectors, but later following the exposition of the huge losses

due to data quality related problems different works has been seen. A few scholars have

been involved in exposing the data quality problem and also finding solutions; among

the initiatives to study the data quality problem systematically was the total data

quality management methodology.

The archiving sector is not a different from the above case, in the process of archiving or

long term preservation unless the data preserved is accurate and authentic its use

would be of little value.

This paper is the study of how to ensure the accuracy of digital archives data and it

presents a data quality approach called an object identification technique as a way of

ensuring that an archive data is accurate. Most of the research undertakings have been

focusing on relational data, but with the increasing popularity and importance of the

XML data, there is a concern for developing data quality tools and methodologies

which suit the XML data need. Based on this fact the object identification technique on

this study focused on an XML data.

The research used the Noark data as a case study and developed a prototype of an

object identification technique. The prototyped object identification technique has

shown a good result upon a test on sample Noark representative data.

This study is of significant in taking the initiative to create the awareness on data

quality issues in the case of an archive.

Joint Master Degree in Digital Library Learning (DILL)

Høgskolen i Oslo. Avdeling for journalistikk, bibliotek- og informasjonsvitenskap
Universitetet i Tallinn
Universitetet i Parma