Thursday 13 November 2014

Good progress on database de-duplication

Over the last six months or so the network has undertaken a combination of a manual blitz on merging catalogue records and running an automated de-duplication script across our database.  The results of this work are really promising & so I thought I would report this.

Providing accurate information is a little difficult because while people were busy merging records the last of the libraries were being added to the database.  And this group of libraries added over 200,000 items to the database.  Despite adding these 200,000 items the net number of bibliographic records on this system decreased by 6,172.

In March the items per bibliographic record was 3.39, and at the end of October it was 3.58.  This is a good indicator that we are moving to have a cleaner database, with fewer duplicate bibliographic records.  During this period approximately 79,000 records were merged either by the diligent work of library staff or through the automated script.

The PLS team ran a Workflows Duplicate Titles report in March & again in October.  While the report can't be 100% accurate, in March it indicated that approximately 14% of our bibliographic records were likely to be duplicates that could be merged.  The October report indicates that this figure is now down to approximately 10%.

The benefits of de-duplication are felt by all in the system.  Customers have a greater probability of getting their reserved item quicker if all items are attached to one bib record.  It also stops customers placing holds on multiple bib records of the same title! Less items get shipped, which means less work for library staff at both sending and receiving libraries.  And of course less shipping & sorting by TOLL = less costs.

While many (but not all) libraries contributed to the clean up I would like to particularly mention Campbelltown, the Flinders Mobile, Port Pirie, Mitcham, West Torrens and Public Library Services for their contribution. Your service on behalf of all libraries and customers is appreciated.

PLS intends running another de-duplicating project next year, because of the benefits for all.  We will be looking forward to all libraries being prepared to contribute to this work, as everyone benefits.

As part of this next round we intend to tweak the "match points" that we use in the automated de-duplication script. We believe that we will get a higher (but still accurate) hit rate by doing this. This should increase the efficiency of this process. 

However making changes to this script does require considerable testing to make sure that we don't incorrectly merge records.  We'll be looking for more people to undertake this testing.

And by all means, although we're not in a formal de-duplication blitz phase if you want to keep working on bib record merging it can only be a good thing!

No comments:

Post a Comment