diglib Archive
Date: Mon May 02 11:24:02 2005
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

diglib: Fwd: Re: [DIGLIB] filenaming



Another interesting viewpoint on this topic.

Carol


Date: Wed, 20 Apr 2005 09:02:30 +1200
From: "Steve Knight" <Steve.Knight@natlib.govt.nz>
Cc: <diglib@infoserv.inist.fr>
Subject: Re: [DIGLIB] filenaming
Scanned: by amavisd-new at inist.fr

Hi Kate

We have come up with a file naming system for all the digital objects
we are storing in our file repository, currently about 40,000 objects.

The file naming scheme was developed to allow us to perform
management tasks on the objects and storage media manually
due to the lack of an appropriate object management system,
however we intend to incorporate the naming scheme into our
new object management system.

Each of our objects has a unique Internal Identifier (IID) which we
use to reference the object. This identifier is a running number.

Each object can have multiple instances, defined by a Role Code.
DO - Digital Original, only exists if it is a digitally born object.
PM - Preservation Master, our preservation copy of an object's files.
MM - Modified Master, a copy of a PM which has been modified by hand.
AC - Access Copy, a derivative of a PM or MM which has been
programmatically
created to support access i.e. thumbnail or low resolution web browser
copy.

There can be multiple versions of an object's instance, i.e. a web
application
may require a thumbnail, preview and medium resolution copy of an
image,
each of these would be AC copies version 01, 02, and 03. The same goes
for
MM and PM instances, only the latest version of a PM is considered to
be a
preservation master though.

An instance of an object may also have multiple parts. i.e a
publication may
have each page as a separate TIFF file or have separate PDF files for
each
chapter where each file is a part of that objects instance. Recently I
was confronted
as to why we don't just call parts files, this has to do with embedded
objects within other
Bitstreams. By calling them parts this allows us to reference embedded
objects within other
Bitstreams without getting them confused with files. We still don't
know if this is a
good idea or not so we're hedging our bets.


Our file naming scheme is as follows.

[IID]_[Role Code]_[Version]_[Part Number].[Extension]

IID = integer
Role Code = two upper case alpha chars.
Version = single zero-padded integer
Part number = single zero-padded integer.
Extension = the file extension of the file .i.e. jpg, wav, tiff.

an example of a PM TIFF version two with one part.

12345_PM_02_01.tif


Justification for the naming scheme....
We are very cautious about embedding metadata into file names, it's
just a bad thing to do.
We have had big problems with some of our legacy systems doing this
especially when you
don't take into account the amount of data that can accumulate over
5-10 years of operation,
and the affects on your systems.
Notwithstanding that we do believe that the metadata that we have
embedded in our scheme is justifiable since we
are managing these objects manually without any application support at
this time, and
the type of metadata we are embedding shouldn't impede future systems.

Descriptive or technical metadata in a file name is completely
unacceptable for us, but
you must have links back to a descriptive or Object Management System
for the files to
be of any real use. The metadata has to exist somewhere, just not in
your file name.

We chose to use an underscore as a data delimiter because its an
inoffensive character
to most of the file systems and operating systems used today. - / .
etc. all mean something
to someone, but underscores just aren't used for anything much at all.

We are currently storing our files on Solaris UFS filesystem, so we
don't see any need to
restrict ourselves to an 8.3 character naming scheme.

These file names are not intended to be referenced by systems anyway.
We've developed
a resolver system which allows us to request objects using the metadata
listed in the
filename without actually using the filename used on the filesystem.
This means that we
can change our file naming scheme at any time without affecting any
references to these
objects by applications, websites etc. (we're still testing the
resolver)

Case sensitive??? We haven't made any official recommendations about
this but I suggest
that you stick to one or the other, uppercase or lowercase. If you mix
your cases then you
WILL run in to problems if you export a case sensitive Unix file-system
to a windows machine
for example.

W are also looking at whether we can use the internal identification
number as part of any persistent identifier system that the Library
decides to proceed with.


Sorry for the overly lengthy response, I hope this helps.


Mat Black
Technical Analyst
National Digital Heritage Archive
National Library of New Zealand.


>>> "Kate Boyd" <boydkf@gwm.sc.edu> 16/04/05 06:13:23 >>>
I am interested in how people are creating filenaming schemes for
their
cultural heritage digital collections.  A few questions I have are:

Is it still important to keep only 8 characters for the filename?  If
not, should there be some uniform limit for all the collections, or
not?

Is it important to use only lower case alphanumeric characters?  Will
other characters like hyphens cause issues in databases and servers?

How important is it to relate the name of digital files to the actual
object being scanned?  Is this better than arbitrary consecutive
numbering to the files, so that there is overall uniformity with all
of
the digital collections.

Any ideas or thoughts on fiilenaming for digital collections will be
greatly appreciated.  Thank you for your time on this.
Kate

Kate Foster Boyd
Digital Projects Librarian
Thomas Cooper Library
University of South Carolina
1322 Greene Street
Columbia, SC 29208
(803)-777-2249
----------------------------------------------------------------------
To post messages to DIGLIB, send messages to: diglib@infoserv.inist.fr


Manage your account, change subscription options, or visit the archive
at:

     http://infoserv.inist.fr/wwsympa.fcgi/info/diglib

DIGLIB requires that subscribers login with a password to change
their list profiles. First time users can request a password from the
page above. Any questions can be directed to one of the list
moderators.

To unsubscribe:
mailto:[conf->email]@[conf->host]?subject=sig%20[list->name]%20[user->email]

----------------------------------------------------------------------
----------------------------------------------------------------------
To post messages to DIGLIB, send messages to: diglib@infoserv.inist.fr

Manage your account, change subscription options, or visit the archive at:

     http://infoserv.inist.fr/wwsympa.fcgi/info/diglib

DIGLIB requires that subscribers login with a password to change
their list profiles. First time users can request a password from the
page above. Any questions can be directed to one of the list moderators.

To unsubscribe:
mailto:[conf->email]@[conf->host]?subject=sig%20[list->name]%20[user->email]
----------------------------------------------------------------------