Database
Introduction
This document establishes standards for Oracle databases developed for
Purdue University's Administrative Computing environment. It has been
determined that, with the open environment offered by the Oracle client/server
architecture and current LAN configurations, adherence to common standards
of data quality, data integrity, and data naming conventions has become
much more critical. Anyone wishing more information on this need may request
the Data Quality and Integrity: Information Technology's
Responsibility document from Data Administration.
Terminology
Any questions regarding terminology used in this document should be answered
in the glossary at the following web site.
If any terms are missing, please refer the question to Data Administration
for inclusion in the glossary.
Goals
The goals of Data Administration this document is intended to help attain
are as follows:
Audience
Given the aforementioned goals, the intended audience for this document
is the DCM's, Data Stewards, Project Managers, and Application Developers
in all of Administrative Computing. Since these standards seek a University-wide
view of the University's data, the audience must include data representatives
from all departments in the Administrative Computing area.
Scope
While it is preferable that all applications in Administrative Computing
which utilize a relational database adhere to the following standards,
this is not always possible. Most purchased applications will not adhere
to all of these standards, and it should not be anticipated that these
applications will ever be brought into conformance. In addition, there
are applications that we have already developed which do not adhere to
these standards. However, lessons learned from these applications have
been helpful in defining these standards.
Therefore, these standards apply to new, in-house developed databases. This includes Online Transaction Processing (OLTP), Operational Data Stores (ODS), and Data Warehouse databases, even ODS databases being developed for a purchased or existing OLTP databases.
Authority
The central authority for these standards will be Data Administration
in coordination with the Data Stewards.
Exceptions
It is recognized that business needs may require exceptions or modifications
to these standards. The basic rule will be, "Bend, but don't break."
The quality of the data we provide cannot be compromised. However, where
business needs dictate, exceptions to these standards are possible and
will be handled on a case-by-case basis. Requests for exceptions must
be coordinated with and approved by the Data Stewards and with Data Administration
so that an appropriate approval decision may be made. All requests for
exceptions must be well documented.
As we move further into the relational database arena, modifications, additions, and/or subtractions from these standards may be necessary. Requests or recommendations for such items should be passed to Data Administration for consideration.
Modeling
For OLTP applications, databases must be normalized to at least 3rd
Normal Form. If necessary, for performance or well-defined business purposes,
certain relationships within an application database may be denormalized,
but these situations must be carefully reviewed by Database Administration
and Data Administration.
Data Warehouse models must continue to follow the Star Join methodology. As this is a common repository, exceptions will likely not be made.
For ODS databases, flexibility will be in order. The first priority of an ODS is that it fulfills specific operational needs of a specific business community. Different business areas may require different architectures. For instance, if access to the database will generally be via ad hoc queries from a diverse user base, a Star Join topology may be appropriate. If the access is generally via canned reports or from a smaller number of well-trained users, a normalized topology may be more appropriate. In either case, however, the standards dictating quality, integrity, and naming must still apply.
The distinction of an ODS supporting a specific business need as opposed to a specific business application must also be made. An ODS database, by definition of supporting a business needs, may certainly contain information from one or more than one external database. An ODS should not be thought of as simply a mirrored image of one, specific OLTP!
Naming/Sizing Conventions
Consistent usage of names and data types in an open database environment
is important for user operability as well as administrative ease. Where
users are free to run ad-hoc queries against one or multiple databases,
the users should be able to expect consistent implementation of names
and data types. Names and data types should be consistent both between
and within databases.
Between databases, it is expected that certain core data elements will be named and sized the same between databases, regardless of the form (OLTP, ODS, Data Warehouse). Core data is defined as data which is shared between multiple databases and/or departments within the university. The Data Stewards will have the ultimate authority to determine what is considered to be core data. Listed below are initial examples of Core Data Elements, their Oracle Data Type and Size, and their Descriptions. This listing is not considered to be complete. It was arrived at after a brief review of the current major data structures. This listing may be considered a starting point for the Data Stewards in the development of a final listing.
|
Additional Naming and Sizing Standards
For alphanumeric fields, the VARCHAR2 data type should always be used instead of the CHAR datatype.
Oracle's maximum length for field names is 30 characters.
All fields which contain a flag (such as "YES/NO" fields) must have the word, "FLAG," appended to their name.
All fields which contain a code (such as STATE_CODE, CAMPUS_CODE, etc.) must have the word, "CODE" appended to their name.
Fields which are not a flag or a code field should not have the associated word appended to their title.
Field names, which are not listed in the above table, should be consistent within an application.
Every effort should be made to make field names as descriptive as possible. This is especially important in an ad hoc database.
Data Integrity
This term addresses data issues in the environment of relational and distributed
databases. This has less to do with the content of data within a database
as it does with how that data "relates" with data elsewhere
in the same database or in another database. It is intertwined with Data
Quality.
In a simple example, if a Student's gender is listed as "FEMALE," then "FEMALE" should be one of the allowable values for gender. No value for gender should be allowed that is not in the list of acceptable values.
In a more important example, if a contributor wishes to give money to a specific account, that account must be contained within the list of acceptable accounts. In an example between databases, if a major contributor's child enrolls at the university, the individual's listed as the student's parents in the Student database should be easily matchable to the individuals in the Donor database.
The surest way of enforcing relational integrity is to apply constraints at the server. These constraints will provide quality assurance to all data which enters the database, regardless of the data's source. However, some purchased applications enforce integrity on the "client side," using the application software to enforce the relationships. This provides much greater opportunity for violations of the database's relational integrity. Not only could parts of the application fail to adequately check all of the incoming data, but data which is loaded "via the backdoor" (such as during migration of legacy data) could easily compromise the integrity of the database.
It may be possible that some purchased applications cannot be altered to enforce integrity constraints at the server level. Attempting to enforce constraints across the board could have unknown impacts upon the application software. However, it is necessary to ensure and enforce data integrity at some point so any pieces of data which violate the integrity are not passed along to other applications or to end-users.
Therefore, the following specific standards apply to the enforcement of data integrity:
OLTP application database must be periodically validated to ensure that they are enforcing data integrity.
Likewise, whenever data leaves a purchased OLTP application, it must be checked for data integrity. This is regardless of whether it's heading for an ODS, a Data Warehouse, or some sort of reporting database. This referential integrity must be enforced either through thorough scripting or through the activation of database level constraints. This is critically important where the ODS provides data to another OLTP application.
The presence of spaces or "blanks," in the place of nulls, represent a threat to referential integrity and cannot be spread beyond purchased OLTP applications to an ODS, Data Warehouse, or other OLTP application! Spaces/Blanks must be replaced with nulls when data leaves a purchased OLTP which contains spaces/blanks.
OLTP's developed in-house must comply with the null standard.
Spaces/Blanks must also be trimmed from the end of all alphanumeric fields where spaces/blanks are or may be present.
Data Quality
Data Quality answers the question, "How good is our data?" Do
we have accurate information about each student, each employee, each alumnus,
and each donor? Is the information about classes correct? Are all addresses
correct? Are all grades correct? Do we have all information correctly
and completely filled in for each person? There are a number of questions
we could ask, and the list would vary by database, but the basic questions
are "Is the data we filled in correct?" and "Should the
data we're missing be filled in?"
Data Quality will be more difficult to enforce than Data Integrity. It will require the involvement of individuals and/or Data Stewards who know the data to check the data's accuracy.
Metadata
Metadata is data which describes our data. Capturing and maintaining metadata
is critical to the maintenance of our database applications. Specific
standards and recommendations regarding metadata will be set forth in
a separate document, but metadata must be captured for all Administrative
Computing databases.