Login   |   ITaP Home > Business Technology > IRM > Data Standards

Database

Introduction
This document establishes standards for Oracle databases developed for Purdue University's Administrative Computing environment. It has been determined that, with the open environment offered by the Oracle client/server architecture and current LAN configurations, adherence to common standards of data quality, data integrity, and data naming conventions has become much more critical. Anyone wishing more information on this need may request the Data Quality and Integrity: Information Technology's Responsibility document from Data Administration.

Terminology
Any questions regarding terminology used in this document should be answered in the glossary at the following web site. If any terms are missing, please refer the question to Data Administration for inclusion in the glossary.

Goals
The goals of Data Administration this document is intended to help attain are as follows:

  1. Encourage recognition of all administrative data as a Purdue University asset.
  2. Promote Data and Database consistency throughout Administrative Computing.
  3. Encourage interaction between different Administrative Computing areas.
  4. Promote the viewing of individual databases from a University perspective.
  5. Ensure and improve data accuracy and data integrity.
  6. Assist with logical design of new databases.
  7. Promote access to centralized core university data.

Audience
Given the aforementioned goals, the intended audience for this document is the DCM's, Data Stewards, Project Managers, and Application Developers in all of Administrative Computing. Since these standards seek a University-wide view of the University's data, the audience must include data representatives from all departments in the Administrative Computing area.

Scope
While it is preferable that all applications in Administrative Computing which utilize a relational database adhere to the following standards, this is not always possible. Most purchased applications will not adhere to all of these standards, and it should not be anticipated that these applications will ever be brought into conformance. In addition, there are applications that we have already developed which do not adhere to these standards. However, lessons learned from these applications have been helpful in defining these standards.

Therefore, these standards apply to new, in-house developed databases. This includes Online Transaction Processing (OLTP), Operational Data Stores (ODS), and Data Warehouse databases, even ODS databases being developed for a purchased or existing OLTP databases.

Authority
The central authority for these standards will be Data Administration in coordination with the Data Stewards.

Exceptions
It is recognized that business needs may require exceptions or modifications to these standards. The basic rule will be, "Bend, but don't break." The quality of the data we provide cannot be compromised. However, where business needs dictate, exceptions to these standards are possible and will be handled on a case-by-case basis. Requests for exceptions must be coordinated with and approved by the Data Stewards and with Data Administration so that an appropriate approval decision may be made. All requests for exceptions must be well documented.

As we move further into the relational database arena, modifications, additions, and/or subtractions from these standards may be necessary. Requests or recommendations for such items should be passed to Data Administration for consideration.

Modeling
For OLTP applications, databases must be normalized to at least 3rd Normal Form. If necessary, for performance or well-defined business purposes, certain relationships within an application database may be denormalized, but these situations must be carefully reviewed by Database Administration and Data Administration.

Data Warehouse models must continue to follow the Star Join methodology. As this is a common repository, exceptions will likely not be made.

For ODS databases, flexibility will be in order. The first priority of an ODS is that it fulfills specific operational needs of a specific business community. Different business areas may require different architectures. For instance, if access to the database will generally be via ad hoc queries from a diverse user base, a Star Join topology may be appropriate. If the access is generally via canned reports or from a smaller number of well-trained users, a normalized topology may be more appropriate. In either case, however, the standards dictating quality, integrity, and naming must still apply.

The distinction of an ODS supporting a specific business need as opposed to a specific business application must also be made. An ODS database, by definition of supporting a business needs, may certainly contain information from one or more than one external database. An ODS should not be thought of as simply a mirrored image of one, specific OLTP!

Naming/Sizing Conventions
Consistent usage of names and data types in an open database environment is important for user operability as well as administrative ease. Where users are free to run ad-hoc queries against one or multiple databases, the users should be able to expect consistent implementation of names and data types. Names and data types should be consistent both between and within databases.

Between databases, it is expected that certain core data elements will be named and sized the same between databases, regardless of the form (OLTP, ODS, Data Warehouse). Core data is defined as data which is shared between multiple databases and/or departments within the university. The Data Stewards will have the ultimate authority to determine what is considered to be core data. Listed below are initial examples of Core Data Elements, their Oracle Data Type and Size, and their Descriptions. This listing is not considered to be complete. It was arrived at after a brief review of the current major data structures. This listing may be considered a starting point for the Data Stewards in the development of a final listing.

Core Data Element Oracle Data Type and Size Description Example
ACADEMIC_SESSION_CODE VARCHAR2(1) A code indicating one of Purdue's academic sessions. 1, 2, S
ACADEMIC_YEAR VARCHAR2(7) A Purdue academic year. 1997-98
ACCOUNT_NAME VARCHAR2(40) The name of an account.  
ACCOUNT_NUMBER VARCHAR2(13) The number of an account. 010 1061-0000
BIRTH_DATE DATE A person's date of birth. 25-DEC-1997
CAMPUS_CODE VARCHAR2(1) The code associated with a Purdue University campus.  
CAMPUS_NAME VARCHAR2(26) The name of a Purdue University campus.  
COURSE_NUMBER VARCHAR2(4) The number associated with a course.  
DEPARTMENT_CODE VARCHAR2(4) The code associated with a Purdue University department.  
DEPARTMENT_NAME VARCHAR2(40) The name or title of a Purdue University department.  
DEPARTMENT_REFERENCE_CODE VARCHAR2(5) A departmental reference number/code.  
ETHNIC_GROUP VARCHAR2(35) The ethnic group of which a person is a part.  
FISCAL_YEAR VARCHAR2(7) A Purdue fiscal year. 1997-98
FUND VARCHAR2(3) An accounting fund.  
GENDER VARCHAR2(12) A person's gender. Female, Male, Not Reported
MARITAL_STATUS VARCHAR2(10) A person's marital status.  
NAME_FIRST VARCHAR2(35) The first name of a person.  
NAME_LAST VARCHAR2(35) The last name of a person.  
NAME_MIDDLE VARCHAR2(35) The middle name of a person.  
NAME_SUFFIX VARCHAR2(10) A person's suffix, such as Jr., Sr., Esq., etc.  
NAME_TITLE VARCHAR2(??) A person's title, such as Mr., Mrs., Major, etc.  
PROJECT VARCHAR2(4) An accounting project.  
RESIDENCE_CODE VARCHAR2(3) The code describing a person's geographical area of residence.  
SUBJECT_CODE VARCHAR2(4) The code associated with a course's subject.  
SUBJECT_NAME VARCHAR2(30) The name of a course's subject.  
VETERAN_FLAG VARCHAR2(3) Indicates if a person is a US military veteran or not.  
VISA_TYPE VARCHAR2(27) The type of visa possessed by a person.  
xxxxx_CITY VARCHAR2(25) The city name of a person's particular address where the "xxxxx" is replaced with the type of the address. For instance, HOME_CITY or LOCAL_CITY.  
xxxxx_COUNTRY VARCHAR2(25) The country of a person's particular address where the "xxxxx" is replaced with the type of the address. For instance, HOME_COUNTRY or BUSINESS_COUNTRY.  
xxxxx_STATE_CODE VARCHAR2(2) The state code of a person's particular address where the "xxxxx" is replaced with the type of the address. For instance, HOME_STATE_CODE or LOCAL_STATE_CODE.  
xxxxx_STREET1 VARCHAR2(25) The first line of a person's particular street address where the "xxxxx" is replaced with the type of the address. For instance, HOME_ADDRESS1 or LOCAL_ADDRESS1.  
xxxxx_STREET2 VARCHAR2(25) The second line of a person's particular street address where the "xxxxx" is replaced with the type of the address. For instance, HOME_ADDRESS2 or LOCAL_ADDRESS2.  
xxxxx_ZIP_CODE VARCHAR2(10) The zip code of a person's particular address where the "xxxxx" is replaced with the type of the address. For instance, HOME_ZIP_CODE or LOCAL_ZIP_CODE.  

 

Additional Naming and Sizing Standards

For alphanumeric fields, the VARCHAR2 data type should always be used instead of the CHAR datatype.

Oracle's maximum length for field names is 30 characters.

All fields which contain a flag (such as "YES/NO" fields) must have the word, "FLAG," appended to their name.

All fields which contain a code (such as STATE_CODE, CAMPUS_CODE, etc.) must have the word, "CODE" appended to their name.

Fields which are not a flag or a code field should not have the associated word appended to their title.

Field names, which are not listed in the above table, should be consistent within an application.

Every effort should be made to make field names as descriptive as possible. This is especially important in an ad hoc database.

Data Integrity
This term addresses data issues in the environment of relational and distributed databases. This has less to do with the content of data within a database as it does with how that data "relates" with data elsewhere in the same database or in another database. It is intertwined with Data Quality.

In a simple example, if a Student's gender is listed as "FEMALE," then "FEMALE" should be one of the allowable values for gender. No value for gender should be allowed that is not in the list of acceptable values.

In a more important example, if a contributor wishes to give money to a specific account, that account must be contained within the list of acceptable accounts. In an example between databases, if a major contributor's child enrolls at the university, the individual's listed as the student's parents in the Student database should be easily matchable to the individuals in the Donor database.

The surest way of enforcing relational integrity is to apply constraints at the server. These constraints will provide quality assurance to all data which enters the database, regardless of the data's source. However, some purchased applications enforce integrity on the "client side," using the application software to enforce the relationships. This provides much greater opportunity for violations of the database's relational integrity. Not only could parts of the application fail to adequately check all of the incoming data, but data which is loaded "via the backdoor" (such as during migration of legacy data) could easily compromise the integrity of the database.

It may be possible that some purchased applications cannot be altered to enforce integrity constraints at the server level. Attempting to enforce constraints across the board could have unknown impacts upon the application software. However, it is necessary to ensure and enforce data integrity at some point so any pieces of data which violate the integrity are not passed along to other applications or to end-users.

Therefore, the following specific standards apply to the enforcement of data integrity:

OLTP application database must be periodically validated to ensure that they are enforcing data integrity.

Likewise, whenever data leaves a purchased OLTP application, it must be checked for data integrity. This is regardless of whether it's heading for an ODS, a Data Warehouse, or some sort of reporting database. This referential integrity must be enforced either through thorough scripting or through the activation of database level constraints. This is critically important where the ODS provides data to another OLTP application.

The presence of spaces or "blanks," in the place of nulls, represent a threat to referential integrity and cannot be spread beyond purchased OLTP applications to an ODS, Data Warehouse, or other OLTP application! Spaces/Blanks must be replaced with nulls when data leaves a purchased OLTP which contains spaces/blanks.

OLTP's developed in-house must comply with the null standard.

Spaces/Blanks must also be trimmed from the end of all alphanumeric fields where spaces/blanks are or may be present.

Data Quality
Data Quality answers the question, "How good is our data?" Do we have accurate information about each student, each employee, each alumnus, and each donor? Is the information about classes correct? Are all addresses correct? Are all grades correct? Do we have all information correctly and completely filled in for each person? There are a number of questions we could ask, and the list would vary by database, but the basic questions are "Is the data we filled in correct?" and "Should the data we're missing be filled in?"

Data Quality will be more difficult to enforce than Data Integrity. It will require the involvement of individuals and/or Data Stewards who know the data to check the data's accuracy.

Metadata
Metadata is data which describes our data. Capturing and maintaining metadata is critical to the maintenance of our database applications. Specific standards and recommendations regarding metadata will be set forth in a separate document, but metadata must be captured for all Administrative Computing databases.