Login   |   ITaP Home > Business Technology > IRM > Data Standards

 

Data Integrity

This term addresses data issues in the environment of relational and distributed databases. This has less to do with the content of data within a database as it does with how that data "relates" with data elsewhere in the same database or in another database. It is intertwined with Data Quality.

In a simple example, if a Student's gender is listed as "FEMALE," then "FEMALE" should be one of the allowable values for gender. No value for gender should be allowed that is not in the list of acceptable values.

In a more important example, if a contributor wishes to give money to a specific account, that account must be contained within the list of acceptable accounts. In an example between databases, if a major contributor's child enrolls at the university, the individual's listed as the student's parents in the Student database should be easily matchable to the individuals in the Donor database.

The surest way of enforcing relational integrity is to apply constraints at the server. These constraints will provide quality assurance to all data which enters the database, regardless of the data's source. However, some purchased applications enforce integrity on the "client side," using the application software to enforce the relationships. This provides much greater opportunity for violations of the database's relational integrity. Not only could parts of the application fail to adequately check all of the incoming data, but data which is loaded "via the backdoor" (such as during migration of legacy data) could easily compromise the integrity of the database.

It may be possible that some purchased applications cannot be altered to enforce integrity constraints at the server level. Attempting to enforce constraints across the board could have unknown impacts upon the application software. However, it is necessary to ensure and enforce data integrity at some point so any pieces of data which violate the integrity are not passed along to other applications or to end-users.

Therefore, the following specific standards apply to the enforcement of data integrity:

  • OLTP application database must be periodically validated to ensure that they are enforcing data integrity.
  • Likewise, whenever data leaves a purchased OLTP application, it must be checked for data integrity. This is regardless of whether it's heading for an ODS, a Data Warehouse, or some sort of reporting database. This referential integrity must be enforced either through thorough scripting or through the activation of database level constraints. This is critically important where the ODS provides data to another OLTP application.
  • The presence of spaces or "blanks," in the place of nulls, represent a threat to referential integrity and cannot be spread beyond purchased OLTP applications to an ODS, Data Warehouse, or other OLTP application! Spaces/Blanks must be replaced with nulls when data leaves a purchased OLTP which contains spaces/blanks.
  • OLTP's developed in-house must comply with the null standard.
  • Spaces/Blanks must also be trimmed from the end of all alphanumeric fields where spaces/blanks are or may be present.