The data architecture describes how the data is created, processed, stored, distributed and managed by a business and or its applications. In other words, it should define an end-to-end vision as how the data flows from source to target to users. A documented understanding of the enterprise data architecture is an essential pre-requisite to many common IS and business improvement initiatives.
The data architecture has many uses. It helps to get a handle on data as it is really used by the business, and it is a key artifact if one wants to develop and implement governance supporting a data strategy. It also helps to guide cross-system developments such as Enterprise Application Integration (EAI), common reporting, and data warehousing initiatives.
The data architecture is never complete, and hence care should be taken when developing the framework such that it is scalable and flexible.
Following are the key stages or phases of a good data architecture:
- Organize
- Move
- Store
- Access
- Present
- Organize
Ensure you identify the source for data collection. It is important to actually identify the source systems, breakdown the data into atomic level so that it can be used or integrated to make it meaningful. Consider reworking or reformatting the original data to the future state as required by business. This effort is time consuming depending upon the original data and the new requirement.
Develop a data model – conceptual, logical and physical that identifies existing and new entities, attributes, and relationships. Define metadata and data dictionary.
Move
Identify the method and technology to move data from source to the new target. This involves choosing a tool that will carry out Extraction, Transformation and Load. Develop business rules and a frame work to integrate data. The frame work needs to consider error handling as well. Develop process and methodology to ensure validity of data that is moved to the target.
Store
Identify a database platform that meets business and technology criteria. Create the database based on the physical model. Ensure the database is sized to accommodate the future growth. Pay special attention to performance – data load, retrieve and reporting. Develop a data retention and archival strategy. Develop process to capture data changes and audit the changes.
Develop policies for data management in each business area:
- What data is stored.
- Who is responsible for its collection and quality.
- Who controls it, and who administers it.
- How long it must be stored, and how it will be disposed of or archived afterwards.
- Who may have access to it, and how it should be disclosed to others outside the normal user groups.
Access
Identify the platform as to how the data is accessed – web (intranet, extranet), desktop etc. Develop a security model that identifies the users who would be accessing the data and their rights. Take into consideration of firewalls and other security softwares when data is accessed from external.
Develop a semantic layer that separates business users accessing data directly from the database and incorporate some of the reporting metrics and rules.
Present
Select a suitable presentation tool that satisfies the business needs and that meets the technology challenges. Define presentation layer metrics and layout. Develop a strategy to run the reports. When possible, schedule them to minimize the impact of network traffic and load on the database.