Sunday, February 23, 2014

Data management


The official definition provided by DAMA International, the professional organization for those in the data management profession, is: "Data Resource Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise." {{DAMA International}} This definition is fairly broad and encompasses a number of professions which may not have direct technical contact with lower-level aspects of data management, such as relational database management.
Alternatively, the definition provided in the DAMA Data Management Body of Knowledge (DAMA-DMBOK) is: "Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets."
The concept of "Data Management" arose in the 1980s as technology moved from sequential processing (first cards, then tape) to random access processing. Since it was now technically possible to store a single fact in a single place and access that using random access disk, those suggesting that "Data Management" was more important than "Process Management" used arguments such as "a customer's home address is stored in 75 (or some other large number) places in our computer systems." During this period, random access processing was not competitively fast, so those suggesting "Process Management" was more important than "Data Management" used batch processing time as their primary argument. As applications moved more and more into real-time, interactive applications, it became obvious to most practitioners that both management processes were important. If the data was not well defined, the data would be mis-used in applications. If the process wasn't well defined, it was impossible to meet user needs.




Corporate Data Quality Managemen

Corporate Data Quality Management (CDQM) is, according to the European Foundation for Quality Management and the Competence Center Corporate Data Quality (CC CDQ, University of St. Gallen), the whole set of activities intended to improve corporate data quality (both reactive and preventive). Main premise of CDQM is the business relevance of high-quality corporate data. CDQM comprises with following activity areas
Strategy for Corporate Data Quality: As CDQM is affected by various business drivers and requires involvement of multiple divisions in an organization; it must be considered a company-wide endeavor.
Corporate Data Quality Controlling: Effective CDQM requires compliance with standards, policies, and procedures. Compliance is monitored according to previously defined metrics and performance indicators and reported to stakeholders.
Corporate Data Quality Organization: CDQM requires clear roles and responsibilities for the use of corporate data. The CDQM organization defines tasks and privileges for decision making for CDQM.
Corporate Data Quality Processes and Methods: In order to handle corporate data properly and in a standardized way across the entire organization and to ensure corporate data quality, standard procedures and guidelines must be embedded in company’s daily processes.
Data Architecture for Corporate Data Quality: The data architecture consists of the data object model - which comprises the unambiguous definition and the conceptual model of corporate data - and the data storage and distribution architecture.
Applications for Corporate Data Quality: Software applications support the activities of Corporate Data Quality Management. Their use must be planned, monitored, managed and continuously improved.


In information technology, data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations. Data is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture.

A data architecture should[neutrality is disputed] set data standards for all its data systems as a vision or a model of the eventual interactions between those data systems. Data integration, for example, should be dependent upon data architecture standards since data integration requires data interactions between two or more data systems. A data architecture, in part, describes the data structures used by a business and its computer applications software. Data architectures address data in storage and data in motion; descriptions of data stores, data groups and data items; and mappings of those data artifacts to data qualities, applications, locations etc.
Essential to realizing the target state, Data Architecture describes how data is processed, stored, and utilized in an information system. It provides criteria for data processing operations so as to make it possible to design data flows and also control the flow of data in the system.
The Data Architect is typically responsible for defining the target state, aligning during development and then following up to ensure enhancements are done in the spirit of the original blueprint.
During the definition of the target state, the Data Architecture breaks a subject down to the atomic level and then builds it back up to the desired form. The Data Architect breaks the subject down by going through 3 traditional architectural processes:
Conceptual - represents all business entities.
Logical - represents the logic of how entities are related.
Physical - the realization of the data mechanisms for a specific type of functionality.
The "data" column of the Zachman Framework for enterprise architecture –
Layer View Data (What) Stakeholder
1 Scope/Contextual List of things and architectural standards[3] important to the business Planner
2 Business Model/Conceptual Semantic model or Conceptual/Enterprise Data Model Owner
3 System Model/Logical Enterprise/Logical Data Model Designer
4 Technology Model/Physical Physical Data Model Builder
5 Detailed Representations Actual databases Subcontractor
In this second, broader sense, data architecture includes a complete analysis of the relationships between an organization's functions, available technologies, and data types.
Data architecture should be defined in the planning phase of the design of a new data processing and storage system. The major types and sources of data necessary to support an enterprise should be identified in a manner that is complete, consistent, and understandable. The primary requirement at this stage is to define all of the relevant data entities, not to specify computer hardware items. A data entity is any real or abstracted thing about which an organization or individual wishes to store data.
Physical data architecture[edit]

Physical data architecture of an information system is part of a technology plan. As its name implies, the technology plan is focused on the actual tangible elements to be used in the implementation of the data architecture design. Physical data architecture encompasses database architecture. Database architecture is a schema of the actual database technology that will support the designed data architecture.
Elements of data architecture[edit]

There are certain elements that must be defined as the data architecture schema of an organization is designed. For example, the administrative structure that will be established in order to manage the data resources must be described. Also, the methodologies that will be employed to store the data must be defined. In addition, a description of the database technology to be employed must be generated, as well as a description of the processes that will manipulate the data. It is also important to design interfaces to the data by other systems, as well as a design for the infrastructure that will support common data operations (i.e. emergency procedures, data imports, data backups, external transfers of data).
Without the guidance of a properly implemented data architecture design, common data operations might be implemented in different ways, rendering it difficult to understand and control the flow of data within such systems. This sort of fragmentation is highly undesirable due to the potential increased cost, and the data disconnects involved. These sorts of difficulties may be encountered with rapidly growing enterprises and also enterprises that service different lines of business (e.g. insurance products).
Properly executed, the data architecture phase of information system planning forces an organization to specify and delineate both internal and external information flows. These are patterns that the organization may not have previously taken the time to conceptualize. It is therefore possible at this stage to identify costly information shortfalls, disconnects between departments, and disconnects between organizational systems that may not have been evident before the data architecture analysis.[4]
Constraints and influences[edit]

Various constraints and influences will have an effect on data architecture design. These include enterprise requirements, technology drivers, economics, business policies and data processing needs.
Enterprise requirements
These will generally include such elements as economical and effective system expansion, acceptable performance levels (especially system access speed), transaction reliability, and transparent management of data. In addition, the conversion of raw data such as transaction records and image files into more useful information forms through such features as data warehouses is also a common organizational requirement, since this enables managerial decision making and other organizational processes. One of the architecture techniques is the split between managing transaction data and (master) reference data. Another one is splitting data capture systems from data retrieval systems (as done in a data warehouse).
Technology drivers
These are usually suggested by the completed data architecture and database architecture designs. In addition, some technology drivers will derive from existing organizational integration frameworks and standards, organizational economics, and existing site resources (e.g. previously purchased software licensing).
Economics
These are also important factors that must be considered during the data architecture phase. It is possible that some solutions, while optimal in principle, may not be potential candidates due to their cost. External factors such as the business cycle, interest rates, market conditions, and legal considerations could all have an effect on decisions relevant to data architecture.
Business policies
Business policies that also drive data architecture design include internal organizational policies, rules of regulatory bodies, professional standards, and applicable governmental laws that can vary by applicable agency. These policies and rules will help describe the manner in which enterprise wishes to process their data.
Data processing needs
These include accurate and reproducible transactions performed in high volumes, data warehousing for the support of management information systems (and potential data mining), repetitive periodic reporting, ad hoc reporting, and support of various organizational initiatives as required (i.e. annual budgets, new product development).




Data access



Data  access typically refers to software and activities related to storing,  retrieving, or acting on data housed in a database or other repository.  There are two types of data access, sequential access and random access.
Data Access is simply the authorization you have to access  different data files. Data access can help distinguish the abilities of  Administrators and users. E.g. Admins may be able to remove, edit and  add data, while a general user may not be able as they don’t have the  access to that particular file.

Historically, different methods and languages were required for every repository, including each different database, file system, etc., and many of these repositories stored their content in different and incompatible formats.

In more recent days, standardized languages, methods, and formats, have been created to serve as interfaces between the often proprietary, and always idiosyncratic, specific languages and methods. Such standards include SQL, ODBC, JDBC, XQJ, ADO.NET, XML, X Query, X Path, and Web Services.
Some of these standards enable translation of data from unstructured (such as HTML or free-text files) to structured (such as XML or SOL).




Data quality


Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. Furthermore, apart from these definitions, as data volume increases, the question of internal consistency within data becomes paramount, regardless of fitness for use for any external purpose, e.g. a person's age and birth date may conflict within different parts of a database. The first views can often be in disagreement, even about the same set of data used for the same purpose. This article discusses the concept as it related to business data processing, although of course other data have various quality issues as well.

Definitions

This list is taken from the online book "Data Quality: High-impact Strategies". See also the Glossary of data quality terms 
Degree of excellence exhibited by the data in relation to the portrayal of the actual scenario.
The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.
The totality of features and characteristics of data that bears on their ability to satisfy a given purpose; the sum of the degrees of excellence for factors related to data.
The processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria.
Complete, standards based, consistent, accurate and time stamped.






No comments:

Post a Comment