FANDOM


INF-3701 Eksamensoppgaver

V2010Edit

Task 1Edit

a.

Describe the problem dimensions of Information system integration; distribution, heterogeneity and authority. Illustrate.

What does it mean that components of a distributed system are autonomous?
What does it mean that components of a distributed system are heterogeneous?


Systems integration is when participating systems are assimilated into a larger whole.

The problem dimensions related to information system integration are distribution, heterogeneity and autonomy.

A Distributed system is a collection of independent computers communicating through a computer network, acting like a single system.

Some of the problems related to distribution are data access of the network, inconsistent data replication and availability of data. Distribution can be “hidden” through the use of proxy servers and remote procedure calls (RPCs).

A Heterogeneous system is where two or more participant of a system either runs on different operating systems, utilizes data models, different management systems etc.

Issues within heterogeneous systems is differing data models, schemas and different types of data. Solutions to these problems are mapping, schema integrations and well-defined standards are useful for defining the meaning of information to be shared among dissimilar organizations. Wrappers that provide unified interfaces are an established technique for integrating legacy systems.

A fully autonomous system is a self-governing system, where the system is isolated and only follows its own policies.


Reducing autonomy is highly limited. The fact that these systems are architecturally or structurally different makes it difficult to solve issues while keeping the autonomous nature of the system. One solution, like with multi-database systems, is to use some kind of mediator (or middleware) on top of the collection of autonomous systems.

b.

In a Distributed system, integration of autonomous and heterogeneous source systems (for instance database systems) is needed.

During integration, a number of different integration conflicts can occur.

Describe at least two different integration conflicts, and possible solutions to the problems.



The different levels of conflicts related to interoperability, schema level and data level. The schema level conflicts are naming, structural, constraint and classification and data level conflicts are identification, representational and data errors.

Within naming conflicts you have homonyms where the same name is used for different people. Introducing prefixes to distinguish the names can solve this. Synonyms are when different names are used for the same concept, and introducing mappings to a common name could solve this. When it comes to structural conflicts you can have different, non-corresponding attributes where the solution could be to create a relation with the union of the attributes. A mapping function can solve the issue with differing data types. Different data model constructs could be integrating object oriented (attribute) data models vs. relational data models, and would require higher order mappings to be resolved.

c.

An Integrated system can also have components that are mobile.

Describe some specific data management problems/challenges that are consequences of having mobile devices in such an environment.

Describe solutions to the problems.



When utilizing mobile components within integrated systems there are major issues concerning data consistency and availability. Solving these issues could be using short-lived, ACID, transactions where there are strict rules for timeouts and broken connections.

Task 2Edit

a. Describe what semantic interoperability is.

Semantic interoperability is the ability to exchange meaningful information and produce useful results – what is sent is the same are what is understood.

b. '''What are the main challenges related to providing semantic interoperability?

There are challenges in the translation of concepts, relationships, and the actual meanings of information between heterogeneous systems, such as different languages and different areas/ fields the concept originates.

c. '''Describe what ontology is and how it can support semantic interoperability.

Ontologies are constructed as a class hierarchy containing concepts and relationships between concepts; they are used to describe the semantics of data.

By defining a common standard for describing information, ontologies enable applications to exchange data on a semantic level. They are represented through ontology language (like OWL).

d. '''Describe what RDF, RDF schema and OWL are.

Resource Description Frameworks (RDFs) are collections of languages for representing information about resources. A triplet in RDF is made out of the three components subject, predicate, and object. The subject is the resource the statement describes. The predicate is a specific property that the statement describes. The object is the value that is set to the property that the statement describes. An example of this could be this paper and who has created it. In an RDF triplet, the statement would have the subject “Name of Paper’, the predicate “creator‟, and the object “Erlend Høgset‟. This is often represented through nodes and arcs in a graph as shown in the figure below.

RDF Schemas defines vocabulary for RDF and organizes this vocabulary in a typed hierarchy. It is a rich, web-based publication format for declaring semantics, capable of explicitly declaring semantic relations between vocabulary terms.

Web Ontology Language (OWL) extends the vocabulary of RDFS and adds axioms. It is intended to provide a language that can be used to describe the classes and relations between them that are inherent in Web documents and applications. The OWL language provides three increasingly expressive sublanguages.

e. 

OWL and RDFS use the same data model, but OWL extends the vocabulary and adds axioms.

Task 3Edit

a.

A data warehouse (DW) provides information for analytical processing, decision making- and data mining tools. A DW collects data from multiple heterogeneous operational source systems (OLTP - On-Line Transactional Processing) and stores summarized integrated business data in a central repository used by analytical applications (OLAP - On-Line Analytical Processing) with different user requirements. The data area of a data warehouse usually stores the complete history of a business.

INSERT SKETCH

Data warehouses are populated through a process called Extract, Transform and Load (ETL)

Extract

In this step of the ETL process, the warehouse obtains a snapshot of the source data.

Data from the sources of the data warehouse generally stores its own data in different forms, and their metadata is hetereogenous. The data is then converted to a single managable data format used in the warehouse.

Transform

This step is responsible for:

▪ cleaning the data, removing errors and inconsistent data.

▪ All syntactics are corrected, e.g. attribute names that differ in name, but not in meaning: SNN vs. Ssnum.

▪ Attribute domains: e.g. Integer vs. String.

▪ Sorting data

▪ etc.

In other words: This is where all the data is transformed into data representable, and interoperable with all the other data from the different sources.

Load

Here the transformed data is placed into the warehouse, and indexes are created

Data warehouses are typically used for analytic processing based on vast amounts of historical data. Information from data warehouses is typically read-only, and the system is designed to achieve high performance in when processing large amounts of data.

b.

The Cube-model is a N-dimensional table for storing data were each dimension is represents a table dimension. A Two-dimensional cube is a standard table. An example of a 3D-cube can be a table of data changing over time, with time being the 3rd dimension. The cube will then be a series of 2D-tables creating a 3D-cube.

Operations:[1]

Pivot:

▪ Rotate the cube

▪ Example: Change the perspective from "Region X Product" to "Region X Time"

Slice:

▪ Cut through the cube, so that users can focus on some specific perspectives

▪ Example: only focus on a specific customer

Dice:

▪ Get one cell from the cube (the smallest slice)

Example: Get the production volume of Armonk, for CellPhone 1001, in January

c.

The reason why data warehouses use different data models than the underlying source systems is that the DW is used for OLAP while the sources are updated and maintained more often because of their OLTP-needs.

V2009Edit

Task 1Edit

a.

Semantic interoperability is the ability to exchange meaningful information and produce useful results – what is sent is the same are what is understood. Both sides must refer to a common information exchange reference model. Syntactic interoperability however, is the ability to communicate and exchange data. Within syntactic interoperability specified data formats and communication protocols are needed.

You cannot achieve semantic interoperability without already having the support for syntactic interoperability. Syntactic Interoperability is easier to achieve, but in many scenarios the ability to agree upon semantics is needed.

b.

Metadata is a layer of data abstraction that provides information about one or more aspects of the data it is describing. While the frequently used term „data about data‟ satisfies the notion that is metadata, it is somewhat inaccurate because metadata tends to be categorized into different types.

An early interpretation of these types was done by Bretheron & Singley (1994), which separated metadata into two distinct types: structural/control metadata and descriptive/guide metadata.

While descriptive metadata describes the human relatable content inside documents or files, structural metadata describes the formats and structure of data containers such as tables, columns and indexes. The view on metadata varies based upon fields of application, but the concept remains the same

c.

Metadata describes the structure and context of data, and can be used to create mappings between data formats and data models.

Descriptive metadata helps support for semantically interoperability, while syntactic/structural interoperability can be achieved through structural metadata.

d and e.

Ontologies are constructed as a class hierarchy containing concepts and relationships between concepts; they are used to describe the semantics of data.

By defining a common standard for describing information, ontologies enable applications to exchange data on a semantic level. They are represented through ontology language (like OWL).

Task 2Edit

a.

Transactions in a distributed system is more complex since there is more computers or systems that need to do the same operation, a solution to this is 2PC (two-phase-commit), which is illustrated on the image below.

b.

If parts of the system are autonomous problems will occur since the database retains its autonomy and might support local (non-distributed) transactions. This is known as multi-database system (or if partially autonomous, a federated system). Image illustrates this below.

Problems that arises now is that the GTM (Global transaction manager) may require a commit from the local system, which may already have a transaction running, even though the GTM sees that there is no transactions that is running (the GTM only sees the global state). This is known as Global Atomicity Problem. Other problems related to multidatabases is Global deadlock (If a deadlock happends the GTM may not know about which may make the whole system stall) and Global Serializability (Since the GTM only can see the global state it can seem for the GTM that the transaction is serializable, but the LTM may have other transaction pending, which can make the transaction non-serializable).

Solution to this can be Semantic atomicity (Allows sub-transactions to commit independently and uses compensation for recovery (inverting the transaction that made failure)), and Local 'Serializability.

Task 3Edit

a.

The web services architecture is an interoperability architecture defined as a software system designed to support interoperable machine-to-machine interactions over a network. Its interface is described in a machine-processable format (WSDL). When interacting with web services, other systems utilize SOAP, typically through HTTP with XML structured messages. It consists of loosely coupled, reusable components and separates a capability from its user interface.

Web services are intended to be accesses by other applications or web services in other autonomous systems. The service may consist of everything from simple operations like checking bank account balances online to complex business processes running enterprise resource planning systems. Because of the fact that web services are based on open standards like HTTP, XML, SOAP and WSDL they are implicitly hardware, platform and programming language independent

DETAILS ABOUT THE TECHNOLOGIES WILL BE ADDED LATER.

b.

The Semantic Web is a vision of the World Wide Web where computers can automatically interpret and understand information without human direction. This would enable automated software agents to intelligently browse the web and locate information.

DETAILS ABOUT THE TECHNOLOGIES AND MORE INFO WILL BE ADDED LATER

c.

SOA are architectures combining loosely coupled services to gain flexibility (i.e. remove, add, change components). The main goal of SOAs are to make components highly reusable, and modular. W3C defines it as “A set of components which can be invoked, and whose interface descriptions can be published and discovered (W3C)”.


An SOA provides a flexible architecture that unifies business processes by modularizing large applications into services. A client from any device, using any operating system, in any programing language, can access an SOA service to create a new business process. An SOA creates a collection of services that can communicate with each other using service interfaces to pass messages from one service to another, or coordinating an activity between one or more services.


In an SOA, software resources are packaged as “services”, which are well defined, self-contained modules that provide standard business functionality and are independent of the state or context of other services. Services are described in a standard definition language, have a published interface, and communicate with each other requesting execution of their operations in order to collectively support a common business task or process.

V2007Edit

Task 2Edit

a.

The web services architecture is an interoperability architecture defined as a software system designed to support interoperable machine-to-machine interactions over a network. Its interface is described in a machine processable format (WSDL). When interacting with web services, other systems utilize SOAP, typically through HTTP with XML structured messages. It consists of loosely coupled, reusable components and separates a capability from its user interface.

b.

When composing web services, there are several issues and critical decisions that have to be made. Firstly, what architecture is to be used, and what other web services are going to interoperate with your WS.

Secondly, the coordination of web services needs to be handled by a WS-coordinator, and the transaction protocol for this coordinator needs to be chosen. For example, if your web service is to handle long lived transactions, the WS-BusinessActivity protocol would be suitable. On the other hand, with short-lived transactions the WS-AtomicTransactions protocol is preferable (a combination is also possible). You also need to consider which domains your web service is to be part of, and who is to be granted access to you WS. On a more detailed level, an appropriate data model and structure has to be chosen.

When it comes to the three problem dimensions, distribution can be hidden and made transparent through the use of proxies and RPCs. Issues concerning heterogeneity and autonomy could occur, and be solved through mediators or other kinds of middleware.

c.

WS-Coordination defines a generic framework for applications to identify related operations across web services. In other words, it provides transactional characteristics for web services. The current goal of coordination is to support termination

d.

Protocols for Atomic Transactions (WS-AtomicTransaction)[2]

The protocols for atomic transactions typically handle activities that are short lived. Atomic transactions are provided through a two-phase commitment protocol. The transaction scope states that all work is either successfully completed in its entirety, or not at all. This means that if an activity is successful, all changes are made permanent. Alternatively, if the activity fails to complete successfully, none of the changes are made permanent.

Protocols for Business Transactions (WS-BusinessActivity)

The protocols for business transactions handle long-lived activities. These differ from atomic transactions in that they take much longer to complete. Also, to minimize latency of access by other users to the resources, the result of interim operations need to become visible to others before the overall activity is completed. Because of this, mechanisms for fault and compensation handling are introduced (such as compensation and reconciliation) to reverse the effects of tasks previously completed within a business activity.

The protocols can be used in combination. For example short-running atomic transactions can be part of a long-running business activity. The actions of the embedded atomic transactions might be committed and made visible before the long-running business activity completes.

Task 3Edit

a.

A data warehouse (DW) provides information for analytical processing, decision making- and data mining tools. A DW collects data from multiple heterogeneous operational source systems (OLTP - On-Line Transactional Processing) and stores summarized integrated business data in a central repository used by analytical applications (OLAP - On-Line Analytical Processing) with different user requirements. The data area of a data warehouse usually stores the complete history of a business.

INSERT SKETCH

A data warehouse retrieves its data from several source database systems periodically (possibly autonomous and heterogeneous), while a traditional centralized database system is a database system that holds all its data in a single location. A data warehouse is configured for OLAP (Online Analytical Processing) and uses a multidimensional data model (cube model).

The main use of data warehouses is related to analyzing large amounts of data (read-only), whilst centralized databases are frequently updated through data insertions.

Data warehouse vs. multidatabase system: data warehouse houses the actual data, while a multidatabase system will retrieve data when the query is executed.

A multidatabase system is a collection of fully autonomous database systems acting as a single database through a mediator front-end. Multidatabase systems are used for OLTP (Online Transaction Processing).

b.

Data warehouses are populated through a process called Extract, Transform and Load (ETL)

Extract

In this step of the ETL process, the warehouse obtains a snapshot of the source data.

Data from the sources of the data warehouse generally stores its own data in different forms, and their metadata is hetereogenous. In this step of ETL, the data is then converted to a single managable data format used in the warehouse.

Transform

This step is responsible for:

▪ cleaning the data, removing errors and inconsistent data.

▪ All syntactics are corrected, e.g. attribute names that differ in name, but not in meaning: SNN vs. Ssnum.

▪ Attribute domains: e.g. Integer vs. String.

▪ Sorting data

▪ etc.

In other words: This is where all the data is transformed into data representable, and interoperable with all the other data from the different sources.

Load

Here the transformed data is placed into the warehouse, and indexes are created

Data warehouses are typically used for analytic processing based on vast amounts of historical data. Information from data warehouses is typically read-only, and the system is designed to achieve high performance in when processing large amounts of data.

b.

The Cube-model is a N-dimensional table for storing data were each dimension is represents a table dimension. A Two-dimensional cube is a standard table. An example of a 3D-cube can be a table of data changing over time, with time being the 3rd dimension. The cube will then be a series of 2D-tables creating a 3D-cube.

Operations:[3]

Pivot:

▪ Rotate the cube

▪ Example: Change the perspective from "Region X Product" to "Region X Time"

Slice:

▪ Cut through the cube, so that users can focus on some specific perspectives

▪ Example: only focus on a specific customer

Dice:

▪ Get one cell from the cube (the smallest slice)

Example: Get the production volume of Armonk, for CellPhone 1001, in January

c.

The reason why data warehouses use different data models than the underlying source systems is that the DW is used for OLAP while the sources are updated and maintained more often because of their OLTP-needs.