Have your say
One type of research which in my view is very appealing is associated with time-based queries. Take for example the spread of a disease like Ebola: the appearance of the virus will likely be mentioned in the local media, and thanks to the information contained in the news database it will be possible to reconstruct the evolution over space and time of the epidemic and the history of its propagation. Another important application for companies is to identify the geographical areas where a particular brand is mentioned, more or less often in the newspapers, TV, Web, and other media.
For example, this can help to plan rational advertising, rather than making generalized campaigns that may be more expensive and less effective. Moreover, it is often useful for embassies to collect mentions in local news media concerning the country of origin and return them to their central government. These are all examples of investigations, typically made by human experts, which NewsStand could dramatically accelerate.
A very impressive idea you faced in the course of your research work 14 is the attempt to predict the characteristics of future events, using the information of the recent past and present. Can you tell us more about this topic? It's a good question. Think about the previous example of the disease propagation: I can try to compute the future pattern of distribution of the epidemic from the recent trend.
It is a very general topic: it covers both events which are much discussed in the media but also unexpected events. The applications are obvious, such as for resource planning or to minimize the risks related to natural events which may be catastrophic for example, predicting where a hurricane will strike. It is a natural direction for future research that can have evident benefits for society. Will you work in this direction? More generally, how do you choose your research topics?
And which is your relationship with companies dealing with such data? This is a general problem of research: you can have a good idea, but then you have to convince others. You can not force anyone to work on something that they do not believe in. As I like to say, you can lead a horse to water but you can not make it drink it. The relationship with companies is not easy.
In general they have a short-term vision of what can be done. We do not have much financial support for our students from companies. Also, we must always be aware of the "elephant in the room", the company most famous in the world for what concerns search engines. Most people believe that this company has solved all existing problems related to search and manipulating all data. However, this is not true for all data and topics. There still remain many topics of research and important open problems to be tackled and solved in the next years. While "classical" realational databases rely on a 2D traditional row and column structure, multidimensional databases result from using a hypercube representation, where each dimension is associated with an object attribute.
As an illustration, consider the example below consisting of the ratings of three leading Italian red wines presented using both a relational Table 1 and a multidimensional database Figure 1.
- Recommended for you.
- Berufliche Weiterbildung - Eine Einführung (German Edition).
- Supplemental Content.
Let us now suppose that we want to add more data to our red wine database, say the ratings of these wines for the years and In the case of a relational database, a simple way to do this is to add 3 new rows to Table 1 for each of the two additional years. Alternatively, we can represent the same information using the multidimensional database shown in Figure From now on, we no longer dwell on the difference between relational and multidimensional databases often refered to as models as either one or the other model can be used for representing spatial data - the actual object of our interest -, by simply setting the appropriate attributes.
Independently of the model adopted, the representation of spatial data has a well-known problem: indexes commonly used for query and look up in non spatial databases are not well-suited for manipulation and search of spatial datasets. In fact, while traditional indexes do a great job when simple retrieval of data is needed, they have serious drawbacks with spatial queries: one of the most important being that they do not preserve proximity This representation shows a major issue in practical queries with spatial data which is that it only identifies the space spanned by the data objects e.
For example, it is not practical for the database to store the names of all of the roads that pass through each point in the underlying space, or equivalently to store the name of every road that crosses every river. Such considerations motivated researchers to introduce new index types, complying with proximity preservation, and often based on suitable mathematical structures like trees to speed up access to the underlying data. The obvious idea is to separate the non-spatial components, which can be handled in a standard way, from the spatial components which deserve special indexing treatment.
A comprehensive introduction and analysis of the different spatial index types may be found in the previously mentioned book of Hanan Samet. Here we limit ourselves to summarizing the main features of the most important families of such methods, without going into too much detail. The basic idea is to decompose the space to which the data belong into regions called buckets.
There are several ways to do that. Object hierarchies with minimum bounding rectangles MBRs In this case we aggregate the group of objects into collections of distinct subgroups of a finite size which are recursively aggregated so the the result is a tree. The fact that the objects can be arbitrarily-shaped means that operations such as point lookup i.
This approach is primarily used with rectangle minimum bounding boxes known as MBR s , and here we restrict ourselves to the two-dimensional case. The result is a non-disjoint decomposition of the underlying space as the MBRs can overlap. This means that the area spanned by a single object e. An example of the space decomposition induced by an R-tree for a piecewise linear curve is shown in Figure 3.
Critical Debates and Reviews
Figure 4 is the tree structure corresponding to the R-tree of Figure 3. Notice the presence of overlapping MBRs in Figure 3. This means that when, for example, we want to determine which line segment contains a particular point, we may have to search the entire underlying space as the point may lie in several MBRs even though the line segment object on which it lies is associated with only one MBR. In other words, we must examine all of the MBRs in which the point lies. Decomposition of the embedding space In this case, we focus on methods that decompose the space in which the objects are embedded and hence also the objects into disjoint non-overlapping cells.
“More things in heaven and earth”: Materiality and the Stage
There are many variants of such methods. One such method starts with an object hierarchy that employs MBRs such as the R-tree discussed in the previous section and then decomposes the MBRs into disjoint cells so that they span the space that contains the objects.
As objects are split into several parts i. However, this enables the more efficient performance of many operations including those that find the object that contains a particular point, as there is no need to traverse the entire object collection. Figure 5 is the disjoint cell analog of the piecewise linear curve of Figure 3. Comparing Figure 6 with Figure 4 we see that some of the objects appear more than once in Figure 6 while this is never the case in Figure 4.
The quadtree family of representations is a very different example of the methods that decompose the embedding space. In this case the embedding space is usually decomposed recursively into 4 congruent cells assuming without loss of generality that the underlying space is two-dimensional until some property of the relationship between the objects and cells is satisfied.
For example, one stopping condition in the case of two-dimensional region data is that each cell is a member of at most just one region. In this case, the quadtree is an alternative to the bitmap representation as shown in Figure 7. More generally, quadtrees can be used to represent various types of spatial data. One common use is to represent a collection of point data e. Figure 8 is an example of the decomposition of the underlying space induced by such a quadtree for a collection of European cities with a bucket capacity of 1.
- Things in Heaven and Earth - Signature Books?
- Mexico, Aztec, Spanish and Republican Vol. 1 of 2 A Historical, Geographical, Political, Statistical and Social Account of That Country From the Period ... War and Notices of New Mexico and California.
- There Are More Things in Heaven and Earth, Horatio, Than Are Dreamt of in Our Philosophy.?
This variant of the quadtree is known as a PR quadtree where P and R denote point and region, respectively. In this case, the leaf nodes are either empty or contain just one point object along with the coordinate values of the point not shown in the figure. Figure 9 is the tree representation of the quadtree structure for the collection of European cities in Figure 8. The leaf nodes are represented by either white square empty nodes or black ones nodes containing data. A major drawback of this implementation is that points in close proximity may result in deep trees e.
Quadtrees are especially well-suited for location-based queries which is the case that we are given a location and that we want to know what features or objects are associated with it. There are considerably many more methods for indexing spatial data that space constraints prevent us from reviewing.
Among them is the pyramid, whose internal nodes contain a summary of the information in the nodes below them, which allows us to skip nodes with no relevant information when performing a search query. The pyramid performs well with feature-based queries which correspond to the case that we are given a feature or an object and we seek to know which locations are associated with it also known as spatial data mining Other interesting methods include octrees, partition fieldtrees, hierarchical decompositions by triangles or rectangles, etc.
For a detailed review of these techniques we refer to the already mentioned book of Hanan Samet 7 and the associated spatial data structure applets The reference in the title is so celebrated that we need add no further details. Instead, we take the chance to remember a former colleague and friend who left us a few days ago. This focus is a homage to Gianfranco Frau Cagliari , Cagliari Actually a relational database may consist of multiple linked tables which, along with a suitable manipulation system RDBMS , should comply with strictly defined rules, described in Codd EF, A relational model of data for large shared data banks, Communications of the ACM, 13 , , and subsequent technical papers.
The latter are based on hypercube representation, where all similar information is grouped along the same axis, making search operations easier. See the Technical Corner for more details.
Hamlet by William Shakespeare: Act 1. Scene V
Oliver AC, Which freaking database should I use? We also wish to thank Hanan for the effort he has devoted to improving this focus, including the English style and grammar. It is important to stress the difference between multidimensional databases as opposed to relational ones, see the Technical Corner and multidimensional data.