This chapter explains, at a high level, why and how to use content-based retrieval. It covers the following topics:
Overview and benefits of content-based retrieval (see Section 6.1)
How content-based retrieval works, including definitions and explanations of the visual attributes (color, texture, shape, location) and why you might emphasize specific attributes in certain situations (see Section 6.2)
Image matching using a specified comparison image, including comparing how the weights of visual attributes determine the degree of similarity between images (see Section 6.3)
Use of indexing to improve search and retrieval performance (see Section 6.4)
Image preparation or selection to maximize the usefulness of comparisons (see Section 6.5)
Inexpensive image-capture and storage technologies have allowed massive collections of digital images to be created. However, as an image database grows, the difficulty of finding relevant images increases. Two general approaches to this problem have been developed. Both use metadata for image retrieval.
Using information manually entered or included in the table design, such as titles, descriptive keywords from a limited vocabulary, and predetermined classification schemes
Using automated image feature extraction and object recognition to classify image content -- that is, using capabilities unique to content-based retrieval
With Oracle interMedia ("interMedia"), you can combine both approaches in designing a table to accommodate images: use traditional text columns to describe the semantic significance of the image (for example, that the pictured automobile won a particular award, or that its engine has six or eight cylinders), and use the ORDImageSignature type to permit content-based queries based on intrinsic attributes of the image (for example, how closely its color and shape match a picture of a specific automobile).
As an alternative to defining image-related attributes in columns separate from the image, a database designer could create a specialized composite data type that combines an ORDImage object and the appropriate text, numeric, and date attributes.
The primary benefit of using content-based retrieval is reduced time and effort required to obtain image-based information. With frequent adding and updating of images in massive databases, it is often not practical to require manual entry of all attributes that might be needed for queries, and content-based retrieval provides increased flexibility and practical value. It is also useful in providing the ability to query on attributes such as texture or shape that are difficult to represent using keywords.
Examples of database applications where content-based retrieval is useful -- where the query is semantically of the form, "find objects that look like this one" -- include:
Trademarks, copyrights, and logos
Art galleries and museums
Fashion and fabric design
Interior design or decorating
For example, a Web-based interface to a retail clothing catalog might allow users to search by traditional categories (such as style or price range) and also by image properties (such as color or texture). Thus, a user might ask for formal shirts in a particular price range that are off-white with pin stripes. Similarly, fashion designers could use a database with images of fabric swatches, designs, concept sketches, and finished garments to facilitate their creative processes.
A content-based retrieval system processes the information contained in image data and creates an abstraction of its content in terms of visual attributes. Any query operations deal solely with this abstraction rather than with the image itself. Thus, every image inserted into the database is analyzed, and a compact representation of its content is stored in a feature vector, or signature.
The signature for the image in Figure 6-1 is extracted by segmenting the image into regions based on color as shown in Figure 6-2. Each region has associated with it color, texture, and shape information. The signature contains this region-based information along with global color, texture, and shape information to represent these attributes for the entire image. In Figure 6-2, there are a total of 55 shapes (patches of connected pixels with similar color) in this segmented image. In addition, there is also a "background" shape, which consists of small disjoint dark patches. These tiny patches (usually having distinct colors) do not belong to any of their adjacent shapes and are all classified into a single "background" shape. This background shape is also taken into consideration for image retrieval.
Images are matched based on the color, texture, and shape attributes. The positions of these visual attributes in the image are represented by location. Location by itself is not a meaningful search parameter, but in conjunction with one of the three visual attributes represents a search where the visual attribute and the location of that visual attribute are both important.
The signature contains information about the following visual attributes:
Texture represents the low-level patterns and textures within the image, such as graininess or smoothness. Unlike shape, texture is very sensitive to features that appear with great frequency in the image.
Location represents the positions of the shape components, color, and texture components. For example, the color blue could be located in the top half of the image. A certain texture could be located in the bottom right corner of the image.
Feature data for all these visual attributes is stored in the signature, whose size typically ranges from 3000 to 4000 bytes. For better performance with large image databases, you can create an index based on the signatures of your images. See Section 6.4 for more information about indexing.
Images in the database can be retrieved by matching them with a comparison image. The comparison image can be any image inside or outside the current database, a sketch, an algorithmically generated image, and so forth.
The matching process requires that signatures be generated for the comparison image and each image to be compared with it. Images are seldom identical, and therefore matching is based on a similarity-measuring function for the visual attributes and a set of weights for each attribute. The score is the relative distance between two images being compared. The score for each attribute is used to determine the degree of similarity when images are compared, with a smaller distance reflecting a closer match, as explained in Section 6.3.3.
Color reflects the distribution of colors within the entire image.
Color and location specified together reflect the color distributions and where they occur in the image. To illustrate the relationship between color and location, consider Figure 6-3.
Image 1 on the left and Image 2 on the right are the same size and are filled with solid colors. In Image 1, the top left quarter (25%) is red, the bottom left quarter (25%) is blue, and the right half (50%) is yellow. In Image 2, the top right quarter (25%) is red, the bottom right quarter (25%) is blue, and the left half (50%) is yellow.
If the two images are compared first solely on color and then color and location, the following are the similarity results:
Color: complete similarity (score = 0.0), because each color (red, blue, yellow) occupies the same percentage of the total image in each one
Color and location: no similarity (score = 100), because there is no overlap in the placement of any of the colors between the two images
Thus, if you need to select images based on the dominant color or colors (for example, to find apartments with blue interiors), give greater relative weight to color. If you need to find images with common colors in common locations (for example, red dominant in the upper portion to find sunsets), give greater relative weight to location and color together.
Texture reflects the texture of the entire image. Texture is most useful for full images of textures, such as catalogs of wood grains, marble, sand, or stones. These images are generally hard to categorize using keywords alone because our vocabulary for textures is limited. Texture can be used effectively alone (without color) for pure textures, but also with a little bit of color for some kinds of textures, like wood or fabrics. Figure 6-6 shows two similar fabric samples.
Texture and location specified together compare texture and location of the textured regions in the image.
Shape represents the shapes that appear in the image. Shapes are determined by identifying regions of uniform color.
Shape is useful to capture objects such as horizon lines in landscapes, rectangular shapes in buildings, and organic shapes such as trees. Shape is very useful for querying on simple shapes (like circles, polygons, or diagonal lines) especially when the query image is drawn by hand. Figure 6-7 shows two images very close in shape.
Shape and location specified together compare shapes and location of the shapes in the images.
When you match images, you assign an importance measure, or weight, to each of the visual attributes, and interMedia calculates a similarity measure for each visual attribute.
Each weight value reflects how sensitive the matching process for a given attribute should be to the degree of similarity or dissimilarity between two images. For example, if you want color to be completely ignored in matching, assign a weight of 0.0 to color; in this case, any similarity or difference between the color of the two images is totally irrelevant in matching. On the other hand, if color is extremely important, assign it a weight greater than any of the other attributes; this will cause any similarity or dissimilarity between the two images with respect to color to contribute greatly to whether or not the two images match.
Weight values can be between 0.0 and 1.0. During processing, the values are normalized such that they total 1.0. The weight of at least one of the color, texture, or shape attributes must be set to greater than zero. See Section 6.3.3 for details of the calculation.
The similarity measure for each visual attribute is calculated as the score or distance between the two images with respect to that attribute. The score can range from 0.00 (no difference) to 100.0 (maximum possible difference). Thus, the more similar two images are with respect to a visual attribute, the smaller the score will be for that attribute.
As an example of how distance is determined, assume that the dots in Figure 6-8 represent scores for three images with respect to two visual attributes, such as color and shape, plotted along the x-axis and y-axis of a graph.
For matching, assume Image 1 is the comparison image, and Image 2 and Image 3 are each being compared with Image 1. With respect to the color attribute plotted on the x-axis, the distance between Image 1 and Image 2 is relatively small (for example, 15), whereas the distance between Image 1 and Image 3 is much greater (for example, 75). If the color attribute is given more weight, then the fact that the two distance values differ by a great deal will probably be very important in determining whether or not Image 2 and Image 3 match Image 1. However, if color is minimized and the shape attribute is emphasized instead, then Image 3 will match Image 1 better than Image 2 matches Image 1.
In Section 6.3.2, Figure 6-8 showed a graph of only two of the attributes that interMedia can consider. In reality, when images are matched, the degree of similarity depends on a weighted sum reflecting the weight and distance of all three of the visual attributes in conjunction with location of the comparison image and the test image.
For example, assume that for the comparison image (Image 1) and one of the images being tested for matching (Image 2), Table 6-1 lists the relative distances between the two images for each attribute. Note that you would never see these individual numbers unless you computed three separate scores, each time highlighting one attribute and setting the others to zero. For simplicity, the three attributes are not considered in conjunction with location in this example.
In this example, the two images are most similar with respect to texture (distance = 5) and most different with respect to shape (distance = 50), as shown in Table 6-1.
Assume that for the matching process, the following weights have been assigned to each visual attribute:
Color = 0.7
Texture = 0.2
Shape = 0.1
The weights are supplied in the range of 0.0 to 1.0. Within this range, a weight of 1 indicates the strongest emphasis, and a weight of 0 means the attribute should be ignored. The values you supply are automatically normalized such that the weights total 1.0, still maintaining the ratios you have supplied. In this example, the weights were specified such that normalization was not necessary.
The following formula is used to calculate the weighted sum of the distances, which is used to determine the degree of similarity between two images:
weighted_sum = color_weight * color_distance + texture_weight * texture_distance + shape_weight * shape_distance+
The degree of similarity between two images in this case is computed as:
0.7*c_distance + 0.2*tex_distance + 0.1*shape_distance
Using the supplied values, this becomes:
(0.7*15 + 0.2*5 + 0.1*50) = (10.5 + 1.0 + 5.0) = 16.5
To illustrate the effect of different weights in this case, assume that the weights for color and shape were reversed. In this case, the degree of similarity between two images is computed as:
0.1*c_distance +0.2*tex_distance + 0.7*shape_distance
(0.1*15 + 0.2*5 + 0.7*50) = (1.5 + 1.0 + 35.0) = 37.5
In this second case, the images are considered to be less similar than in the first case, because the overall score (37.5) is greater than in the first case (16.5). Whether or not the two images are considered matching depends on the threshold value (explained in Section 6.3.4). If the weighted sum is less than or equal to the threshold, the images match; if the weighted sum is greater than the threshold, the images do not match.
In these two cases, the correct weight assignments depend on what you are looking for in the images. If color is extremely important, then the first set of weights is a better choice than the second set of weights, because the first set of weights grants greater significance to the disparity between these two specific images with respect to color. The two images differ greatly in shape (50) but that difference contributes less to the final score because the weight assigned to the attribute shape is low. With the second set of weights, the images have a higher score when shape is assigned a higher weight and the images are less similar with respect to shape than with respect to color.
When you match images, you assign a threshold value. If the weighted sum of the distances for the visual attributes is less than or equal to the threshold, the images match; if the weighted sum is greater than the threshold, the images do not match.
Using the examples in Section 6.3.3, if you assign a threshold of 20, the images do not match when the weighted sum is 37.5, but they do match when the weighted sum is 16.5. If the threshold is 10, the images do not match in either case; and if the threshold is 37.5 or greater, the images match in both cases.
The following example shows a cursor (getphotos) that selects the
product_photo columns from the
online_media table where the threshold value is 20 for comparing photographs with a comparison image:
CURSOR getphotos IS SELECT product_id, product_photo FROM online_media WHERE ORDSYS.IMGSimilar(photo_sig, comparison_sig, 'color="0.4", texture="0.10", shape="0.3", location="0.2"', 20)=1;
Before the cursor executes, the generateSignature( ) method must be called to compute the signature of the comparison image (comparison_sig), and to compute signatures for each image in the table. See Oracle interMedia Reference for a description of all the operators, including IMGSimilar and IMGScore.
The number of matches returned generally increases as the threshold increases. Setting the threshold to 100 would return all images as matches. Such a result, of course, defeats the purpose of content-based retrieval. If your images are all very similar, you may find that even a threshold of 50 returns too many (or all) images as matches. Through trial and error, adjust the threshold to an appropriate value for your application.
You will probably want to experiment with different weights for the visual attributes and different threshold values, to see which combinations retrieve the kinds and approximate number of matches you want.
A domain index, or extensible index, is an approach for supporting complex data objects. Oracle Database and interMedia cooperate to define, build, and maintain an index for image data. This index is of type ORDImageIndex. Once it is created, the index automatically updates itself every time an image is inserted, updated, or removed from the database table. The index is created, managed, and accessed by the index type routines.
For better performance with large image databases, you should always create and use an index for searching through the image signatures. The default search model compares the signature of the query image to the signatures of all images stored in the database. This works well for simple queries against a few images such as, "Does this picture of an automobile match the image stored with the client's insurance records?" However, if you want to compare that image with thousands or millions of images to determine if the images match, then a linear search through the database would be impractical. In this case, an index based on the image signatures would greatly improve performance.
Assume you are using the
online_media table from the
Product Media schema.
Process each image using the generateSignature( ) method to generate the signatures.
DECLARE t_image ORDSYS.ORDImage; image_sig ORDSYS.ORDImageSignature; BEGIN SELECT p.product_photo, p.product_photo_signature INTO t_image, image_sig FROM pm.online_media p WHERE p.product_id = 1910 FOR UPDATE; -- Generate a signature: image_sig.generateSignature(t_image); UPDATE pm.online_media p SET p.product_photo_signature = image_sig WHERE product_id = 1910; COMMIT; END; /
Note:Performance is greatly improved by loading the data tables prior to creating the index.
Once the signatures are created, the following command creates an index on this table, based on the data in the
CREATE INDEX idx1 ON online_media(product_photo_signature) INDEXTYPE IS ORDSYS.ORDIMAGEINDEX PARAMETERS ('ORDImage_Filter_Tablespace = <name>,ORDImage_Index_Tablespace = <name>');
The index name is limited to 24 or fewer characters. As with any Oracle table, do not use pound signs (#) or dollar signs ($) in the name. Also as usual, the tablespaces must be created before creating the index.
Note:The standard Oracle restriction is 30 characters for table or index names. However, interMedia requires an extra 6 characters for internal processing of the domain index.
The index data resides in two tablespaces, which must be created first. The first contains the actual index data, and the second is an internal index created on that data.
The following recommendations are good starting points for further index tuning:
ORDIMAGE_FILTER_TABLESPACE -- Each signature requires approximately 350 bytes in this tablespace. The tablespace should be at least 350 times the number of signatures in the table.
ORDIMAGE_INDEX_TABLESPACE -- The size of the tablespace should be 100 times the size of the initial and final extents specified. For example, if an extent is 10 KB, the tablespace size should be 1 MB. The initial and final extents should be equal to each other. The size of the tablespace should also be approximately equal to the size of ORDIMAGE_FILTER_TABLESPACE.
Typically, it will be much faster if you create the index after the images are loaded into the database and signatures have been generated for them.
When importing a large number of images, you should postpone index creation until after the import operation completes. Do this by specifying the following parameters to the IMPORT statement: INDEXES=N and INDEXNAME=<filename>. See Oracle Database Utilities for details.
Rollback segments of an appropriate size are required. The size depends on the size of your transactions, such as, how many signatures are indexed at one time.
Analyze the new index.
As with other Oracle indexes, you should analyze the new index as follows:
ANALYZE INDEX idx1 COMPUTE STATISTICS;
Two operators, IMGSimilar and IMGScore, support queries using the index. The operators automatically use the index if it is present. See Oracle interMedia Reference for syntax information and examples.
Queries for indexed and nonindexed comparisons are identical. The Oracle optimizer uses the domain index if it determines that the first argument passed to the IMGSimilar operator is a domain-indexed column. Otherwise, the optimizer invokes a functional implementation of the operator that compares the query signature with the stored signatures, one row at a time.
See Oracle interMedia Reference for examples of retrieving similar images. As in the example, be sure to specify the query signature as the second parameter.
The human mind is infinitely smarter than a computer in matching images. If we are near a street and want to identify all red automobiles, we can easily do so because our minds rapidly adjust for the following factors:
Whether the automobile is stopped or moving
The distinction among red automobiles, red motorcycles, and red trailers
The absolute size of the automobile, as well as its relative size in our field of vision (because of its distance from us)
The location of the automobile in our field of vision (center, left, right, top, bottom)
The direction in which the automobile is pointing or traveling (left or right, toward us, or away from us)
However, for a computer to find red automobiles (retrieving all red automobiles and no or very few images that are not red or not automobiles), it is helpful if all the automobile images have the automobile occupy almost the entire image, have no extraneous elements (people, plants, decorations, and so on), and have the automobiles pointing in the same direction. In this case, a match emphasizing color and shape would produce useful results. However, if the pictures show automobiles in different locations, with different relative sizes in the image, pointing in different directions, and with different backgrounds, it will be difficult to perform content-based retrieval with these images.
The following are some suggestions for selecting images or preparing images for comparison. The list is not exhaustive, but the basic principle to keep in mind is this: Know what you are looking for, and use common sense. If possible, crop and edit images in accordance with the following suggestions before performing content-based retrieval:
Have what you expect to be looking for occupy almost all the image space, or at least occupy the same size and position on each image. For example, if you want to find all the red automobiles, each automobile image should show only the automobile and should have the automobile in approximately the same position within the overall image.
Minimize any extraneous elements that might prevent desired matches or cause unwanted matches. For example, if you want to match red automobiles and if each automobile has a person standing in front of it, the color, shape, and position of the person (skin and clothing) will cause color and shape similarities to be detected, and might reduce the importance of color and shape similarities between automobiles (because part of the automobile is behind the person and thus not visible). If you know that your images vary in this way, experiment with different thresholds and different weights for the various visual attributes until you find a combination that provides the best result set for your needs.
During signature generation, images are temporarily scaled to a common size such that the resulting signatures are based on a common frame of reference. If you crop a section of an image, and then compare that piece back to the original, interMedia will likely find that the images are less similar than you would expect.
Note:interMedia has a fuzzy search engine, and is not designed to recognize objects. For example, interMedia cannot find a specific automobile in a parking lot. However, if you crop an individual automobile from a picture of a parking lot, you can then compare the automobile to known automobile images.
When there are several objects in the image, interMedia matches them best when:
The colors in the image are distinct from each other. For example, an image of green and red as opposed to an image of dark green and light green.
The color in adjacent objects in the image contrast with each other.
The image consists of a few, simple shapes.
The photographs in Figure 6-9 illustrate roads. The human mind would focus on the similarities in these images and consider them a match. Specifically, the human mind recognizes that the object in both images is a road. Because the road is roughly in the center of both images, it is the focal point in both photographs. interMedia, however, focuses on the entire image in each photograph without recognizing the objects in these images. Thus, interMedia concludes that these images do not match in color, texture, or shape.
Image 1 appears far away; the road is long and narrow with a barren-looking countryside in the background and an open view of the sky. Image 2 appears closer; the road is short and wide with hedges and a thick forest in the background with the sky barely visible between the trees on the left.
interMedia image matching involves making comparisons based on the visual attributes of color, texture, and shape. Because image matching is based on these attributes rather than on object recognition, interMedia cannot recognize the fact that both of these images contain a road. Thus, interMedia image matching does not work well for these two images.
First, the color of the roads in these images is different. In Image 1, the road is gray because the open sky illuminates the road in full daylight. In Image 2, however, the road is brown because the sky is obscured by the thick forest, allowing very little daylight to shine on the road. The remaining colors in these images are also different. For example, interMedia identifies the following colors in Image 1: a large percentage of gray (the road), some blue (the sky), and some golden brown (the barren sides of the road). The colors in Image 2, however, include grayish brown (the road), some green (the grass on the side of the road and the leaves of the trees), and brown (the tree trunks). Clearly, the colors in these two images do not match.
Second, the shape of the roads in these images is different. In Image 1, the road occupies a large area, and is square (or rectangular) in shape. The road in Image 2 occupies a small area, and is triangular in shape. Thus two of the criteria used for shape matching, the area and the geometric shape, do not match.
Finally, the texture of the roads in these images is similar, which results in a slightly better match. However, other objects contained in these images, such as the sky and the leaves, have a different texture. Thus, these images do not match in texture.
In summary, the comparisons based on the visual attributes of color, texture, shape are insufficient to match these two images.
In both Image 1 and Image 2, the flowers are approximately in the center of the image, surrounded by a leafy green background. In Image 2, the foreground also includes part of a bird bath or a small fountain.
interMedia recognizes that the following visual attributes are similar in Image 1 and Image 2: the color of the background leaves and the texture of the flowers and the leaves. interMedia also recognizes that the colors of the flowers are different. Thus, the match based on color is good, but not perfect. This match is performed without identifying that the objects in the green background are leaves and the objects in the colored segments are flowers.
interMedia segments the images to identify the areas of different colors. Both images contain a large green area, because in both images the background is predominantly green. The smaller areas in Image 1 and Image 2 are identified as purple and white, respectively. Thus, these images match well in color because the predominant area of green color matches in both images. The images do not match perfectly however, because the smaller areas represented by the flowers are different. In addition, the texture of the leaves and the flowers is similar in both images, resulting in good matches based on texture.
The photographs in Figure 6-11 illustrate flowers of the same type and color, but of different sizes. These images are a better match than either the images in Section 6.5.1 or those in Section 6.5.2.
Flowers of the same color and type comprise the contents of both Image 1 and Image 2. In Image 1, the flowers are smaller and more tightly spaced, with few details visible. In Image 2, however, the flowers are larger and more loosely spaced, with clearer details.
Because these flowers are the same color and occupy almost all the area in both images, interMedia segments the images by color and identifies the predominant areas of violet in both images. It does not matter that the flowers in Image 1 and Image 2 are not the same size; interMedia identifies the flowers in both images as areas of violet. Thus, the images match well based on color. Additionally, interMedia recognizes that the texture of the flowers is a match. interMedia performs this match without recognizing that the objects in both images are flowers. Based on color and texture, these images match well.