hit counter

I'm currently completing a Masters course in Human-Centred Computer Systems at the University of Sussex. This section of the website is for an on-going 'learning diary', for me to write my thoughts and notes on various courses and my dissertation.

Entries from October 1, 2007 - November 1, 2007

Reading on auditory systems

Well, to try and look at some of the questions I came up with on the sound design for blind users, I read a couple of papers.

First up was "Electronic Sensory Systems for the Visually Impaired" by B. Ando. Published by IEEE.

This had lots of good follow-up leads and overviews on existing handheld systems, and about hearing for the choice of which beep pitches to use etc.

Next I looked into the "Design of Auditory UI for Blind Users" by Hilko Donker, Palle Klante and Peter Gorny. Published by the ACM.

This one was les useful for me, but included some good information about the sound merging. Apparently it's important for the noises to be pleasant for the user, which suggests nice chords of abstract notes would be better than clashing sounds that might more accurately represent the objects we're looking for. Also apparently the human ear is better at distinguishing the position of sounds in a horizontal plane, rather than vertically.

The other interesting bit was how they tested it. They tried to get users to represent their understanding of the screen layout using a set of pins in a cork board, but the mental maps that the blind users produced were completely incomprehensible to them. When they tried the same thing with blindfolded sighted users the maps were more what they were expecting. That implies that  the way the blind users internalised the spatial awareness was very different to the sighted version, and that this isn't necessarily a good way to measure the success of the model. It's pretty interesting in terms of how to represent visual information to people who have no concept of many of the 'standard' visual cues.

Posted on Tuesday, October 30, 2007 at 16:38 by Registered Commentermartian77 in | CommentsPost a Comment

Image manipulation

About 99% of images we see in magazines wil have been digitally manipulated, and there is a range of software available for doing that.

Digital images are made up of pixels, so complex processing can be applied by manipulating the values of those pixels, either individually or by applying a formula to all of them.

Statistical operations

Starting from the intensity histogram of an image gives us the distribution of intensity values across an image. We can then create a bi-level threshold by picking a place on the diagram and making every pixel with an intensity value below that a 0 (black) and give everything else an intensity value of 1 (white). The thresholds are normally chosen according to minima in the histogram. You can also tell from the intensity histogram what kind of image this is. A dark image will have peaks in the low end of the chart, while a light one will have peaks at the high end. A low contrast picture will have all of its intensity values in a small range, while a high contrast picture will have a more spread out histogram. To increase the contrast you can stretch the intensity histogram out (known as linear contrast stretching), or lighten a dark image by moving the peaks.

Gamma correction

When an intensity is passed to an output device, there is a non-linear relationship between the intensity value and the output value. So if the intensity increases by 10 times, the output intensity may increase by more than that... Say 100 times for example. So a nice linear grey scale may come out looking not very linear. Gamma correction is applied to correct this.

Intensity = Voltage of the device gamma where gamma is a value for the particular device. It will vary between devices, but is generally between 2.3 and 2.4. 

Pixel group processing 

Uses the  values of the neighbouring pixels to determine the final value of a given pixel. Done by applying a 'convolution matrix' to the values. Doing this can give smoothing, sharpening, edge detection and noise removal, depending on the values used in the matrix. So, you take a 3x3 block of pixels and calculate the value of the one in the middle by applying a 3x3 set of weights to that block.

You can average out the values, by applying a constant 1/9 weight across the matrix. That gives you a smoothing effect. It's a simple thing to implement, but gives some pretty crude results. An alternative is to apply a weighting based on a Gaussian distribution, and this is known as Gaussian smoothing or blur. There's a good outline of the procedure here. (I'm pretty sure I did all about Gaussian distributions in my engineering degree - there are faint bells going off, but nothing particuarly useful!)

 

Posted on Saturday, October 27, 2007 at 16:36 by Registered Commentermartian77 in | CommentsPost a Comment

Image formats

Compuserve developed the Graphics Interchange Format (GIF). There are 2 versions; the 89A version is more useful, allowing for animated GIFs and transparent backgrounds. GIFs work on a restricted palette, so are best for a restricted colour range. They are great for line art or icons. However, be wary of using animated GIFs. They can look really naff and add nothing.

JPEG was formed by a working group from ISO and CCITT (which has apparently been renamed) called the Joint Photographic Experts Group. This is teh dominant format for true-colour images. They can achieve 15:1 to 30:1 compression rates, but compression and unpacking is relatively slow. There are many incompatible coding schemes, but standardisation has occurred.

The stages of JPEG compression are as follows:

  1.  Conversion from RGB to luminance/chrominance colour space (because the human eye is less sensitive to chrominance change than brightness).
  2. Colour information is then sub-sampled (lossy). Codes in 2x2 cells are stored as 4 intensity values and 2 colour difference values. Requires 6 values per group instead of 12.
  3. The image is then split into blocks of 8x8 and a Discrete Cosine Transform (DCT) is applied. Intensity data is transformed to frequency data. This process is reversible, so this stage is lossless.
  4. The frequency information is then quantized, which is lossy. Particularly applied to high frequency chrominance information, as the eye can't detect this. May remove some values and increase the occurance of others. E.g. the values 78, 79, 80, 81, 82 could all be replaced by 80.
  5. The data is then subjected to Huffman coding (as mentioned in the compression entry!).

It is possible to equate the quality setting to the compression rate. The default is a 75% quality setting, at a 12:1 compression rating. As compression rating gets higher, the 8x8 blocks become increasingly noticeable in the final image.

There is now a (relatively) new bitmapped graphics format called Portable Network Graphics (PNG). This is similar to the GIF format, and has been approved by the WWW consortium as a standard. GIF is subject to patent rules because the LZW compression that it uses is under patent. PNG uses lossless compression and handles full colour images, giving better quality than GIF. It doesn't support animation (but that may not be a big problem!). Apparently there may be a new version that does support animation.

 

Posted on Saturday, October 27, 2007 at 16:05 by Registered Commentermartian77 in | CommentsPost a Comment

Compression

This follows on from the colour model discussion in the bitmap image entry. If we are going to store high quality, memory intensive images, we may need some way to compress the data so that it takes up less space.

There are two ways to do this: lossy and lossless. Lossy throws away information. Lossless (guess) doesn't. In the lecture it was demonstrated as two different ways to tidy a room. You could either tidy it by restacking everything so that it takes up less space (but doesn't actually reduce the amount of stuff) or you can tidy by throwing out all the stuff you don't need any more (empty pizza boxes were the example used). Lossless compression restacks the information, but the original information is always still available. Lossy compression throws away the information we say we don't need any more.

Run Length Encoding (RLE) is a very basic form of lossless compression. It stores the data in a more appropriate form. So if (for example) you have a long string of 1s and 0s as your data like this: 11111111111111111100000111111111, it could be stored as  18,5,9. That takes it from 4 bytes (32 bits) to 3. This method of encoding rarely gives compression ratios of better than 2:1.

You can get more complex with lossless compression and use Huffman coding. That replaces common pixel values witha  short code, and less common with a longer code. It may result in some individual segments taking more memory to store, but these should be infrequently used and this can give from a 27% to maybe a 40% reduction in size. This does rely on an unequal distribution of pixels, so trying to compress an equally-distributed image may actually result in an increase in size. It is used as the final stage in the compression of jpegs.

Then we have Lempel-Ziv-Welsh (LZW) compression. This is probably the most widely used lossless compression - used in Unix compression and GIF. It works very well for text documents but much less well for noisy images. It is on a similar idea to Huffman coding, but instead of individual colours it replaces whole repeating sequences with a code that points to a data dictionary entry.

Lossy compression works on removing the information that we don't need. It works by taking advantage of the properties of human vision and hearing, and tries to remove data where it won't be noticed. In the case of images, that is normally colour information (the eye is not so good at recognising tiny changes in colour) and in sound it's the bits you can't really hear.  

Posted on Saturday, October 27, 2007 at 15:31 by Registered Commentermartian77 in | CommentsPost a Comment

Bitmapped Images

Digital pictures are made up of a series of pixels (or picture elements). The resolution of an image is the number of pixels in the x and y direction. With a digital camera, the quality of the lens is more important than the number of pixels available. A 2 megapixel camera with a really good lens can take better pictures than a 7 megapixel camera with a less good lens.

Bitmapped images are also known as raster images. Apparently raster is a grid square. A second definition for raster is the parallel lines that form the scan pattern on display screens or tvs. They are good at representing real-world images, where there is a comples variation in colours, shades and shapes. (As opposed to vector graphics.)

The size that the image will be displayed at affects the resolution that should be used to store it. So a small image or thumbnail can be stored at a really low resolution and still get across the data. If that same image was displayed at a larger size it would look all rubbish and pixelated. Thsi is the problem with lots of 'show large image' links - where clicking just increases the size to show a nasty grainy image and gives no extra information. Getting this right is a major concern in multimedia presentations, because you want to avoid things looking crap.

Resolution is often specified in terms of Dots Per Inch (DPI). Printers are normally very high resolution at 600-1200 dpi, scanners from 300-3600 dpi, and monitors are 70-204dpi. Interesting that the iPod Nano has the highest definition screen ever, because they need to squeeze more detail and information out of a very small display area. Smaller pixels allows finer detail, and provides a more readable (and vivid) display. One side effect of the difference in resolution between a monitor and a printer is that an image that fills the screen on a 72dpi monitor will look tiny on a 600dpi printer. To avoid this being an issue, most image formats allow you to specify the resolution in pixels per inch.

A 4:3 aspect ratio is normal for PAL TV format. That's the same for a lot of screen resolutions, and standard sizes like 320x240, 640x480, 800x600... 1280x1024 is an exception at 5:4, so circles drawn on a screen at that resolution will not be circles when displayed on a 4:3 screen. Digital TV and widescreen stuff has introduced a whole new range of aspect ratios.

One thing to remember is that a graphics package such as Photoshop is better at resizing than a browser, so it's better to resize photos to the right size and save them that way, rather than using the width and height tags to do it on the fly.

The simplest data model used in images is the true colour image data model. Each pixel has a 24 bit value, containing a RGB value (or the HSV, or YUV - basically three pieces of information).

Alternatively a palette colour model can be used. The palette colour has an 8 bit pixel index, which refers to a colour in a palette. The palette stores the full RGB (or HSV or YUV) value for that colour. Gifs use this system.

True colour vs. palette: True colour gives a very high quality image, but takes 3 times the memory to store. Palette limits the colour range used, but is cheaper storage-wise.  

 

Posted on Saturday, October 27, 2007 at 14:48 by Registered Commentermartian77 in | CommentsPost a Comment
Page | 1 | 2 | 3 | 4 | Next 5 Entries