Biology

CS5238 Combinatorial methods in bioinformatics 2004/2005 Semester 1

Lecture 8: Finding structural similarities among proteins (II)

Lecturer: Prof Jean-Claude Latombe

Scribe: Cheng Chi Kan, Lee Pern Chern and Moritz Buck

1 Voting scheme with hash table

Many-to-many comparisons are evaluated when we align protein structures. In order to avoid repetition, a better

organization of computation is necessary. This could be achieved by pre-computing the indexes of proteins and

arranging them in a hash table. Then, queries are evaluated based on a voting scheme using the hash table. This

voting scheme replaces the seed generation process.

In this lecture, we look into the voting scheme used in 3dSEARCH [2]. The algorithm is based on the concept of

geometric hashing [1] developed in the eld of computer vision. The basic idea is to represent all secondary structure

elements (SSEs) from all target proteins with a large, highly redundant hash (or index) table. Once the table has been

constructed, every SSE from a given query structure can be compared simultaneously to the entire set of SSEs of the

target structures, by indexing the SSE into the table.

The hash table that consists of 3-dimensional regular grid of cube bins ( 2A) is constructed as follows. For each

target structure, we compute a coordinate system P for every pair of vectors (i; j) (Figure 1). Then, we transform all

y

z j

i

x

Figure 1: The coordinate systems for the vectors (i; j). The z-axis is coincident with i

CS5238 Combinatorial methods in bioinformatics 2004/2005 Semester 1

Lecture 8: Finding structural similarities among proteins (II)

Lecturer: Prof Jean-Claude Latombe

Scribe: Cheng Chi Kan, Lee Pern Chern and Moritz Buck

1 Voting scheme with hash table

Many-to-many comparisons are evaluated when we align protein structures. In order to avoid repetition, a better

organization of computation is necessary. This could be achieved by pre-computing the indexes of proteins and

arranging them in a hash table. Then, queries are evaluated based on a voting scheme using the hash table. This

voting scheme replaces the seed generation process.

In this lecture, we look into the voting scheme used in 3dSEARCH [2]. The algorithm is based on the concept of

geometric hashing [1] developed in the eld of computer vision. The basic idea is to represent all secondary structure

elements (SSEs) from all target proteins with a large, highly redundant hash (or index) table. Once the table has been

constructed, every SSE from a given query structure can be compared simultaneously to the entire set of SSEs of the

target structures, by indexing the SSE into the table.

The hash table that consists of 3-dimensional regular grid of cube bins ( 2A) is constructed as follows. For each

target structure, we compute a coordinate system P for every pair of vectors (i; j) (Figure 1). Then, we transform all

y

z j

i

x

Figure 1: The coordinate systems for the vectors (i; j). The z-axis is coincident with i