PROSITE motif search on genomic data?

Use this forum for general bioinformatics questions, or questions regarding genomics, proteomics, etc.

Moderators: Abhijeet Bakre, mdfenko, strom

PROSITE motif search on genomic data?

Postby Paddywhacker » Jun 17 2012 5:20 am

Is there a tool that gives you pBLAST type searching with a PROSITE type protein specification?

Thanks,
Paddy
Paddywhacker
technician-in-training
technician-in-training
 
Posts: 10
Joined: Jul 01 2011 5:12 am
Location: Auckland, New Zealand

Re: PROSITE motif search on genomic data?

Postby Astarte » Jun 21 2012 4:08 am

Most motif search tools only accept protein sequences, I have not yet found a tool that accepts genomic DNA. The best approach would be to translate your sequence into all possible reading frames and submit these to prosite. If you know a bit of Perl, R, Matlab, this is something you can easily automate.
Astarte
Prolific Post-Master
Prolific Post-Master
 
Posts: 118
Joined: Nov 13 2007 7:41 am
Location: Belgium

Re: PROSITE motif search on genomic data?

Postby Paddywhacker » Jun 22 2012 12:58 am

Yes, that is what I want, and that is what I have done in the past. But doing so requires downloading the sequence of the genome. It would be very useful if the online BLAST engines could be upgraded to do regular expression searching.
Paddywhacker
technician-in-training
technician-in-training
 
Posts: 10
Joined: Jul 01 2011 5:12 am
Location: Auckland, New Zealand

Re: PROSITE motif search on genomic data?

Postby Astarte » Jun 26 2012 2:54 am

it's not really clear to me what exactly you want to do. Blast engines accept IUPAC code, and although this is not as flexible as reg ex, it is usually all you need in the genome world.
Do you want to do a motif search on a whole genome? Or are you looking for a specific motif in a genome?
Astarte
Prolific Post-Master
Prolific Post-Master
 
Posts: 118
Joined: Nov 13 2007 7:41 am
Location: Belgium

Re: PROSITE motif search on genomic data?

Postby Paddywhacker » Jun 26 2012 6:50 pm

Astarte wrote:it's not really clear to me what exactly you want to do. Blast engines accept IUPAC code, and although this is not as flexible as reg ex, it is usually all you need in the genome world.
Do you want to do a motif search on a whole genome? Or are you looking for a specific motif in a genome?


I'm defining motifs from alignments between my model and the few number of annotated genomes that I can find and then trying to find matches in whatever raw genomic scaffolds that I can find online. The modifs are protein based with ambiguities and variable filler lengths (typical PROSITE), because that is where the conservation can be seen, but the scaffolds are DNA.
Paddywhacker
technician-in-training
technician-in-training
 
Posts: 10
Joined: Jul 01 2011 5:12 am
Location: Auckland, New Zealand

Re: PROSITE motif search on genomic data?

Postby Astarte » Jun 27 2012 9:40 am

The IUPAC code will take care of the ambiguities, but the variable filler length is a problem. you can try with a low score for the creation of a gap. This will hopefully give you a good first selection. But I'm afraid from here on, you will have to download this first selection and do a more stringent matching with a reg ex in matlab or R.

Good luck!
Astarte
Prolific Post-Master
Prolific Post-Master
 
Posts: 118
Joined: Nov 13 2007 7:41 am
Location: Belgium

Re: PROSITE motif search on genomic data?

Postby Paddywhacker » Jun 27 2012 3:32 pm

Thanks for your input, but the IUPAC codes don't have a mechanism to restrict the ambiguities to a a specific set. 'X' stands for any amino acid residue when you might want a choice between say, Phe and Try.

It looks as if I have to carry on what I've been doing. Download all scaffolds. Convert each scaffold into the six different protein frames. Scan each repeatedly for motif matches, increasing the permitted error count each time until I get a match. And examine the genomic neighbourhood of the match to see if it isn't just a garbage hit.
Paddywhacker
technician-in-training
technician-in-training
 
Posts: 10
Joined: Jul 01 2011 5:12 am
Location: Auckland, New Zealand

Re: PROSITE motif search on genomic data?

Postby Astarte » Jun 28 2012 10:24 am

Can't you translate your protein motif into a DNA sequence, for nucleotides, each possible ambiguity is met in the IUPAC code. Then use this to do a regular nblast?
Astarte
Prolific Post-Master
Prolific Post-Master
 
Posts: 118
Joined: Nov 13 2007 7:41 am
Location: Belgium


Return to General Bioinformatics

Who is online

Users browsing this forum: No registered users and 1 guest