2

There is a one question of Algorithm.

Question is as follows:-

You are given a protein String consisting of characters, A, B, C, D. You have to find a minimum length sequence in that.

Example

0 1 2 3 4 5 6 7 8 9 10 11 12
A B A C D C A B C D  C  C  D  

String to find is : BCD 

This string is find between (StartPoint, EndPoint)
1, 4
7, 9
1, 12
7, 12

Minimum length is of 7, 9.

So the answer is 7, 9

My work,

  1. We can solve this using Brute force approach in O(n^2).
  2. We can find the first sequence present in main string by using DP, and my DP logic is as follows,
A = Main string
B = String to be find
DP = Dynamic programming function

n = A.size, m = B.size

Build an array of  DP[m+1][n+1]

DP[i][j], means If in A[0...i], B[0...j] is present or not.

This way we can find our first occurence of B in A. Now after this, I am stuck.

I need some hint from your side.

Please give me hint/guidance only, no code or implementation required.

11
  • 1
    "Question is as follows"? There's currently no question, just an assignment :/ Commented Oct 7, 2013 at 9:24
  • @Zeta This is not an assignment. I found this question on one of the online coding competition. I stuck to develop efficient approach. I just need a start. I am not asking for a complete implementation. Commented Oct 7, 2013 at 9:29
  • @Dukeling There they are sharing codes. And already made codes ruin your thinking, thats why I am asking for a hint/start. Commented Oct 7, 2013 at 9:39
  • In my experience, hints don't make for particularly good StackOverflow answers, and, to me, code, pseudo-code and a high-level approach mainly differs in terms of readability, not amount of ruin to one's thinking. Commented Oct 7, 2013 at 9:51
  • 1
    @justhalf Yes, I normally posted questions here after putting my best efforts. Commented Oct 8, 2013 at 5:51

2 Answers 2

0

Your sample problem and its solution clearly suggests that the solution will always be a numeric pair containing position of first letter of substring and position of last letter of substring i.e.

If the substring is BCD, then solution will be position of B, position of D

Provided that rest of the substring (C in this case) falls in between the solution pair.

So, to give a hint, we can start by finding the positions of first letter of substring in main string and store those positions in an array. Similarly we can find the positions of last letter of substring and store them in an array. This will give us a set of probable solution set wherein each pair will comprise of one number from array 1 and one number from array 2 such that number from array 2 is greater than number from array 1. Now we might end up observing that there is no such pair, which means there is no solution i.e. substring does not exist in main string, or we might end up find one or more such pairs, that means there can be a solution. Now all that is left to do is find out if rest of the substring exists between the solution pairs or not. If there are more than one such pairs found at the end, then just higher number minus the lower number should resolve to the right solution. Hope this helps, as you mentioned you do not wish to know the entire answer, you are just looking for hint :)

Sign up to request clarification or add additional context in comments.

Comments

0

Based on the example, I'm assuming the search string needs to be found in the same order as given (i.e. ACB isn't a valid find for ABC).

General DP approach / hints:

The function we're trying to minimize is the distance so far, so this should be the value stored in each cell of your matrix.

For some position in the string and some position in the search string, we need to look back to all previous positions in the string for one position back in the search string. For all of these we need to add the distance to there and record the minimum.

To illustrate, assume a search string of A, B, C, D. Then for ABC in the search string and position i in the string, we need to look at positions 0 through i-1 for AB.

Given a string BACCD and a search string BCD, when looking at the last position of both, we'd have something like:

DP(BACCD, BCD) = min(4+DP(B, BC), 3+DP(BA, BC), 2+DP(BAC, BC), 1+DP(BACC, BC))

But DP(B, BC) and DP(BA, BC) are invalid since B and BA don't contain BC and, more specifically, don't end with a C (thus they can be assigned some arbitrary large value).

Once we get to the last character in the search string, the value would indicate we found the complete search string, ending at that position in the string, thus it should compared to the global minimum.

Optimization:

To get an O(m*n) rather than O(m*n^2) running time, it's worth noting that you can stop iterating backwards as soon as you see another of the current letter (because, any sequence up to that point is longer than the same sequence with only the last letter moved forward), i.e.:

Given a string ABCCD and a search string ABC, when checking the second C, we can stop as soon as we get to the first C (which is right away), since ABC is shorter than ABCC.

Side note:

I think one can do better than the DP approach, but if I were to suggest something else here, it would likely just be copied from / inspired by one of the answers to Find length of smallest window that contains all the characters of a string in another string.

1 Comment

Means, If A = BABCBDD , and B = BCD , then DP(BABCB, BCD) can be calculated by taking minimum from DP(BABC, BC), DP(BABC, B). Here, DP is your minimum function. Is this the same approach, you are talking about ? If this is not your approach, give me your mathematical formula, please.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.