Maybe some of you may wonder why lately I wasn’t posting any algorithm, or maybe not. But, I feel the compelling need to explain You the reason behind this behavior! Well, I’m working on my combinatorics book, which just happen to be a very demanding one, and it takes all my free time and free energies.
The Book has over one hundred exercise per chapter, there are almost more exercises than combinatorics arguments in that thing, and since I would like to have it done all –or almost all– then became clear why I don’t have time to code problems.. I will write sooner or later a review of that manuscript, I live it very much!
Additionality I’m supposedly going to run some half-marathon this year, which means I need extra time to train and to restore from the training’s! Sometime I wish to be a BORG, you know, no sleep no food no problems, more time!
But, today I’ve something for you! Here’s the problem:
Given an array A of n integers, where 1<=n<=5*10E5 and 1<=A[i]<=10E6, youre asked to find the longest subsequence of A containing at most k different values, with 1<=k<=n.
Return the left and right index of any of those sub-sequence.
We have repetition allowed, and the array is not sorted –the problem statement is pretty clear about this-, and from the problem is also clear that multiple sub sequences are possible but we need to output only one of them.
This problem is part of a large family of related riddles where one is asked to find a certain sub-sequence under a given constrain, the most popular one is the problem of finding in A the sub-sequence which sum is maximized, if you don’t how to solve this problem perhaps you should have a look at here or here.
Here we have a pretty simple statement indeed, and an easy O(n) brute force could be easily implemented using the two pointer technique*:
//Need a fast mapping, std::map is too slow. long cnt[1000001]; pair<long,long> solution(long k, const vector<long>& A) { long current_l{0},current_k{0},r{0},l{0},N{(long)A.size()}; for(long current_r{0};current_r<N;current_r++) { //Have we encountered a new element? if(++cnt[A[current_r]]==1) ++current_k; if(current_k>k) { //We need to remove something while(--cnt[A[current_l]]>0) ++current_l; ++current_l; --current_k; } //Any better solution to our problem? if((r-l)<(current_r-current_l)) l = current_l, r = current_r; } return make_pair(l+1,r+1); }
The only interesting part of this code is the cnt array which is required to keep track of the amount of items you’ve encountered so far, each and every time when increasing the counter you hit a 1 you know for sure that this is a new item which is entering the windows –current_l/current_r-, so the counter of the amount of different items need to be updated as well –current_k-.
if the amount of different items in the current window is larger than the constrain k, then the while loop will remove them starting from the head by shifting current_l to right until cnt[A[current_l]]==0, which basically means we have one kind of item less.
You may very well implement this code using a map instead of the nasty array cnt, but the implementation will be substantially slower when A get’s big.
Thanks for reading!
*Folks in the algorithmic community call this way of solving a problem a technique, but is just a pompous way to call an array traversal using two indexes. Sometime I do not understand those computer NERDS..