Let’s say that we have a sequence A of 1<=n<=10^5 elements, and that this sequence may or may not be ordered but all the numbers {1…n} are present exactly once. And now given 1<=k<=10, you’re asked to find all the increasing sub-sequences of A with exactly k+1 elements.
This is a problem I get from a competitive programming portal, you may find the details here. If at the first glance one don’t have any intuition on how to solve a problem, the best approach is to write down and analyze an example, we may put A={1,2,3,5,4} with n=5 and k=2 -Totally random numbers! -.
So, we want to have a look at all the increasing subsequences of three elements, at first let’s check all the possible k<=n subsequences (From now on, let’s be our k just the k+1 from the problem statement) so for k={1..5} we have:
- 1;2;3;4;5, five sub sequences
- 1,2;1,3;1,5;1,4;2,3;2,5;2,4;3,5;3,4, nine subsequences
- 1,2,3;1,2,5;1,2,4;1,3,5;1,3,4;2,3,5;2,3,4, seven subsequences
- 1,2,3,5;1,2,3,4, two subsequences
- no subsequences
And now we know that for k=3 the answer is seven. A brute force approach for the problem is just to check all the possible nC(k+1) combinations of A and count the amount of them which have strictly increasing elements, this is going to work but will take ages for big n’s.
To do better we need to make some observations, let’s note first that for increasing k we are building the multi subset by taking all the k-1 subsets and checking if we can append a new element maintaining the condition that the subsequence is in strictly increasing order, let’s have a look at k=3:
- For k-1 we have: 1,2;1,3;1,5;1,4;2,3;2,5;2,4;3,5;3,4, can we use any of them bo build k?
- {1,2} is good with 3,5,4: We have {1,2,3},{1,2,5},{1,2,4}.
- {1,3} is good with 5,4: We have {1,3,5},{1,3,4}
- {1,5} is good with nothing, no Ai>5 are available
- {1,4} is good with nothing, 4 is the last element of A, nothing to append available.
- {2,3} is good with 5,4: {2,3,5},{2,3,4}
- The remaining elements of k-1 are not good, we cannot build an increasing sequence of k elements with them.
Note that we are appending the new element to the k-1/n-1 subsequence ending with the bigger number smaller than the new one, which of course make sense since we want to build increasing subsequences and since from the problem statement we know that all the numbers from {1…n} appear exactly once, then the closest number to a given Ai must be Ai-1 (or smaller if A is not in order)
This is a good starting point, but is not sufficient. Let’s think about how the amount of subsequences increase if we increase n adding some more elements to A. If for a given n we have counted say c subsequences, increasing n will not decrease c! c will not change, but will eventually increase!.
So for k=1 we have:

For bigger k the same rule apply, so let’s start drawing a table for {1,2,3,5,4} and k=1…5:

Drawing the table is easy for k=1, we know the values of c. for increasing k we can only say that if n<k then c=0 for sure (you cannot build a k-element sequence from n if n<k), and if k==n then it may contain one subsequence or eventually zero. For example k=4 to n=4 we have {1,2,3,5}, which is increasing.
But what about the rest of the table? Well we may use our first observation, that is possible to build k from k-1 (and n-1 of course):

So for example, if n=5 and k=4 then c=7 (the correct solution of our sample problem). We built 7 from 4+3, where 4 is 1+3 and 3 came from the calculated c for {1,2,3}. we are not picking {1,2,3,5} since 5>4!
At any point in the table once we have calculated c, we are using that c for all the next columns when adding new n’s. Let’s have a look again at the row k:3 we have: 0,0,1,1+3,1+3+3 and so forth, what we have here are cumulative sums! As already said before once a value of c is calculated for a given k/n then increasing n will give us c’>=c or in other words c’=c+q, where q is the amount of additional subsequences introduced by the new element!
Now we can write down our first algorithm to solve this problem:
using ll = long long; ll num_of_elements; ll max_k; ll cum[100001][12]; //Update the cumulative values for the row k void update_cum(ll val,int start_index,int k) { for(int i{start_index};i<=num_of_elements;i++) { cum[i][k] += val; } } void run() { cin >> num_of_elements >> max_k; ++max_k; ll value; update_cum(1,0,0); for(ll i{1};i<=num_of_elements;i++) { cin >> value; for(int k{1};k<=max_k;k++) { ll cur_cum = cum[value-1][k-1]; //calculate c update_cum(cur_cum,value,k); } } cout<<cum[num_of_elements][max_k]<<endl; }
The code is performing the same operations completed on the table above. The first call to update_cum(1,0,0) set c for k=1, we know for sure what values this row will have. For the other rows we use the observations made before to find all the c‘s.
This code works perfectly fine but has one major flaw: Is slow! The most expensive operation this algorithm performs is updating of the cumulative table, if you look at the inner loop you’ll notice that a call to update_cum will be performed for all the Ai from A, even if not in order we’ll have a pass to update_cum value’s from one to num_of_elements;
Since update_cum iterate from i to num_of_elements for each 1<=i<=num_of_elements, then the execution time is given by n+n-1+n-2+…+1=n(n+1)/2 or O(n^2). We should really improve update_cum (Yes, you can ignore k since k<<n)!
The Fenwick tree (Or Binary Indexed Tree) is a beautiful data structure which is exceptionally well suited for operating on cumulative tables. If this is the first time you hear about the Fenwick’s tree then you must absolutely read the original paper from Peter Fenwick or at least have a look at here or here! You’ll find this data structure useful very frequently.
Let’s have a look at the modified version of the algorithm, which is identical but the way it handle the cumulative tables. For the sake of performance comparison I introduced one minor modification at the code presented above, the function run now do not contain any cin and cout operation, the data are collected in main and then provided to run.
Here’s the code:
ll cum_sum(ll idx,int k) { if(k==0) return 1; ll sum{0}; while(idx>=1) { sum += cum[idx][k]; idx -= idx & -idx; } return sum; } void cum_add(ll val,ll idx,int k) { while(idx<=num_of_elements) { cum[idx][k] += val; idx += idx & -idx; } } ll run() { for(ll i{1};i<=num_of_elements;i++) { for(int k{1};k<=max_k;k++) { ll cur_cum = cum_sum(data[i]-1,k-1); cum_add(cur_cum,data[i],k); } } return cum_sum(num_of_elements, max_k); }
If you don’t understand how cum_add or cum_sum work’s don’t worry and have a look at the links I provided before. My point is that run is exactly the same as before, but now is much more faster! sum_sum and sum_add are O(log2(n)) functions, and the new algorithm is thus O(n*log2(n)).
You may find on my github the code of those algorithms, if you execute both the brute force approach and the smart force approach –Using fenwick tree– you’ll notice a huge difference in the execution time. Let’s have a look at this very simple testing procedure:
void bench(std::function<ll()> fnc, const string& name) { //Make sure the cum table is clear for(ll i{0};i<=num_of_elements;i++) for(int k{0};k<=max_k;k++) cum[i][k] = 0; //Ok, let's go.. cout<<"Running: "<<name<<endl; auto start = chrono::high_resolution_clock::now(); auto ret_value = fnc(); auto stop = chrono::high_resolution_clock::now(); cout<<"Execution time: "<<chrono::duration_cast(stop-start).count()<<"msn"; cout<<"Result: "<<ret_value<<endl; } int main() { data.resize(10000); std::iota(data.begin(),data.end(),0); num_of_elements = data.size()-1; max_k = 10; bench(brute_force::run, "Brute"); bench(smart_force::run, "Smart"); }
For n=9999 (note the 0 is ignored, 1<=n<=N) and k = 10, the output is:
Running: Brute Execution time: 1515ms Result: 4152669778378200987 Running: Smart Execution time: 4ms Result: 4152669778378200987
I think no more comments are required here, Fenwick tree’s are absolutely awesome!
Thanks for reading!