# UCLA ACM's Second Programming Competition: Subset Sum Solver

## Problem

Fill a truckbed to maximum weight capacity using the available
packages with known weights, computing a correct answer as quickly as possible.
The subset sum problem.

(The full UCLA ACM description is no longer on the web.)

## Results

The results:

Date: Sun, 13 Mar 2005 22:38:27 -0800 From: UCLA ACM Programming Contest <ucla.acm.mpc@gmail.com> Subject: ACM Contest Results! Hello all! First off, I'm sorry for taking so long. These things actually didn't take all that long to grade; I've just been extremely busy (you can blame CSM152B mostly). However, as ACM has had a year of incredibly growth and larger campus presence, the coordinator staff has strived to be as professional as possible. So do accept my apology for the delay. Without further ado . . . How the submissions were graded. First, submissions were put through a series of simple tests (testing just a few numbers and small pallet sizes). Incorrect answers eliminated several submissions at this stage. Typical problems included a) people not checking every single possible product combination and b) people not testing the case of all the products fitting onto the pallet exactly. Those that survived the initial phase proceeded to the next round. In this stage, basic efficiency was tested. Moderately sized product sets that had many potential pallet combinations were tested. Care was taken to try pallet sizes that were one larger than the biggest product size as well as cases where the entire product set was used to fill the pallet. Pre-ordered product sizes also were tried. Any submission that showed promise on at least a few test cases remained in the running. However, some submissions that obviously took much longer than average were eliminated at this stage (these were submission that used brute force, "try every combination" without pruning or submissions who used languages other than C/C++ (Java seemed to be an inappropriate choice for this challenge.). The remaining submissions then proceed to the final round, where the following "classes" of test cases were used a) A pallet that could not be filled (thus requiring programs to test every combination). This was tested by generating even number product sizes and asking for an odd numbered pallet. b) A pallet where three particular products was the only valid combination (tested by generating numbers divisible by ten and inserting three numbers whose values were one larger than a number divisible by ten and using the sum of these three numbers as the pallet size). c) A pallet size and product sizes such that one particular number must be part of any solution (although many solutions were possible). This was done by taking all even numbers except for one odd numbered product size and then asking for an odd sized pallet. Note that for the above three cases, different seeds were provided to the random number generator across several test runs. Care was also taken to make sure the desired pallet size and special product sizes were not extraneous values. The notion of testing a "random" test case (one with randomly generated product sizes and a randomly generated pallet size) did not distinguish among the submissions. With a large number of products there are many solutions to this "average" example. Most programs could find a solution quickly. In this type of test case, memory usage was more important. Since memory usage is only a secondary factor, this type of test case was not used in the final grading. In the upper-class undergraduate category, there are two winners of our competition: Kim Tran and Matthew Wilgenburg. While neither used a dynamic programming solution, both these solutions consistently out-performed any other correct solution submitted by undergraduates. Differences between Kim's and Matthew's solutions depended principally on the random number seed. The general approach for both solutions were the same: sort the input values and then apply some algorithm that could take advantage of the sorted array. Kim's method was especially "computer science-y;" he treated the problem as a generic search problem and applied as many tree searching techniques as possible (both domain specific pruning and generic speedups). In the undergraduate category, Rizwan Kassim and Shant Hovsepian both deserve credit (and prizes) for remaining in the race until the last stage. The winning graduate submission was Chris Frost (who was also the winner of the Fall competition). He competed with Steve VanDeBogart in the final round. Chris's README to his submission is attached to this email. It is actually a great general overview of the problem and mentions the two types of successful submissions (branch and bound as well as dynamic programming). Chris combined the two approaches to maximize the case when dynamic programming performs at its best. I also suspect his liberal use of the compiler optimizations had an effect on the speed of his program. Chris' solution handily beat any undergraduate submission. Congrats Kim, Matthew, and Chris on a job well done! All those mentioned in this email have won prizes. All those who competed but were not mentioned will receive separate emails detailing how the testing of their program went as well as what prize they will receive. *Detail of prizes: Chris Frost - $20 Best Buy Gift Card, copy of Halo 2 for Xbox, and one of the items below Kim Tran - $40 Best Buy Gift Card and copy of Mech Assault 2: Lone Wolf for Xbox Matthew Wilgenburg - $40 Best Buy Gift Card and copy of Mech Assault 2: Lone Wolf for Xbox Steve VanDeBogart - Can choose one of the items below Rizwan Kassim - Can choose one of the items below Shant Hovsepian - Can choose one of the items below Hyduke Noshadi - Can choose one of the items below SoftwareChoice: Microsoft Visual Studio .NET Microsoft One Note Microsoft Office 2003 (not available on MSDN AA) Microsoft Digital Image Suite Microsoft Windows XP Halo for PC There is a chance that if everyone wants the same item from the above list, we may run out. We'll have to do it on a first-come, first-serve basis. Any of the winners/honorable mentions can also have a Google Prize Pack: Google Nalgene, Flashlight, & T-shirt. Please only take me up on this if you'll use the free stuff! In addition, any of our winners/honorable mentions can have their resumes forwarded to our sponsors: Google, Microsoft, and Real Networks. If you are looking for a job/internship and want to take advantage of this opportunity, send me your resume. Obviously, feel free to include this award on your resume. ACM can validate the award to a potential employer if necessary. To pick up your prizes, I will be in BH4760 from 6PM-8PM on Wednesday as well as from 6PM-Midnight on Thursday. You can also find me tomorrow (Monday) by Panda Express from 12-1:30 or so. My cell is 310.435.5411. If none of those times work, let me know and we'll work something out. Let me know if you are coming by to 4760 so I'll be sure to be there. In addition, ACM has reserved BH4760 Tuesday, Wednesday, Thursday, and Friday of 10th week and Tuesday of final's week for its officers to study in. Everyone, feel free to make use of this quiet study space if you would like. You've earned it. As far as "what's next" for the ACM competitions, here is some news: *The next competition will take place during E-week, the 2nd week of Spring quarter. Riddles, challenges, and puzzles will be provided throughout the day. Make sure to stay on the ACM email list to keep updated! *The next "at home" competition will hopefully be a partnership with Real Network and will take place sometime after E-week. *If you are interested in running these competitions next year, you should apply to be an ACM officer: http://acm.cs.ucla.edu/officerselection Please email me if you have any questions or would like to know more about the winning submissions. Thank you for participating. Take care, Adam

## Solution

The email's attached README, my solution's description/analysis:

Chris Frost's knapsack competition submission. chris@frostnet.net. CS Graduate Student. Submitted 2/27/2005. * Installation See INSTALL for build directions; this source is built via the usual ./configure && make, but beforehand you should set CXXFLAGS and CFLAGS to enable higher optimization. * Usage The program binary will be ./src/truckload. Input may come from a file specified on the command line ("truckload <file>") or as stdin. Input is assumed to be comma delimited. (Side note, whitespace delimited input with the last item being the weight would have been easier to parse.) The output will be a comma delimited list of weights to fill if the pallet can be filled. Else the output wil lbe "CANNOT FILL PALLET". * How it works truckload reads in the input. It then asks each of its solvers whether they apply to the given input, asking the first applicable solver to solve the problem. A good first question to ask when wondering how quickly a problem can be solved is how well can a solver do asymptotically, in terms of its input. Our truckloading problem can be seen as the 0-1 knapsack problem, with each item's price equal to its weight. Unfortunately, for the [generally] 0-1 knapsack problem things look bleak. A 0-1 knapsack solver can be used to solve the subset sum problem, which is NP-Complete. Reduction of subset sum to 0-1 knapsack: Given a set of numbers and the desired sum, given the knapsack solver this set of numbers and tell it the pallet limit is the subset sum problem's sum. The items put on the pallet by the knapsack solver form a subset of the request sum. Computer scientists often get sad when they see a problem is NP-Complete and I do not forsee proving P == NP-C during a programming contest. Studying the problem futher, however, I do see two ways to help runtime. (1) While the general problem is NP-C, there appear to be special cases with lower bound solve times that are sub-exponential. (2) The knapsack problem appears to be NP-C not because all or most inputs are NP-C, only because there exist a few bad cases. To address these two cases I've implemented two solvers, an O(n) time solver for superincreasing inputs (since these seem common for our input and fall under category 1) and an E[O(n)] time and E[O(n)] space solver that handles all valid inputs (taking advantage of category 2). ** Superincreasing Solver (superincreasing_solver.cc) Of course informing us that 20% of the inputs will be superincreasing is a great hint. I've devised a [low-order] O(n) time and O(1) space algorithm for solving this input space. Proof of O(n) time, O(1) space: - Assume the set of elements is superincreasing. - Assume the elements are sorted in increasing order. (If not one could sort and stay in O(n) time using say radix and counting sort or bucket sort, though these do require O(n) space. Another option would be to raise to O(n lg n) time and O(1) space using say heapsort. And in fact, knowing the elements are superincreasing, we might be able to swing O(n) time and O(1) space sorting, but anyway...) - Init pallet weight to fill to its capacity. O(1) - For each element, starting with the largest element: O(n) - If elt's weight is less than weight to fill, move the item to the pallet and subtract its weight from weight to fill. O(1) Thus the runtime is in O(n). The only state kept is the current weight to fill, thus space usage is O(1) assuming we can throw output to someone else's problem. Proof of correctness: At each loop iteration there either is weight to fill on the pallet or not. If the pallet is full we are done. Else the pallet is not full. The current elt can either fit or not onto the pallet, we can decide greedily whether to move it: the elt's weight is greater than the summation of all elts to go, thus if the weight to fill is at least this elt's weight, the only way the pallet can be filled is by this elt being in it. This applies to each loop iteration. At the end either the pallet's weight to fill is zero, in which case the pallet has been filled exactly, or the weight to fill is positive, in which case there are no elts having weight less than weight to fill, and so the pallet can not be filled. I believe (though do not prove here) that algorithms can be devised to solve other special cases in sub-exp time. However, because of my lack of knowledge of the inputs our programs will receive, that finite thing known as 'time', and because my non-special-case solver runs in E[O(n)] time and space, I've devised and implemented only one special case solver. ** "Combo" Solver (combo_solver.cc and combo.c) While special case solvers allow the solving of our problem in fast time compared to the NP-C bound (I'll assume P != NP-C for the purposes of my algorithm development), who knows what crazy inputs the ACM people will give our programs, so a general solver is a Good Idea. One obvious approach is to enumerate all combinations of items and to search for a combination whose total weight is the pallet's weight. While this is correct, I can only prove it to be in O(e^n) running time. Perhaps I can do better. AI loves searching, so we could try doing a branch-and-bound approach. I believe that a few such algorithms have been devised that have fast runtimes when tested experimentally; however, no one has proven sub-exp runtimes for all cases. A practical improvement, but without guarantees this is scary! (Run from the exp monster!) Dynamic programming seems applicable here, we may be solving the same subproblems repeatedly. Many (Bellman was the first) have done this, solving knapsack in O(nc) runtime, "pseudo poly", where c is the pallet's capacity. Were our tricky acmers to give us a large capacity and number of elements, however, we'd be back to scary-land. Not super-scary-land though, Horowicz and Sahni a while back came up with a way of partitioning the inputs (in linear time) to get a runtime in O(2^(n/2)). But comeon, we want to solve this *quickly*! These trucks hafta get in, and get out. Dynamic programming can do badly when it has to consider many inputs, what if we were able to restrict the size of inputs dynamic programming must consider? The general idea is that all valid solutions fall into a [hopefully very] restricted subset of all the solutions, where the weight sum is "cloes" to the pallet's capacity. This is this then leads to the combo algorithm the combo solver implements, which attains an expected O(n) runtime. The specifics of the combo algorithm and much of its proof is due to David Pisinger's work "An O(nr) algorithm for the Subset-Sum Problem". Although the algorithm and its proof are too complicated to really get into in this README file (these are supposed to be short, right!?), I continue the above description here with an overview! Right, so we'll use our tool dynamic programming on a restricted subset of the possible solutions by whittling these solutions down to a smaller space that we know still contains all possible valid solutions. First we can find the first item too large to fit into the pallet (min { j : sum_(i=1)^(j) (w_i) > c }). We can then consider only solutions using items up to this too-large item. We can find a balanced filling from this too-large item using insert and remove operations to piece by piece try moving our sum of valid items. We can then prove that any valid solution must come from here. We can now use Bellman's algorithm (O(nc) time). Since we know all weights are limited to some max r (2^32-1 for C's unsigned long on x86), we can bound the complexity to O(n^2r). Nothing new there, but we using our balancing we this is improved to just O(nr). Using our balancing we can creating a weight constraint on the too-large item to try each element. We'll need an array of size n*r. Additionally our insert/remove ops will run at most r times the too-large items weight plus the using of the array. Blamo! E[O(nr)] and r is bounded. E[O(n)] time and space. You can see the origin of the name "combo", this algorithm combines bounding techniques and dynamic programming. *** Remaining problems with the combo solver Ok, so this all sounds kind of sweet. There are two areas to go still, however. The above algo runs in E[O(n)] time and space. While we can not do lower than linear (say we did and found no solution, one of the inputs we did not consider may allow the problem to be solved, therefore Omega(n) runtime), can we improve the probablility of reaching O(n)? Can we improve the *constant factors*?! After coming up with the basic above idea I then looked through the NP-C, knapsack, and subset sum literature a bit and found that Rene Beier and Berthold Voecking have been able to extend the combo algorithm, combining it with a strong idea of bounds, pareto-opt fillings, some searching halving, and Hemhauser and Ullmann's old algorithm to get super-fast E[O(n)] times, an *order of magnitude* faster than combo in the tests a followup paper ran. However, I'm but a lowly first year grad student, and so decided to not attempt to implement their algorithm because it seemed complex and because unless all the optimizations are applied, it is slower than combo. combo_solver.cc could also be optimized a bit more, making a copy of the entire input set for a linear-solver can't help time. Such a speedup might hurt superincreasing_solver's runtime however, so I've left things simple. A bigger concern for me is that my implementation is unable to solve some problems I give it that do have solutions. Unfortunately time (again, that seemingly finite resource) has been too short for me to fully understand my problem, although at first guess it appears that the sum of elts weights is overflowing the 32bits for storing ints. truckload is able to solve say a problem of 10,000,000 elements (non-superincreasing) each of weight 100 or less, in 5 seconds, on my laptop. Increasing this max weight much, or increasing it some and increasing the number of elements results in not finding many of the small elements. Perhaps truckload is sufficient to solve the problem acm will test it with, we'll see! Sorry for such a long README, but hopefully the above explainations and proofs were helpful in analysing my submission. Enjoy? :) (In addition to serving a chance to learn about subset sum solving, this contest served as my first use of autoconf and automake, wahooo!)