Monday, June 22, 2009

Krypton: pseudo-code




Here is some code to get across the basic flow of the algorithm.

1.) Open file, convert hex to binary string/number.

2.) Count ones in string, get index for pattern in subgroup for
Binomial Coefficient / Column in Pascal triangle.

3.) Store ones count in an array and index number (as binary)
becomes new input data/string.

4.) Goto step 2 using new input until data is exhausted.








PSEUDO-CODE:





function krypton(filePath)
{
string fileBinary = hexToBinary(filePath);

kompress(fileBinary);
}


//VARIABLES, will be used to string together final output.

array kountArray; //counts for each iteraction level

array optimizeArray; //optimize BOOL did flip setting per level

int numberOfOnes; //holds number of ones for current level

string answerIndex; //holds index binary string for current level

array trimmedOnes; //leading ones trimmed

//VARIABLES



//will probably always lead with a 1, may need to
//track leading Zero for very first 'file' input

function kompress(inputBinaryString)
{

//keep looping until answer is single digit 1 or 0
while(inputBinaryString.length > 0)
{
//determine number of ones in input string
numberOfOnes = kountOnes(inputBinaryString);
//if string has more ones than zeros, then flip...
//ie 10101, transform into 01010
//simple 1-bit replacement based optimize,
//could use 2-bit.... or other algorithms.
//may need to track trimmed leading 1's
if(numberOfOnes > inputBinaryString.length/2)
{
//store interation = DID flip
optimizeArray[] = 1; //flipbits
//faster than reparsing
numberOfOnes = inputBinaryString.length - numberOfOnes;
//flip 1's and 0's for eachother
inputBinaryString = flipString(inputBinaryString);
}
else
{
//store interaction = did NOT flip
optimizeArray[] = 0; //flipbits
}
//store 1's count
kountArray[] = numberOfOnes;

//get index for data pattern within triangle column...
answerIndex = indexForPattern(inputBinaryString,numberOfOnes);
//iterate, new input is the index in binary
kompress(answerIndex);
}

}







///// INDEXING CODE BELOW
///// INDEXING CODE BELOW


//input 11001, = 3/5 = 10 patterns
//INDEX BREAKDOWN
00111 = 0
01011 = 1
01101 = 2
01110 = 3
10011 = 4
10101 = 5
10110 = 6
11001 = 7
11010 = 8
11100 = 9
//INDEX BREAKDOWN



//ALL THANKS to user 'Paul Miner' of ArsTechnica Forum
//for original algorithm used below (July 2005)
//arstechnica.com/groupee/forums?a=tpc&f=6330927813&m=787007444731
//major speed bottleneck in current form, this function
//needs to be refactored/vectorized for neglible speed per loop

function indexForPattern(binaryString, ones)
{

int encounteredOnes;
int onePose;
int index;
int startPosition;
//11001 becomes 10011
string revereString = reverse(binaryString);

index = 0;
encounteredOnes = 0;
startPosition = 0;
//get position of '1' in string revere, start position...
//one time scanner to postion array quicker?
while (onePose = position('1',revereString,startPosition))
{

//another 1 encountered... sine while resolved to true
encounteredOnes++;
//add value of binomial coefficient to existing index value;
index += binCoe(onePose,encounteredOnes);
//new start
startPosition = onePose+1;

//LOOP BREAKDOWN, positions
//10011 =
//1 = pos: 0
//0 = nope
//0 = nope
//1 = pos: 3
//1 = pos: 4
//LOOP BREAKDOWN, binCoe(onePose,encounteredOnes);
//binCoe(0,1); = 0
//binCoe(3,2); = 3
//binCoe(4,3); = 4
//LOOP BREAKDOWN

}

}


///// END INDEXING CODE
///// END INDEXING CODE







Friday, June 19, 2009

Project: Krypton

Mathematical compression based on Pascal's Triangle, binomial coefficient, combinadic(s), etc.





stolen from (http://cas.umkc.edu/math/ExtraCaption.asp)


Basically, I've taken this as far as I can go alone... without help it will never be seen, let alone work.

The premiss is very simple... it is a mathematical, iterative algorithm that uses the binomial coefficient rather than being dictionary/symbol based.

It has several performance problems that do not allow me to test and otherwise effect it's general feasibility. Among these problems are the massive amounts of memory and computing power currently required for the algorithm. I believe intelligent optimizations in the underlying math can improve the computing speed at least.

Algorithm: combinations

The algorithm started as a non-iteractive, computive formula based on the idea of finite systems for a given data length.

In the simple-est of terms, given a fixed number of slots to occupy with the binary digits 1 or 0, there is a set number of possible combinations for those digits. The number of possibilities is easily computed using the following formula: 2^N (two raised to the exponent N), whereas N is the number of available slots.

For example a 2 slot system has 2^2 (= 4) possible data sets (00,01,10,11). A 3 slot system would have 2^3 (= 8) possible combinations (000,001,010,100,011,101,110,111).

These combinations can further be subdivided into groups based on how many 1's occupy the available slots. In the 3 slot example there are 4 groups... containing combinations of either 0, 1, 2, or 3 ones. Here are the groups broken down:

0 ones = 000

1 one = 001, 010, or 100

2 ones = 011, 101, or 110

3 ones = 111

As you can see there are 1, 3, 3, 1 combinations for 0, 1, 2, 3 ones.

This same distribution is visible in the fourth row of the graphic below (commonly known as a Pascal's Triangle, though the concept predates Pascal by centuries). Any row (and slot index) can be computed using the binomial coefficient.


Algorithm: compression, failed

So how do we get compression out of combinations?

If you look at the 3 slot example you see that the range of ones in a pattern must be between 0 and 3. The numbers 0-3 can be represented using two binary digits... 00, 01, 10, 11 represent 0, 1, 2, 3 counts for the available slots.

The goal of any compression algorithm is to represent data using smaller amounts of data. For a 3 slot system you can reperesent the number of ones with 2 digits... one less than the 3 comprising the data.

Unfortunately those 2 digits can only tell the de-compression algorithm which subgroup the original data is in. For example... a count of 2 (10 in binary) lands the decompressor in the group containing these 3 possibilities 011, 101, or 110 (each containing two 1 digits and one 0 digit).

There is no way for the decompressor to know which is the desired pattern without more information and we only have 1 bit left to help it out. The 2 problems with using that 1 bit...1.) is that it can only specify 2 choices rather than the 3 we seek an answer about... 2.) using this 1 bit brings us back to the original 3 bits therefore not saving us any data.

How do we solve these problems?

In any system it is a good idea to see how it scales. For example... if 2 digits can count a 3 slot data set, how many can 3 digits count? The answer of course is 2^3 which equals 8 and can represent a count of between 0 and 7 ones. Therefore a dataset of seven digits, such as '1010101' has a count of 4 ones (and 3 zeros) which will be represented by the binary string '100'. In this case we now have 4 of the original 7 digits to help us guess which pattern within a group is the correct one.

Unfortunately since our count is 4 there are 35 possible patterns (see 8th row in triangle graphic)... if we had a count of 3 it would also be 35 possible patterns. We have 4 bits left to help us guess (3 if we want compression). The 4 bits only help us guess within a set of 16 possibilities (2^4 = 16). If we had 5 bits we could tell the decompressor that it is one of a possible 32 patterns (2^5). Still that doesn't cover the full 35 available in a 7 slot system with 3 or 4 of the slots filled with ones. Also 3 counting bits plus the 5-6bits needed to help choose the correct pattern is a total of 8-9 bits.... which is actually more than the original 7 slots.

This is a major problem in pursuing a counting index-based algorithm.

The trend of having more guessing bits continues as you move to larger slotted data sets. Below are some examples of how many slots/bits are supported by a particula number of counting bits.


counting bits data set slots file size
16 65,535 slots 8 KB
24 16,777,215 slots 2 MB
32 4,294,967,295 slots 512 MB
35 34,359,738,367 slots 4 GB
43 8796093022208 slots 1 TB


As you can see in the chart above, even at 24 counting bits we can represent a large number of slots. The problem is when the number of filled slots is close to half the number of bits (for example 8,388,607 ones out of 16,777,215 slots... as it would be in the worse case scenario).

In this case the count would cost you 24 bits and the binary number to represent the index (of the correct pattern) within the group would probably at least a bit or 2 over 16,777,192.... lets say 16,777,200... plus the 24 counting bits.... so 16,777,224... slightly over our our original data length.

This is the same problem we encountered using 3 bits to count a 7 bit/slot data system.

Frustrating isn't it?

Algorithm: iteration

So whats the current problem... we just spent 24 bits to represent how many 1's are in these 16,777,215 slots... and we need a ton more to represent which is the correct pattern out of the many possibile ones that exist for that particular slot count. We need almost, but not quite as many digits as the original data... so really no compression is happening if we store our count and our 'answer index' bits together for the decompressor.

But what if we didn't store our 'answer index' bits?

What if we instead counted how many bits are 1's in our 'answer' bits?

But how is that going to help?... If the number of slots is almost as big as the original.... then wont the new count and the new answer also be too big? Well the key here is that its almost as big.... but its not quite as big. That and we aren't going to store the answer index this time either.... we are just going to store the 24 count bits from the original data as well as the 24 count bits from the 'answer bits' (now our new data).

Now if the original data is 16,777,215 we can do @699,050 iterations using a 24 bit counter. Of course if our final answer is more that 1 bit than we don't have compression again.

The question becomes how many iterations do we have to do before our answer is small enough that if we tack our answer onto our list of counters it is still smaller that the original. Is it even possible? Yes, it is possible... most mathematical compression algorithms work at least some of the time. We are trying to tackle the worst case scenario here as well (cases further from the center of the normal curve decay faster).

And in practice, after a certain number or iterations it will take only 23 bits to count the new data, then that will shrink to 22 bits eventually after a similar number of iterations.

I think this is enough for the first post. All ideas and concepts are mine alone and go back over many years. Don't steal them unless you wanna pay me one day. I'm going to slowly add the rest the concepts, to hopefully get some kind of open yet profitable implementaion going.

I have a few illustrative examples via code and graphics that I will post at some point.



Sunday, March 29, 2009

Wisdom Teeth (and general ugliness)

I had my wisdom teeth out the day Princess Diana died. All four teeth, while in college, with my roommate home on break. I had friends Paul and Gaby bring me there and bring me home. The actual operation wasn't so bad... an IV anesthetic, preceded by giggly gas... I did at one point hear my teeth being yanked or crushed, though I was only very barely conscious. I went home with my Mepergan pain killers and my Ensure drinks (which I was truthfully told would be all I would want to eat for awhile). I upchucked a stomach-full of blood (not my first bloody surgery), took my pills (which made me sick so much I didn't use them further) and passed out from 4pm until 1am. I was greeted on the television by George Clooney telling me that the princess of whales was dead. Now... in the state that I is was in (Georgia 1997, LOL)... I took that at face value and surely thought I would soon see the expected National Geographic tribute specials and be told about how I needed cut up my soda pack plastic, etc. But no it wasn't any species of whale and I quickly lost interest... which brings me to this:

Should people have their wisdom teeth removed... and more generally, should cosmetic and intrinsic flaws be corrected by man or nature?

I'll tell you the pain of recovery (which took a year fully) was about as annoying as continuously having the teeth break through my gums and then to have my gums try to heal up over them. Though I looked forward to kissing a girl and not be scared of having a wisdomy teethy mouth. In the long term it made MY life better.... but for humanity's sake it was a selfish act and will keep future generations in pain. You see... if someone hadn't been born into 'modern' society thousands of years ago.... they would have had to deal with it naturally... perhaps getting impacted teeth, infection, and dying. If he had been lucky in the genetic lottery and gotten smaller or no wisdom teeth at all... well than that good trait would have evolved.

The problem is society outraced nature and we ended up in an eddy of being neither here nor there as far as it being a good trait. In ancient time I have no doubt that we had bigger jaws... but also by the age of 16 had probably lost a molar or 2. I mean we get 2 sets of teeth already.... I think of the wisdom teeth as a last ditch effort to give a caveman some useable choppers. But what happened was... I don't know. Perhaps bigger cavemen met smaller cavemen and there was a disconnect in jaw sizes. I'm leaning more towards people living healthier lives, as society and written history came into existence. People's teeth lasted longer and they didn't need a spare set of molars anymore. But this happened far too quickly for genetics and natural selection to deal with. They would have eventually course corrected via death of people with infected teeth, and procreation for people with bigger jaws or smaller teeth or eventually no wisdom teeth at all. But something else 'bad' happened that stopped evolution in its tracks... people and society could now medically (or otherwise) deal with this scourge of painful teeth. Just yank them out and get your lollipop. Done and done, right? Nope... here comes the next part:

Fixing that which is broken in human physicality and not passing that fix on is bad long term

By fixing the teeth problem you allow a man and women who both had super, giant, ridiculously badly engineered wisdom teeth to mate without ever knowing that the other one was carrying this flaw. 9 months later a bouncing baby bad-ass is born, who will doubly suffer later in life through his parents fixing themselves in a non-genetic manner. If the teeth had been removed through thousands of years of evolution (or modern gene manipulation)... than every child that baby had will no longer have to suffer. The amount of pain in the world would be drastically reduced on a planetary scale... whether thats good, I don't know (in the sense that sometimes you learn through suffering or adversity). But this doesn't just apply to teeth.

The cosmetic and the intrinsic

Bigger boobs, perfect noses, straight teeth... these are cosmetic flaws that people perfect through surgery and devices. Cancer, liver failure, chronic diseases.... these are intrinsic flaws that medicine tries to deal with. Both good in the short term and both lead to suffering for society at large. Flaws can be amplified if both parents have the flaw and both have hidden or otherwise dealt with it. If only these fixes were on a genetic level.... think of the suffering that would disappear. From the minor suffering of being an awkward teen in braces to the horrible suffering of dying before your time. These could be wiped out, forever. Of course... you don't want everyone with the same nose, same smile, same hair and same skin color. It would be creepy and just plain boring. But certain baseline 'ideals' could be strived for... then minor cosmetic fixes could be applied. It would still be a 'lie' to a degree.... but a much smaller lie. As far as disease go... we don't need a baseline... just get rid of them. Now yes, some diseases may carry beneficial genetic traits along for the ride... so careful study should be made. Baby/bath-water... you get the point?

Yeah, what is the point?

I dunno... marry the ugliest, average, most unhealthy person you can stand? Hah... OK, not that... I guess I just hate the lie of perfection that society has given us and the huge amounts of past and future suffering that could be avoided if everyone truly were more 'perfect'. OK, 2 rants 1 day... done for now.

Perfect Vision

I often hear people use the term 'perfect vision' and on occasion want to correct them. What they really are describing is more so 'ideal' or as I (incorrectly) call it 'average' vision. If there were no near sighted people, there would have been no watchmakers... likewise farsighted-ness is likely useful in archery and sports. But I agree, 20/20 vision is ideal for most people as it allows society to have a standard baseline which can be used to determine how big to make traffic signs, etc. I guess it just irks me when these 'average' people try to pawn off themselves as 'perfect'. They are ideal and in some instances superior, but they are far from being perfect. If that were the case 'perfect' intelligence would be the center on the bell curve and anyone falling on either side should be considered faulty. Anyways... I hope all of you perfect people are enjoying your perfect vision and your perfect intelligence on this drizzly Sunday morning.