Fenwick tree

A Fenwick tree or binary indexed tree is a data structure that can efficiently update elements and calculate prefix sums in a table of numbers. This structure was proposed by Peter Fenwick in 1994 to improve the efficiency of arithmetic coding compression algorithms.[1]

When compared with a flat array of numbers, the Fenwick tree achieves much higher performance for two operations: element update and prefix sum calculation. In a flat array of numbers, calculating prefix sum and updating the elements both require time. Fenwick trees allow both operations to be performed in time. This is achieved by representing the numbers as a tree, where the value of each node is the sum of the numbers in that subtree. The tree structure allows operations to be performed using only node accesses.

Description

Given a table of elements, it is sometimes desirable to calculate the running total of values up to each index according to some associative binary operation (addition on integers, for example). Fenwick trees provide a method to query the running total at any index, in addition to allowing changes to the underlying value table and having all further queries reflect those changes. Although Fenwick trees are trees in concept, in practice they are implemented using a flat array analogous to implementations of a binary heap. Given an index in the array representing a vertex, the index of a vertex's parent or child is calculated through bitwise operations on the binary representation of its index. Each index contains the pre-calculated sum of a section of the table, and by combining that sum with section sums encountered during an upward traversal to the root, the prefix sum is calculated. When a table value is modified, all section sums which contain the modified value are in turn modified during a similar traversal of the tree. The section sums are defined in such a way that queries and modifications to the table are executed in asymptotically equivalent time - in the worst case.

The initial process of building the Fenwick tree over a table of values runs in time. Other efficient operations include locating the index of a value if all values are positive, or all indices with a given value if all values are non-negative. Also supported is the scaling of all values by a constant factor in time. The sum of an arbitrary subarray can also be calculated by subtracting the prefix sum before its start point from the prefix sum at its end point.

Applications

Fenwick trees are used to implement the arithmetic coding algorithm. Development of operations it supports were primarily motivated by use in that case.

Using a Fenwick tree is very easy to generate the cumulative sum table. From this cumulative sum table it is possible to calculate the summation of the frequencies in a certain range in order of .

Fenwick trees can also be used to update and query subarrays of multidimensional arrays. These operations can be performed with complexity , where d is number of dimensions and n is the number of elements along each dimension.[2]

Implementation

A simple C implementation follows.

/*******************************************************************/
/* This program demonstrates the speedup for a Fenwick tree vs a   */
/* flat array for performing multiple prefix sums.  It generates a */
/* sequence of random numbers, then performs 1000 summations,      */
/* each of which starts and ends at random indices.                */
/*******************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

// Number of elements in the array
#define N (1024*1024*16)

// Number of queries to perform
#define NQUERIES 1000

// Macro to zero all except the least significant non-zero bit
#define LSB(i) ((i) & -(i)) 

// Fen_sum: returns the sum of elements from index 0 to index i
double Fen_sum(double *a, int i) 
{
  double sum = 0.0;
  i++;
  while (i > 0)
    {
      sum += a[i-1];
      i -= LSB(i);
    }
  return sum;
}

// Fen_add: adds k to element i
void Fen_add(double *a, int size, double k, int i) 
{
  i++;
  size++;
  while (i < size) 
    {
      a[i-1] += k;
      i += LSB(i);
    }
}

// Fen_get: Returns the value of the element at index i
double Fen_get(double *a, int i)
{
  return Fen_sum(a,i)-Fen_sum(a,i-1);
}

// Fen_set: sets the value of the element at index i
void Fen_set(double *a, int size, double value, int i)
{
  Fen_add(a,size,value-Fen_get(a,i),i);
}


// main: allocates an array of doubles and fills it with a sequence of
// random numbers, then performs a series of summations (queries).  It
// then repeats the process using a Fenwick tree rather than a flat
// array.  The sequence of random numbers and the bounds of each query
// are identical for the flat array and the Fenwick tree.  The time
// requierd to populate the data structure and the total time required
// for all queries is calculated and reported for the flat array and
// for the Fenwick tree.
int main()
{
  double *a;
  double asum;
  int i,j,tmp;
  long long seed;
  double time1,time2,time3;
  int queries[NQUERIES*2];

  // get a random number seed based on time
  seed = time(NULL);
  // generate the bounds for all of the queries
  srandom(seed);
  for(i=0;i<NQUERIES*2;i+=2)
    {
      queries[i]=random()%N;   // lower bound
      queries[i+1]=random()%N; // upper bound
      if(queries[i]>queries[i+1]) // may need to swap them
        {
          tmp=queries[i];
          queries[i]=queries[i+1];
          queries[i+1]=tmp;
        }
    }
  // allocate the array of doubles
  a = malloc((N)*sizeof(double));
  // The following loop forces all pages into memory (otherwise the
  // timing of the algorithm could include a lot of page faults)
  for(i=0;i<N;i++)  
    a[i]=0.0;

  /*****************************************************************/
  /*   FLAT ARRAY METHOD                                           */
  /*****************************************************************/
  
  // get the current time 
  time1 = clock();
  // add random numbers to a flat arrary
  srand48(seed);
  for(i=0;i<N;i++)
    a[i] = drand48();
  // track time required to fill the data structure  
  time2 = clock();
  // perform the queries
  for(j=0;j<NQUERIES*2;j+=2)
    {
      asum=0.0;
      for(i=queries[j];i<queries[j+1];i++)
        asum += a[i];
      printf("%.3lf ",asum);
    }
  // track time required for the queries  
  time3 = clock();
  // print out the times
  printf("\nFlat Array:\n Build: %lf\n Query: %lf\n",
         (time2-time1)/CLOCKS_PER_SEC,
         (time3-time2)/CLOCKS_PER_SEC);

  /*****************************************************************/
  /*   FENWICK TREE METHOD                                         */
  /*****************************************************************/

  // get the current time 
  time1 = clock();  
  // Add random numbers to a Fenwick tree.  Fen_set is relatively
  // expensive, so initialize all elements to 0.0, then use Fen_add
  // instead of Fen_set... Much faster!
  srand48(seed);
  for(i=0;i<N;i++)
    a[i]=0.0;
  for(i=0;i<N;i++)
    Fen_add(a,N,drand48(),i);
  // track time required to fill the data structure        
  time2 = clock();  
  // perform the queries
  for(j=0;j<NQUERIES*2;j+=2)
    {
      asum = Fen_sum(a,queries[j+1])-Fen_sum(a,queries[j]);
      printf("%.3lf ",asum);
    }
  // track time required for the queries  
  time3 = clock();
  // print out the times
  printf("\nFenwick:\n Build: %lf\n Query: %lf\n",
         (time2-time1)/CLOCKS_PER_SEC,
         (time3-time2)/CLOCKS_PER_SEC);

  free(a);
  return 0;
}

See also

References

  1. Peter M. Fenwick (1994). "A new data structure for cumulative frequency tables". Software: Practice and Experience. 24 (3): 327–336. doi:10.1002/spe.4380240306.
  2. Pushkar Mishra (2013). "A New Algorithm for Updating and Querying Sub-arrays of Multidimensional Arrays". arXiv:1311.6093Freely accessible.

External links

This article is issued from Wikipedia - version of the 9/22/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.