Dynamic Datastructures in ACL2: A Challenge

                      David Greve and Matt Wilding
                               Nov. 2002

Background
----------

Arrays, structures, and stacks are common data structures that can be
found in some form in nearly every useful software system.  The
utility that such structures provide in implementing complex systems,
however, hinges on the often overlooked assumption of independence.
What the programmer often takes for granted - that modifying structure
X will not change the value of structure Y - becomes a significant
burden when the formal correctness of the final implementation is
considered in the context of a linear address space.

Several verification projects with which we are familiar have faced
this challenge, including Moore's Piton compiler, Bevier's KIT proof,
and Yu's 68020 code proofs.  In each project the disjointness of data
structures was described and verified as part of the larger
verification effort.  However, the proof architecture in each of these
projects involves blocks of data that resided within distinct ranges
rather than arbitrary, dynamic structures.  This approach is not
feasible for the kinds of complex, pointer-rich data structures we
encounter in proofs about microprocessor microarchitectures.

A first challenge
-----------------

We start with two functions, a-mark-objects and b-mark-objects, that
operate on "A" and "B" type data structures, respectively.  The
functions a-collect and b-collect crawl over their respective data
structures collecting lists of addresses used by the structures. The
function compose-bab is a composition of two a-mark-objects and a
b-mark-object.  (Note the definitions in the code below.)

Prove the following theorems:

;; The read of an arbitrary address location outside of the
;; data structure modified by a-mark-objects is not affected
;; by a-mark-objects.

(defthm rd-read-over-a-mark-objects
  (implies
   (let ((list (a-collect ptr n ram)))
     (and
      (not (member addr list))
      (unique list)))
   (equal
    (g addr (a-mark-objects ptr n ram))
    (g addr ram))))

;; The read of an arbitrary address location outside of the
;; union of the three data structures modified by compose-bab
;; is not effected by compose-bab.

(defthm read-over-bab
  (implies
   (let ((list (append (b-collect ptr1 n1 ram)
                   (a-collect ptr2 n2 ram)
                   (b-collect ptr3 n3 ram)
                   )))
     (and
      (not (member addr list))
      (unique list)))
   (equal
    (g addr (compose-bab ptr1 n1 ptr2 n2 ptr3 n3 ram))
    (g addr ram))))


;; a-mark-objects and b-mark-objects commute when their respective
;; data structures are disjoint.

(defthm a-mark-over-b-mark
  (implies
   (let ((list (append (a-collect ptr1 n1 ram)
                   (b-collect ptr2 n2 ram))))
     (unique list))
   (equal
    (a-mark-objects ptr1 n1 (b-mark-objects ptr2 n2 ram))
    (b-mark-objects ptr2 n2 (a-mark-objects ptr1 n1 ram))))
  :hints (("goal" :in-theory (enable a-mark-objects-open))))


More challenges
---------------

1) Add fields to the "A" and "B" data structures and measure the
performance of your proof.  You may have constructed a proof that
scales quadratically with the number of entries in the data structure.
Is there a more efficient solution?

2) The proof of read-over-bab suggests that many theorems will need to
be proved about a system with many unique functions and data
structures. Generalize the notion of "collect" so that only a single
version of it need be introduced.  (And, therefore, only a single
"read-over-write theorem need be proved for each data structure
operation.)  One approach for accomplishing this would be to provide a
structure "map" as an argument to "collect".

3) In a system with many functions and data structures, it would be
tedious to prove the various "a-mark-over-b-mark"-type theorems for
all possible combinations of functions and data structures.  Is there
some technique that would allow us to state and prove, in general,
that for all functions (f1,f2) that operate only on well defined data
structures (d1,d2) we get:

(defthm xxx
  (implies
   (unique (append d1 d2))
   (equal (f1 (f2 ram))
        (f2 (f1 ram)))))

Can the application of this theorem be automated?

4) The proposed representation of data structures is somewhat limited.
Circular data structures, data structures with several pointers to one
node, are precluded by the assumption of uniqueness.  How might such
structures be incorporated into the reasoning framework developed so
far?

5) What other issues might be encountered in the application of these
techniques to real-world problems?


Code
----

; We assume that two books have been loaded, part of the ACL2
; distribution:

; (include-book "XXX/books/misc/records")
; (include-book "XXX/books/arithmetic/top-with-meta")

(defun seq-int (start len)
  (if (zp len)
      nil
    (cons start
        (seq-int (1+ start) (1- len)))))

(defun unique (list)
  (if (consp list)
      (and (not (member (car list) (cdr list)))
         (unique (cdr list)))
    t))

;; We introduce two data structures.

;; "A" nodes have 4 words.  Words 0 and 3 are scalars and words 1 and
;; 2 are A node pointers.

;; Collect addresses in the first n nodes of an a structure
(defun a-collect (ptr n ram)
  (declare (xargs :measure (nfix n)))

  ;    +---+
  ; 0: |   |
  ;    +---+
  ; 1: | o-+--> a-node
  ;    +---+
  ; 2: | o-+--> a-node
  ;    +---+
  ; 3: |   |
  ;    +---+

  (if (zp n)
      nil
    (if (zp ptr)
      nil
      (append
       (seq-int ptr 4)
       (a-collect (g (+ 1 ptr) ram) (1- n) ram)
       (a-collect (g (+ 2 ptr) ram) (1- n) ram)
       ))))

;; "B" nodes have 3 words.  Word 2 is an integer, words 0 and 1 are
;; B-node pointers.  "0" is a null pointer

;; Collect addresses from the first n nodes of a b structure
(defun b-collect (ptr n ram)
  (declare (xargs :measure (nfix n)))

  ;    +---+
  ; 0: | o-+--> b-node
  ;    +---+
  ; 1: | o-+--> b-node
  ;    +---+
  ; 2: |   |
  ;    +---+

  (if (zp n)
      nil
    (if (zp ptr)
      nil
      (append
       (seq-int ptr 3)
       (b-collect (g ptr ram) (1- n) ram)
       (b-collect (g (+ 1 ptr) ram) (1- n) ram)
       ))))

;; Crawl through at most n nodes in an "a" structure along the second
;; pointer.  Modify word 0 of each node by adding word 2 to it.
(defun a-mark-objects (addr n ram)
  (declare (xargs :measure (nfix n)))
  (if (zp n) ram
    (if (zp addr) ram
      (let ((ram (s addr (+ (g addr ram) (g (+ 2 addr) ram)) ram)))
      (a-mark-objects (g (+ addr 2) ram) (1- n) ram)))))

;; Crawl through at most n nodes in an "b" structure along the pointer
;; in the first field

(defun b-mark-objects (addr n ram)
  (declare (xargs :measure (nfix n)))
  (if (zp n) ram
    (if (zp addr) ram
      (let ((ram (s (+ 2 addr) 0 ram)))
      (b-mark-objects (g addr ram) (1- n) ram)))))

(defun compose-bab (ptr1 n1 ptr2 n2 ptr3 n3 ram)
  (let ((ram (b-mark-objects ptr1 n1 ram)))
    (let ((ram (a-mark-objects ptr2 n2 ram)))
      (let ((ram (b-mark-objects ptr3 n3 ram)))
      ram))))


 