Pushing the Information-Theoretic Limits of Random Access Lists

Edward Peters, Yong Qi Foo, Michael D. Adams

Published at ICFP 2025.

Pre-print | DOI | Talk

Abstract

Accessing an arbitrary element of a traditional singly linked list or cons list requires traversing up to a linear number of pointers. A random-access list is a data structure that behaves like a cons list except that accessing an arbitrary element requires traversing only a logarithmic number of pointers. Specifically, in a list of length n, an arbitrary element can be accessed by traversing at most 3 * ceil(log(n)) - 5 pointers.

In this paper, we present a simple variation on Myers lists that improves this bound and requires traversing at most 2 * ceil(log(n + 1)) - 1 pointers. We then present a more complicated variation that improves this bound to (1 + 1 / s) * ceil(log(n)) + s + 9 for any s >= 2. This shows that it is possible to get asymptotically close to the information-theoretically optimal bound of ceil(log(n + 1)) - 1.

Tags: Data StructuresFunctional ProgrammingPublished

Class-Dictionary Specialization with Rank-2 Polymorphic Functions

Yong Qi Foo, Michael D. Adams

Abstract

Rank-2 polymorphism enables powerful abstractions and code reuse by allowing functions to accept arguments that are themselves polymorphic. Optimizing compilers like the Glasgow Haskell Compiler (GHC) use techniques like specialization and inlining for optimization. However, these techniques can falter in the presence of rank-2 polymorphism. Specializing the polymorphic arguments of recursive rank-2 polymorphic function applications requires inlining the applied function to expose type applications to the arguments, but the compiler's heuristic-driven inliner is reluctant to inline recursive functions, risking non-termination during compilation. This stalemate causes recursive rank-2 polymorphic functions to remain un-optimized and be left with a runtime penalty.

In this paper, we present a new optimization technique that breaks this stalemate and enables type specialization with recursive rank-2 polymorphic functions. Our approach was discovered and refined through the challenge of optimizing Scrap Your Boilerplate (SYB), one of Haskell's most widely used generic-programming libraries whose combinators are archetypal examples of this problematic pattern. We introduce a partial evaluator that strategically applies the standard transformations of inlining, β-reduction and memoization to applications of rank-2 polymorphic functions that are partially static, i.e., whose arguments have some information known statically. This process exposes type applications of its polymorphic arguments, which can be specialized by the standard compiler optimization pipeline. Additionally, we introduce a novel type-specialization strategy—type-constant folding—to evaluate run-time type-equality tests statically, to further leverage static type information gained from partial evaluation. We implement our technique as a GHC plugin and demonstrate its effectiveness by completely eliminating the overhead of SYB traversals, achieving speedups of 43× on average (up to 155×) to match the performance of their hand-written counterparts. The generality of our technique is shown with benchmarks on Scrap Your Boilerplate With Class library, the main variant of SYB, with speedups of 6.1× on average (up to 9.5×), and on the lens optics library, with speedups of 1.9× on average (up to 3.3×) on one of its modules.

Tags: OptimizationGeneric ProgrammingHaskellPartial EvaluationSubmitted

A Fixed-Point-Oriented Programming Language with User-Specified Optimization Strategies

There are a wide range of problems that are easily solved using a fixed-point algorithm instead of the more common programming strategies of loops and recursions. Examples of these include parsing, static analysis, type-checking, graph algorithms, and automata manipulation. Existing language paradigms, such as the imperative and functional paradigms, support loops and recursions well but support fixed-point algorithms rather poorly.

In this project, we will develop a programming language providing such support. In addition, by specifying problems at the abstraction-level of fixed points, our language can include mechanisms for the programmer to influence and optimize the solution strategy used by the implementation without the programmer having to modify the problem specification. This will allow users of the language to rapidly prototype and test a problem specification and to later specify optimizations while making only minimal changes to existing code, a task particularly difficult for fixed-point problems in existing languages.

Tags: Programming ParadigmsDeclarative ProgrammingFixed-Point-Oriented ProgrammingOptimizationIn Progress (nearing completion)

Optimization Tactics (Functional Pearl)

Program optimizations are frequently done by the compiler. These optimizations are general purpose, so as to extract the best performance out of as many programs as possible. However, general-purpose optimizations may not be well-suited for some specific programs.

Thus, we present optimization tactics, a new construct that allows users to define their own optimization strategies and to apply optimizations at specific program points. The library for supporting optimization tactics is implemented with Lean 4's powerful metaprogramming and theorem proving facilities, which allows users to:

  • Easily define provably correctness-preserving optimization tactics as expression-to-expression transformation functions.
  • Apply an optimization wherever a program term occurs
  • Leverage Lean 4's interactive view to observe every step of the optimization
  • Separate algorithm from optimization.

Optimization tactics thus allows library authors to define domain-specific optimizations, and for library users to incorporate, extend and adapt optimization tactics to their specific use-case. We also propose that the interactivity of optimization tactics in Lean 4 allows researchers to explore new optimization techniques that can be applied to other functional programming languages.

Tags: OptimizationMetaprogrammingLean 4In Progress (early)

Reconstruction of Incomplete Polymorphic Programs

The ability to reconstruct the dependencies of incomplete programs to form complete and well-typed ones provides several benefits, including allowing static analysis tools to analyse incomplete programs where their surrounding dependencies are unavailable, and supporting stub generation and testing tools to work on snippets. However, earlier efforts to do so are unable to work with incomplete programs containing parametrically polymorphic types.

In this paper, we present a technique that receives an incomplete Java program that may contain parametrically polymorphic types, and reconstructs its surrounding dependencies to form a complete and well-typed program. We then present empirical results from our prototype implementation of this algorithm, JavaCIP, which outperforms prior works in this area.

Submitted Author's Copy (Under Review) | Poster

Tags: Type SystemsPolymorphismProgram AnalysisJavaSubmitted

Motivating Mathematics Study in Prospective Computer Scientists with Interactive Theorem Provers

The Curry-Howard correspondence shows that terms, types, programs and program evaluation correspond to witnesses, propositions, proofs and proof normalization in (intuitionistic) logic. This has spawned a large body of work in the literature known as proof assistants or interactive theorem provers that exploit this correspondence via a programming language with a type system that supports proof checking.

On the other hand, the need to study mathematics may not be immediately apparent to budding software engineers, particularly those aiming to enter industry post-study. Our aim is motivate students to study mathematics by showing that the art of writing programs is the same as the art of writing a proof. We do so by writing both proofs and programs in Lean 4.

Tags: EducationIn Progress (early)