Introduction

Arc-Lang is a programming language for continuous analytics. Continuous analytics is about analyzing data as soon as it is produced and possibly using the results for critical decision-making. Data is generally unstructured (e.g., images, audio, text) or semi-structured (e.g., JSON, XML, CSV), and is produced at a high velocity. Programs for continuous analytics must be able to scale-out their compute and storage while simultaneously execute forever with fault-tolerance.

{{#include ../../arc-lang/examples/wordcount.arc:example}}

The goal of Arc-Lang is to make big-data analytics easy. Arc-Lang targets streaming analytics (i.e., processing data continuously as it is being generated) and batch analytics (i.e., processing data in large chunks all-at-once). From the streaming-perspective, Arc-Lang must be able to manage data at a fine granularity that is generated by many types of sensors, arriving at varying rates, in different formats, sizes, qualities, and possibly out-of-order. Datastreams can in addition be massive in numbers, ranging into the billions, due to the plethora of data sources that have emerged in the recent IoT boom. From the batch-perspective, Arc-Lang must be able to handle different kinds of collection-based data types whose sizes can scale to massive sizes, e.g., tensors and dataframes. Operations should to a large degree be agnostic of the collection type.

{{#include ../../arc-lang/examples/wordcount.arc:polymorphic}}

To cope with the requirements of batch and stream data management, a runtime system is needed which can exploit distributed programming to enable scalability through partitioning and parallelism. Distributed programming is however difficult without abstraction. Application developers must manage problems such as fault tolerance, exactly-once-processing, and coordination while considering tradeoffs in security and efficiency. To this end, distributed systems leverage high-level DSLs which are more friendly towards end-users. DSLs in the form of query languages, frameworks, and libraries allow application developers to focus on domain-specific problems, such as the development of algorithms, and to disregard engineering-related issues. In addition, DSLs that are intermediate languages have been adopted by multiple systems both as a solution to enable reuse by breaking the dependence between the user and runtime, and to enable target-independent optimisation. There is always a tradeoff that must be faced in DSL design. DSLs make some problems easier to solve at the expense of making other problems harder to solve. How a DSL is implemented can also have an impact on its ability to solve problems. DSLs can be categorized as follows:

Approach

In contrast to other DSLs, Arc-Lang is a standalone compiled DSL implemented in OCaml. The idea of Arc-Lang's is to combine general purpose imperative and functional programming over small data with declarative programming over big data. As an example, it should be possible to perform both fine-grained processing over individual data items of a datastream, while also being able to compose pipelines of relational operations through SQL-style queries. Arc-Lang is statically typed for the purpose of performance and safety, but at the same time also inferred and polymorphic to enable ease of use and reuse.

The approach of implementing the language as a standalone DSL allows for more creative freedom in the language design. At the same time, this approach requires everything, including optimisations and libraries, to be implemented from scratch.

To address the issue of optimisation, we are using the MLIR compiler framework to implement Arc-MLIR - an intermediate language - which Arc-Lang programs translate into for optimisations. MLIR defines a universal intermediate language which can be extended with custom dialects. A dialect includes a set of operations, types, type rules, analyses, rewrite rules (to the same dialect), and lowerings (to other dialects). All dialects adhere to the same meta-syntax and meta-semantics which allows them to be interweaved in the same program code. The MLIR framework handles parsing, type checking, line information tracking among other things. Additionally, MLIR provides tooling for testing, parallel compilation, documentation, CLI usage, etc. The plan is to extend Arc-MLIR with custom domain-specific optimisations for the declarative part of Arc-Lang and to capitalize on MLIR's ability to derive general-purpose optimisations such as constant propagation for Arc-Lang's functional and imperative side.

To address the shortcoming of libraries, Arc-Lang allows both types and functions to be defined externally (inside Rust) and imported into the language. Most of the external functionality is encapsulated inside a runtime library named Arc-Sys. Arc-Sys builds on the kompact Component-Actor framework to provide distributed abstractions.

Summary

In summary, Arc-Lang as a whole consists of three parts:

  • Arc-Lang: A high-level programming language for big data analytics.
  • Arc-MLIR: An intermediate language for optimising Arc-Lang.
  • Arc-Sys: A distributed system for executing Arc-Lang.

Getting Started

This section explains how to get started with Arc-Lang.

Prerequisites

The following dependencies are required to build Arc-Lang from source:

  • CMake
  • Clang
  • Ninja
  • Rust

macOS

On macOS, dependencies can be installed with:

brew install cmake ninja clang
curl https://sh.rustup.rs -sSf | sh

Ubuntu

On Ubuntu, dependencies can be installed with:

sudo apt install cmake ninja-build clang
curl https://sh.rustup.rs -sSf | sh

Installation

To install Arc-Lang, clone the repo and run the build script:

git clone https://github.com/cda-group/arc/
cd arc
git submodule update --init --recursive
./build

The build script installs the arc command-line utility along with the arc-sys library.

Hello World

Arc-Lang files have the .arc file extension. For example:

# hello-world.arc

def main() = print(

"Hello World"

)

To execute the above program, run:

arc run hello-world.arc

Tour of Arc-Lang

This section presents a brief tour of Arc-Lang, introducing the basics and unique concepts of the language. The tour assumes you have basic knowledge in some other programming language.

Queries

Arc-Lang allows the formulation of queries over data collections (similar to SQL). This concept is borrowed from Morel which embeds SQL-style queries over relations in StandardML. The following query defines a word-count application.

{{#include ../../../arc-lang/examples/wordcount.arc:example}}

The top-level statements provided by Arc-Lang are:

OperatorDescription
fromIterate over data items
yieldMap data items
whereFilter data items
joinJoin data items on a common predicate
groupGroup data items by a key
windowSliding or tumbling window
computeAggregate data items
reduceRolling aggregate data items
sortSort data items
splitSplit data items into distinct collections

The exact syntax of these statements is described here.

Query expressions

A query in Arc-Lang is a type of expression that takes data collections as operands and evaluates into a data collection as output. A data collection could either be a finite dataframe or an infinite datastream. Queries begin with the from keyword and are followed by a set of statements. The from statement can in of itself be used to compute the cross product between collections.

val a = [

1

,

2

]; val b = [

3

]; val c = from x in a, y in b; # Equivalent to: # val c = a.flatmap(fun(x) = b.map(fun(y) = {x:x,y:y})); assert(c == [{x:

1

,y:

3

},{x:

2

,y:

3

}]);

Projection

The yield clause can be used to project elements

val a = [[[

0

,

1

]],[[

2

,

3

]],[[

4

,

5

]]]; val b = from x in a, y in x yield y+

1

; # Equivalent to: # val b = a.flatmap(fun(x) = x.map(fun(y) = {x:x,y:y})) # .map(fun(r) = r.y +

1

); assert(b == [

1

,

2

,

3

,

4

,

5

,

6

]);

Selection

The where clause can be used for retaining elements which satisfy a predicate.

val a = [

0

,

1

,

2

,

3

,

4

]; val b = from x in a where x %

2

==

0

; # Equivalent to: # val b = a.filter(fun(x) = x %

2

==

0

); assert(a, [

0

,

2

,

4

]);

Any kind of expression can be used as a predicate, as long as it evaluates into a boolean value.

Ordering

The order clause can be used to sort elements according to a comparison function. By default, sorting is ascending.

val a = [

3

,

1

,

2

,

4

,

0

]; val b = [{k:

3

,v:

3

},{k:

1

,v:

2

},{k:

2

,v:

9

},{k:

4

,v:

1

},{k:

0

,v:

3

}]; val c = from x in a order x; val d = from x in a order x desc; val e = from x in a order x with fun(x,y) = x <= y; val f = from x in a order x on x.k; # Equivalent to: # val c = x._sort(fun(x) = x); # val d = x._sort(fun(x) = x, _desc: Order::Descending); # val e = x._sort(fun(x) = x, _compare: fun(x,y) = x <= y); # val f = x._sort(fun(x) = x, _on: fun(x) = x.k); assert(c == [

0

,

1

,

2

,

3

,

4

]); assert(d == [

4

,

3

,

2

,

1

,

0

]); assert(e == [

0

,

1

,

2

,

3

,

4

]); assert(f == [{k:

0

,v:

3

},{k:

1

,v:

2

},{k:

2

,v:

9

},{k:

3

,v:

3

},{k:

4

,v:

1

}];

Streams cannot be sorted since sorting requires that the input is finite.

Aggregating

The compute clause can be used to aggregate values.

val a = [

1

,

2

,

3

]; val b = [{v:

1

},{v:

2

},{v:

3

}]; val c = from x in a compute sum; val d = from x in b compute sum of x.v; # Equivalent to: # val c = a._compute(sum); # val d = a._compute(sum, _of: fun(x) = x.v); assert(c == {sum:

6

}); assert(d == {sum:

6

});

Aggregation is only allowed on finite-sized data collections.

Rolling Aggregates

The reduce clause can be used to compute rolling aggregates.

val a = [

1

,

2

,

3

]; val c = from x in a reduce sum; # Equivalent to: # val c = a._reduce(sum); assert(c == [{sum:

1

}, {sum:

3

}, {sum:

6

}]);

Rolling aggregates are allowed on both finite- and infinite-sized data collections.

Custom Aggregators

It is possible to define custom aggregators that can be used inside compute clauses. The sum, count, and average aggregators are monoids:

val count = Aggregator::Monoid({
    lift: fun(x) = {count: 

1

}, identity: fun() = {count:

0

}, merge: fun(x,y) = {count: x.count+y.count}, lower: fun(x) = x, }); val sum = Aggregator::Monoid({ lift: fun(x) = {sum: x}, identity: fun() = {sum:

0

}, merge: fun(x,y) = {sum: x.sum+y.sum}, lower: fun(x) = x, }); val average = Aggregator::Monoid({ lift: fun(x) = {sum: x, count:

1

}, identity: fun() = {sum:

0

, count:

0

}, merge: fun(x,y) = {sum: x.sum+y.sum, count: x.count+y.count}, lower: fun(x) = {average: x.sum/x.count}, });

Aggregators that operate on the whole input such as median are holistic:

val median = Aggregator::Holistic(
    fun(v) = {
        val n = v.len();
        if n % 

2

==

0

{ (v[n/

2

] + v[n/

2

+

1

])/

2

} else { v[n/

2

+

1

] } } );

Grouping

The group clause can be used to group elements by key into partitions. After grouping, it is possible to apply a partition-wise transformation.

val a = [{k:

1

,v:

2

}, {k:

2

,v:

2

}, {k:

2

,v:

2

}]; val b = from x in a group x.k; val c = from x in a group x.k compute average of x.v; # Equivalent to: # val b = a._group(fun(x) = x.k); # val d = a._group(fun(x) = x.k, _compute: sum, _of: fun(v) = x.v); assert(b == [{k:

1

,v:[

2

]},{k:

2

,v:[

2

,

2

]}]); assert(d == [{k:

1

,sum:

2

},{k:

2

,sum:

4

}]);

Windowing

The window clause can be used to slice data collections into multiple, possibly overlapping, partitions. Like group, a transformation can be applied per-partition. The after clause starts the window at an offset, and the every clause specifies the slide of a sliding window.

val a = [{t:

2022

-

01

-01T00:

00

:

01

,v:

1

}, {t:

2022

-

01

-01T00:

00

:

04

,v:

2

}, {t:

2022

-

01

-01T00:

00

:

09

,v:

3

}]; val b = from x in a window 5s on x.t; val e = from x in a window 5s on x.t compute count; val c = from x in a window 5s on x.t after

00

:

00

:

02

; val d = from x in a window 5s on x.t every 3s; # Equivalent to: # val b = a._window(_length: 5s); # val b = a._window(_length: 5s, _on: fun(x): x.t, _compute: count); # val c = a._window(_length: 5s, _on: fun(x): x.t, _after:

2020

-

01

-01T00:

00

:

02

); # val d = a._window(_length: 5s, _on: fun(x): x.t, _every: 3s); assert(b == [{t:

2020

-

01

-01T00:

00

:

05

,v:[

1

,

2

]}, {t:

2020

-

01

-01T00:

00

:

10

,v:[

3

]}]); assert(e == [{t:

2020

-

01

-01T00:

00

:

05

,count:

2

}, {t:

2020

-

01

-01T00:

00

:

10

,count:

1

}]); assert(c == [{t:

2020

-

01

-01T00:

00

:

07

,v:[

1

,

2

]}, {t:

2020

-

01

-01T00:

00

:

12

,v:[

3

]}]); assert(d == [{t:

2020

-

01

-01T00:

00

:

05

,v:[

1

,

2

]}, {t:

2020

-

01

-01T00:

00

:

08

,v:[

2

]}, {t:

2020

-

01

-01T00:

00

:

11

,v:[

3

]}]);

Joining

The join clause can be used to join multiple data collections together whose records satisfy a common predicate. It is logically equivalent to a Cartesian-product between the two collections followed by a selection.

val a = [{id:

0

,name:

"Tim"

}, {id:

1

,name:

"Bob"

}]; val b = [{id:

0

,age:

30

}, {id:

1

,age:

100

}]; val c = from x in a join y in b on x.id == y.id; # Equivalent to # val c = a._join(b, fun(x,y) = x.id == y.id); assert(c == [{a:{id:

0

,name:

"Tim"

},b:{id:

0

,age:

30

}}, {a:{id:

1

,name:

"Bob"

},b:{id:

1

,age:

100

}}]);

Windowed Joins

Joins require at least one of their input collections to be finite. A join operation over two streams must be followed by a window operation.

val a = [{id:

0

,name:

"Tim"

,t:

2020

-

01

-01T00:

00

:

01

}, {id:

1

,name:

"Bob"

,t:

2020

-

01

-01T00:

00

:

09

}]; val b = [{id:

0

,age:

30

,t:

2020

-

01

-01T00:

00

:

02

}, {id:

1

,age:

100

,t:

2020

-

01

-01T00:

00

:

04

}]; val c = from x in a join y in b on x.id == y.id window 5s on x.t, y.t; assert(c == [{a:{id:

0

,name:

"Tim"

,t:

2020

-

01

-01T00:

00

:

01

}, b:{id:

0

,age:

30

,t:

2020

-

01

-01T00:

00

:

02

}}]);

Splitting

The split operator can be used to split a collection into two collections.

val a = [Ok(

1

), Err(

2

), Ok(

3

)]; val (b, c) = from a split error; val (d, e) = from a split with fun(x) = match x { Ok(v) => Either::L(v), Err(v) => Either::R(v), }; # Equivalent to: # val (b, c) = a._split(_with: split_error); # val (d, e) = a._split(_with: fun(x) = match x { # Ok(v) => Either::L(v), # Err(v) => Either::R(v), # }); assert(b == [

1

,

3

] and c == [

2

]); assert(d == [

1

,

3

] and e == [

2

]);

Examples

In this section we describe examples of how to use Arc-Lang.

Word Count

Word count is a simple way to count the number of words in a text. The code for calculating a wordcount is as follows.

{{#include ../../../arc-lang/examples/wordcount.arc:example}}

TF-IDF

The formula that is used to compute tf-idf is defined as follows:

  • tf-idf(t, d) = tf(t, d) * idf(t)
    • t is a term
    • d is a document in a document set
    • idf(t) = log [n/df(t)] + 1 is the inverse document frequency
      • n is the total number of documents in the document set
      • df(t) is the document frequency of t
        • i.e., number of documents containing the term t
      • tf(t, d) is the term frequency of t in d
        • i.e., number of occurrences of the term t within the document d
      • 1 is added so that terms which occur in all documents will not be entirely ignored.

Sensor Data Aggregation

Deduplication

A common problem in data intensive computing is to deduplicate events of data streams [kleppmann-2017]. As an example, a login system must ensure that all usernames uniquely identify user accounts. This can be implemented as follows.

{{#include ../../../arc-lang/examples/deduplication.arc:example}}

Fraud Detection

This problem is about identifying fraudulent transactions.

Credit card fraud is a growing concern in the digital age. Criminals steal credit card numbers by running scams or hacking into insecure systems. Stolen numbers are tested by making one or more small purchases, often for a dollar or less. If that works, they then make more significant purchases to get items they can sell or keep for themselves. Flink Tutorial.

{{#include ../../../arc-lang/examples/fraud-detection.arc:example}}

Command Line Interface

Language Reference

The syntax of Arc-Lang is formalized using a Regex-based variation of the BNF grammar where:

  • + and * denote repetition.
  • ? is for optional rules.
  • ( ) indicates grouping.
  • | is for alternation.
  • [ ] for is character-alternation (e.g., [abc]).
  • - is for ranges (e.g., [a-zA-Z]).
  • . is for matching any character.
  • \ is for escaping characters.
  • Non-terminals are written as uppercase (e.g., Expr).
  • Terminals are written in blue text (e.g., )

Programs

An Arc-Lang program is a sequence of items. Programs start their execution from a main function.

Program ::= Item*

Source files are suffixed with the .arc file extension.

Examples

{{#include ../../../arc-lang/examples/program.arc:example}}

Names

A name is a symbolic identifier which can be assigned to a variable or item.

Name ::= [a-zA-Z_][a-zA-Z_0-9]*

Examples

{{#include ../../../arc-lang/examples/names.arc:example}}

Paths

A path is a list of names which point to an item in the module hierarchy.

Path ::= ::? Name::* Name

Paths can be both relative and absolute. Absolute paths are absolute with respect to the root of the module hierarchy. Relative paths are relative with respect to the current namespace.

mod1::mod2::MyItem   # Relative path
::mod1::mod2::MyItem # Absolute path

Examples

{{#include ../../../arc-lang/examples/paths.arc:example}}

Values

A value is the result of evaluating an expression.

Value ::=
  | { (Name : Value),+ }              Record-value
  | ( Value,+ )                        Tuple-value
  | fun ( (Pattern : Type),+ ) : Expr  Lambda-value
  | Value? .. (=? Value)?              Range-value
  | Path                               Item-value
  | Path ( Value )                     Enum-value
  | Literal

Literal ::=
  | true | false        Literal boolean
  | unit                Literal unit
  | '[^']'              Literal character
  | "[^"]*"             Literal string
  | [1-9][0-9]*Name?     Literal integer with optional suffix
  | [1-9][0-9]*.[0-9]*Name? Literal floating point with optional suffix
  | DateTime
  | Duration

DateTime ::=
  | [0-9]+-[0-9]+-[0-9]+                                         Date
  | [0-9]+-[0-9]+-[0-9]+T[0-9]+:[0-9]+:[0-9]+                    Date + Time
  | [0-9]+-[0-9]+-[0-9]+T[0-9]+:[0-9]+:[0-9]+(+|-)[0-9]+:[0-9]+  Date + Time + Zone

Duration ::= 
  | [0-9]+ns  Nanosecond
  | [0-9]+us  Microsecond
  | [0-9]+ms  Millisecond
  | [0-9]+s   Second
  | [0-9]+m   Minute
  | [0-9]+h   Hour
  | [0-9]+d   Day
  | [0-9]+w   Week

Examples

Some examples of different values:

{{#include ../../../arc-lang/examples/values.arc:example}}

Types

All expressions in Arc-Lang have a statically inferred data type. Types of items and variables can be inferred, and thus do not need to be annotated unless desired.

Type ::=
  | { (Name : Type),* (| Type)? }       Record-type
  | < (Name : Type),* (| Type)? >       Enum-type
  | ( Type,* )                          Tuple-type
  | fun ( Type,* ) : Type (~ Effects)?  Function-type (with optional effects)
  | [ Type ]                            Array-type
  | Type ?                              Option-type
  | Path ([ Type,+ ])?                  Nominal-type (with optional type parameters)

Examples

Some examples of different types:

{{#include ../../../arc-lang/examples/types.arc:example}}

Standard types

The following types are provided in the standard library of Arc-Lang:


Nominal types

Nominal types are context-sensitive. They could either be type aliases, extern types, or generics.

Effects

Blocks

A block is a sequence of statements optionally terminated by an expression which the block evaluates into. If no expression is specified, then the block evaluates into unit.

Block ::= { Stmt* Expr? }

Examples

Expression-blocks are represented by the do keyword.

{{#include ../../../arc-lang/examples/blocks.arc:example}}

Statements

A statement is either a variable assignment or an expression whose result is ignored.

Stmt ::=
  | ;
  | Expr ;
  | Assign

Expressions

An expression is syntactic construct which can be evaluated into a value.

Expr ::=
  | Name                Variable reference
  | Path (:: TypeArgs)  Item reference
  | Value               Value literal
  | Query               Queries
  | _                   Placeholder
  | new Name (: Expr)?             Variant-construction
  | { ExprRecordField,* (| Expr)? }  Record-construction
  | dyn { ExprRecordField,* (| Expr)? }  Dynamically-typed record
  | [ Expr,* (| Expr)? ]           Array-construction
  | Expr? .. (=? Expr)?            Range-construction
  | fun Params (: Type)? = Expr    Lambda-function construction
  | Expr BinOp Expr   Binary operation
  | Expr UnOp         Unary operator
  | Expr ( Expr,* )   Function call
  | Expr . Name       Field projection
  | Expr . [0-9]+     Index projection
  | Expr . { Name,+ } Fields projection
  | Expr . ( ([0-9]+),+ ) Indexes projection
  | Expr : Type      Type annotated
  | if Expr Block (else Block)?            If-else-expression
  | match Expr Arms                        Match-expression
  | for Pattern in Expr Block              For-loop
  | while Expr Block                       While-loop
  | loop Block                             Infinite loop
  | break Expr? | continue | return Expr?  Jumps
  | try Expr catch Arms (finally Expr)?    Exceptions

UnOp ::=
  | -    Arithmetic negation
  | not  Logical negation

BinOp ::=
  | +   | -  | *   | /    | **  | %     Arithmetic
  | ==  | != | <   | >    | <=  | >=    Equality and comparison
  | and | or | xor | band | bor | bxor  Logical and bitwise
  | in  | not in                              Contains

ExprRecordField ::=
  | Name : Expr
  | Expr
  | Expr . Name

Operators

Operators are defined as follows, with precedence from highest to lowest:

OperatorArityAffixAssociativityOverloadable?
return breakUnaryPrefix*No
fun task onUnaryPrefixNo
= ! += -= %= *= /= **=BinaryInfixNoneNo
in not inBinaryInfixLeftNo
.. ..=BinaryInfixNoneNo
and or xor bor band bxorBinaryInfixLeftYes
== !=BinaryInfixNoneNo
< > <= >=BinaryInfixNoneNo
- + %BinaryInfixLeftYes
* /BinaryInfixLeftYes
**BinaryInfixRightYes
not -UnaryPrefixYes
asBinaryInfixLeftNo
(exprs) [exprs]UnaryPostfixNo
.index .name .name(exprs) .name[exprs]UnaryPostfixNo
Primary expressionsNullaryNo

(*) Operand is optional.

Builtin Functions

The builtin functions of Arc-Lang are listed here.


Examples

Basic function calls

{{#include ../../../arc-lang/examples/basic.arc:example}}

Lambda functions

{{#include ../../../arc-lang/examples/lambda.arc:example}}

Binary operators

{{#include ../../../arc-lang/examples/binops.arc:example}}

Placeholders

An expression-argument such as _ + _ desugars into a lambda function fun(x0, x1): x0 + x1

{{#include ../../../arc-lang/examples/placeholder.arc:example}}

Binary operator lifting

Binary operators can be lifted into functions.

{{#include ../../../arc-lang/examples/binopref.arc:example}}

String interpolation

String interpolation is supported using the $ and ${} syntax.

{{#include ../../../arc-lang/examples/interpolate.arc:example}}

Query

{{#include ../../../arc-lang/examples/query.arc:implicit}}

Patterns

Patterns can be used to deconstruct values and bind their constituents with variables.

Pattern ::=
  | Name                       Variable binding
  | Value                      Value comparison
  | Pattern : Type         Type annotated
  | case Path Pattern?         Variant deconstruction
  | { (Name (: Pattern)?),* }  Record deconstruction
  | ( Pattern , Pattern,* )    Tuple deconstruction
  | Pattern? .. (=? Pattern)?  Range deconstruction
  | Pattern or Pattern         Alternation

Examples

Tuples

{{#include ../../../arc-lang/examples/tuples.arc:patterns}}

Records

{{#include ../../../arc-lang/examples/records.arc:patterns}}

Enums

{{#include ../../../arc-lang/examples/enums.arc:patterns}}

Arrays

{{#include ../../../arc-lang/examples/arrays.arc:patterns}}

Queries

Query ::= from (Pattern in Expr),+ { QueryStmt+ }

QueryStmt ::=
  | select Expr (as Name)?                Select
  | where Expr                            Filter
  | join (Pattern in Expr),+ on Expr      Join
  | group Expr,+ (as Name)?               Partition
  | compute Expr (of Expr)? (as Name)?    Aggregation
  | order Expr desc?                      Ordering
  | window Expr as Name { WindowLogic+ }  Window
  | into Expr                             Piping

WindowLogic ::= WindowKind compute Expr (of Expr)? (as Name)?

WindowKind ::=
    | length Expr (step Expr)?  Tumbling and sliding window
    | count Expr                Count window

Items

An item is a named top-level definition which can be referenced by path. The ordering of how items are defined in the source code insignificant. Items can be prefixed by annotations for configuration.

Item ::=
  | Annots? Global      Global immutable value
  | Annots? Def         Function definition
  | Annots? TypeAlias   Type alias
  | Annots? ExternDef   Extern function declaration
  | Annots? ExternType  Extern type declaration
  | Annots? Use         Import
  | Annots? Module      Module
  | Annots? Query       Query

Type Aliases

A type alias is a purely cosmetic alias of a type.

TypeAlias ::= type Name Generics? = Type Where? ;

Example

The following code defines type aliases for representing lines on a two-dimensional plane, and a function for calculating the length of a line.

{{#include ../../../arc-lang/examples/type-alias.arc:example}}

Type Classes

Type classes are written with the class keyword.

TypeClass ::= class Name Generics? { Decl,+ }

Decl ::= def Name Generics? ( Type,+ ) ;

Examples

{{#include ../../../arc-lang/examples/type-class.arc:class}}

Type Class Instances

Type class instances are written with the instance keyword.

Instance ::= instance Generics Path TypeArgs? Where? { Def,+ }

Examples

{{#include ../../../arc-lang/examples/type-class.arc:instance}}

Extern Functions

An extern function is a function-declaration whose implementation is defined externally, outside of Arc-Lang, inside Rust. Extern functions can be annotated with effects that indicate side effects that occur when they are called.

ExternDef ::= extern def Name Generics? ( Type,* ) : Type (~ Effects)? Where? ;

Examples

{{#include ../../../arc-lang/stdlib/std.arc:string}}

Extern types

An extern type is an abstract data type with methods whose implementations are defined in Rust.

ExternType ::= extern type Name Generics? Where? ;

Example

The following code shows how to define an extern type String in arc-lang.

{{#include ../../../arc-lang/stdlib/std.arc:array}}

Uses

A use-item imports a name into the current namespace and optionally aliases it.

Use ::= use Path (as Name)?;

Examples

The following code creates a Person-type and an alias and use it as a Human.

{{#include ../../../arc-lang/examples/uses.arc:example}}

Modules

A module is a named unit of encapsulation which may contain items and other modules.

Module ::= mod Name { Item* }

A currently unsupported but planned feature is to allow modules to form a hierarchy through nesting. This hierarchy is in addition tied to the file system hierarchy (similar to Rust):

my-project/
  src/
    main.arc     # :: (root module)
    foo/
      mod.arc    # ::foo
      bar/
        mod.arc  # ::foo::bar
    baz/
      mod.arc    # ::foo::baz

Examples

{{#include ../../../arc-lang/examples/modules.arc:example}}

Body

The body of a function or task can be written either inline using = syntax or directly as a block.

Body ::=
  | = Expr ;
  | Block

Examples

{{#include ../../../arc-lang/examples/body.arc:example}}

Arms

Arms are used for conditional branching inside match and try expressions.

Arms ::= { Arm,* }

Arm ::=
  | Expr (if Expr)? => Expr

Sinks

Sinks are named parameters into which tasks can emit their output.

Sinks ::= ( Sink,+ )
Sink ::= Name (: Type)?

Parameters

Parameters are used in function and task definitions.

Params ::= ( Param,+ )

Param ::= Pattern

Examples

{{#include ../../../arc-lang/examples/params.arc:example}}

Generics

Items can be parameterised by generic types. Generics can in addition be bounded by type class constraints using a where clause.

Generics ::= [ Generic,+ ]

Generic ::= Name

Examples

Explicit generic function

{{#include ../../../arc-lang/examples/generic-function.arc:example}}

Inferred generic function

{{#include ../../../arc-lang/examples/inferred.arc:example}}

Type Arguments

Type arguments use bracket-syntax as separators.

TypeArgs ::= [ Type,+ ]

Assignments

Variables can be assigned as mutable with var and immutable with val.

Assign ::=
 | val Pattern (: Type)? = Expr ;
 | var Name (: Type)? = Expr ;

Examples

{{#include ../../../arc-lang/examples/assign.arc:example}}

Annotations

Annotations can optionally be attached to items.

Annots ::= @ { Annot,* }

Annot ::= Name (: Value)?

Examples

{{#include ../../../arc-lang/stdlib/std.arc:unit}}

Standard library

The standard library is defined in includes

Strings

Arrays

An array is a collection of values which are stored contiguously.


Arrays are indexable, and may grow and shrink in size. The functions supported for arrays are listed as follows:


Syntactic sugar

{{#include ../../../arc-lang/examples/arrays.arc:example}}

DataFrames

DataFrames store data in rows and columns, where cells are scalars. The idea in Arc-Lang is to build up DataFrames incrementally.

extern type DataFrame;
extern def df(): DataFrame;
extern def append[T: BaseRow](DataFrame, T);
extern def head(DataFrame, i32): DataFrame;
extern def tail(DataFrame, i32): DataFrame;
extern def take(DataFrame, i32): DataFrame;
extern def join(DataFrame, DataFrame, [String], [String]): DataFrame;
val df = DataFrame(
  {name: 

"John"

, age:

30

, city:

"New York"

}, {name:

"Jane"

, age:

22

, city:

"San Francisco"

}, {name:

"Joe"

, age:

49

, city:

"Boston"

}, {name:

"Jill"

, age:

35

, city:

"New York"

}, ); df.append({name:

"Mary"

, age:

12

}); val df = DataFrame({

"name"

: [

"John"

,

"Jane"

,

"Joe"

,

"Jill"

],

"age"

: [

30

,

22

,

49

,

35

],

"city"

: [

"New York"

,

"San Francisco"

,

"Boston"

,

"New York"

]} ); df.append({

"name"

:

"Mary"

,

"age"

:

12

}); extern type Series; extern def df(data: Array[Series]?): DataFrame; extern def series[T: Scalar](name: String, data: Array[T]); extern def Series[T: Scalar](name: String, values: Array[T]): Series; extern def __index__(df: DataFrame, col: i32): Array[String]; extern def ds(): DataSet; extern def append[T](DataSet[T], T); extern def head[T](DataSet[T], i32): DataSet[T]; extern def tail[T](DataSet[T], i32): DataSet[T]; extern def take[T](DataSet[T], i32): DataSet[T]; extern def join_df[A, B](DataSet[A], DataSet[B], [String], [String]): DataSet[A ++ B];

A series is a named one-dimensional array of data representing a column in a DataFrame.

val obj = {

"name"

:

"John"

,

"age"

:

30

,

"city"

:

"New York"

}; val name = obj[

"name"

];

Channels

Development

Below are instructions for setting up a custom runner, running in a Docker container, for GitHub actions.

# Setup docker

docker pull ubuntu:latest
docker run -i -t ubuntu:latest /bin/bash

# Setup user

passwd # change root password
adduser arc-runner sudo
su -l arc-runner

# Install apt dependencies

sudo apt install software-properties-common
sudo add-apt-repository ppa:git-core/ppa -y
sudo apt update && apt upgrade -y
sudo apt install -y git vim curl z3 libz3-dev curl libssl-dev gcc pkg-config make ninja-build zip openjdk-8-jdk software-properties-common texlive-xetex latexmk gettext ccache cmake

sudo ln -s /usr/bin/clang-16 /usr/local/bin/clang
sudo ln -s /usr/bin/clang++-16 /usr/local/bin/clang

# Install Rust

curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
echo 'source $HOME/.cargo/env' >> ~/.bashrc
cargo install mdbook

# Install GitHub Actions Runner

# Follow this tutorial (Which generates a unique token):
#     https://github.com/cda-group/arc/settings/actions/runners/new?arch=x64&os=linux

# Setup Runner

./run.sh &
disown <PID>

# To exit the container: <C-p><C-q>

# [OPTIONAL] Check that everything builds

cd ~
git clone https://github.com/cda-group/arc
cd ~/arc
git checkout mlir
git submodule update --init --recursive

# Check if arc-lang builds

cd ~/arc/arc-lang
cargo check --all-features --tests --bins --examples --benches

# Check if arc-mlir builds

cd ~/arc/arc-mlir/
./arc-mlir-build

Arc-Lang Reference

Language Reference

This section gives a definition of Arc-MLIR's grammar.

Programs

Program ::= mod Path { Item,* }

Names

A name is a symbolic identifier which can be assigned to a variable or item.

Name ::= [a-zA-Z_][a-zA-Z_0-9]*

Operations

This section describes the operations of Arc-MLIR.

Operation ::=
  | Arc
  | Std
  | Deprecated

Arc ::=
  | arc.keep ( Value ) : ( Type ) -> ()
  | arc.tan Value : Type
  | arc.acos Value : Type
  | arc.asin Value : Type
  | arc.cosh Value : Type
  | arc.sinh Value : Type
  | arc.erf Value : Type
  | arc.make_struct ( Value,* : Type,* ) : Type
  | arc.make_enum Value ( Value : Type ) : Type
  | arc.if ( Value ) ( Block, Block )
  | arc.loop.break ( Value,+ ) : ( Type,+ ) -> ()
  | arc.adt_constant String : Type
  | arc.constant Literal : Type
  | arc.cmpi Cmp, Value, Value : Type
  | arc.receive ( Value ) : ( Type ) -> Type
  | arc.select Value, Value, Value : Type
  | arc.send ( Value, Value ) -> ()
  | arc.enum_access String in ( Value : Type ) : Type
  | arc.enum_check ( Value : Type ) is String : Type
  | arc.struct_access ( Value ) { field = String } : ( Type ) -> Type
  | arc.addi Value, Value : Type
  | arc.and Value, Value : Type
  | arc.divi Value, Value : Type
  | arc.or Value, Value : Type
  | arc.muli Value, Value : Type
  | arc.subi Value, Value : Type
  | arc.remi Value, Value : Type
  | arc.xor Value, Value : Type
  | arc.panic () (msg = String)? : () -> ()

Std ::=
  | call Path ( Value,+ ) : ( Type,+ ) -> Type
  | call_indirect Value ( Value,+ ) : ( Type,+ ) -> Type
  | return ( Value ) : ( Type ) -> ()

Deprecated ::= Deprecated and unused operations
  | arc.emit ( Value, Value ) -> ( Type, Type ) -> ()
  | arc.make_vector ( Value,* ) : (Type,*) -> Type
  | arc.make_tuple ( Value,* ) : ( Type,* ) -> Type
  | arc.make_tensor ( Value,* ) : ( Type,* )
  | arc.index_tuple ( Value ) { index = Int } : ( Type ) -> Type
  | arc.make_appender () : () -> Type
  | arc.merge ( Value, Value ) : ( Type,* ) -> Type
  | arc.result ( Value ) : ( Type,* ) -> Type
  | arc.appender_push ( Value, Value ) : ( Type, Type ) -> ()
  | arc.appender_fold ( Value, Value ) : ( Type, Type ) -> Type
  | arc.map_contains ( Value, Value ) : ( Type, Type ) -> Type
  | arc.map_get ( Value, Value ) : ( Type, Type ) -> Type
  | arc.map_insert ( Value, Value, Value ) : ( Type, Type, Type ) -> Type
  | arc.map_remove ( Value, Value ) : ( Type, Type ) -> Type
  | arc.value_write ( Value, Value ) : ( Type, Type ) -> Type
  | arc.value_read ( Value ) : ( Type ) -> Type

Int ::= [1-9][0-9]*.[0-9]*

String ::= "[^"]*"

Literal ::=
  | Int
  | String

Cmp ::= eq | ne | lt | le | gt | ge

Builtin functions


Types

Unlike Arc-Lang, Arc-MLIR has no nominal types.

Type ::=
  | !arc.struct < (Name : Type),* >
  | !arc.enum < (Name : Type),* >
  | !arc.adt < String >
  | !arc.stream < Type >
  | f32 | f64 | i32 | i64 | si32 | si64
  | ()
  | Deprecated

Deprecated ::= Deprecated and unused types
  | !arc.arcon.value < Type >
  | !arc.arcon.appender < Type >
  | !arc.arcon.map < Type, Type >

String ::= "[^"]*"

Values

Value ::= % Name

Blocks

A block is a sequence of operations.

Block ::= { Operation* }

Items

Item ::=
  | Func

Functions

Func ::= func Path ( Params ) -> Type Block

Parameters

Params ::= Param,*

Param ::= Value : Type

Arc-Sys Reference

Arc-Sys is a distributed system for executing Arc-Lang programs that focuses on flexibility, portability, native performance, and scalability. Arc-Sys is divided into a set of components:

Orchestration  Client <--> Coordinator <--> Worker
    Layer      (Rust)        (Rust)         (Rust)
                 ^             ^              ^
                 |             |              |
                 v             v              v
  Execution   Arc-Lang      Arc-MLIR       Dataflow
    Layer     Compiler      Compiler       Executor
              (Ocaml)        (C++)          (Rust)
The workflow is as follows:

Users write Arc-Lang programs on their local machine and execute them using a client program. The client communicates with a local Arc-Lang process using pipes to compile and interpret the program. During interpretation, the interpreter may construct dataflow graphs that should be executed on a distributed system. These logical dataflow graphs and their UDFs are holistically represented as Arc-MLIR code and sent back to the client process using standard output. The client process forwards the code to a coordinator process using TCP. The coordinator process uses an Arc-MLIR compiler to compile the MLIR source into a Rust module representing the UDFs, and a JSON object representing an optimised logical dataflow graph. The coordinator then compiles the logical graph into a physical graph mapped to specific hardware threads and sockets in a cluster of workers.

Arc-MLIR and Arc-Runtime share different responsibilities in making Arc-Lang programs execute efficiently. Arc-MLIR supports ahead-of-time standard compiler optimisations as well as logical and physical optimisations of streaming-relational operators. However, Arc-MLIR makes no assumptions of where programs will execute and what the input data will be. That is, all Arc-MLIR optimisations are based on the source code itself.

Arc-Runtime supports runtime optimisations which might rely on information that is only known during execution. Streaming programs are expected to run for an indefinite duration and must therefore be able to adapt to changes in their environment. Among its responsibilities, Arc-Runtime must therefore be able to take care of specialisation, scheduling, and scaling decisions.

Execution Model

Arc-Runtime is implemented on top of the Kompact framework. To understand how Arc-Runtime executes,

Bibliography

[kleppmann-2017] - Kleppmann, Martin - Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. - 2017. -

Summary/Abstract

N/A