The ScalaLab project aims to provide an efficient scientific
programming environment for the Java Virtual Machine. The scripting
language is based on the Scala programming language enhanced with high
level scientific operators and with an integrated environment that
provides a Matlab-like working style.
The scripting code
is extremely fast, close to Java (sometimes slower, sometimes faster),
and usually faster from equivalent Matlab .m scripts!
A high performance numeric linear algebra library for Scala, with rich
Matlab-like operators on vectors and matrices; a library of numerical
routines; support for plotting.
FACTORIE is a toolkit for deployable probabilistic modeling,
implemented as a software library in Scala. It provides its users with
a succinct language for creating relational factor graphs, estimating
parameters and performing inference.
Cassovary is designed from the ground up to efficiently handle graphs
with billions of edges. It comes with some common node and graph data
structures and traversal algorithms. A typical usage is to do
large-scale graph mining and analysis.
At Twitter, Cassovary forms the bottom layer of a stack that we use to
power many of our graph-based features, including "Who to Follow" and
“Similar to.” We also use it for relevance in Twitter Search and the
algorithms that determine which Promoted Products users will see. Over
time, we hope to bring more non-proprietary logic from some of those
product features into Cassovary.
Code is targeted at building aggregation systems (via Scalding or
Storm). It was originally developed as part of Scalding's Matrix API,
where Matrices had values which are elements of Monoids, Groups, or
Rings. Subsequently, it was clear that the code had broader
application within Scalding and on other projects within Twitter.
Markov chains represent stochastic processes where the probability
distribution of the next step depends non-trivially on the current
step, but does not depend on previous steps. Give this library some
training data and it will generate new random data that statistically
resembles it.
Signal/Collect is a programming model and framework for large-scale
graph processing. The model is expressive enough to concisely
formulate many iterated and data-flow algorithms on graphs, while
allowing the framework to transparently parallelize the processing.
The author of this library is currently writing a book on Probabilistic Programming using Figaro. Here is the link to the book page: Probabilistic Programming Book
Spire is a numeric library for Scala which is intended to be generic,
fast, and precise.
Using features such as specialization, macros, type classes, and
implicits, Spire works hard to defy conventional wisdom around
performance and precision trade-offs. A major goal is to allow
developers to write efficient numeric code without having to "bake in"
particular numeric representations. In most cases, generic
implementations using Spire's specialized type classes perform
identically to corresponding direct implementations.