STIX Prototyping Language
=========================

The STIX prototyping language is intended to be a simple, readable way to
express STIX object graphs.  This library can automatically create STIX content
from the language.  The language and library can be useful for creating content
for testing and experimentation.

Basic Syntax
------------

The language is composed of a sequence of statements.  Each statement is
terminated by a period, like an English sentence.  STIX domain objects and
relationships are referenced by name.  Domain objects must begin with a capital
letter and contain only letters and underscores; relationships must begin with
lower case.  In following with the STIX specification, relationship names may
contain only lowercase alphanumerics and hyphens.  They must begin with a
letter.

The simplest statement names a single SDO:

.. code::

    Identity.

To relate two objects together:

.. code::

    Malware targets Identity.

When SDOs are named this way, they have no reusable identity within the
language.  That means each use indicates a different object:

.. code::

    Attack_Pattern uses Malware.
    Malware targets Identity.

Here, the two ``Malware`` objects are different.  Object reuse may be accomplished
with other syntax.

Multiplicity
------------

Lists of objects are expressed with parentheses:

.. code::

    (Identity Location).

This silly example means the same as if the two objects were in separate
statements.  But lists can be used as sources and targets of relationships:

.. code::

    Attack_Pattern uses (Malware Tool).

This relates an attack pattern to both a ``malware`` and a ``tool`` object, via
different relationships.  It has a different meaning than if two statements
were used: both relationships share the *same* source.  If two statements had
been used, the relationships would have two different sources.  It is an
analogous situation if the list had been in the source position.

If a list is in both the source and target position, then all objects in the
source are related to all objects in the target.  This is similar to a set-
theoretic Cartesian product, or a relational join.  If there are N objects in
the source and M objects in the target, N*M relationships are created.

Counts
~~~~~~

An integer count prefix can be given, which means the same as a homogenous list:

.. code::

    2 Identity.

This means the same as ``(Identity Identity)``.  A count prefix may occur most
places an object type name is allowed.  This makes it usable in contexts where a
list is not allowed, e.g. inside another list:

.. code::

    (Malware 2 Identity).

This means one ``malware`` object and two ``identity`` objects.

Lastly, counts are allowed on relationships, which has the effect of creating
multiple parallel edges in the graph:

.. code::

    Attack_Pattern 2 uses (Malware Tool).

This relates a single ``attack-pattern`` to a ``malware`` and a ``tool``, but two
relationships each are created, for a total of four relationships.

Chaining
--------

A relationship between a source and target can be chained to another target:

.. code::

    Attack_Pattern delivers Malware targets Identity.

This represents two relationships, where the ``malware`` delivered by the
``attack-pattern`` is the same one which targets the ``identity``.  This is another
way of reusing an object.  These chains can be arbitrarily long.

Property Blocks
---------------

Property blocks are primarily used to represent embedded relationships, i.e.
those which are realized in STIX via an object property, not an SRO.  They use
a JSON object-like syntax with curly braces, positioned after the object
type name:

.. code::

    Report {
        object_refs: (Malware)
    }.

Note that a length-1 list is used because the STIX property is list-valued.

The property name must not be quoted, and the property value may be any STIX
prototyping language graph statement, including relationships and nested
property blocks.  When a more complex graph is used as a property value, it is
the top-level source objects which are assigned to the property.  In keeping
with STIX spec requirements on property names, these names may consist of
lowercase alphanumerics and underscores only.  They must begin with a letter.

String Literals
~~~~~~~~~~~~~~~

String literals are the only primitive literal type supported in the prototyping
language, and are only supported in property blocks.  The primary purpose of
string literals is to assign simple names to things, to assist people in
matching up generated STIX objects to components of language statements.  When
usage is more complex and/or generates numerous objects, it can otherwise be
difficult to understand what was generated.  Graphical visualization tools
sometimes use certain properties to create graph labels.  For example, some
objects have a "name" property, and "labels" is a common property.  

String literals are enclosed in double quotes.  Lists of literals can be
expressed with square brackets:

.. code::

    Malware {name: "Downloader"} downloads Malware {name: "Backdoor"}.

and

.. code::

    Indicator {
        labels: ["label1", "label2"]
    }.

Special Relationship Syntax
---------------------------

In order to make STIX prototyping language more English-like, some relationship
names are treated specially: ``on`` and ``of``.  These special relationships may not
have counts.

object_refs and `on`
~~~~~~~~~~~~~~~~~~~~

``on`` is a shorthand used to set the ``object_refs`` property of an object, and
may be used instead of a property block.  The statement looks like others which
represent SRO relationships, but it doesn't do that.  If you use this special
syntax on a source object, you can't also relate it to a target via a normal
SRO relationship.  You may still use a property block on the source object to
populate other properties.  For example:

.. code::

    Report on (Malware Campaign).

Sightings and `of`
~~~~~~~~~~~~~~~~~~

Sightings are a special relationship type which breaks the mold of all other
SROs.  They are ternary (relate up to three things), and don't have the usual
SRO property names.  So they don't fit with the normal infix notation of other
relationships.  A sighting statement begins with ``Sighting`` and may be followed
by ``of`` to represent the required ``sighting_of_ref`` property:

.. code::

    Sighting of Malware.

The other related objects must be represented in a property block:

.. code::

    Sighting {
        observed_data_refs: (Observed_Data),
        where_sighted_refs: 2 Location
    } of Malware.

If desired, ``sighting_of_ref`` can also be given in a property block, and the
trailing ``of`` clause omitted:

.. code::

    Sighting {
        observed_data_refs: (Observed_Data),
        where_sighted_refs: 2 Location,
        sighting_of_ref: Malware
    }.

Note that ``Sighting`` *must not* have a count prefix, or it will be interpreted
as a "normal" graph statement, not this special syntax.

Variables
---------

If other methods of object reuse won't work or are undesirable, the language
supports variables.  A variable declaration statement looks like:

.. code::

    var_a, var_b: Identity.

Variable names must be all lowercase, begin with a letter, and consist of
alphanumerics, hyphens, and underscores only.  Variables may only hold domain
objects; they may not hold relationships.

Where *used*, a variable may not have either a count or a property block.  Where
*declared*, it may have both:

.. code::

    malware_a {name: "bad malware"}: Malware.
    2 victims {name: "a victim"}: Identity.

    malware_a targets victims.

The count on a variable is given before the variable name, similar to how it is
done with domain objects and relationships in normal graph statements.  This
allows variables to hold multiple values.  The above represents a ``malware``
targeting two ``identity`` objects, the "victims".

Property blocks on variables may use other variables.  This creates dependencies
among them.  Declaration order is unimportant; the tool figures out an
appropriate initialization order automatically:

.. code::

    note {object_refs: (loc id)}: Note.
    loc: Location.
    id: Identity.

    Report on note.

A dependency cycle will cause an error.

Implementation Notes
--------------------

An obvious question to ask is what STIX object types are currently supported by
the library and what names do you use for them in the language.  The answer may
be counterintuitive, and requires some understanding of the library
architecture.

The library is composed of two components:
1. A language "processor"
2. An object generator

The first component is what understands the language and connects the objects
together.  The second component is a delegate of the first, and is responsible
for generating its objects.

So the counterintuitive answer to the question is that the language processor
has *no* hard-coded lists of STIX object names or properties.  Anything goes;
you just need to follow the lexical rules as described above.  E.g. that domain
objects start with capital letters and consist of letters and underscores,
everything else starts with lower case, etc.  STIX domain object names are
passed to the object generator, and if the latter component doesn't know how to
generate an object of that type, it will produce an error.  But that issue is
unrelated to the language itself.  You can also use any lexically legal
relationship name you want; the language processor will happily create an SRO
with that relationship type.  It knows little of the STIX specification.

Another important architectural point is that all objects generated by the
object generator, and by the language processor internally, are plain "parsed
JSON", i.e. simple Python values like dicts and lists.  It is not until the
very last step that those values are passed to the ``stix2`` library, from which
it creates the final objects which are returned.  So the latter library is a
dependency of this one.  It has its own STIX support, and does certain
compliance checks which none of the components of this library necessarily do.

The built-in object generator operates based on "specifications" contained in a
JSON data file; it doesn't have any STIX rules built into the programming.  The
advantage of all of this is that custom objects can potentially be supported
without reprogramming anything in this library at all!  (The stix2 library is a
different story though.)

So the final answer as to current STIX object support boils down to what object
types the object generator and the ``stix2`` library recognize.  The latter
library has its own documentation.  The built-in object generator in this
library recognizes the following types:

.. code::

    Attack_Pattern
    Campaign
    Course_of_Action
    Grouping
    Identity
    Indicator
    Infrastructure
    Intrusion_Set
    Location
    Malware
    Malware_Analysis
    Note
    Observed_Data
    Opinion
    Report
    Threat_Actor
    Tool
    Vulnerability
    Artifact
    Autonomous_System
    Directory
    Domain_Name
    Email_Address
    Email_Message
    File
    IPv4_Address
    IPv6_Address
    MAC_Address
    Mutex
    Network_Traffic
    Process
    Software
    URL
    User_Account
    Windows_Registry_Key
    X509_Certificate