Types
Native CMake is a weakly typed language where all values are strings, and, in
certain circumstances, certain strings are interpreted as being of another type.
A common example is when a string is used as an argument to CMake’s if
statement, where the string is implicitly cast to a boolean. In practice, this
weak typing can lead to subtle, hard-to-detect errors. CMakePPLang implements
strong-typing in order to avoid/catch such errors.
The following sections describe the types recognized by CMake and CMakePPLang. Subsections describe the individual types in more detail (the type literal associated with the subsection’s type is listed in parentheses after the type’s name).
CMake Types
To our knowledge, CMake does not provide a list of recognized types. CMake does, in some contexts, recognize a string as one of the following types: boolean, command, file path, floating-point number, generator expression, integer, or target.
Boolean (bool)
Any CMake string which case-insensitively matches one of the established boolean literals is deemed to be a boolean. The recognized boolean literals adopted by CMakePPLang are a subset of those recognized by CMake itself. The list of CMakePPLang boolean literals are:
True boolean literals:
ON
,YES
,TRUE
, andY
False boolean literals:
OFF
,NO
,FALSE
,N
,IGNORE
,NOTFOUND
, and string literals ending in-NOTFOUND
CMake’s documentation additionally lists integer literals as being boolean
literals; however, CMakePPLang adopts the philosophy that 0
and
1
are integer literals.
Command (fxn)
CMake is able to determine if a string is actually the name of an invocable
CMake command. The name is case-insensitive and is considered to define a
command if it is the name of a built-in CMake function (e.g.,
add_subdirectory
) or a user-defined macro/function. The type “Command” has
been chosen to avoid confusion with CMake’s native function
command. It
should be noted that CMakePPLang uses name-mangling to implement
overloads, so the mangled function name is a command and not the unmangled
one. For example, if you define a CMakePPLang function a_fxn
, the literal
string "a_fxn"
is not recognized as a command (it will be a desc
).
Instead, the actual, mangled command is something like "_a_fxn_int_bool_"
(this is just an illustrative example of what the name mangling looks like, the
actual mangled name will be different, and less human-readable than this).
Note
CMake convention is that member functions of a type T
are called using a
function named T
. This creates an ambiguity; for example, is list
a
command or a type? CMakePPLang resolves this ambiguity by deciding
that list
is a type and that list(LENGTH
is the command. This
interpretation is consistent with Python (and C++) where MyClass.fxn
(MyClass::fxn
) is the function belonging to a type MyClass
. This
means that, in CMakePPLang, type and class take precedence over
function. In practice, the actual member functions have mangled names and
this is only an ambiguity for the syntactic sugar to call them.
File Path (path)
CMake has builtin support for determining whether a string is a valid file path
for a target operating system. While CMake natively has some support for
relative file paths, best CMake practice is to always use absolute file paths.
Therefore, CMakePPLang stipulates that any CMake string that can be
interpreted as an absolute file path is an object of type path
. It is
worth noting that the path does NOT need to exist in order for it to be a
file path object. Additionally, elements of the path should be separated by
/
characters regardless of the convention of any particular OS.
Floating-Point Number (float)
CMake strings which contain only a single decimal character (.
) in addition
to at least one digit in the range 0-9 are floating point numbers. Floating
point numbers may optionally be prefixed with a single minus sign character
(-
) to denote that the value is negative. Since ultimately the
floating-point value is stored as a string, it is of infinite precision.
CMake’s math
command does not support arithmetic involving floating point
values, so floating point numbers are uncommon in typical CMake scripts.
Generator Expressions (genexpr)
CMake expressions of the form $<...>
are generator expressions.
Integers (int)
Any CMake string comprised entirely of the digits 0-9 is an integer. Integers
may optionally be prefixed with a single minus sign (-
). Integers are
ultimately stored as strings and are of infinite precision. That said,
CMake’s backend is written in C++ and it is likely that passing integers into
native CMake functions will result in a loss of precision.
List (list)
In CMake, strings that contain at least one un-escaped semicolon are lists. In CMakePPLang, we avoid lists as much as possible due to the many potential gotchas associated with them. Nonetheless, particularly at interfaces with standard CMake code, the use of lists is unavoidable. It is worth noting that CMakePPLang’s definition of a list means that a single-element list will be identified as being of the type of that element. In practice, this poses no problem because we allow any single object to be implicitly cast to a single-element list in order to satisfy a function’s signature.
String (str)
This is the fundamental type that all valid CMake values share. In CMakePPLang,
the type str
functions like void*
in C/C++. If a CMakePPLang
function accepts an argument of type str
that means it accepts any valid
CMake value. It should be noted that str
is not the same thing as desc
.
In particular, all desc
are str
, but not all str
are desc
. For
example, the str
, TRUE
, is a bool
. In CMakePPLang, the
intrinsic type of an object will never be str
.
Target (target)
CMake keeps an internal list of targets. Any CMake string that CMake recognizes
as the name of a target is of the target
type.
Quasi-CMake Types
The types in this section straddle the line between being native to CMake and being part of CMakePPLang.
Description (desc)
CMake itself takes the point of view that everything is a string, so str
is
the type common to all CMake values. If we want to assert that every valid
value has one, and only one, intrinsic type, we need a type for the subset of
str
objects that are not of any other intrinsic type aside from str
.
We call this subset descriptions. Description is a catchall for any valid
CMake value which fails to meet the criteria of being another intrinsic type
(not counting str
). In practice, descriptions are usually used to name
and/or document things and tend to be human-readable. The name “description”
was chosen to avoid confusion with CMake’s fundamental string type (the use of
which is fairly common throughout CMake). Also of note, descriptions tend to
be the type of an object when there is a syntax error, for example the string
literal " 1"
is a description and not an integer because it includes
whitespace.
Type (type)
If you are going to recognize types, you need a way to express them in the code. That is what the abbreviations (e.g., bool, int) we have been introducing are for. The abbreviations are reserved strings that need to have a type associated with them, that’s what the “type” type is for (in actual code the type name “type” is far more natural and less confusing than it comes off here). More generally, a CMake string is of type “type” if it matches (case-insensitively) any of the type abbreviations in this chapter. The list of which is: bool, class, desc, float, fxn, genexpr, int, list, map, obj, path, str, target, and type.
Pointers
Pointers are an important concept in vanilla CMake, but
are loosely defined and usually not called pointers directly.
They typically show up when we are dealing with lists
or function return values. In CMakePPLang, a pointer is
a variable which dereferences to a value. In CMake, the ${...}
syntax can
be thought of as dereferencing whatever variable is in the brackets. If a
function takes a pointer to, for example, a list, then you do not pass in the
list explicitly, but rather the name of the variable which holds the list. In
code:
function(take_list_by_pointer pointer_to_list)
list(LENGTH "${pointer_to_list}" length_of_list)
message("List length: ${length_of_list}")
endfunction()
set(a_list 1 2 3)
# Meant to be called like:
take_list_by_pointer(a_list) # Prints "List length: 3"
# Not like (this passes the value of the list):
take_list_by_pointer("${a_list}") # Prints "List length: 0"
To document the type of pointer_to_list
in the above code we use the syntax
list*
, which is borrowed from C/C++ and should be read as “pointer to a
list”. If a function takes an argument of type T*
(and it
does type checking), then any value of type R*
is allowed where R
is
a type convertible to T
. So, a value of type fxn*
is allowed
to be used where str*
is expected, because a fxn
is convertible
to a str
. The inverse does not hold, because a str
might not
be a fxn
, and so you cannot use a str*
where a fxn*
is
expected.
Additionally, all pointer types are considered subtypes of desc
,
so a value of type desc
can be used wherever a pointer type is
expected. This is because a pointer in CMake is the name of an identifier,
which must follow the rules of a desc
.
If a desc
does not resolve to a variable within the
current scope, dereferencing it resolves to the empty string,
and so the original desc
can be thought of as a null pointer.
Pointers can point to any valid type, including other pointers, so
T**
is a valid type that is a pointer to a pointer to T.
It should be noted that the official CMake documentation does not differentiate well between the variable holding a list and the list itself. By introducing the concept of a pointer to CMake it becomes easier to make this distinction. In almost all circumstances, native CMake functions take pointers to lists and not the lists themselves.
CMakePPLang Types
Class (class)
The “Class” type is the type of an object which holds the specification of a
user-defined type (i.e. it’s the type of a class in the object-oriented sense
of the word). This should not be confused with the type of an instance of a
class. For example, if you declared a class MyClass
CMakePPLang would treat
the literal, case-insensitive string "MyClass"
as being of
type “Class”. Instances of the MyClass
class would be of type MyClass
.
The “class” type generalizes objects of type “Type” (and CMakePPLang
allows implicit conversion from class
to type
). The reason for the
distinction between “Class” and “Type” is that CMakePPLang needs to keep track
of additional meta-data for a “Class” (such as base classes) which is not
associated with a type. In other words, the distinction between a class and a
type is to most users immaterial (and in fact you almost always want to take
instances of type “Type” and just let classes be implicitly converted) so
ignore it if it is confusing and know that the string “class” is reserved
and can’t be used.
Map (map)
Borrowing from Python’s design, it becomes much simpler to implement objects if we have an associative array object. These arrays hold the state of the object instances. For this reason, CMakePPLang supports a map type. The map type is essentially a fundamental, built-in type. You cannot inherit from it, but you can use instances of it in your code. CMakePPLang prefers to use maps instead of lists whenever feasible, as maps can be arbitrarily nested without further consideration and tend to work cleaner than CMake’s native lists (although there is an abstraction penalty).
Object (obj)
This is the base class for all user-defined classes. In practice it works a bit
like str
in that it is usually used as the lowest-common denominator for a
function taking any object. The object type is mainly needed for writing
generic routines that operate on instances of user-defined classes and is not
expected to be of interest to most users aside from the fact that it is the
Object class which defines the default implementations for member functions like
equality comparisons.
Other Types
Particularly for documentation purposes, CMakePPLang introduces several other types. These types may only be conceptual or they may have some code support.
Tuples
Tuples are purely a documentation convenience for functions with multiple
return values. The standard way to document the return type of a function is
using the :rtype:
field which takes the type of the returned value(s) as
an argument. To get multiple returns in a single return, one wraps them in a
tuple. CMakePPLang documentation is intended to be parsed using
CMinx, which borrows this syntax
from Python and the Sphinx package. For example, (int, bool)
means the
function returns two values, the first value is of type int
and the second
value is of type bool
. As a slight aside, multiple values are returned in
CMake by having multiple arguments to a function be denoted as results.
Summary
The following UML diagram is intended to help summarize the type system of CMakePPLang.
At the top, in blue, are the types native to CMake. For all intents and purposes they all derive from a single type, String. In the middle, in red, are the types native to CMakePPLang. Native CMakePPLang types derive from String as well, with “Class” also deriving from “Type”. User-defined classes are symbolized by the green box at the bottom, all of which derive from “Object”, and may have relationships among themselves as well. Not shown are pointers. Each type has an associated pointer type, including pointer types themselves. Each pointer type is derived from “Description.”