Revision 02-06-2013
Derived from Google C++ Style GuideStanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges.
SNAP is written in the C++ programming language.
This programming guide describes a set of conventions for the SNAP
C++ code as well as the most important constructs that are used in
the code.
To see an example of SNAP programming style, see
file graph.h
.
C++ has many powerful features, but this power brings with it complexity, which can make code more error-prone and harder to read and maintain. The goal of this guide is to manage this complexity by describing the rules of writing SNAP code. These rules exist to keep the code base consistent and easier to manage while still allowing productive use of C++ features.
Code consistency is important to keep the code base manageable. It is very important that any programmer be able to look at another's code and quickly understand it. Maintaining a uniform style and following conventions means that we can more easily use "pattern-matching" to infer what various symbols are and what they do, which makes code much easier to understand. In some cases there might be good arguments for changing certain style rules, but we nonetheless keep things as they are in order to preserve consistency.
Another issue this guide addresses is that of C++ feature bloat. C++ is a huge language with many advanced features. In some cases we constrain, or even ban, use of certain features. We do this to keep code simple and to avoid the various common errors and problems that these features can cause. This guide lists these features and explains why their use is restricted.
Note that this guide is not a C++ tutorial. We assume that you are familiar with the language.
Coding style and formatting can be pretty arbitrary, but code is much easier to follow and learn if everyone uses the same style. Not everyone may agree with every aspect of the formatting rules, but it is important that all SNAP contributors follow the style rules so that we can all read and understand everyone's code easily.
We use spaces for indentation. Do not use tabs in your code. You should set your editor to emit 2 spaces when you hit the tab key.
else
keyword belongs on a new line.
if (condition) { // no spaces inside parentheses ... // 2 space indent. } else if (...) { // The else goes on the same line as the closing brace. ... } else { ... }
You must have a space between the if
and the open
parenthesis. You must also have a space between the close
parenthesis and the curly brace.
if(condition) // Bad - space missing after IF.
if (condition) { // Good - proper space after IF and before {.
Short conditional statements may be written on one line if
this enhances readability. You may use this only when the
line is brief and the statement does not use the
else
clause. Always use the curly brace:
if (x == kFoo) { return new Foo(); } if (x == kBar) { return new Bar(); }
Single-line statements without curly braces are prohibited:
if (condition) DoSomething();
In most cases, conditional or loop statements with complex conditions or statements are more readable with curly braces.
if (condition) { DoSomething(); // 2 space indent. }
while (condition) { ... // 2 space indent } for (int i = 0; i < Num; i++) { ... // 2 space indent }
while (condition); // Bad - looks like part of do/while loop.
case
blocks in switch
statements can have
curly braces or not, depending on your preference. If you do
include curly braces they should be placed as shown below.
If the condition is not an enumerated value, switch statements
should always have a default
case:
switch (var) { case 0: { // 2 space indent ... // 4 space indent break; } case 1: { ... break; } default: ... } }
The following are examples of correctly-formatted pointer and reference expressions:
x = *p; p = &x; x = r.y; x = r->y;
Note that:
*
or
&
.
When declaring a pointer variable or argument, place
the asterisk *
adjacent to the
variable name and the ampersand &
adjacent to the type:
char* C; const int& P;
if ((ThisOneThing > ThisOtherThing) && (AThirdThing == AFourthThing) && YetAnother && LastOne) { ... }
Function calls have the following format:
bool RetVal = DoSomething(Arg1, Arg2, Arg3);
If the arguments do not all fit on one line, they should be broken up onto multiple lines. Do not add spaces after the open paren or before the close paren:
DoSomethingThatRequiresALongFunctionName(Argument1, Argument2, Argument3, Argument4);
If the parameter names are very long and there is not much space left due to line indentation, you may place all arguments on subsequent lines:
DoSomethingElseThatRequiresAEvenLongerFunctionName( Argument1, Argument2, Argument3, Argument4);
Functions look like this:
ReturnType ClassName::FunctionName(Type ParName1, Type ParName2) { DoSomething(); ... }
If you have too much text to fit on one line, split the code over several lines:
ReturnType ClassName::ReallyLongFunctionName(Type ParName1, Type ParName2, Type ParName3) { DoSomething(); ... }
Some points to note:
return
expression with
parentheses. Parentheses are ok to make a complex expression more readable:
return Result; // No parentheses in the simple case. return (SomeLongCondition && // Parentheses ok to make a complex AnotherCondition); // expression more readable.
class TMyClass : public TOtherClass { public: typedef TMyClass TDef; // typedefs typedef enum { meOne, meTwo, ... } TMyEnum; // enums public: class TPubClass1 { // public subclasses ... } private: class TPriClass2 { // private subclasses ... } private: TInt Var; // private data ... private: TInt GetTmpData(); // private methods ... public: TMyClass(); // constructors ... int SetStats(const int N); // public methods ... friend class TMyOtherClass; // friends };
Each public SNAP class must define the following methods:
a default constructor, a copy constructor,
a TSIn
constructor,
a Load()
method,
a Save()
method and
an assignment operator =
:
class TMyClass : public TOtherClass { ... public: TMyClass(); // default constructor explicit TMyClass(int Var); // an explicit constructor (optional) TMyClass(const TMyClass& MCVar); // copy constructor TMyClass(TSIn& SIn); // TSIn constructor void Load(TSIn& SIn); // Load() method void Save(TSOut& SOut) const; // Save() method TMyClass& operator = (const TMyClass& MCVar); // '=' operator ... int GetVar() const; // get value of Var int SetVar(const int N); // set value of Var ... };
Make data members private
and provide access to
them through Get...()
and Set...()
methods.
More complex classes with support for "smart" pointers have additional requirements. See Smart Pointers for details.
For a class format example in the SNAP code, see
file graph.h:TUNGraph
.
MyClass::MyClass(int Var) : SomeVar(Var), SomeOtherVar(Var + 1) { }
Make sure that the values in the list are listed in the same order in which they are declared. A list order that is different than the declaration order can produce errors that are hard to find.
An example of template forward definition (see alg.h
for more):
template <class PGraph> int GetMxDegNId(const PGraph& Graph);
The corresponding template implementation is:
template <class PGraph> int GetMxDegNId(const PGraph& Graph) { ... }
This is more a principle than a rule: don't use blank lines when you don't have to. In particular, don't put more than one blank line between functions, resist starting functions with a blank line, don't end functions with a blank line, and be discriminating with your use of blank lines inside functions.
The basic principle is: The more code that fits on one screen, the easier it is to follow and understand the control flow of the program. Of course, readability can suffer from code being too dense as well as too spread out, so use your judgement. But in general, minimize use of vertical whitespace.
In certain
cases it is appropriate to include non-ASCII characters in your code.
For example, if your code parses data files from foreign sources,
it may be appropriate to hard-code the non-ASCII string(s) used in
those data files as delimiters.
In such cases, you should use UTF-8, since this encoding
is understood by most tools able to handle more than just ASCII.
Hex encoding is also OK, and encouraged where it enhances
readability — for example, "\xEF\xBB\xBF"
is the
Unicode zero-width no-break space character, which would be
invisible if included in the source as straight UTF-8.
SNAP code uses a range of conventions to name entities. It is important to follow these conventions in your code to keep the code compact and consistent.
Type and variable names should typically be nouns,
ErrCnt
.
Function names should typically be "command" verbs,
OpenFile()
.
SNAP code uses an extensive list of abbreviations, which make the code easy to understand once you get familiar with them:
T...
: a type (TInt
).P...
: a smart pointer (PUNGraph
)....V
: a vector (variable InNIdV
)....VV
: a matrix (variable FltVV
, type TFltVV
with floating point elements)....H
: a hash table (variable NodeH
, type TIntStrH
with Int keys, Str values)....HH
: a hash of hashes (variable NodeHH
, type TIntIntHH
with Int key 1 and Int key 2)....I
: an iterator (NodeI
)....Pt
: an address pointer, used rarely (NetPt
).Get...
: an access method (GetDeg()
).Set...
: a set method (SetXYLabel()
)....Int
: an integer operation (GetValInt()
)....Flt
: a floating point operation (GetValFlt()
)....Str
: a string operation (DateStr()
).Id
: an identifier (GetUId()
).NId
: a node identifier (GetNIdV()
).EId
: an edge identifier (GetEIdV()
).Nbr
: a neighbour (GetNbrNId()
).Deg
: a node degree (GetOutDeg()
).Src
: a source node (GetSrcNId()
).Dst
: a destination node (GetDstNId()
).Err
: an error (AvgAbsErr
).Cnt
: a counter (LinksCnt
).Mx
: a maximum (GetMxWcc()
).Mn
: a minimum (MnWrdLen
).NonZ
: a non-zero (NonZNodes
)._
) or dashes (-
).
C++ files should end in .cpp
and header files
should end in .h
.
Examples of acceptable file names:
graph.cpp bignet.h
"T"
and have a
capital letter for each new word, with no underscores:
TUNGraph
.
NIdV
. An exception for lowercase names is
the use of short index names for loop iterations, such as
i, j, k
.
Variable names should typically be nouns, ErrCnt
.
GetInNId()
.
Function names should typically be "command" verbs,
OpenFile()
.
typedef enum { srUndef, srOk, srFlood, srTimeLimit } TStopReason;
See Namespaces for a discussion about the SNAP namespaces.
#define ROUND(x) ... #define PI_ROUNDED 3.14
Comments are absolutely vital to keeping the code readable. But remember: while comments are very important, the best code is self-documenting. Giving sensible names to types and variables is much better than using obscure names that you must then explain through comments.
Comments in the source code are also used to generate reference documentation for SNAP automatically. A few simple guidelines below show how you can write comments that result in high quality reference documentation.
When writing your comments, write for your audience: the next contributor who will need to understand your code. Be generous — the next one may be you in a few months!
A brief description consists of ///
, followed by one
line of text:
/// Returns ID of the current node.
A detailed description consists of a brief description, followed by
##<tag_name>
:
/// Returns ID of NodeN-th neighboring node. ##TNodeI::GetNbrNId
Text for <tag_name>
from file
<source_file>
is placed in file
doc/<source_file>.txt
. Tag format is:
/// <tag_name> ...<detailed description> ///
For example, a detailed description for
##TNodeI::GetNbrNId
from file
snap-core/graph.h
is in file
snap-core/doc/graph.h.txt
(see these files for more examples):
/// TNodeI::GetNbrNId Range of NodeN: 0 <= NodeN < GetNbrDeg(). Since the graph is undirected GetInNId(), GetOutNId() and GetNbrNId() all give the same output. ///
Additional Documentation Commands
Snap documentation also uses the following Doxygen commands:
///<
: for comments associated with variables.@param
: for comments associated with function parameters.\c
: specifies a typewritter font for the next word.<tt>
: specifies a typewritter font for the enclosed text.More details on how to use these commands are provided in specific sections below.
//#/////////////////////////////////////////////// /// Undirected graph. ##Undirected_graph class TUNGraph { ... };
*.h
file include
a 1 line, 1 sentence long description.
The description should give use of the function:
/// Deletes node of ID \c NId from the graph. ##TUNGraph::DelNode void DelNode(const int& NId);
\c
to specify typewriter font when you
refer to variables or functions.
If the description requires more than one sentence,
which should happen often,
then create a tag ##<class>::<function>
at the end of the line
and put the remainder of the description in the
doc/*.h.txt
file.
Function Declarations
Every function declaration should have a description immediately preceding it about what the function does and how to use it. In general, the description does not provide how the function performs its task. That should be left to comments in the function definition.
Function Definitions
Each function definition should have a comment describing what the function does if there's anything tricky about how it does its job. For example, in the definition comment you might describe any coding tricks you use, give an overview of the steps you go through, or explain why you chose to implement the function in the way you did rather than using a viable alternative. If you implemented an algorithm from literature, this is a good place to provide a reference.
Note you should not just repeat the comments given
with the function declaration, in the .h
file or
wherever. It's okay to recapitulate briefly what the function
does, but the focus of the comments should be on how it does it.
Function Parameters
It is important that you comment the meaning of input parameters.
Use @param
construct in the comments to do that.
For the example of function void DelNode(const int& NId);
above,
its parameter is documented in file doc/*.h.txt
as
follows:
/// TUNGraph::DelNode @param NId Node Id to be deleted. ///
To associate a comment with a variable,
start the comment with ///<
:
TInt NFeatures; ///< Number of features per node.
Class Data Members
Each class data member (also called an instance variable or member variable) should have a comment describing what it is used for.
TODO
comments for code that is temporary, a
short-term solution, or good-enough but not perfect.
TODO
s should include the string TODO
in
all caps, followed by the name, e-mail address, or other identifier
of the person who can best provide context about the problem
referenced by the TODO
. The main purpose is to have
a consistent TODO
format that can be
searched to find the person who can provide more details upon request.
//
syntax to document the code, wherever possible:
// This line illustrates a code comment.
Smart pointers are objects that act like pointers, but automate management of the underlying memory. They are extremely useful for preventing memory leaks, and are essential for writing exception-safe code.
By convention, class names in SNAP start with letter
"T"
and their corresponding smart pointer
types have "T"
replaced with
"P"
.
In the following example, variable Graph
is defined as an undirected graph.
TUNGraph
is the base type and PUNGraph
is its corresponding
smart pointer type:
PUNGraph Graph = TUNGraph::New();
The following example shows how an undirected graph is loaded from a file:
{ TFIn FIn("input.graph"); PUNGraph Graph2 = TUNGraph::Load(FIn); }
To implement smart pointers for a new class, only a few lines need to be added to the class definition.
The original class definition:
class TUNGraph { ... };
The original class definition after smart pointers are added:
class TUNGraph; typedef TPt<TUNGraph> PUNGraph; class TUNGraph { ... private: TCRef CRef; ... public: ... static PUNGraph New(); // New() method static PUNGraph Load(TSIn& SIn); // Load() method ... friend class TPt<TUNGraph>; };
The new code declares PUNGraph
, a smart pointer type
for the original class TUNGraph
. A few new definitons
have been added to TUNGraph
:
CRef
, a reference counter for garbage collection;
New()
, a method to create an instance; and a
friend
declaration for TPt<TUNGraph>
.
The Load()
method for a smart pointer class returns
a pointer to a new object instance rather than no result,
which is the case for regular classes.
An example of definitions for the New()
and
Load()
methods for TUNGraph
are shown here:
static PUNGraph New() { return new TUNGraph(); } static PUNGraph Load(TSIn& SIn) { return PUNGraph(new TUNGraph(SIn)); }
cin
, cout
, cerr
.
For console output, use printf()
.
SNAP defined streams are:
TSIn
is an input stream.TSOut
is an output stream.TSInOut
is an input/output stream.TStdIn
is the standard input stream.TStdOut
is the standard output stream.TFIn
is a file input stream.TFOut
is a file output stream.TFInOut
is a file input/output stream.TZipIn
is a compressed file input.TZipOut
is a compressed file output.Assertion names in SNAP use the following convention for the first letter:
Assert
: compiled only in the debug mode, aborts when the assertion is false.IAssert
: always compiled, aborts if the assertion is false.EAssert
: always compiled, does not abort, but throws an exception.Some common SNAP assertions are:
Assert
verifies the condition. This is the basic assertion.AssertR
verifies the condition, provides a reason when the condition fails.SNAP also implements assertions that always fail. These are used when the program identifies a critical error, such as being out of memory. Fail assertions are:
EFailR
throws an exception with a reason.FailR
prints the reason and terminates the program.Fail
terminates the program.Examples of assertion usage:
AssertR(IsNode(NId), TStr::Fmt("NodeId %d does not exist", NId)); EFailR(TStr::Fmt("JSON Error: Unknown escape sequence: '%s'", Beg).CStr());
int
in your code. If a program needs a variable of a different
size, use one of these precise-width integer types:
int8, uint8
: signed, unsigned 8-bit integers.int16, uint16
: signed, unsigned 16-bit integers.int32, uint32
: signed, unsigned 32-bit integers.int64, uint64
: signed, unsigned 64-bit integers.
Use int
for integers that are not
going to be too large, e.g., loop counters.
You can assume that an int
is
at least 32 bits,
but don't assume that it has more than 32 bits.
Functions should not return values of type
TSize (size_t)
.
Instead, use fixed size arguments for function return values,
like int32
, int64
.
Do not use the unsigned integer types, unless the quantity you are representing is really a bit pattern rather than a number, or unless you need defined twos-complement overflow. In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this purpose.
To print 64-bit integers, use TUInt64::GetStr()
for
conversion to string and the %s
print formatting conversion:
int64 Val = 123456789012345; TStr Note = TStr::Fmt("64-bit integer value is %s", TUInt64::GetStr(Val).CStr());
SNAP exceptions are implemented with TExcept::Throw
and PExcept
.
TExcept::Throw
throws an exception:
TExcept::Throw("Empty blog url");
PExcept
catches an exception:
try { ... } catch (PExcept Except) { SaveToErrLog(Except->GetStr().CStr()); }
const
whenever it makes sense to do so.
Declared variables and parameters can be preceded by the
keyword const
to indicate the variables are not
changed (e.g., const int Foo
). Class functions
can have the const
qualifier to indicate the
function does not change the state of the class member
variables (e.g., class Foo { int Bar(char Ch) const;
};
).
const
variables, data members, methods and
arguments add a level of compile-time type checking. It
is better to detect errors as soon as possible.
const
can also significantly reduce execution
time.
Therefore we strongly recommend that you use
const
whenever it makes sense to do so:
const
.
const
whenever
possible. Accessors should almost always be
const
. Other methods should be const if they do
not modify any data members, do not call any
non-const
methods, and do not return a
non-const
pointer or non-const
reference to a data member.
const
whenever they do not need to be modified after
construction.
Put const
at the beginning of a definition as in
const int* Foo
, not in the middle as in
int const *Foo
.
Note that const
is viral: if you pass a const
variable to a function, that function must have const
in its prototype (or the variable will need a
const_cast
).
const
variables to macros.
Macros are not nearly as necessary in C++ as they are
in C. Instead of using a macro to inline performance-critical
code, use an inline function. Instead of using a macro to
store a constant, use a const
variable. Instead of
using a macro to "abbreviate" a long variable name, use a
reference. Instead of using a macro to conditionally compile code
... well, don't do that at all (except, of course, for the
#define
guards to prevent double inclusion of
header files). It makes testing much more difficult.
0
for integers, 0.0
for reals,
NULL
for pointers, and '\0'
for chars.
sizeof(varname)
instead of
sizeof(type)
whenever possible.
Use sizeof(varname)
because it will update
appropriately if the type of the variable changes.
sizeof(type)
may make sense in some cases,
but should generally be avoided because it can fall out of sync if
the variable's type changes.
Struct Data; memset(&Data, 0, sizeof(Data));
memset(&Data, 0, sizeof(Struct));
static_cast<>()
:
int Cnt = static_cast<int>(ValType);
int y = (int)x;
or
int y = int(x);
.
TSnap
to encapsulate global
functions. Define all SNAP global functions within that namespace.
Use namespace TSnapDetail
to encapsulate local
functions.
Do not define any new namespaces, use TSnap
for global
functions and TSnapDetail
for local functions.
Do not use a using-directive to make all names from a namespace available:
// Forbidden -- This pollutes the namespace. using namespace Foo;
Use namespace TSnap
for global functions and
namespace TSnapDetail
for local functions.
See file alg.h
for an example.
If you must define a nonmember function and it is only
needed locally in its .cpp
file, use static
linkage: static int Foo() {...}
, or
namespace TSnapDetail
to limit its scope:
namespace TSnapDetail { // This is in a .cpp file. // The content of a namespace is not indented enum { kUnused, kEOF, kError }; // Commonly used tokens. bool AtEof() { return pos_ == kEOF; } // Uses our namespace's EOF. } // namespace
*.h
files,
so that functions relating to common functionality are grouped together.
Function definitions in the corresponding *.cpp
file
should be in the same order as function declarations.
double GetDegreeCentr(const PUNGraph& Graph, const int& NId);
When defining a function, parameter order is: inputs, then outputs.
Input parameters
are usually values or const
references, while output
and input/output parameters are non-const
references.
In the following example, Graph
is an input parameter,
and InDegV
and OutDegV
are output parameters:
void GetDegSeqV(const PGraph& Graph, TIntV& InDegV, TIntV& OutDegV);
When ordering function parameters, put all input-only parameters before any output parameters. In particular, do not add new parameters to the end of the function just because they are new; place new input-only parameters before the output parameters.
This is not a hard-and-fast rule. Parameters that are both input and output (often classes/structs) muddy the waters, and, as always, consistency with related functions may require you to bend the rule.
We recognize that long functions are sometimes appropriate, so no hard limit is placed on functions length. If a function exceeds about 40 lines, think about whether it can be broken up without harming the structure of the program.
Even if your long function works perfectly now, someone modifying it in a few months may add new behavior. This could result in bugs that are hard to find. Keeping your functions short and simple makes it easier for other people to read and modify your code.
TSIn
constructor,
an assignment operator =
, and
Save()
and Load()
methods.
Classes that support "smart" pointers
must also define a New()
method.
See Class Format for additional details on
class formatting.
static const
data members)static const
data members)
Group the methods so that methods relating to common functionality
are grouped together.
Method definitions in the corresponding .cpp
file
should be the same as the declaration order, as much as possible.
Init()
method.
If your object requires non-trivial initialization, consider
having an explicit Init()
method. In particular,
constructors should not call virtual functions, attempt to raise
errors, access potentially uninitialized global variables, etc.
private
, and provide
access to them through accessor functions as needed.
Typically a variable would be called Foo
and
the accessor function GetFoo()
. You may also
want a mutator function SetFoo()
.
struct
only for passive objects that carry data;
everything else is a class
.
The struct
and class
keywords behave
almost identically in C++. We add our own semantic meanings
to each keyword, so you should use the appropriate keyword for
the data-type you're defining.
If in doubt, make it a class
.
Interface
suffix.
In general, every .cpp
file should have an associated
.h
file. There are some common exceptions, such as
unittests
and small .cpp
files containing just a main()
function.
Correct use of header files can make a huge difference to the readability, size and performance of your code.
#define
guards to
prevent multiple inclusion. The format of the symbol name
should be
snap_<file>_h
:
#ifndef snap_agm_h #define snap_agm_h ... #endif // snap_agm_h
.h
files..h
files.
All of a project's header files should be
listed as descendants of the project's source directory
without use of UNIX directory shortcuts .
(the current
directory) or ..
(the parent directory). For
example,
snap-awesome-algorithm/src/base/logging.h
should be included as:
#include "base/logging.h"
Within each section it is nice to order the includes alphabetically.
The coding conventions described above are mandatory. However, like all good rules, these sometimes have exceptions, which we discuss here.
If you need to change such code, the best option is to rewrite it, so that it conforms to the guide. If that is not possible, because rewriting would require more time and effort than you have available, then stay consistent with the local conventions in that code.
SNAP does not contain any Windows specific code. All such code is encapsulated within the GLIB library, which SNAP uses. If you must implement some Windows specific functionality, contact SNAP maintainers.
Use common sense and BE CONSISTENT.
If you are writing new code, follow this style guide.
To see an example of SNAP programming style, see
file graph.h
.
If you are editing existing code, take a few minutes to look at it and determine its style. If your code looks drastically different from the existing code around it, the discontinuity makes it harder for others to understand it. Try to avoid this.
OK, enough writing about writing code; the code itself is much more interesting. Have fun!
Revision 02-06-2013