Which token is a literal
Gosling, B. Joy, G. Steele, G. Bracha: The Java Language Specification. In a Java program, all characters are grouped into symbols called tokens. Larger language features are built from the first five categories of tokens the sixth kind of token is recognized, but is then discarded by the Java compiler from further processing. We must learn how to identify all six kind of tokens that can appear in Java programs.
In EBNF we write one simple rule that captures this structure: token identifier keyword separator operator literal comment We will examine each of these kinds of tokens in more detail below, again using EBNF. For now, we briefly describe in English each token type. Identifiers: names the programmer chooses Keywords: names already in the programming language Separators also known as punctuators : punctuation characters and paired-delimiters Operators: symbols that operate on arguments and produce results Literals specified by their type Numeric: int and double Logical: boolean Textual: char and String Reference: null Comments Line Block Finally, we will also examine the concept of white space which is crucial to understanding how the Java compiler separates the characters in a program into a list of tokens; it sometimes helps decide where one token ends and where the next token starts.
We can describe the structure of this character set quite simply in EBNF, using only alternatives in the right hand sides. White space consists of spaces from the space bar , horizontal and vertical tabs, line terminators newlines and formfeeds : all are non-printing characters, so we must describe them in English.
White space and tokens are closely related: we can use white space to force the end of one token and the start of another token i. For example XY is considered to be a single token, while X Y is considered to be two tokens. Adding extra white space e. Just as a good comedian know where to pause when telling a joke; a good programmer knows where to put white space when writing code.
The first category of token is an Identifier. Identifiers are used by programmers to name things in Java: things such as variables, methods, fields, classes, interfaces, exceptions, packages, etc. When you read programs that I have written, and write your own program, think carefully about the choices made to create identifiers. Choose descriptive identifiers mostly starting with lower-case letters. Separate different words in an identfier with a case change: e. Apply the "Goldilocks Principle": not too short, not too long, just right.
During our later discussions of programming style, we will examine the standard naming conventions that are recommend for use in Java code. The second category of token is a Keyword , sometimes called a reserved word. Keywords are identifiers that Java reserves for its own use. These identifiers have built-in meanings that cannot change. Thus, programmers cannot use these identifiers for anything other than their built-in meanings.
Technically, Java classifies identifiers and keywords as separate categories of tokens. The following is a list of all 49 Java keywords we will learn the meaning of many, but not all,of them in this course. It would be an excellent idea to print this table, and then check off the meaning of each keyword when we learn it; some keywords have multiple meanings, determined by the context in which they are used.
The third category of token is a Separator also known as a punctuator. There are exactly nine, single character separators in Java, shown in the following simple EBNF rule. The last six separators 3 pairs of 2 each are also known as delimiters: wherever a left delimiter appears in a correct Java program, its matching right delimiter appears soon afterwards they always come in matched pairs. Together, these each pair delimits some other entity. For example the Java code Math.
The fourth category of token is an Operator. Java includes 37 operators that are listed in the table below; each of these operators consist of 1, 2, or at most 3 special characters. The fifth, and most complicated category of tokens is the Literal. All values that we write in a program are literals: each belongs to one of Java's four primitive types int , double , boolean , char or belongs to the special reference type String.
It is stored in the builtins module, alongside built-in functions like print. These names are defined by the interpreter and its implementation including the standard library. Current system names are discussed in the Special method names section and elsewhere.
More will likely be defined in future versions of Python. Class-private names. See section Identifiers Names. One syntactic restriction not indicated by these productions is that whitespace is not allowed between the stringprefix or bytesprefix and the rest of the literal.
The source character set is defined by the encoding declaration; it is UTF-8 if no encoding declaration is given in the source file; see section Encoding declarations. In plain English: Both types of literals can be enclosed in matching single quotes ' or double quotes ". They can also be enclosed in matching groups of three single or double quotes these are generally referred to as triple-quoted strings. Bytes literals are always prefixed with 'b' or 'B' ; they produce an instance of the bytes type instead of the str type.
They may only contain ASCII characters; bytes with a numeric value of or greater must be expressed with escapes. Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R' ; such strings are called raw strings and treat backslashes as literal characters. Given that Python 2. New in version 3. See PEP for more information.
A string literal with 'f' or 'F' in its prefix is a formatted string literal ; see Formatted string literals. The 'f' may be combined with 'r' , but not with 'b' or 'u' , therefore raw formatted strings are possible, but formatted bytes literals are not. In triple-quoted literals, unescaped newlines and quotes are allowed and are retained , except that three unescaped quotes in a row terminate the literal.
Unless an 'r' or 'R' prefix is present, escape sequences in string and bytes literals are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:. In a bytes literal, hexadecimal and octal escapes denote the byte with the given value. In a string literal, these escapes denote a Unicode character with the given value.
Changed in version 3. Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i. This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.
It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals. Specifically, a raw literal cannot end in a single backslash since the backslash would escape the following quote character. Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal, not as a line continuation. Multiple adjacent string or bytes literals delimited by whitespace , possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation.
Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings, for example:. Note that this feature is defined at the syntactical level, but implemented at compile time. Also note that literal concatenation can use different quoting styles for each component even mixing raw strings and triple quoted strings , and formatted string literals may be concatenated with plain string literals.
A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time. Escape sequences are decoded like in ordinary string literals except when a literal is also marked as a raw string. After decoding, the grammar for the contents of the string is:. A conversion field, introduced by an exclamation point '!
A format specifier may also be appended, introduced by a colon ':'. Expressions in formatted string literals are treated like regular Python expressions surrounded by parentheses, with a few exceptions. Replacement expressions can contain line breaks e.
Each expression is evaluated in the context where the formatted string literal appears, in order from left to right. When a format is specified it defaults to the str of the expression unless a conversion '! If a conversion is specified, the result of evaluating the expression is converted before formatting. Python Pillow.
Python Turtle. Verbal Ability. Interview Questions. Company Questions. Artificial Intelligence. Cloud Computing. Data Science. Angular 7.
Machine Learning. Data Structures. Operating System. Computer Network. Compiler Design. Computer Organization. Discrete Mathematics. Ethical Hacking. Computer Graphics. Software Engineering. The reserved function names are:. Macro names are reserved in all contexts. Do not use any of the following reserved macro names:.
Identifiers that start with E followed by a digit or an uppercase letter. A keyword is an identifier that is reserved in all contexts for special use by the language. The following is a list of all the reserved keywords. Note that some compilers do not implement all of the reserved keywords; these compilers allow you to use certain keywords as identifiers.
See Section 1. A literal is an integer, floating-point, Boolean, character, or string constant. An integer literal can be a decimal, octal, or hexadecimal constant. A prefix specifies the base or radix: 0x or 0X for hexadecimal, 0 for octal, and nothing for decimal. An integer literal can also have a suffix that is a combination of U and L , for unsigned and long , respectively. The suffix can be uppercase or lowercase and can be in any order.
The suffix and prefix are interpreted as follows:. If the suffix is UL or ul , LU , etc. That is, if the value fits in a long , the type is long ; otherwise, the type is unsigned long. An error results if the value does not fit in an unsigned long. If the suffix is U , the type is unsigned or unsigned long , whichever fits first.
Without a suffix, a decimal integer has type int or long , whichever fits first. An octal or hexadecimal literal has type int , unsigned , long , or unsigned long , whichever fits first. Some compilers offer other suffixes as extensions to the standard. See Appendix A for examples. A floating-point literal has an integer part, a decimal point, a fractional part, and an exponent part. You must include the decimal point, the exponent, or both.
You must include the integer part, the fractional part, or both.
0コメント