Revisit Python from statements and PEG

Event:

PyCon APAC 2022

Presented:

2022/07/20 (pre-recorded) nikkie

你好❗️ PyCon APAC 2022

Many thanks to all the staff who worked so hard❤️

First question: Can you write in Python❔

  • if

  • for

  • function

About "Revisit Python from statements and PEG"

audience:

Intermediate level (equivalent of those who answered "yes")

subject:

appearance pf Python

Audience take away

  • Components of Python statements: expressions and keywords

  • Explanation of the statement by clauses, headers, and suites

  • How to read PEG

Motivation: Python itself is interesting!

  • Language Reference is difficult? But exciting!

  • Difficulty may comes from confusion between new and old grammatical expressions, so resolve it (as first step)

About nikkie myself

  • loves Python (& Anime, Japanese cartoons)

  • Twitter @ftnext / GitHub @ftnext

  • PyCon JP: 2019〜2020 staff & 2021 chair

About nikkie myself

  • Data scientist at Uzabase, Inc. (NLP, Write Python)

  • We're hiring!! (Engineers, Data scientists, Researchers)

https://drive.google.com/uc?id=19PMMnkqDiFMCJBPwoA1B51ltQBG0y4kL

Also talks 讓我聽見愛的歌聲 x 🐍

Statements in programming

  1. High level / Low level

  2. Short introduction about compile

1 High level / Low level

High-level programming language / Machine language

High level

Low level

High-level programming language (e.g. Python)

Machine language

Humans 👩‍💻👨‍💻 read and write

Machines 🤖 read and write

E.g. Program written in high-level language 👩‍💻👨‍💻(mario.py

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

Execute mario.py 🤖

$ python mario.py
Input your name: nikkie
It's not Mario.

$ python mario.py
Input your name: mario
It's me, Mario!

For humans to write in high-level languages

  • Humans 👩‍💻👨‍💻 read and write high-level language

  • Machines 🤖 read and write machine language

  • Machines 🤖 converts program in high-level language into machine language

For humans to write in high-level languages

  • Machines 🤖 read high-level languages (not only machine language)

  • Machines 🤖 recognize the structure of programs written in high-level languages

Statement allows machines 🤖 to understand the structure of the program

2 Short introduction about compile

ref: "The Elements of Computing Systems"

Example of compilation

  • Source code (written in Python)

  • Bytecode (ref: Glossary)

    • internal representation in the interpreter

    • cached in .pyc files

Compile the if statement program

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

2 steps from compile

  1. Lexical analysis

  2. Abstract Syntax Trees

2-1. Lexical analysis

  • Source code is string.

  • Parse into tokens, the smallest unit of meaning.

e.g. Lexical analysis (python -m tokenize -e mario.py)

2,0-2,2:            NAME           'if'           
2,3-2,7:            NAME           'name'         
2,7-2,8:            DOT            '.'            
2,8-2,13:           NAME           'lower'        
2,13-2,14:          LPAR           '('            
2,14-2,15:          RPAR           ')'            
2,16-2,18:          EQEQUAL        '=='           
2,19-2,26:          STRING         '"mario"'      
2,26-2,27:          COLON          ':'            
2,27-2,28:          NEWLINE        '\n'           

2-2. Abstract Syntax Trees

  • AST

  • Machines handle the structure of a program as a tree.

  • Output an abstract syntax tree from a sequence of tokens.

e.g. AST (python -m ast -m exec mario.py)

      If(
         test=Compare(
            left=Call(
               func=Attribute(
                  value=Name(id='name', ctx=Load()),
                  attr='lower',
                  ctx=Load()),
               args=[],
               keywords=[]),
            ops=[
               Eq()],
            comparators=[
               Constant(value='mario')]),
         body=[
            Expr(

Two kinds of syntax

Abstract syntax

Concrete syntax

interpreted by the interpreter

appearance of programming language (e.g. how to write compound statements)

Subject is concrete syntax

  • Deep dive into the current appearance of compound statements (Python 3.10.5)

  • Machine (parser 🤖) reads whether tokens match concrete syntax or not (Key🗝 is recursion)

Revisit Python from statements and PEG

  • Share the meaning of colons and indents

  • Read and taste PEG together

Revisit Python from statements and PEG

  • Meaning of colons and indents: tell parser components of statement

  • Read and taste PEG together

Revisit Python from statements and PEG

  • Meaning of colons and indents: tell parser components of statement

  • Read and taste PEG together: expressed concisely and without omissions

Menu: Revisit Python from statements and PEG

  1. Statements in Python

  2. Define statements with PEG

  3. Read PEG together

Part I. Statements in Python

Glossary "statement"

A statement is part of a suite (a “block” of code).

https://docs.python.org/3/glossary.html#term-statement

Focus on "compound statements"

Compound statements contain (groups of) other statements;

The Python Language Reference 8. Compound statements

Example of compound statements: if statement

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

Compound statements are like control flows

they[*compound statements] affect or control the execution of those other statements in some way (8. Compound statements)

if statement is a branch (execute / don't execute).

Elements of compound statement

  • clause

  • header

  • suite

Clause

A compound statement consists of one or more ‘clauses.’ (8. Compound statements)

e.g. Clauses in if statement 1/2

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

e.g. Clauses in if statement 2/2

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

Elements of clause

A clause consists of a header and a ‘suite.’ (8. Compound statements)

Header

Each clause header begins with a uniquely identifying keyword and ends with a colon. (8. Compound statements)

Keyword

  • Specific tokens

  • e.g.

    • if

    • else

    • for

    • def

Header begins with a keyword and ends with a colon

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

Suite

A suite can be [...], or it can be one or more indented statements on subsequent lines. (8. Compound statements)

Suite is one or more indented statements

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

e.g. Suite as two indented statements

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me.")
    print("Mario!")
else:
    print("It's not Mario.")

Suite

Only the latter form[*prior quoted] of a suite can contain nested compound statements; (8. Compound statements)

Compound statements are defined recursively!

e.g. Suite nests other compound statement

name = input("Input your name: ")
if name.lower() == "mario":
    for _ in range(3):
        print("It's me, Mario!")
else:
    print("It's not Mario.")

Short summary🥟: Statements in Python

  • Compound statements consist of clauses (headers and suites)

  • Colons indicate headers!

  • Indents indicate suites!

Supplementation💊 about statements: Glossary continues

A statement is either an expression or one of several constructs with a keyword, such as if, while or for.

https://docs.python.org/3/glossary.html#term-statement

A statement is either

  1. an expression

  2. one of several constructs with a keyword

Supplementation💊: Keywords are reserved (2)

>>> if = 1231  # Cannot use as variable
  File "<stdin>", line 1
    if = 1231
       ^
SyntaxError: invalid syntax

Another element of statement: Expression (1)

A piece of syntax which can be evaluated to some value.

Glossary 'expression'

Glossary continues

In other words, an expression is an accumulation of expression elements [...] which all return a value.

Glossary 'expression'

Expressions are defined recursively too!

Elements of expression

  • Literals (6. Expressions 6.2.2)

  • e.g. 108 is literal (int) 👉 108 is expression

Elements of expression, next

  • Operators (6. Expressions 6.7)

  • e.g. 33 - 4 is expression (literal, operator, literal)

  • Really accumulation of expression elements (recursive)

Elements of expression, just one more

Expression itself is statement

The function call in the following example is also a statement (written as a suite)

name = input("Input your name: ")
if name.lower() == "mario":
    print("It's me, Mario!")
else:
    print("It's not Mario.")

Menu: Revisit Python from statements and PEG

  1. ✅ Statements in Python

  2. Define statements with PEG

  3. Read PEG together

Part II. Define statements with PEG

Parsing Expression Grammar

Grammar from rules

  • Grammar consists of a sequence of rules of the form (PEP 617)

  • Definitions of the rules in PEG are:

rule_name: expression

How to read PEG

You can find on PEP 617 'Grammar Expressions'.

Symbols used in PEG 1/2

  • literals

  • whitespace

  • |

  • ()

  • [] OR ?

literals

  • Written in single quotes

  • e.g. 'else' (keyword)

whitespace

  • e.g. e1 e2

  • Match e1, then match e2.

  • (If it doesn't match e1 first, it won't match e1 e2)

literals & whitespace

else_block: 'else' ':' block
  • First, match literal 'else'

  • Next, match literal ':'

  • Then, match the rule block

|

  • e.g. e1 | e2

  • Match e1 or e2

  • Note: ordered choice (left comes first. characteristic of PEG)

How to read |

  • rule_name: first_rule second_rule is equivalent of:

rule_name:
    | first_rule
    | second_rule
  • | before first_rule is for formatting purposes.

()

  • Group (with repetition later)

  • e.g. 1 ( e ): Match e

  • e.g. 2 ( e1 e2 ): Match e1 e2

[] OR ?

  • Match optionally

  • e.g. [e] (equivalent of e?)

    • May or may not match e (optionally)

Symbols used in PEG 2/2

  • *

  • +

    • join (s.e+)

  • lookahead

    • &

    • !

  • ~

*

  • e.g. e*: Match zero or more occurrences of e (i.e. zero or more repetitions)

  • e.g. (e1 e2)*: Match zero or more repetitions of group (e1 e2)

+

  • e.g. e+: Match one or more occurrences of e (i.e. one or more repetitions)

  • e.g. (e1 e2)+: Match one or more repetitions of group (e1 e2)

s.e+

  • Match one or more occurrences of e, separated by s (equivalent of (e (s e)*))

  • e.g. ','.e+

  • one or more occurrences of e, separated by comma: e / e,e / e,e,e / and so on

lookahead

  • rule: 'a' 'b' matches tokens 'a' 'b' ...

  • lookahead: without consuming any tokens

  • 'a' 'b' (b is lookahead) matches 'a' 'b' ... and next token is b

&

  • Positive lookahead: Succeed if matched

  • e.g. &e: e is required to match, but not consumed by match

  • (Prior example is the one of positive lookahead)

!

  • Negative lookahead: Fail if matched

  • e.g. primary: atom !'.' !'(' !'['

    • Given a is atom, it matches if the expression is not a. or a( or a[.

~

  • Commit

  • e.g. rule_name: '(' ~ some_rule ')' | some_alt

    • Match '(', but doesn't match '(' some_rule ')', some_alt is not considered

    • Commit will not consider others in an ordered selection.

Understand how to read symbols🙌

  • literals

  • whitespace

  • |

  • ()

  • [] OR ?

Understand how to read symbols🙌

  • *

  • +

    • join (s.e+)

  • lookahead

    • &

    • !

  • ~

Broader world of PEG!

Short summary🥟: Define statements with PEG

  • Introduce how to read of each symbol in PEG (i.e. meaning).

  • Ready to read the definitions of compound statements of Python.

Menu: Revisit Python from statements and PEG

  1. ✅ Statements in Python

  2. ✅ Define statements with PEG

  3. Read PEG together

Part III. Read PEG together

Read syntax definitions

Our reading list!

compound_stmt:
    | function_def
    | if_stmt
    | for_stmt
    | while_stmt
    | match_stmt

References include other statements.

Assumptions of reading PEG

  • Program has been tokenized (lexical analysis)! We have a sequence of tokens.

  • Parser checks that the sequence of tokens matches the rules.

  • i.e. you won't see syntax errors if you write according to the definitions.

Our reading list!

compound_stmt:
    | function_def
    | if_stmt
    | for_stmt
    | while_stmt
    | match_stmt

if statement

The if statement is used for conditional execution:

8.1. The if statement

How would you explain the syntax of an if statement (without PEG)?

Think about it for a minute.

nikkie's choice: Enumerate cases

  • else presence / absence

    • if ...

    • if ... else ...

Hard to check for omissions in enumeration😫

  • Number of elif (0 / 1 / multiple)

    • if ... elif ...

    • if ... elif ... elif ...

    • if ... elif ... else ...

    • if ... elif ... elif ... else ...

Syntax of if statement by PEG

if_stmt:
    | 'if' named_expression ':' block elif_stmt
    | 'if' named_expression ':' block [else_block]
elif_stmt:
    | 'elif' named_expression ':' block elif_stmt
    | 'elif' named_expression ':' block [else_block]
else_block:
    | 'else' ':' block

What is block?

block:
    | NEWLINE INDENT statements DEDENT
    | simple_stmts

Elements of block

statements: statement+
statement: compound_stmt  | simple_stmts
# In brief, simple_stmts are multiple simple_stmt

block (equivalent of suite)

# indented multiple (simple or compound) statements with newline and dedent,
# OR multiple simple statements without newline
block:
    | NEWLINE INDENT statements DEDENT
    | simple_stmts

Taste if statement 1/2

if_stmt:
    | 'if' named_expression ':' block elif_stmt
    | 'if' named_expression ':' block [else_block]
  • Required: Clause with a header starting with the keyword if and following block

  • View point: presence or absence of elif after the if

Taste if statement 2/2

elif_stmt:
    | 'elif' named_expression ':' block elif_stmt
    | 'elif' named_expression ':' block [else_block]
  • View point: presence or absence of elif after the elif

if statement expressed in PEG

  • if clause is required.

  • View point of presence or absence of elif!

  • else clause is optional ([]).

  • No omissions!

wandering: named_expression

named_expression appears in the header of if and elif clauses.

named_expression

# Assignment expression OR (non-assignment) expression
named_expression:
    | assignment_expression
    | expression !':='

assignment_expression:
    | NAME ':=' ~ expression

Read definition of named_expression

  • Assignment expression := was big change; It affected the syntax of the control flow.

  • Note: appears in while statement (not only in if statement)

Our reading list!

compound_stmt:
    | function_def
    | if_stmt
    | for_stmt
    | while_stmt
    | match_stmt

while statement

The while statement is used for repeated execution as long as an expression is true:

8.2. The while statement

Syntax of while statement

while_stmt:
    | 'while' named_expression ':' block [else_block]

Taste while statement

while_stmt:
    | 'while' named_expression ':' block [else_block]
  • Required: Clause with a header starting with the keyword while and following block

  • Optional else_block

while else

if the expression is false (which may be the first time it is tested) the suite of the else clause, if present, is executed and the loop terminates.

A break statement executed in the first suite terminates the loop without executing the else clause’s suite.

🚨 Syntax supports, but do not use else_block in loop (recommend)

9 Avoid else Blocks After for and while Loops

Effective Python Second EditionBecause they are easily misunderstood

Our reading list!

compound_stmt:
    | function_def
    | if_stmt
    | for_stmt
    | while_stmt
    | match_stmt

for statement

The for statement is used to iterate over the elements of a sequence (such as a string, tuple or list) or other iterable object:

8.3. The for statement

Syntax of for statement

for_stmt:
    | 'for' star_targets 'in' ~ star_expressions ':' [TYPE_COMMENT] block [else_block]
    | ASYNC 'for' star_targets 'in' ~ star_expressions ':' [TYPE_COMMENT] block [else_block]

Taste for statement (out of scope: async for)

for_stmt:
    | 'for' star_targets 'in' ~ star_expressions ':' [TYPE_COMMENT] block [else_block]
  • Required: Clause with a header starting with the keyword for and containing in, and following block

  • Optional: TYPE_COMMENT and else_block

type comment of for statement

for x, y in points:  # type: float, float
    # Here x and y are floats
    ...

https://peps.python.org/pep-0484/#type-comments

for else (recommend not using)

When the items are exhausted ([...]), the suite in the else clause, if present, is executed, and the loop terminates.

A break statement executed in the first suite terminates the loop without executing the else clause’s suite.

Our reading list!

compound_stmt:
    | function_def
    | if_stmt
    | for_stmt
    | while_stmt
    | match_stmt

Function definitions

A function definition defines a user-defined function object

8.7. Function definitions

Syntax of function definitions (def)

function_def:
    | decorators function_def_raw
    | function_def_raw
function_def_raw:
    | 'def' NAME '(' [params] ')' ['->' expression ] ':' [func_type_comment] block
    | ASYNC 'def' NAME '(' [params] ')' ['->' expression ] ':' [func_type_comment] block
func_type_comment:
    | NEWLINE TYPE_COMMENT &(NEWLINE INDENT)
    | TYPE_COMMENT

Taste function definitions 1/2

function_def:
    | decorators function_def_raw
    | function_def_raw
  • With one or more decorators OR without ones

Taste function definitions 2/2 (out of scope: async def)

function_def_raw:
    | 'def' NAME '(' [params] ')' ['->' expression ] ':' [func_type_comment] block
  • Required: Clause with a header starting with the keyword def and containing function NAME and (), and following block

  • params in () and type hint '->' expression are optional

func_type_comment is also optional

def embezzle(self, account, funds=1000000, *fake_receipts):
    # type: (str, int, *str) -> None
    """Embezzle funds from account using fake receipts."""
    <code goes here>

Suggested syntax for Python 2.7 and straddling code (PEP 484)

Our reading list!

compound_stmt:
    | function_def
    | if_stmt
    | for_stmt
    | while_stmt
    | match_stmt

match statement

The match statement is used for pattern matching.

8.6. The match statement

Since Python 3.10 (2021/10 release)

match subject:
    case <pattern_1>:
        <action_1>
    case <pattern_2>:
        <action_2>
    case <pattern_3>:
        <action_3>
    case _:
        <action_wildcard>

ref: What's New In Python 3.10

Example of match statement

def fizzbuzz(number):
    match number % 3, number % 5:
        case 0, 0: return "FizzBuzz"
        case 0, _: return "Fizz"
        case _, 0: return "Buzz"
        case _, _: return str(number)

Syntax of match statement

match_stmt:
    | "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
    | star_named_expression ',' star_named_expressions?
    | named_expression
case_block:
    | "case" patterns guard? ':' block
guard: 'if' named_expression

Taste match statement 1/4

match_stmt:
    | "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
  • Required: a header starting with the soft keyword match

  • One or more case_block

Taste match statement 2/4

match_stmt:
    | "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
  • Required: NEWLINE and INDENT

  • case block cannot be written on the match line (But e.g. if header and block may come on the same line)

Taste match statement 3/4 (case_block

case_block:
    | "case" patterns guard? ':' block
  • Required: a header starting with the soft keyword case

  • block follows (It can be continued on the same line, separated by a semicolon, or indented on the next line)

Taste match statement 4/4 (guard

case_block:
    | "case" patterns guard? ':' block
guard: 'if' named_expression
  • Forms part of the header starting with case.

  • But optional (?).

Example of guard

>>> flag = False
>>> match (100, 200):
...    case (100, 200) if flag:  # Match succeeds, but guard fails
...        print("Case 2")
...    case (100, y):  # Match! (200 is assigned to y)
...        print(f"Case 3, y: {y}")
...
Case 3, y: 200

ref: 8.6. The match statement

Finished! 🙌

compound_stmt:
    | function_def
    | if_stmt
    | for_stmt
    | while_stmt
    | match_stmt

Short summary🥟: Read PEG together

  • Tasted definitions of compound statements written in PEG

  • simple view point

  • AND expressed concisely without omissions

Menu: Revisit Python from statements and PEG

  1. ✅ Statements in Python

  2. ✅ Define statements with PEG

  3. ✅ Read PEG together

Summary🌯: Revisit Python from statements and PEG

  • Focus on appearance of compound statements of Python (concrete syntax).

  • learn how to read PEG and tasted the syntax of compound statements.

Summary🌯: Revisit Python from statements and PEG

  • Compound statements consist of headers and suites

  • Colons and indents tell parser elements of statements

Summary🌯: Revisit Python from statements and PEG

  • Taste PEG's concise expression without ommissions

  • Language Reference is difficult but exciting!

Thank you very much for your attention.

Enjoy development with Python!

EOF