Jedi Development

Note

This documentation is for Jedi developers who want to improve Jedi itself, but have no idea how Jedi works. If you want to use Jedi for your IDE, look at the plugin api. It is also important to note that it’s a pretty old version and some things might not apply anymore.

Introduction

This page tries to address the fundamental demand for documentation of the Jedi internals. Understanding a dynamic language is a complex task. Especially because type inference in Python can be a very recursive task. Therefore Jedi couldn’t get rid of complexity. I know that simple is better than complex, but unfortunately it sometimes requires complex solutions to understand complex systems.

In six chapters I’m trying to describe the internals of Jedi:

Note

Testing is not documented here, you’ll find that right here.

The Jedi Core

The core of Jedi consists of three parts:

Most people are probably interested in type inference, because that’s where all the magic happens. I need to introduce the parser first, because jedi.inference uses it extensively.

Parser

Jedi used to have its internal parser, however this is now a separate project and is called parso.

The parser creates a syntax tree that Jedi analyses and tries to understand. The grammar that this parser uses is very similar to the official Python grammar files.

Type inference of python code (inference/__init__.py)

Type inference of Python code in Jedi is based on three assumptions:

  • The code uses as least side effects as possible. Jedi understands certain list/tuple/set modifications, but there’s no guarantee that Jedi detects everything (list.append in different modules for example).
  • No magic is being used:
    • metaclasses
    • setattr() / __import__()
    • writing to globals(), locals(), object.__dict__
  • The programmer is not a total dick, e.g. like this :-)

The actual algorithm is based on a principle I call lazy type inference. That said, the typical entry point for static analysis is calling infer_expr_stmt. There’s separate logic for autocompletion in the API, the inference_state is all about inferring an expression.

TODO this paragraph is not what jedi does anymore, it’s similar, but not the same.

Now you need to understand what follows after infer_expr_stmt. Let’s make an example:

import datetime
datetime.date.toda# <-- cursor here

First of all, this module doesn’t care about completion. It really just cares about datetime.date. At the end of the procedure infer_expr_stmt will return the date class.

To visualize this (simplified):

  • InferenceState.infer_expr_stmt doesn’t do much, because there’s no assignment.
  • Context.infer_node cares for resolving the dotted path
  • InferenceState.find_types searches for global definitions of datetime, which it finds in the definition of an import, by scanning the syntax tree.
  • Using the import logic, the datetime module is found.
  • Now find_types is called again by infer_node to find date inside the datetime module.

Now what would happen if we wanted datetime.date.foo.bar? Two more calls to find_types. However the second call would be ignored, because the first one would return nothing (there’s no foo attribute in date).

What if the import would contain another ExprStmt like this:

from foo import bar
Date = bar.baz

Well… You get it. Just another infer_expr_stmt recursion. It’s really easy. Python can obviously get way more complicated then this. To understand tuple assignments, list comprehensions and everything else, a lot more code had to be written.

Jedi has been tested very well, so you can just start modifying code. It’s best to write your own test first for your “new” feature. Don’t be scared of breaking stuff. As long as the tests pass, you’re most likely to be fine.

I need to mention now that lazy type inference is really good because it only inferes what needs to be inferred. All the statements and modules that are not used are just being ignored.

Inference Values (inference/base_value.py)

Values are the “values” that Python would return. However Values are at the same time also the “values” that a user is currently sitting in.

A ValueSet is typically used to specify the return of a function or any other static analysis operation. In jedi there are always multiple returns and not just one.

Inheritance diagram of jedi.inference.value.instance.TreeInstance, jedi.inference.value.klass.ClassValue, jedi.inference.value.function.FunctionValue, jedi.inference.value.function.FunctionExecutionContext

Name resolution (inference/finder.py)

Searching for names with given scope and name. This is very central in Jedi and Python. The name resolution is quite complicated with descripter, __getattribute__, __getattr__, global, etc.

If you want to understand name resolution, please read the first few chapters in http://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/.

Flow checks

Flow checks are not really mature. There’s only a check for isinstance. It would check whether a flow has the form of if isinstance(a, type_or_tuple). Unfortunately every other thing is being ignored (e.g. a == ‘’ would be easy to check for -> a is a string). There’s big potential in these checks.

API (api/__init__.py and api/classes.py)

The API has been designed to be as easy to use as possible. The API documentation can be found here. The API itself contains little code that needs to be mentioned here. Generally I’m trying to be conservative with the API. I’d rather not add new API features if they are not necessary, because it’s much harder to deprecate stuff than to add it later.

Core Extensions

Core Extensions is a summary of the following topics:

These topics are very important to understand what Jedi additionally does, but they could be removed from Jedi and Jedi would still work. But slower and without some features.

Iterables & Dynamic Arrays (inference/value/iterable.py)

To understand Python on a deeper level, Jedi needs to understand some of the dynamic features of Python like lists that are filled after creation:

Contains all classes and functions to deal with lists, dicts, generators and iterators in general.

Parameter completion (inference/dynamic_params.py)

One of the really important features of Jedi is to have an option to understand code like this:

def foo(bar):
    bar. # completion here
foo(1)

There’s no doubt wheter bar is an int or not, but if there’s also a call like foo('str'), what would happen? Well, we’ll just show both. Because that’s what a human would expect.

It works as follows:

  • Jedi sees a param
  • search for function calls named foo
  • execute these calls and check the input.

Docstrings (inference/docstrings.py)

Docstrings are another source of information for functions and classes. jedi.inference.dynamic_params tries to find all executions of functions, while the docstring parsing is much easier. There are three different types of docstrings that Jedi understands:

For example, the sphinx annotation :type foo: str clearly states that the type of foo is str.

As an addition to parameter searching, this module also provides return annotations.

Refactoring (api/refactoring.py)

Imports & Modules

Compiled Modules (inference/compiled.py)

Imports (inference/imports.py)

jedi.inference.imports is here to resolve import statements and return the modules/classes/functions/whatever, which they stand for. However there’s not any actual importing done. This module is about finding modules in the filesystem. This can be quite tricky sometimes, because Python imports are not always that simple.

This module uses imp for python up to 3.2 and importlib for python 3.3 on; the correct implementation is delegated to _compatibility.

This module also supports import autocompletion, which means to complete statements like from datetim (cursor at the end would return datetime).

Stubs & Annotations (inference/gradual)

It is unfortunately not well documented how stubs and annotations work in Jedi. If somebody needs an introduction, please let me know.

Caching & Recursions

Caching (cache.py)

This caching is very important for speed and memory optimizations. There’s nothing really spectacular, just some decorators. The following cache types are available:

  • time_cache can be used to cache something for just a limited time span, which can be useful if there’s user interaction and the user cannot react faster than a certain time.

This module is one of the reasons why Jedi is not thread-safe. As you can see there are global variables, which are holding the cache information. Some of these variables are being cleaned after every API usage.

Recursions (recursion.py)

Recursions are the recipe of Jedi to conquer Python code. However, someone must stop recursions going mad. Some settings are here to make Jedi stop at the right time. You can read more about them here.

Next to the internal jedi.inference.cache this module also makes Jedi not thread-safe, because execution_recursion_decorator uses class variables to count the function calls.

Settings

Recursion settings are important if you don’t want extremly recursive python code to go absolutely crazy.

The default values are based on experiments while completing the Jedi library itself (inception!). But I don’t think there’s any other Python library that uses recursion in a similarly extreme way. Completion should also be fast and therefore the quality might not always be maximal.

jedi.inference.recursion.recursion_limit = 15

Like sys.getrecursionlimit(), just for Jedi.

jedi.inference.recursion.total_function_execution_limit = 200

This is a hard limit of how many non-builtin functions can be executed.

jedi.inference.recursion.per_function_execution_limit = 6

The maximal amount of times a specific function may be executed.

jedi.inference.recursion.per_function_recursion_limit = 2

A function may not be executed more than this number of times recursively.

Helper Modules

Most other modules are not really central to how Jedi works. They all contain relevant code, but you if you understand the modules above, you pretty much understand Jedi.