Skip to article frontmatterSkip to article content

How to query StructureData nodes

In the following we presents first how StructureData nodes are store in the AiiDA database and then how to user the QueryBuilder to find them.

How properties are stored in the AiiDA database

In the AiiDA database, we store only properties which are defined to be different from the default value. For example, if we provide all the charges to be zero, the charges properties will not be stored in the database. This means that the only structures which are in the database and contains also a charges entry, will be about systems with charged different from zero (NB can also be overall neutral).

For a given structure, you can print the list of all the properties stored in the database (and so, queryable), by calling the get_defined_properties method and providing exclude_computed=False as input (i.e. returning also properties which are computed/derived from the user-defined ones):

from aiida import load_profile
from aiida_atomistic import StructureData

load_profile()

structure = StructureData(**{
    'pbc': [True, True, True],
    'cell': [[2.75, 2.75, 0.0], [0.0, 2.75, 2.75], [2.75, 0.0, 2.75]],
    'symbols': ['Si', 'Si'],
    'charges': [0.0, 0.0],
    'kinds': ['Si', 'Si'],
    'positions': [
        [0.0, 0.0, 0.0],
        [3.84, 1.3576450198781713, 1.9200]
    ],
})

print("Properties stored in the database for this StructureData are:")
print(structure.get_defined_properties(exclude_computed=False))
Properties stored in the database for this StructureData are:
{'cell', 'has_vacancies', 'formula', 'symbols', 'positions', 'cell_volume', 'is_alloy', 'kinds', 'dimensionality', 'sites', 'masses'}
/opt/conda/lib/python3.10/site-packages/pydantic/main.py:390: UserWarning: Pydantic serializer warnings:
  Expected `dict[any, any]` but got `bool` with value `False` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(

The only exception will be the sites property, which is contained in this list but not stored in the database: this is computed on-the-fly and it does not contains additional information meant to be in the database.

How to Query StructureData using properties

In the following we present how to perform advanced query in you database to retrieve StructureData with given properties. The results of the queries will be different if you try them on your local AiiDA installation: databases are different!

Exhaustive AiiDA documentation on how to find and query data can be found in the official docs page. For a full list of filters that can be applied to queries, we refer to this table.

Thanks to the additional computed properties in our StructureData (formula, symbols, kinds, masses, charges, magmoms, positions, cell_volume, dimensionality and so on), we can easily query for a structure which satisfies a specific set of requirements. The full list of queryable properties can be printed as:

StructureData().get_supported_properties()
{'cell', 'cell_charge', 'cell_magmom', 'charges', 'custom', 'hubbard', 'kinds', 'magmoms', 'masses', 'pbc', 'positions', 'symbols', 'weights'}

Simple queries

Let’s start with simple queries. The first and simplest one is to query all the StructureData which are contained in our database:

from aiida import load_profile
from aiida.orm import QueryBuilder

load_profile()

# Querying all structures in the DB
qb = QueryBuilder()
qb.append(StructureData)
print(f"We have {len(qb.all())} StructureData in our AiiDA database!")
We have 245 StructureData in our AiiDA database!

StructureData having only selected properties

This is one of the most crucial query that we may want to do. Suppose you want only StructureData where some atom is charged, or some magnetic moment is provided. Efficient queries can be done as follows:

# 1 in general, we have a given property or not?
# 
prop = 'charges'
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={     
        'attributes': {'has_key': prop}
    },    )
print(f' Number of structures having the property {prop}: {len(qb.all())}')

prop = 'magmoms'
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={     
            'attributes': {'has_key': prop}
    },    )
print(f' Number of structures having the property {prop}: {len(qb.all())}')

# 2. we have a given set of properties or not?
#
props = ['charges', 'magmoms']
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={     
        'attributes': {'and': [{'has_key': prop} for prop in props]}
    },    )
print(f' Number of structures having the properties {props}: {len(qb.all())}')

# 3. we have a given property and not another one?
#
props = ['charges', 'magmoms']
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={     
        'attributes': {'and': [{'has_key': props[0]}, {'!has_key': props[1]}]}
    },    )
print(f' Number of structures having the property {props[0]} and not the property {props[1]}: {len(qb.all())}') # is indeed the difference of first and third queries? yes
 Number of structures having the property charges: 19
 Number of structures having the property magmoms: 103
 Number of structures having the properties ['charges', 'magmoms']: 6
 Number of structures having the property charges and not the property magmoms: 13

we can also ask for StructureData not having a specific property. To do this, we use the ! negation in the has_key:

# does not have a given property 
# is the sum of this and the second query (above) equal to the number of SData in you db?
#
prop = 'magmoms'
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={     
        'attributes': {'!has_key': prop}
    },    )
print(f' Number of structures not having the property {prop}: {len(qb.all())}')
 Number of structures not having the property magmoms: 142

is the sum of the StructureData parsed from this and the one searching for magmoms equal to the total number of StructureData in the database? The answer is clearly True, looking at the first, the third and this last query of this chapter.

Useful feature of the QueryBuilder is the possibility to project some of the attributes of the queried StructureData:

#Projecting only given properties
# 
prop = 'charges'
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={     
        'attributes': {'has_key': prop}
    },    
    project= ['attributes.formula', 'attributes.'+prop, 'id'])
print(f'Formula: {qb.all()[-1][0]}\nCharges : {qb.all()[-1][1]}\npk: {qb.all()[-1][2]}')
Formula: Si2
Charges : [1.0, 0.0]
pk: 548

Structures with only given number of atoms

We may want to select only small systems, let’s less with less than six atoms, or larger systems. Let’s do it:

# given number of kinds or atoms: ok
# check shorter longer lists in the query table
#
# less than 6 atoms:
nr_atoms = 6
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.symbols': {'shorter': nr_atoms}
    },
)
print(f' Number of structures in the DB containing less than {nr_atoms} atoms: {len(qb.all())}')

# more than 5 atoms:
nr_atoms = 5
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.symbols': {'longer': nr_atoms}
    },
)
print(f' Number of structures in the DB containing more than {nr_atoms} atoms: {len(qb.all())}')

# exactly 2 atoms
nr_atoms = 2
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.symbols': {'and': [        # same can be done with 'attribute.kinds'
            {'shorter': nr_atoms+1},
            {'longer': nr_atoms-1},
            ],
        },
    }
)
print(f' Number of structures in the DB containing exactly {nr_atoms} atoms: {len(qb.all())}')
 Number of structures in the DB containing less than 6 atoms: 203
 Number of structures in the DB containing more than 5 atoms: 5
 Number of structures in the DB containing exactly 2 atoms: 119

Alloys and vacancies

We can find alloys and/or structure with vacancies by looking for the attributes.is_alloy and/or attributes.has_vacancies to be True.

Advanced queries: stoichiometry, binaries, ternaries

We may want to search for a specific system, i.e. with a given formula, or binaries, ternaries, and so on.

Using the chemical formula

# specific formula
#
formula = 'Fe2'
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
       'attributes.formula': formula, # same as: 'attributes.formula': {'==': formula}
    },
)
print(f' Number of structures in the DB with formula = {formula}: {len(qb.all())}')
 Number of structures in the DB with formula = Fe2: 12

Looking for only a certain number of atoms of a given element

We can still look at the formula. In case of structure with more than one atoms of the same element, we can proceed as follows:

# certain number of atoms of the same element - still from the formula
#
element = 'Mn'
nr_atoms = 2
if nr_atoms == 1:
    print("You are looking for all the structures having at least one atom of the desired element.")
    nr_atoms = ""
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.formula': {
            'like':f'%{element}{nr_atoms}%',
        },
    },
    project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing {nr_atoms} atoms of element {element}: {len(qb.all())}')
print(f' These are: {[struct[0] for struct in qb.all()]}')
 Number of structures in the DB containing 2 atoms of element Mn: 6
 These are: ['Mn2', 'Mn2', 'Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn2O2']

However, this query will leave open the possibility also to retrieve formulas which contain Mn2 followed by a number (e.g. Mn20). To do a more precise and robust query, we have to apply a programmatic post process of the results, as shown in what follows.

Moreover, in the specific case of only one atom of the same element, we cannot build the string f'%{element}{nr_atoms}%, but we need to make sure that no numbers follow our element string in the formula. This can be done by performing a post-processing on the results of the query, using regex expressions:

import re

# exactly one atom of the same element
# we need a regex because we cannot search for {element}{number} in the formula, as we should look for {element}[^2-9]...
#
element = 'Mn'
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.formula': {
            'like':f'%{element}%',
        },
    },
    project= ['attributes.formula', 'id'],
)

print(f' Number of structures in the DB containing element {element}: {len(qb.all())}')

res = []
for struct in qb.iterall():
    if re.search(f'{element}[^2-9]', struct[0]):
        print(f' structures in the DB containing exactly one atom of element {element}: {struct[0]}, pk:{struct[1]}')
        res.append(struct)
print(f' Number of structures in the DB containing exactly one atom of element {element}: {len(res)}')
 Number of structures in the DB containing element Mn: 11
 Number of structures in the DB containing exactly one atom of element Mn: 0

Using this approach, we can indeed repeat the query done above for the two Mn atoms:

# exactly two atoms of the same element
#
element = 'Mn'
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.formula': {
            'like':f'%{element}2%',
        },
    },
    project= ['attributes.formula', 'id'],
)

print(f' Number of structures in the DB containing element {element}: {len(qb.all())}')

res = []
for struct in qb.iterall():
    if re.search(f'{element}2[^2-9]', struct[0]):
        #print(f' SData in the DB containing exactly two atoms of element {element}: {struct[0]}, pk:{struct[1]}')
        res.append(struct)
print(f' Number of structures in the DB containing exactly two atoms of element {element}: {len(res)}')
print(f' These are: {[struct[0] for struct in res]}')
 Number of structures in the DB containing element Mn: 6
 Number of structures in the DB containing exactly two atoms of element Mn: 4
 These are: ['Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn2O2']

Binaries, ternaries

Also in this cases, we use a combination of the QueryBuilder and regex searches:

#
number_of_elements = 2 # 2 is binary, 3 is ternary and so on
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.symbols': {'longer': number_of_elements-1} # at least number_of_elements needed in the symbols list
    },
    project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing more than one atom: {len(qb.all())}')

res = []
for struct in qb.iterall():
    if re.search('^'+'[A-Z][a-z]*[0-9]*'*number_of_elements+'$', struct[0]): # we indeed look for two elements, i.e. two capital letters followed, if needed, by lower case letters and numbers
        #print(f' structures in the DB containing exactly two atoms of element {element}: {struct[0]}, pk:{struct[1]}')
        res.append(struct)
print(f' Number of binaries structures in the DB: {len(res)}')
print(f' These are: {[struct[0] for struct in res]}')

# ternaries
#
number_of_elements = 3 # 2 is binary, 3 is ternary and so on
qb = QueryBuilder()
qb.append(
    StructureData,
    filters={
        'attributes.symbols': {'longer': number_of_elements-1} # at least number_of_elements needed in the symbols list
    },
    project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing more than two atoms: {len(qb.all())}')

res = []
for struct in qb.iterall():
    if re.search('^'+'[A-Z][a-z]*[0-9]*'*number_of_elements+'$', struct[0]): # we indeed look for three elements, i.e. three capital letters followed, if needed, by lower case letters and numbers
        #print(f' SData in the DB containing exactly two atoms of element {element}: {struct[0]}, pk:{struct[1]}')
        res.append(struct)
print(f' Number of ternaries structures in the DB: {len(res)}')
print(f' These are: {[struct[0] for struct in res]}')
 Number of structures in the DB containing more than one atom: 130
 Number of binaries structures in the DB: 9
 These are: ['Mn6Sn2', 'Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn6Sn2', 'Mn6Sn2', 'Mn6Sn2', 'Mn6Sn2']
 Number of structures in the DB containing more than two atoms: 11
 Number of ternaries structures in the DB: 2
 These are: ['CoLiO2', 'CoLiO2']