In the following we presents first how StructureData
nodes are store in the AiiDA database and then how to user the QueryBuilder
to find them.
How properties are stored in the AiiDA database¶
In the AiiDA database, we store only properties which are defined to be different from the default value. For example, if we provide all the charges to be zero, the charges
properties will not be stored in the database. This means that the only structures which are in the database and contains also a charges entry, will be about systems with charged different from zero (NB can also be overall neutral).
For a given structure, you can print the list of all the properties stored in the database (and so, queryable), by calling the get_defined_properties
method and providing exclude_computed=False
as input (i.e. returning also properties which are computed/derived from the user-defined ones):
from aiida import load_profile
from aiida_atomistic import StructureData
load_profile()
structure = StructureData(**{
'pbc': [True, True, True],
'cell': [[2.75, 2.75, 0.0], [0.0, 2.75, 2.75], [2.75, 0.0, 2.75]],
'symbols': ['Si', 'Si'],
'charges': [0.0, 0.0],
'kinds': ['Si', 'Si'],
'positions': [
[0.0, 0.0, 0.0],
[3.84, 1.3576450198781713, 1.9200]
],
})
print("Properties stored in the database for this StructureData are:")
print(structure.get_defined_properties(exclude_computed=False))
Properties stored in the database for this StructureData are:
{'cell', 'has_vacancies', 'formula', 'symbols', 'positions', 'cell_volume', 'is_alloy', 'kinds', 'dimensionality', 'sites', 'masses'}
/opt/conda/lib/python3.10/site-packages/pydantic/main.py:390: UserWarning: Pydantic serializer warnings:
Expected `dict[any, any]` but got `bool` with value `False` - serialized value may not be as expected
return self.__pydantic_serializer__.to_python(
The only exception will be the sites
property, which is contained in this list but not stored in the database: this is computed on-the-fly and it does not contains additional information meant to be in the database.
Note
To explicitely see how data are stored and represented in the database, you can access the structure.base.attributes.all
dictionary.
As you can see, not sites
entry is present.
print(structure.base.attributes.all.keys())
returns:
dict_keys(['cell', 'symbols', 'positions', 'kinds', 'masses', 'cell_volume', 'dimensionality', 'formula'])
How to Query StructureData using properties¶
In the following we present how to perform advanced query in you database to retrieve StructureData
with given properties. The results of the queries will be different if you try them on your local AiiDA installation: databases are different!
Exhaustive AiiDA documentation on how to find and query data can be found in the official docs page. For a full list of filters that can be applied to queries, we refer to this table.
Thanks to the additional computed properties in our StructureData
(formula, symbols, kinds, masses, charges, magmoms, positions, cell_volume, dimensionality and so on), we can easily query for a structure which satisfies a specific set of requirements. The full list of queryable properties can be printed as:
StructureData().get_supported_properties()
{'cell',
'cell_charge',
'cell_magmom',
'charges',
'custom',
'hubbard',
'kinds',
'magmoms',
'masses',
'pbc',
'positions',
'symbols',
'weights'}
Simple queries¶
Let’s start with simple queries. The first and simplest one is to query all the StructureData
which are contained in our database:
from aiida import load_profile
from aiida.orm import QueryBuilder
load_profile()
# Querying all structures in the DB
qb = QueryBuilder()
qb.append(StructureData)
print(f"We have {len(qb.all())} StructureData in our AiiDA database!")
We have 245 StructureData in our AiiDA database!
StructureData having only selected properties¶
This is one of the most crucial query that we may want to do. Suppose you want only StructureData
where some atom is charged, or some magnetic moment is provided.
Efficient queries can be done as follows:
# 1 in general, we have a given property or not?
#
prop = 'charges'
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes': {'has_key': prop}
}, )
print(f' Number of structures having the property {prop}: {len(qb.all())}')
prop = 'magmoms'
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes': {'has_key': prop}
}, )
print(f' Number of structures having the property {prop}: {len(qb.all())}')
# 2. we have a given set of properties or not?
#
props = ['charges', 'magmoms']
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes': {'and': [{'has_key': prop} for prop in props]}
}, )
print(f' Number of structures having the properties {props}: {len(qb.all())}')
# 3. we have a given property and not another one?
#
props = ['charges', 'magmoms']
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes': {'and': [{'has_key': props[0]}, {'!has_key': props[1]}]}
}, )
print(f' Number of structures having the property {props[0]} and not the property {props[1]}: {len(qb.all())}') # is indeed the difference of first and third queries? yes
Number of structures having the property charges: 19
Number of structures having the property magmoms: 103
Number of structures having the properties ['charges', 'magmoms']: 6
Number of structures having the property charges and not the property magmoms: 13
we can also ask for StructureData
not having a specific property. To do this, we use the !
negation in the has_key
:
# does not have a given property
# is the sum of this and the second query (above) equal to the number of SData in you db?
#
prop = 'magmoms'
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes': {'!has_key': prop}
}, )
print(f' Number of structures not having the property {prop}: {len(qb.all())}')
Number of structures not having the property magmoms: 142
is the sum of the StructureData
parsed from this and the one searching for magmoms
equal to the total number of StructureData
in the database? The answer is clearly True, looking at the first, the third and this last query of this chapter.
Useful feature of the QueryBuilder
is the possibility to project some of the attributes of the queried StructureData
:
#Projecting only given properties
#
prop = 'charges'
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes': {'has_key': prop}
},
project= ['attributes.formula', 'attributes.'+prop, 'id'])
print(f'Formula: {qb.all()[-1][0]}\nCharges : {qb.all()[-1][1]}\npk: {qb.all()[-1][2]}')
Formula: Si2
Charges : [1.0, 0.0]
pk: 548
Structures with only given number of atoms¶
We may want to select only small systems, let’s less with less than six atoms, or larger systems. Let’s do it:
# given number of kinds or atoms: ok
# check shorter longer lists in the query table
#
# less than 6 atoms:
nr_atoms = 6
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.symbols': {'shorter': nr_atoms}
},
)
print(f' Number of structures in the DB containing less than {nr_atoms} atoms: {len(qb.all())}')
# more than 5 atoms:
nr_atoms = 5
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.symbols': {'longer': nr_atoms}
},
)
print(f' Number of structures in the DB containing more than {nr_atoms} atoms: {len(qb.all())}')
# exactly 2 atoms
nr_atoms = 2
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.symbols': {'and': [ # same can be done with 'attribute.kinds'
{'shorter': nr_atoms+1},
{'longer': nr_atoms-1},
],
},
}
)
print(f' Number of structures in the DB containing exactly {nr_atoms} atoms: {len(qb.all())}')
Number of structures in the DB containing less than 6 atoms: 203
Number of structures in the DB containing more than 5 atoms: 5
Number of structures in the DB containing exactly 2 atoms: 119
Alloys and vacancies¶
We can find alloys and/or structure with vacancies by looking for the attributes.is_alloy
and/or attributes.has_vacancies
to be True
.
Advanced queries: stoichiometry, binaries, ternaries¶
We may want to search for a specific system, i.e. with a given formula, or binaries, ternaries, and so on.
Using the chemical formula¶
# specific formula
#
formula = 'Fe2'
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.formula': formula, # same as: 'attributes.formula': {'==': formula}
},
)
print(f' Number of structures in the DB with formula = {formula}: {len(qb.all())}')
Number of structures in the DB with formula = Fe2: 12
Looking for only a certain number of atoms of a given element¶
We can still look at the formula. In case of structure with more than one atoms of the same element, we can proceed as follows:
# certain number of atoms of the same element - still from the formula
#
element = 'Mn'
nr_atoms = 2
if nr_atoms == 1:
print("You are looking for all the structures having at least one atom of the desired element.")
nr_atoms = ""
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.formula': {
'like':f'%{element}{nr_atoms}%',
},
},
project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing {nr_atoms} atoms of element {element}: {len(qb.all())}')
print(f' These are: {[struct[0] for struct in qb.all()]}')
Number of structures in the DB containing 2 atoms of element Mn: 6
These are: ['Mn2', 'Mn2', 'Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn2O2']
However, this query will leave open the possibility also to retrieve formulas which contain Mn2
followed by a number (e.g. Mn20
). To do a more precise and robust query, we have to apply a programmatic post process of the results, as shown in what follows.
Moreover, in the specific case of only one atom of the same element, we cannot build the string f'%{element}{nr_atoms}%
, but we need to make sure that no numbers follow our element
string in the formula. This can be done by performing a post-processing on the results of the query, using regex
expressions:
import re
# exactly one atom of the same element
# we need a regex because we cannot search for {element}{number} in the formula, as we should look for {element}[^2-9]...
#
element = 'Mn'
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.formula': {
'like':f'%{element}%',
},
},
project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing element {element}: {len(qb.all())}')
res = []
for struct in qb.iterall():
if re.search(f'{element}[^2-9]', struct[0]):
print(f' structures in the DB containing exactly one atom of element {element}: {struct[0]}, pk:{struct[1]}')
res.append(struct)
print(f' Number of structures in the DB containing exactly one atom of element {element}: {len(res)}')
Number of structures in the DB containing element Mn: 11
Number of structures in the DB containing exactly one atom of element Mn: 0
Using this approach, we can indeed repeat the query done above for the two Mn atoms:
# exactly two atoms of the same element
#
element = 'Mn'
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.formula': {
'like':f'%{element}2%',
},
},
project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing element {element}: {len(qb.all())}')
res = []
for struct in qb.iterall():
if re.search(f'{element}2[^2-9]', struct[0]):
#print(f' SData in the DB containing exactly two atoms of element {element}: {struct[0]}, pk:{struct[1]}')
res.append(struct)
print(f' Number of structures in the DB containing exactly two atoms of element {element}: {len(res)}')
print(f' These are: {[struct[0] for struct in res]}')
Number of structures in the DB containing element Mn: 6
Number of structures in the DB containing exactly two atoms of element Mn: 4
These are: ['Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn2O2']
Binaries, ternaries¶
Also in this cases, we use a combination of the QueryBuilder
and regex
searches:
#
number_of_elements = 2 # 2 is binary, 3 is ternary and so on
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.symbols': {'longer': number_of_elements-1} # at least number_of_elements needed in the symbols list
},
project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing more than one atom: {len(qb.all())}')
res = []
for struct in qb.iterall():
if re.search('^'+'[A-Z][a-z]*[0-9]*'*number_of_elements+'$', struct[0]): # we indeed look for two elements, i.e. two capital letters followed, if needed, by lower case letters and numbers
#print(f' structures in the DB containing exactly two atoms of element {element}: {struct[0]}, pk:{struct[1]}')
res.append(struct)
print(f' Number of binaries structures in the DB: {len(res)}')
print(f' These are: {[struct[0] for struct in res]}')
# ternaries
#
number_of_elements = 3 # 2 is binary, 3 is ternary and so on
qb = QueryBuilder()
qb.append(
StructureData,
filters={
'attributes.symbols': {'longer': number_of_elements-1} # at least number_of_elements needed in the symbols list
},
project= ['attributes.formula', 'id'],
)
print(f' Number of structures in the DB containing more than two atoms: {len(qb.all())}')
res = []
for struct in qb.iterall():
if re.search('^'+'[A-Z][a-z]*[0-9]*'*number_of_elements+'$', struct[0]): # we indeed look for three elements, i.e. three capital letters followed, if needed, by lower case letters and numbers
#print(f' SData in the DB containing exactly two atoms of element {element}: {struct[0]}, pk:{struct[1]}')
res.append(struct)
print(f' Number of ternaries structures in the DB: {len(res)}')
print(f' These are: {[struct[0] for struct in res]}')
Number of structures in the DB containing more than one atom: 130
Number of binaries structures in the DB: 9
These are: ['Mn6Sn2', 'Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn2O2', 'Mn6Sn2', 'Mn6Sn2', 'Mn6Sn2', 'Mn6Sn2']
Number of structures in the DB containing more than two atoms: 11
Number of ternaries structures in the DB: 2
These are: ['CoLiO2', 'CoLiO2']
Exercise
if you have quaternaries in your database, you can try to query them, following the above examples.