Using ExtSym with your own powder decomposition program

ExtSym Home

Summary: This page describes the information needed to adapt your own powder decomposition software to work with ExtSym. To do this you first need to understand what input the ExtSym program requires. Most importantly, ExtSym does not work directly with raw powder data but requires the raw powder data in a processed format. This page details this processed format and explains how to generate it. For additional information see also Ref. [5].

This web page is structured as follows:

The processed powder data formats [subsections: "The .hkl format" and "The .hkl_weight format"]
How to calculate the weight matrix elements at the end of a le Bail or Pawley refinement
Fine tuning the data input to ExtSym
List of space groups with no systematic absences
History of this document
References

The processed powder data formats

ExtSym only allows powder data to be read into the program in two formats that are closely related:

.hkl format: The powder data that was originally used for testing the ExtSym algorithm came in this format. Therefore this format is the 'native' format for the ExtSym program.
.hkl_weight format: In connected with writing the ExtSym web pages, the ExtSym code was modified to allow for this additional format. In general, this format should be easier to generate than the .hkl format. ExtSym performs equally well with either formats.

The .hkl format

This format originates from the TF12LS program [2]. The .hkl format aims to summarize the output from a powder decomposition refinement and it contains:

The Lorentz-polarisation corrected, extracted intensities from the refinement
Estimated standard deviations and (optionally) normalised off-diagonal covariance matrix elements

To explain the .hkl format in detail let us first introduce the notation:

h and k each refer to one or more Bragg reflections. Most frequently h (or k) will represent just one Bragg reflection. However, when performing a le Bail or Pawley fit it is not uncommon to refine the sum of two or more Bragg intensities for reflections that are somehow judged to be sufficiently close to each other in the powder pattern. For such cases h (or k) represents the set of Bragg reflections making up such a refineable sum. For the purpose of this web page think of h (or k) as an index from 1 to P, where P is the number of le Bail or Pawley refined intensity values. Thus h=1 may represent the [0 0 1] reflection, h=2 the [0 0 2] reflection, h=3 the [1 0 0] and [0 1 0] reflections and so on.

Further, let C denote a covariance matrix with dimensions P x P and Chk an element of this matrix. A normalised covariance element is calculated as Bh,k =100 * Chk / (Chh*Ckk)^1/2 (see Eq. (9) in Ref. [1] for more on this, although notice there is a typographical error in that equation: -1/2 should be replaced by 1/2).

The .hkl format stores an upper band of the normalised matrix elements; that is the elements: (Cii)^1/2, B i,i+1, B i,i+2,...., B i,i+n-1, Bi,i+n for all i = 1, 2, ...., P and n<=N. This correspond to ignoring the Bragg peak overlap between any two intensities that are separated by more then n intensities.

More specifically the .hkl format consists of lines having the form:

hkl Intensity(h) Chh^1/2 h Bh,h+1 Bh,h+2 .... Bh,h+n-1 Bh,h+n

where

hkl: are the Miller indices of a Bragg reflection.
Intensity(h): is a refined integrated intensity including preferably the multiplicity factor.
Chh^1/2: is the square root of a diagonal covariance matrix element.
h: plays the role of a counter starting with h = 1 for the first line. See description of h (and k) above. When h represents more than one Bragg reflection then this results in the output of two or more consecutive identical lines except for different [h k l] values.
Bh,h+1 Bh,h+2 .... Bh,h+n-1 Bh,h+n: are normalised covariance elements measuring the correlation between the intensity values.

Shown below is an example of what this format might look like for a real dataset:

Figure 1: Output from the version of the Pawley program in Ref. [2] that form part of the DASH software.

The two lines highlighted in blue in Fig. 1 show an example where the DASH(TF12LS) program has decided not to treat the intensity for [1 0 0] and [-1 0 1] as two separate refineable variables but instead decided only to refine on the sum of these two intensities.

The output above was obtained from the powder decomposition of the Hydrochlorothiazide dataset described in Ref. [3]. A part of the powder pattern for this dataset is shown below for comparison with the Pawley output in Fig. 1:

Figure 2: Part of a dataset. The green tick marks at the top show the Bragg positions as they are listed in Fig. 1. Ignore the numbers above some of the peaks and the corresponding shaded areas.

The .hkl_weight format

For ExtSym to recognise that an input powder data file conforms to the .hkl format described above this file must have the extension .hkl. Similarly for ExtSym to recognise the format described in this subsection the input data file must have the extention .hkl_weight (this is not fantastic programming but was quick to implement). A .hkl_weight file presents a raw powder pattern in exactly the same way as a .hkl file except that the covariance matrix elements are stored in their inverse format, i.e. as weight matrix elements. To be precise a .hkl_weight file is required to conform to the .hkl format except for the following differences. A .hkl_weight file consists of lines:

hkl Intensity(h) Whh^1/2 h Xh,h+1 Xh,h+2 .... Xh,h+n-1 Xh,h+n

where

Whh^1/2: is the square root of a diagonal weight matrix element.
Xh,h+1 Xh,h+2 .... Xh,h+n-1 Xh,h+n: are normalised weight elements calculated using X hk =100 * W hk / (Whh*Wkk)^1/2.

How to calculate the weight matrix elements at the end of a le Bail or Pawley refinement

First introduce some notation in addition to the notation introduced at the beginning of the subsection of 'The .hkl format' above.

In a powder decomposition refinement the ith observed point in the diffraction pattern, yiobs, is compared against

where bi is the background value for the ith point in the diffraction pattern. Ih is the refineable intensity variable for Bragg reflection(s) h. It may contain the multiplicity, ph, or other factors depending on how fih is defined relative to Ih in a given powder decomposition software. fih should contain any contributions such that the yi calculated using the equation above can be successfully fitted to the yiobs. You might imagine that fih takes a form not too dissimilar from:

Equation 1 : The definition of fih determines the values of the refined integrated intensities.

where Lh is the Lorentz-polarisation factor, Aih asymmetry and Gih normalised profile shape function value (notation used here is taken from Ref. [4]).

Values for the integrated intensities, Ih, may be obtained by using either the Le Bail or Pawley technique. Either way, at the end of such a refinement weight matrix elements can be calculated using the expression

Equation 2 : Formula for a weight matrix element. For those familiar with least squares: the weight matrix is simply equal to the Hessian matrix of the linear least squares FOM, sum_i (yiobs-yi)^2/2, where the Ih are the parameters to be determined.

The sum is over points in the diffraction pattern and sigma_i is the estimated standard deviation for the ith observed value, yiobs, in the pattern.

The ExtSym algorithm makes the assumption that, for the part of the pattern used in the Le Bail or Pawley fitting, the expected average intensity at any point in the fitted part of the pattern is the same. For this reason it is better not to have the multiplicity included as part of the fih in Eq. [1] since the multiplicity times the statistical weight for each reflection in a pattern is a constant (see Bricogne 1991 and references therein). Further, the inclusion of any other terms in the expression for fih in Eq. [1] which would balance the mean intensity to be closer to constant should make ExtSym work better; for example, correcting for the intensity falloff due to the temperature effect. However, in practice, it is found that ExtSym is fairly robust to some changes in the definition of fih in Eq. [1]. For example, the .hkl files produced by DASH do not include the multiplicity in the intensity values, but this output when slightly modified, as described in the following section, works very well with ExtSym as is demonstrated in Table 1.

Fine tuning the data input to ExtSym

Once a translation script has been written for generating .hkl_weight (or .hkl) files from your favorite powder decomposition program, these files may with advantage be adjusted before used as input to ExtSym.

The extinction symbol probabilities calculated by ExtSym are strongly dependent on the ratios: 'intensity values' to 'covariance matrix element values' in comparison to the absolute values of these individually.

This is easy to understand by considering an isolated peak in a diffraction pattern that has one possible present Bragg reflection associated with it as illustrated in the figure below.

Figure 3: Shows a peak in a powder diffraction pattern with has one Bragg reflection associated with it.

Say an extracted intensity of I=1000 has been refined for this peak. This information by itself is not useful to ExtSym because the main purpose of the program is to make judgments about the probability of certain peaks being present or absent, and knowing only the Bragg intensity is not enough for this. From Fig. 3 we should agree that the reflection associated with the tick mark under the peak is extremely likely to be present! However, imagine you had never seen the data in Fig. 3 before and only given the information that the intensity is I=1000; this information alone does not tell you whether this Bragg reflection is present or perhaps absent. But, if you were given the information that I=1000±1 then this information tells you that the reflection is extremely likely to be present whereas I=1000±10000 informs you that this reflection may or may not be present. ExtSym is no different. It does not 'see' the raw powder data, only the extracted intensities and covariance matrix elements and the ratios of these number must be a fair representation of the raw powder data.

As an example: performing a Pawley refinement using DASH (TF12LS) will in general return a .hkl file containing accurate normalised off-diagonal covariance matrix elements but the squared (diagonal) elements in column 5 or the .hkl may not be on an accurate absolute overall scale. These two programs Pawley refine against the squared of the goodness-of-fit (GOF):

Equation 3 : chi2profile figure-of-merit formula.

where yi and yiobs are the ith calculated and observed data point in the diffraction pattern, N the number of such points and P the number of parameters which are refined. For a "perfect" fit GOF should close to one. However, perhaps due to awkward lineshapes, background, impurities, limitations of software etc. it is not uncommon that a best GOF ends up being say GOF=5. When calculating the covariance matrix by inverting the weight matrix in Eq. 2, the resulting matrix elements contain no history about the quality of the fit. By comparing Eq. 2 with Eq. 3 it is seen that these equations share the same denominator: 1/[sigma_i]^2. Thus, to partially include in the covariance matrix elements an awareness of not perfected fitted data by the decomposition refinement it is here suggested that the weight matrix is divided by (A*GOF)^2, which equates to multiplying the covariance matrix by the same number or multiplying the sigma_i's in Eq. 1 by A*GOF, where A is a parameter you unfortunately have to determine. It depends on the specific powder decomposition implementation you are using. See the ExtSym test web page for examples of how a good value for A can be determined. The value A*GOF is set in the ExtSym parameter input file advanced.asc. Note that, for a powder diffraction package that performs consistent good Pawley or Le Bail fit, and which follow the recommendations of this page you should expect to find a best performing value for A close to one.

Keep in mind that the above is only a suggestion for how to include in the covariance matrix a reminder about a possible refinement misfit; it is expected to be more accurate when the misfit is distributed throughout the pattern rather than being a large discrepancy in a narrow region of the pattern.

List of Space group with no systematic absences

A .hkl (or .hkl_weight) file should be prepared from a refinement of the diffraction pattern in a space group that has no systematic absences. Examples of such space groups are listed below for the various crystal systems:

Monoclinic with unique axis b: P 1 2 1, P 1 m 1, P 1 2/m 1.
Orthorhombic: P 2 2 2, P m m 2, P m 2 m, P 2 m m, P m m m.
Tetragonal: P 4, P bar4, P 4/m, P 4 2 2, P 4 m m, P bar4 2 m, P bar4 m 2, P 4/m m m.
Trigonal: P 3, P bar3, P 3 2 1, P 3 m 1, P bar3 m 1, P 3 1 2, P 3 1 m, P bar3 1 m.
Hexagonal: P 6, P bar6, P 6/m, P 6 2 2, P 6 m m, P bar6 2 m, P bar6 m 2, P 6/m m m.
cubic: P 2 3, P m bar3, P 4 3 2, P bar4 3 m, P m bar3 m.

History or this document

The first version of this document was completed 13th Nov. 2006.
Moved the secion 'Content of downloadable zip file' into a new web page and added new section 'How to calculate the weight matrix elements, completed 5th June 07.
Added suggestions for corrections by Tom Griffin, 8th June 2007.
Corrected a topo in an equation and other topos in section headings etc. 14th February 2008.
Made section "How to calculate the weight matrix elements..." read better. 7th August 2008.
Added ref to follow up ExtSym paper. 24th November 2008.

References

A. J. Markvardsen, W. I. F. David, J. Johnston and K. Shankland, Acta Cryst. A57, 47 (2001)
David, W.I.F., Ibberson, R.M. and Matthewman, J.C. (1992). Rutherford Appleton Laboratory Report: RAL-92-032
David, W. I. F., Shankland, K., Cole, J., Maginn, S., Motherwell, W. D. S. & Taylor, R. (2001). DASH user manual. Cambridge Crystallographic Data Centre, Cambridge, UK.
C. Giacovazzo. Direct Phasing in Crystallography. IUCr Monographs on Crystallography 8. Oxford University Press 1998.
A. J. Markvardsen, K. Shankland, W. I. F. David, J. Johnston, R. M. Ibberson, M. Tucker, H. Nowell and T. Griffin, J. Appl. Cryst. 41, 1177 (2008).