BIAD is a relational database with the primary objective of organizing data with respect to human cultural activity. Crucial to BIAD is the need for intelligent decisions by the data team to aggregate raw data at a meaningful level. Therefore BIAD is not a repository for datasets with variety of structures. At the top of the relational hierarchy sits sites. Below this phases represent discrete temporal ‘layers’ of human activity associated with a specific cultural unit. These phases are associated with a specific culture group, but in rare cases only with a broad period. The concept of the phase is central to BIAD, despite the difficulties in codifying a precise definition. Most sites have only a single phase. Many have several, each assigned a different culture group. A few sites have multiple phases assigned the same culture group, since the data differences between these phases were too important to warrant aggregation. Therefore all data in BIAD sits in (or below) a phase, with the single exception of some radiocarbon dates, that can only be associated with a site.
The bulk of the data in BIAD is spread across the standard tables.
These are look-up tables or options tables ('zoptions' is a convenience to when listing tables alphabetically) to constrain various values that can be entered into the main standard tables. Many columns on the standard tables are enumerated (for example 'yes' or 'no'), but it is often more convenient for longer options lists to provide a specific zoptions table instead. Typically these tables have very few (often only one) column.
In order to prepare a .csv data table for batch import into a BIAD table it is crucial to ensure the correct columns. This can be easily achieved by either downloading an empty version of the relevant table directly from BIAD, or alternatively by downloading from the github which includes a single sample row to assist. Note this sample row is a deliberate combination of real and toy data.
The complete representation of all relationships between all tables is complex:
Instead the structure may be more easily understood by separating the overall database into small groupings or clades:
Fundamental to BIAD is the tricky concept of a phase. Phases are discrete 'space-time' blobs of data. The 'space' component is more easily defined as the latitude and longitude coordinates of the archaeological site (although even defining the point at which a series of pits can be considered a single site or several sites is itself a moot issue). Phases therefore, are discrete temporal aggregations at a site, loosely related to stratigraphic layers. Most sites will only have a single phase, and it is rare indeed to have more than a hanful of phases. As a minimum, phases should have a period assigned, and usually a cultural assignment and reported (published) date ranges.
The assignment of a phase is usually the most time consuming component of datamining source publications into BIAD, and requires careful expert interpretation of the primary source. Furthermore, these assignments are subject to change as new data is added to BIAD.
For a simple example, consider inputting data from a site report that describes material from an early phase during the Early Bronze Age (EBA), and more material from a clearly later phase (stratographically distinct) established as from the Early Neolithic (EN). We can add the phase-specific data to the Phases table, and trivially assign this known order to the PhaseOrder table:
Consider the more challenging is the case where different site reports provide different data at different temporal scales. Consider for example the plot below illustrating a site report providing Faunal data (blue) for three phases, whilst a different report provides Botanical data (red) for two phases:
Crucially, the phasing of the two reports are incongruent, and in order to store data in a way that maximises its usefulness we must make a decision of how best to adapt it. Since the sample size of the Faunal data during the MN is so small (n=2), the information content is poor, and the data are more useful if we amalgamate both the EN and MN into a single UN phase:
In contrast,consider the plot below illustrating a similar incongruence between two reports, but where there sample sizes are such that we cannot justify such amalgamations. Furthermore, We know that the EN, MN and EBA phases are ordered, we also know the UN and EBA phases are ordered, but we have no idea about the relative chronology between EN and UN, or between MN and UN:
In such a case, we store the ordering as follows:
A crucial part of adding any new data to BIAD is to first establish if they are already stored. For some data types it is impossible to accidentally import duplicates, for example 14C labcodes are a unique column. However, checking if a site or phase already exists is a non-trivial task and may require substantial human effort. For example, it is possible for two completely different sites to have been independently published with the same site name. Or, for the same site to have been excavated (and therefore published) twice, but with slightly different latitude and longitude coordinates (perhaps due to inaccuracies).
In such cases where there are slightly different data (such as coordinates) for what are clearly the same site or phase, the analyst must harmonise these data. This involves making a decision about how best to combine them. An easy example might be the case where one dataset derived its coordinates using the nearest village on a map rounded to 1dp, whilst the other provided 3dp precision from a satellite measurement on site.
Even more frustratingly, in rare cases a single site spanning several hundred metres may be published with one pair of coordinates, but subsequent exavation of the same site consider several pits as different sites, each with slightly different coordinates. This is not a huge problem per se, as downstream macro analysis will typically be looking at patterns across a broader area, so we can retain both data sources without needing to harmonize both.
However, a single data point (such as a 14C date) obviously cannot belong to two different sites, so as a general rule of thumb we should favour the more detailed or later publication.
Other relationships exist in BIAD that are difficult to intuit from a hierarchical relationship diagram. For example, it is possible for data in two or more tables to have come from the same 'item'. This item might be a human individual, and an animal tooth, or an entire grave.
ItemIDs are intended to build bridges between different data types in order to show links and relationships between the data that has been gathered. In this way, an ItemID is similar to a closest common denominator between two data types.
The principle of its use can be explained in the following examples.
Consider the (hypothetical) case of the archaeological remains of a single human individual (ID = ind01), from which the following data have been collected:
The hierarchical structure of BIAD ensures that the Strontium data must already be associated with the correct individual, however C14 dates are not hierarchically associated with an individual (since they are not necessarily from a human sample):
Therefore we must assign ItemIDs to ensure:
It is worth considering the problem with a Venn diagram, which illustrates that c01 and sr01 are data from the same tooth (upper right M1), whilst c01, sr01 and c02 are data from the same individual (ind01):
The workflow is to always start from the lowest aggregation - in this case the tooth, then the individual. Firstly, we generate an ItemID (ID = it01) in the Items table, and assign it to both c01 and sr01. Since the strontium value is already associated with the correct individual, this ensures c01 also inherits the association with ind01. Secondly, we generate an ItemID (ID = it02) in the Items table, and assign it to both c02 and ind01.
Consider the (hypothetical) case of the archaeological remains from a single grave (gr01) which contains two individuals (ind01 and ind02) and a dog, from which the following data has been collected:
It is helpful to first consider the Venn diagram:
Starting from the lowest aggregation (the M1 tooth) we generate an ItemID (ID = it01) in the Items table and assign it to both the radiocarbon date c01 and the nitrogen δ15N value n01, telling us that c01 and n01 came from the same item (the M1 tooth).
The next aggregation is the individual ind01. We generate an ItemID (ID = it02) in the Items table and assign it to the radiocarbon date c02 and ind01. Remember the M1 tooth data (c01 and n01) are now already hierarchically associated with ind01, and the grave gr01.
Furthermore, the strontium 87Sr/86Sr value sr01 must already be associated with the correct individual ind02, and therefore hierarchically with grave gr01.
The final aggregation is the grave gr01, to ensure the radiocarbon date c03 is associated with it. Therefore we generate an ItemID (ID = it03) in the Items table and assign it to both c03 and to gr01.