====================================
Copying artifacts between workspaces
====================================

Motivation
==========

It is useful in various situations to be able to copy artifacts between
workspaces.  For example, once we have repository hosting, we'll want a way
to copy packages between repositories as part of managing transitions, and
copying at least source packages between workspaces (and perhaps even
between scopes) would be a natural part of maintaining derivative
distributions.

To support security workflows, we need to be able to prepare artifacts in a
private ("embargoed") workspace and then copy them to somewhere public once
the embargo has expired.  Doing this should require some kind of intentional
flag: we don't want to make it too easy to break embargoes by accident.

Permission considerations
=========================

Copying artifacts requires both the ability to read from the source and the
ability to write to the destination (either directly or via a workflow).

After artifacts have been made public, it's helpful to be able to see the
work request that created them, without having to somehow also copy the work
request around.  To achieve this, the permission predicate that checks
whether a user can see a work request may check whether any of the artifacts
produced by the work request are visible to that user, and return True in
that case even if the work request itself would not ordinarily be visible.

.. note::

   It may be surprising that this rule is "any of the artifacts produced by
   the work request" rather than "all of the artifacts produced by the work
   request"; but there isn't usually anywhere useful to copy
   :ref:`debusine:work-request-debug-logs
   <artifact-work-request-debug-logs>` artifacts to, and making only some of
   the artifacts produced by a work request public seems unlikely to be a
   realistic unembargoing use case.

While build logs may expose additional information not in the output
artifacts (such as build-dependencies where security updates are also being
prepared), similar information might easily be exposed by the output
artifacts themselves anyway, so the onus is on people who make artifacts
public to check that it is safe to do so.

Resource accounting considerations
==================================

We want to be able to track the resource usage of workspaces and scopes.  If
artifacts are copied between workspaces (and hence perhaps between scopes),
then the same files may exist in multiple workspaces, complicating this kind
of analysis.  The question is likely to be something along the lines of "how
much data does debusine need to store on behalf of this workspace or scope
that it would not otherwise need to store?".

A reasonable first cut would be to track the origin of copies, and to
account an artifact's files to a workspace (and its containing scope) if the
artifact is in that workspace and is no longer in its origin workspace.  We
therefore add a nullable ``Artifact.original_artifact`` foreign key, with
``on_delete=SET_NULL``.

Some other variations are possible, and are not made more difficult by this
design.  For example, we may wish to account for each workspace's usage
without considering whether files have been copied from or to other
workspaces (in which case the total file store size may be less than the sum
of the sizes of all workspaces); or to calculate the "unique" size of a
workspace as the total size of all files that appear only in that workspace.

.. _task-copy-collection-items:

CopyCollectionItems task
========================

This server task copies items into given target collections, which may or
may not be in the same workspace as the original items.  It returns an error
if:

* the user/workflow that created the task does not have permission to read
  the items or to write to the target collection
* any of the items is a collection
* ``unembargo`` is False, any of the items are in a private workspace, and
  the target collection is in a public workspace
* the collection manager fails to add the items (e.g. because they are
  incompatible with the collection)

The ``task_data`` for this task may contain the following keys:

* ``copies``: a list of dictionaries as follows:

  * ``source_items`` (:ref:`lookup-multiple`, required): a list of items to
    copy (as usual for lookups, these may be collection items or they may be
    artifacts looked up directly by ID)
  * ``target_collection`` (:ref:`lookup-single`, required): the collection
    to copy items into
  * ``unembargo`` (boolean, defaults to False): if True, allow copying from
    private to public workspaces
  * ``replace`` (boolean, defaults to False): if True, replace existing
    similar items
  * ``name_template`` (string, optional): template used to generate the name
    for the target collection item, using the ``str.format`` templating
    syntax (with variables inside curly braces)
  * ``variables`` (dictionary, optional): pass these variables when adding
    items to the target collection; if a given source item came from a
    collection, then this is merged into the per-item data from the
    corresponding source collection item, with the values given here taking
    priority in cases of conflict

For each of the entries in ``copies``, the task copies the source items to
the target collection's workspace; when copying artifacts, if the contained
files are already in one of that workspace's file stores, then it copies
references to them, and otherwise it copies the file contents.  For each
source item, it then adds a collection item to the target collection, using
``name_template`` and ``variables`` in the same way as in
:ref:`action-update-collection-with-artifacts`.

All the requested copies happen in a single database transaction; if one of
them fails then they are all rolled back.

.. _workflow-publish:

Workflow ``package_publish``
============================

This workflow publishes source and/or binary packages to a given target
suite.  It is normally expected to be used as a sub-workflow.

* ``task_data``:

  * ``source_artifact`` (:ref:`lookup-single`, optional): a
    ``debian:source-package`` or ``debian:upload`` artifact representing the
    source package (the former is used when the workflow is started based on
    a ``.dsc`` rather than a ``.changes``)
  * ``binary_artifacts`` (:ref:`lookup-multiple`, optional): a list of
    ``debian:upload`` artifacts representing the binary packages
  * ``target_suite`` (:ref:`lookup-single`, optional): the ``debian:suite``
    collection to publish packages to
  * ``unembargo`` (boolean, defaults to False): if True, allow publishing
    artifacts from private workspaces to public suites
  * ``replace`` (boolean, defaults to False): if True, replace existing
    similar items
  * ``suite_variables`` (dictionary, optional): pass these variables when
    adding items to the target suite collection; if a given source or binary
    artifact came from a collection, then this is merged into the per-item
    data from the corresponding collection item, with the values given here
    taking priority in cases of conflict; see :ref:`debian:suite
    <collection-suite>` for the available variable names

At least one of ``source_artifact`` and ``binary_artifacts`` must be set.

The workflow creates a :ref:`task-copy-collection-items`.  The ``copies``
field in its task data is as follows:

* ``source_items``: the union of whichever of ``{source_artifact}`` and
  ``{binary_artifacts}`` are set
* ``target_collection``: ``{target_suite}``
* ``unembargo``: ``{unembargo}``
* ``replace``: ``{replace}``
* ``variables``: ``{suite_variables}``

Any of the lookups in ``source_items`` may result in :ref:`promises
<bare-data-promise>`, and in that case the workflow adds corresponding
dependencies.

If the source and target workspaces have different instances of the
:ref:`debian:package-build-logs <collection-package-build-logs>` collection,
then the workflow also adds an entry to ``copies`` as follows:

* ``source_items``:

  .. code-block:: yaml

      collection: {source build logs collection}
      lookup__same_work_request: {binary_artifacts}

* ``target_collection``: target build logs collection
* ``unembargo``: ``{unembargo}``
* ``replace``: ``{replace}``

If the source and target workspaces have different instances of the
:ref:`debusine:task-history <collection-task-history>` collection, then the
workflow also adds an entry to ``copies`` as follows:

* ``source_items``:

  .. code-block:: yaml

      collection: {source task history collection}
      lookup__same_workflow: {binary_artifacts}

* ``target_collection``: target task history collection
* ``unembargo``: ``{unembargo}``
* ``replace``: ``{replace}``
