Open Source Heartbeat: Open Source Heartbeat

Open Source Heartbeat

vsoch commented on issue containers/containerimage-py#12.

It’s up to you! It never hurts to start a discussion….

View Comment

vsoch opened a pull request to compspec/fractale

View Pull Request

vsoch pushed to singularityhub/shpc-registry

Merge pull request #318 from singularityhub/update/containers-2025-04-21

[bot] update/containers-2025-04-21</small>

View Commit

vsoch merged a pull request to converged-computing/google-performance-study

View Pull Request

vsoch pushed to rseng/software

Merge pull request #419 from rseng/update/software-2025-04-20

Update from update/software-2025-04-20</small>

View Commit

vsoch commented on issue kubeflow/trainer#2459.

More granularity is definitely useful - did you write down the use cases for this from the call or elsewhere?…

View Comment

vsoch closed issue flux-framework/flux-docs#299.

The double copyright

ImageView Comment

vsoch pushed to converged-computing/google-performance-study

analysis; update plots to include 128 nodes

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/flux-tutorials

Add zenodo doi

View Commit

vsoch pushed to compspec/fractale

feat: graph solver backend

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to compspec/fractale

View Pull Request

vsoch commented on issue skypilot-org/skypilot#3751.

I don’t think there was interest here - I closed the PR and I’ll close the issue too. I still think the design to support deployment of a cluster to Kubernetes has feet, but the code base would need some work for that. …

View Comment

vsoch pushed to conda-forge/deid-feedstock

Merge pull request #49 from regro-cf-autotick-bot/0.4.2_had277d

deid v0.4.2</small>

View Commit

vsoch merged a pull request to flux-framework/flux-framework.github.io

View Pull Request

vsoch merged a pull request to converged-computing/google-performance-study

View Pull Request

vsoch pushed to converged-computing/flux-apps-helm

Merge pull request #21 from converged-computing/update-recipes

fix uid for pairs runs</small>

View Commit

vsoch commented on issue spack/spack#49893.

woot! Thanks all. …

View Comment

vsoch pushed to singularityhub/shpc-registry

Merge pull request #317 from singularityhub/update/containers-2025-04-17

[bot] update/containers-2025-04-17</small>

View Commit

vsoch commented on issue pydicom/deid#277.

Thank you for the reminder! …

View Comment

vsoch commented on kubernetes-sigs/lws

View Comment

vsoch closed a pull request to flux-framework/spack

View Pull Request

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #147 from flux-framework/add-coral2

add: flux-coral2</small>

View Commit

vsoch commented on issue flux-framework/flux-docs#298.

Thanks @jameshcorbett - will do. …

View Comment

vsoch commented on issue flux-framework/flux-core#6771.

What happened to me with spack is I tried it after they changed compilers, I think it went from being “a different thing” to more akin to a package, and the fix (for me) was to nuke my ~/.spack and start fresh….

View Comment

vsoch commented on issue flux-framework/flux-core#6738.

I think we can simply just do:…

View Comment

vsoch commented on issue spack/spack#49893.

I added a conflicts statement to flux-sched to reflect our discussion (thanks for that @tgamblin @trws) and (I think there is delay receiving the notification?) but all the jobs passed about 5 minutes ago. …

View Comment

vsoch pushed to singularityhub/shpc-registry

Merge pull request #316 from singularityhub/update/containers-2025-04-16

[bot] update/containers-2025-04-16</small>

View Commit

vsoch commented on kubernetes-sigs/lws

View Comment

vsoch commented on issue kubernetes/enhancements#4671.

I’d like to give feedback on that - we developed a similar solution (fluence) and at least have awareness to some of the issues. Whomever takes lead could you cc me on the relevant issues?…

View Comment

vsoch pushed to flux-framework/spack

flux-sched: build older flux-core

flux sched 0.38 was the first that required gcc version 12 or higher, and flux-core continued to build for some time, but eventually added features that we are now seeing break with sched 0.37 and the latest flux. This conflicts should ensure that older flux-sched, which is being built by having an older compiler, only builds with flux-core up to 0.68.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to flux-framework/flux-framework.github.io

View Pull Request

vsoch opened a pull request to flux-framework/flux-docs

View Pull Request

vsoch opened a pull request to converged-computing/flux-apps-helm

View Pull Request

vsoch pushed to compspec/compspec-modules

generate software type metadata

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to pydicom/deid

Bump version and add changelog entry

View Commit

vsoch reviewed a pydicom/deid pull request

None

View Review

vsoch reviewed a pydicom/deid pull request

This is ready for merge! The last final tweaks: …

View Review

vsoch commented on issue converged-computing/performance-study#86.

Thanks @wihobbs !

vsoch released 0.0.0.

Pre-release (or skeleton release) to coincide with MuMMI Experiment work, using Flux and Kubernetes.

What’s Changed

  • wip: add kubernetes operator by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/1
  • testing mummi by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/2
  • adding testing setup by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/3
  • refactor: trackers are generic by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/4
  • feat: add flux tracker by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/5
  • add timing wrappers by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/7
  • wip: updates for experiment by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/8
  • add automated build for manager by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/9
  • ensure we capture all times by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/10
  • failed jobs need to be wrapped in job by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/11
  • feat: support for custom node selector by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/12
  • remove nprocs by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/13
  • bug: failed jobs should not be considered active by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/14
  • allow for fail, meaning the job always succeeds by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/15
  • job base class by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/16
  • feat: node selector for manager by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/17
  • cleanup state machines by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/18
  • add support for oras arch for arm, etc. by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/19
  • feat: node timings by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/20
  • Add node timings by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/21
  • feat: add more resource specs to flux tracker job submit by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/22
  • bug: flux misses events by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/23
  • feat: save kubernetes logs. by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/24
  • feat: allow multiple node jobs by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/25
  • feat: analysis and plotting functions by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/26
  • wip: add support for workflow events by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/27
  • feat: allow variadic tasks/nodes by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/28
  • feat: add support for jobset by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/29

New Contributors

  • @vsoch made their first contribution in https://github.com/converged-computing/state-machine-operator/pull/1

Full Changelog: https://github.com/converged-computing/state-machine-operator/commits/0.0.0</small>View Comment

vsoch released 0.0.2.

MuMMI Operator Release to coincide with Zenodo record for MuMMI Experiments work.

What’s Changed

  • wip: mummi-operator python by @vsoch in https://github.com/converged-computing/mummi-operator/pull/7
  • so much fail by @vsoch in https://github.com/converged-computing/mummi-operator/pull/9
  • feat: mlrunner timing and simplified validator by @vsoch in https://github.com/converged-computing/mummi-operator/pull/10
  • cpu runs and remove mummi-operator python module by @vsoch in https://github.com/converged-computing/mummi-operator/pull/11
  • bug with running/queued calculation by @vsoch in https://github.com/converged-computing/mummi-operator/pull/12
  • queued pods are not failed pods by @vsoch in https://github.com/converged-computing/mummi-operator/pull/13
  • ensure we use nrpoc for cganalysis/createsims by @vsoch in https://github.com/converged-computing/mummi-operator/pull/14
  • feat: node selectors for mlserver registry, wfmanager by @vsoch in https://github.com/converged-computing/mummi-operator/pull/15

Full Changelog: https://github.com/converged-computing/mummi-operator/compare/0.0.1…0.0.2</small>View Comment

vsoch pushed to rseng/software

Merge pull request #418 from rseng/update/software-2025-04-13

Update from update/software-2025-04-13</small>

View Commit

vsoch pushed to rseng/rseng.github.io

Merge pull request #2 from 2xB/add-derse

Adding german RSE community</small>

View Commit

vsoch released 0.1.32.

## What’s Changed

  • Revert “Introducing version_naming property that makes modulefile as version

vsoch commented on issue spack/spack#49893.

I can comment that having built flux-sched and flux-core in a ton of environments, I either have needed to pin flux-sched to (at most) 0.37.0 and then flux-core to ~ 0.68.0, or use newer gcc and build the latest for both. Flux core seems pretty flexible to build with different versions of flux-sched - the breaks always happen with gcc/clang versions and sched. …

View Comment

vsoch commented on issue spack/spack#49893.

Ping @alecbcs what would you like to try next? The CI is outside of my spack expertise….

View Comment

vsoch commented on issue singularityhub/singularity-hpc#690.

No worries! Can you bump this one more to 0.1.32? Turns out 0.1.31 was already on pypi so your previous change hasn’t been released yet. The silver lining here is we can do that easily now….

View Comment

vsoch commented on pydicom/deid

View Comment

vsoch commented on issue pydicom/deid#277.

@ReeceStevens for black we have it pinned to black-23.3.0 - if you can pip install that version in an environment and run on the code, it should fix the failed test….

View Comment

vsoch pushed to flux-framework/spack

Automated deployment to update package flux-sched 2025-04-09 (#321)

Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch reviewed a singularityhub/singularity-hpc pull request

ok, this looks good then! Please bump the version and add a note to the CHANGELOG.md and we should be good to merge….

View Review

vsoch pushed to vsoch/vsoch.github.io

add container pulling study to cv

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue spack/spack#49893.

@alecbcs let us know what you’d like to try. I’m not familiar with the spack CI hairball and why flux-sched is building at such an old version. https://gitlab.spack.io/spack/spack/-/jobs/16100298…

View Comment

vsoch pushed to flux-framework/spack

cuDDN: Add versions 9.5.1, 9.6.0, 9.7.1 and 9.8.0 (#49789)

View Commit

vsoch opened a pull request to sciworks/spack-updater

View Pull Request

vsoch pushed to rseng/software

Merge pull request #417 from rseng/update/software-2025-04-06

Update from update/software-2025-04-06</small>

View Commit

vsoch pushed to flux-framework/spack

Do not pin py-packaging

View Commit

vsoch pushed to flux-framework/spack

kokkos: allow using new gfx942_apu arch (#48609)

Add an apu variant that promotes GPU architectures to their APU equivalent. Right now this is just gfx942 -> gfx942_apu.</small>

View Commit

vsoch commented on issue flux-framework/flux-core#6738.

https://github.com/spack/spack/pull/49893…

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: add support for jobset (#29)

  • feat: add support for jobset

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to spack/spack

View Pull Request

vsoch merged a pull request to flux-framework/spack

View Pull Request

vsoch pushed to flux-framework/spack

Add py-tf-keras package, upgrade TFP (#43688)

  • enh: add tf-keras package, upgrade TFP

  • chore: remove legacy deps

  • chore: fix style

  • chore: fix style

  • fix: url

  • fix: use jax, tensorflow instead of py-jax, py-tensorflow

  • fix: remove typo

  • Update var/spack/repos/builtin/packages/py-tensorflow-probability/package.py

Co-authored-by: Adam J. Stewart ajstewart426@gmail.com

  • fix: typos

  • fix: swap version

  • fix: typos

  • fix: typos

  • fix: typos

  • chore: use f strings

  • enh: move tf-keras to pypi

  • [@spackbot] updating style on behalf of jonas-eschle

  • fix: t

  • enh: add tf-keras package, upgrade TFP

  • chore: remove legacy deps

  • chore: fix style

  • chore: fix style

  • fix: url

  • fix: use jax, tensorflow instead of py-jax, py-tensorflow

  • fix: remove typo

  • Update var/spack/repos/builtin/packages/py-tensorflow-probability/package.py

Co-authored-by: Adam J. Stewart ajstewart426@gmail.com

  • fix: typos

  • fix: swap version

  • fix: typos

  • fix: typos

  • fix: typos

  • chore: use f strings

  • enh: move tf-keras to pypi

  • [@spackbot] updating style on behalf of jonas-eschle

  • enh: move tf-keras to pypi

  • enh: move back to releases to make it work, actually

  • enh: move back to releases to make it work, actually

  • fix:change back to tar…

  • Fix concretisation: py-tf-keras only has 2.17, not 2.16, fix checksum

  • enh: add TFP 0.25

  • enh: add tf-keras 2.18

  • chore: fix style

  • fix: remove patch

  • maybe fix license

  • Update var/spack/repos/builtin/packages/py-tf-keras/package.py

Co-authored-by: Adam J. Stewart ajstewart426@gmail.com

  • fix: pipargs global?

  • Update var/spack/repos/builtin/packages/py-tf-keras/package.py

Co-authored-by: Wouter Deconinck wdconinc@gmail.com

  • chore: fix formatting

  • chore: fix formatting again

  • fix: pathes in spack

  • fix: typo

  • fix: typo

  • use github package

  • use pip install

  • fix typo

  • fix typo

  • comment 2.19 out

  • fix typo

  • fix typo

  • fix typo

  • chore: remove unused patch file

  • chore: cleanup

  • chore: add comment about TF version

  • chore: remove unused Bazel, cleanup imports

  • [@spackbot] updating style on behalf of jonas-eschle

  • chore: add star import, degrading readability


Co-authored-by: Adam J. Stewart ajstewart426@gmail.com Co-authored-by: jonas-eschle jonas-eschle@users.noreply.github.com Co-authored-by: Bernhard Kaindl contact@bernhard.kaindl.dev Co-authored-by: Bernhard Kaindl bernhardkaindl7@gmail.com Co-authored-by: Wouter Deconinck wdconinc@gmail.com</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

feat: resnet model running and completing

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue oras-project/oras-py#187.

I would be happy to review a PR with the fix. Thanks for catching this!…

View Comment

vsoch commented on spack/spack

View Comment

vsoch pushed to flux-framework/spack

Automated deployment to update flux-sched versions 2025-04-01 (#311)

Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #145 from flux-framework/release-docs-2025-04-01

Update from release-docs-2025-04-01</small>

View Commit

vsoch created a new tag, 0.2.28 at oras-project/oras-py

View Repository

vsoch opened a pull request to converged-computing/state-machine-operator

View Pull Request

vsoch pushed to conda-forge/oras-py-feedstock

updated v0.2.28 (#33)

View Commit

vsoch pushed to sciworks/spack-updater

compilers are now packages

View Commit

vsoch pushed to rseng/software

Merge pull request #416 from rseng/update/software-2025-03-30

Update from update/software-2025-03-30</small>

View Commit

vsoch pushed to converged-computing/performance-study

Merge pull request #85 from converged-computing/azure-osu-reruns

osu re-runs - not a success</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

wip: add support for workflow events (#27)

  • wip: add support for workflow events

This will add support for ending the workflow early due to a count of successes, failures, or job duration metric. We need to next add ability to grow or shrink (need to think about how to do that, since we want a cloud agnostic solution) and then how to handle application specific metrics

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • feat: add support for minicluster

If we really want to test scale (shrink and grow) of a job and have it work with the cluster autoscaler, plus collecting metrics from an HPC app, we can most easily do that with the flux operator. This feature adds support for specifying a minicluster property to convert the previous indexed job into a MiniCluster. The flux operator needs to be installed.

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • feat: shrink with flux minicluster example working.

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • save state

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • feat: support for custom metrics

In this example, the user is allowed to provide a custom script that will be used against the log, and it needs to return a dictionary of values (the custom metrics). These are passed back to the manager from the state machine step and can influence workflow behavior (e.g., stop early, grow, or shrink.

Signed-off-by: vsoch vsoch@users.noreply.github.com


Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #305 from singularityhub/update/containers-2025-03-27

[bot] update/containers-2025-03-27</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

feat: shrink with flux minicluster example working.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue oras-project/oras-py#185.

I’d be happy to review a PR that adds this functionality then….

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: analysis and plotting functions (#26)

  • feat: analysis and plotting functions
  • ensure x axis is same scale
  • add analysis libfuncs

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue vsoch/oci-python#23.

Thanks! I remember this bit me for other projects, I appreciate the catch here….

View Comment

vsoch commented on spack/spack

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: allow multiple node jobs

There is a bug in the kubernetes tracker that we treat the failed/succeeded as boolean (0/1) when it is actually a count of indices. We have not done experiments with >1 nodes so this has not been an issue (or caught). This change will fix it.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/performance-study

Merge pull request #84 from converged-computing/redo-osu

osu: fix runs for gpu 128 GKE and CE</small>

View Commit

vsoch commented on issue oras-project/oras-py#185.

Is this supported for the oras client in Go?…

View Comment

vsoch opened a pull request to spack/spack

View Pull Request

vsoch commented on issue skypilot-org/skypilot#3777.

Closing for no interest….

View Comment

vsoch pushed to rseng/software

Merge pull request #415 from rseng/update/software-2025-03-23

Update from update/software-2025-03-23</small>

View Commit

vsoch pushed to flux-framework/spack

bug: cffi needs to be present for link (configure) (#308)

Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue oras-project/oras-py#164.

Are we good to close here?…

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: save kubernetes logs.

We have been saving artifacts for everything, relying on the application to take the burden of saving its own logging retrieved from the registry. For experiments with gpu selection we just need one little value, and I think it would be easier to save all the logs instead of using oras. This feature supports that, where the user adds a properties -> save-path, and under that path “logs” is created that is named by the job, step, and pod index.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to conda-forge/oras-py-feedstock

oras-py v0.2.27 (#32)

  • updated v0.2.27

  • MNT: Re-rendered with conda-build 25.1.2, conda-smithy 3.47.0, and conda-forge-pinning 2025.03.21.21.56.39</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #304 from singularityhub/update/containers-2025-03-20

[bot] update/containers-2025-03-20</small>

View Commit

vsoch pushed to flux-framework/spack

re-enable flux checks

View Commit

vsoch pushed to converged-computing/state-machine-operator

bug: flux failed jobs do not have status COMPLETED, they are FAILED

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue pydicom/deid#275.

Closed with #276 …

View Comment

vsoch commented on issue flux-framework/flux-core#6713.

I couldn’t say now - I wound up sending a kill signal to the job, and didn’t save the data because I considered the run erroneous!…

View Comment

vsoch pushed to conda-forge/deid-feedstock

Merge pull request #48 from regro-cf-autotick-bot/0.4.1_h47750b

deid v0.4.1</small>

View Commit

vsoch commented on issue pydicom/deid#276.

I can see the output above and the logic in the code, so no need. I think this is good to go - if you could please bump the version in version.py and add a corresponding note in the CHANGELOG.md we should be good….

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: add more resource specs to flux tracker job submit

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to spack/spack

View Pull Request

vsoch merged a pull request to singularityhub/shpc-registry

View Pull Request