vsoch commented on issue containers/containerimage-py#12.
It’s up to you! It never hurts to start a discussion….
vsoch opened a pull request to compspec/fractale
vsoch pushed to singularityhub/shpc-registry
Merge pull request #318 from singularityhub/update/containers-2025-04-21
[bot] update/containers-2025-04-21</small>
vsoch merged a pull request to converged-computing/google-performance-study
vsoch pushed to rseng/software
Merge pull request #419 from rseng/update/software-2025-04-20
Update from update/software-2025-04-20</small>
vsoch commented on issue kubeflow/trainer#2459.
More granularity is definitely useful - did you write down the use cases for this from the call or elsewhere?…
vsoch closed issue flux-framework/flux-docs#299.
The double copyright
vsoch pushed to converged-computing/google-performance-study
analysis; update plots to include 128 nodes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-tutorials
Add zenodo doi
vsoch pushed to compspec/fractale
feat: graph solver backend
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to compspec/fractale
vsoch commented on issue skypilot-org/skypilot#3751.
I don’t think there was interest here - I closed the PR and I’ll close the issue too. I still think the design to support deployment of a cluster to Kubernetes has feet, but the code base would need some work for that. …
vsoch pushed to conda-forge/deid-feedstock
Merge pull request #49 from regro-cf-autotick-bot/0.4.2_had277d
deid v0.4.2</small>
vsoch merged a pull request to flux-framework/flux-framework.github.io
vsoch merged a pull request to converged-computing/google-performance-study
vsoch pushed to converged-computing/flux-apps-helm
Merge pull request #21 from converged-computing/update-recipes
fix uid for pairs runs</small>
vsoch commented on issue spack/spack#49893.
woot! Thanks all. …
vsoch pushed to singularityhub/shpc-registry
Merge pull request #317 from singularityhub/update/containers-2025-04-17
[bot] update/containers-2025-04-17</small>
vsoch commented on issue pydicom/deid#277.
Thank you for the reminder! …
vsoch closed a pull request to flux-framework/spack
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #147 from flux-framework/add-coral2
add: flux-coral2</small>
vsoch commented on issue flux-framework/flux-docs#298.
Thanks @jameshcorbett - will do. …
vsoch commented on issue flux-framework/flux-core#6771.
What happened to me with spack is I tried it after they changed compilers, I think it went from being “a different thing” to more akin to a package, and the fix (for me) was to nuke my ~/.spack
and start fresh….
vsoch commented on issue flux-framework/flux-core#6738.
I think we can simply just do:…
vsoch commented on issue spack/spack#49893.
I added a conflicts statement to flux-sched to reflect our discussion (thanks for that @tgamblin @trws) and (I think there is delay receiving the notification?) but all the jobs passed about 5 minutes ago. …
vsoch pushed to singularityhub/shpc-registry
Merge pull request #316 from singularityhub/update/containers-2025-04-16
[bot] update/containers-2025-04-16</small>
vsoch commented on issue kubernetes/enhancements#4671.
I’d like to give feedback on that - we developed a similar solution (fluence) and at least have awareness to some of the issues. Whomever takes lead could you cc me on the relevant issues?…
vsoch pushed to flux-framework/spack
flux-sched: build older flux-core
flux sched 0.38 was the first that required gcc version 12 or higher, and flux-core continued to build for some time, but eventually added features that we are now seeing break with sched 0.37 and the latest flux. This conflicts should ensure that older flux-sched, which is being built by having an older compiler, only builds with flux-core up to 0.68.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to flux-framework/flux-framework.github.io
vsoch opened a pull request to flux-framework/flux-docs
vsoch opened a pull request to converged-computing/flux-apps-helm
vsoch pushed to compspec/compspec-modules
generate software type metadata
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to pydicom/deid
Bump version and add changelog entry
vsoch reviewed a pydicom/deid pull request
This is ready for merge! The last final tweaks: …
vsoch commented on issue converged-computing/performance-study#86.
Thanks @wihobbs !
Pre-release (or skeleton release) to coincide with MuMMI Experiment work, using Flux and Kubernetes.
What’s Changed
- wip: add kubernetes operator by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/1
- testing mummi by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/2
- adding testing setup by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/3
- refactor: trackers are generic by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/4
- feat: add flux tracker by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/5
- add timing wrappers by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/7
- wip: updates for experiment by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/8
- add automated build for manager by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/9
- ensure we capture all times by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/10
- failed jobs need to be wrapped in job by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/11
- feat: support for custom node selector by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/12
- remove nprocs by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/13
- bug: failed jobs should not be considered active by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/14
- allow for fail, meaning the job always succeeds by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/15
- job base class by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/16
- feat: node selector for manager by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/17
- cleanup state machines by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/18
- add support for oras arch for arm, etc. by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/19
- feat: node timings by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/20
- Add node timings by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/21
- feat: add more resource specs to flux tracker job submit by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/22
- bug: flux misses events by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/23
- feat: save kubernetes logs. by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/24
- feat: allow multiple node jobs by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/25
- feat: analysis and plotting functions by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/26
- wip: add support for workflow events by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/27
- feat: allow variadic tasks/nodes by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/28
- feat: add support for jobset by @vsoch in https://github.com/converged-computing/state-machine-operator/pull/29
New Contributors
- @vsoch made their first contribution in https://github.com/converged-computing/state-machine-operator/pull/1
Full Changelog: https://github.com/converged-computing/state-machine-operator/commits/0.0.0</small>View Comment
MuMMI Operator Release to coincide with Zenodo record for MuMMI Experiments work.
What’s Changed
- wip: mummi-operator python by @vsoch in https://github.com/converged-computing/mummi-operator/pull/7
- so much fail by @vsoch in https://github.com/converged-computing/mummi-operator/pull/9
- feat: mlrunner timing and simplified validator by @vsoch in https://github.com/converged-computing/mummi-operator/pull/10
- cpu runs and remove mummi-operator python module by @vsoch in https://github.com/converged-computing/mummi-operator/pull/11
- bug with running/queued calculation by @vsoch in https://github.com/converged-computing/mummi-operator/pull/12
- queued pods are not failed pods by @vsoch in https://github.com/converged-computing/mummi-operator/pull/13
- ensure we use nrpoc for cganalysis/createsims by @vsoch in https://github.com/converged-computing/mummi-operator/pull/14
- feat: node selectors for mlserver registry, wfmanager by @vsoch in https://github.com/converged-computing/mummi-operator/pull/15
Full Changelog: https://github.com/converged-computing/mummi-operator/compare/0.0.1…0.0.2</small>View Comment
vsoch pushed to rseng/software
Merge pull request #418 from rseng/update/software-2025-04-13
Update from update/software-2025-04-13</small>
vsoch pushed to rseng/rseng.github.io
Merge pull request #2 from 2xB/add-derse
Adding german RSE community</small>
vsoch commented on issue spack/spack#49893.
I can comment that having built flux-sched and flux-core in a ton of environments, I either have needed to pin flux-sched to (at most) 0.37.0 and then flux-core to ~ 0.68.0, or use newer gcc and build the latest for both. Flux core seems pretty flexible to build with different versions of flux-sched - the breaks always happen with gcc/clang versions and sched. …
vsoch commented on issue spack/spack#49893.
Ping @alecbcs what would you like to try next? The CI is outside of my spack expertise….
vsoch commented on issue singularityhub/singularity-hpc#690.
No worries! Can you bump this one more to 0.1.32? Turns out 0.1.31 was already on pypi so your previous change hasn’t been released yet. The silver lining here is we can do that easily now….
vsoch commented on issue pydicom/deid#277.
@ReeceStevens for black we have it pinned to black-23.3.0
- if you can pip install that version in an environment and run on the code, it should fix the failed test….
vsoch pushed to flux-framework/spack
Automated deployment to update package flux-sched 2025-04-09 (#321)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch reviewed a singularityhub/singularity-hpc pull request
ok, this looks good then! Please bump the version and add a note to the CHANGELOG.md and we should be good to merge….
vsoch pushed to vsoch/vsoch.github.io
add container pulling study to cv
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue spack/spack#49893.
@alecbcs let us know what you’d like to try. I’m not familiar with the spack CI hairball and why flux-sched is building at such an old version. https://gitlab.spack.io/spack/spack/-/jobs/16100298…
vsoch pushed to flux-framework/spack
cuDDN: Add versions 9.5.1, 9.6.0, 9.7.1 and 9.8.0 (#49789)
vsoch opened a pull request to sciworks/spack-updater
vsoch pushed to rseng/software
Merge pull request #417 from rseng/update/software-2025-04-06
Update from update/software-2025-04-06</small>
vsoch pushed to flux-framework/spack
Do not pin py-packaging
vsoch pushed to flux-framework/spack
kokkos: allow using new gfx942_apu arch (#48609)
Add an apu variant that promotes GPU architectures to their APU equivalent. Right now this is just gfx942 -> gfx942_apu.</small>
vsoch commented on issue flux-framework/flux-core#6738.
https://github.com/spack/spack/pull/49893…
vsoch pushed to converged-computing/state-machine-operator
feat: add support for jobset (#29)
- feat: add support for jobset
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to spack/spack
vsoch merged a pull request to flux-framework/spack
vsoch pushed to flux-framework/spack
Add py-tf-keras package, upgrade TFP (#43688)
-
enh: add tf-keras package, upgrade TFP
-
chore: remove legacy deps
-
chore: fix style
-
chore: fix style
-
fix: url
-
fix: use jax, tensorflow instead of py-jax, py-tensorflow
-
fix: remove typo
-
Update var/spack/repos/builtin/packages/py-tensorflow-probability/package.py
Co-authored-by: Adam J. Stewart ajstewart426@gmail.com
-
fix: typos
-
fix: swap version
-
fix: typos
-
fix: typos
-
fix: typos
-
chore: use f strings
-
enh: move tf-keras to pypi
-
[@spackbot] updating style on behalf of jonas-eschle
-
fix: t
-
enh: add tf-keras package, upgrade TFP
-
chore: remove legacy deps
-
chore: fix style
-
chore: fix style
-
fix: url
-
fix: use jax, tensorflow instead of py-jax, py-tensorflow
-
fix: remove typo
-
Update var/spack/repos/builtin/packages/py-tensorflow-probability/package.py
Co-authored-by: Adam J. Stewart ajstewart426@gmail.com
-
fix: typos
-
fix: swap version
-
fix: typos
-
fix: typos
-
fix: typos
-
chore: use f strings
-
enh: move tf-keras to pypi
-
[@spackbot] updating style on behalf of jonas-eschle
-
enh: move tf-keras to pypi
-
enh: move back to releases to make it work, actually
-
enh: move back to releases to make it work, actually
-
fix:change back to tar…
-
Fix concretisation: py-tf-keras only has 2.17, not 2.16, fix checksum
-
enh: add TFP 0.25
-
enh: add tf-keras 2.18
-
chore: fix style
-
fix: remove patch
-
maybe fix license
-
Update var/spack/repos/builtin/packages/py-tf-keras/package.py
Co-authored-by: Adam J. Stewart ajstewart426@gmail.com
-
fix: pipargs global?
-
Update var/spack/repos/builtin/packages/py-tf-keras/package.py
Co-authored-by: Wouter Deconinck wdconinc@gmail.com
-
chore: fix formatting
-
chore: fix formatting again
-
fix: pathes in spack
-
fix: typo
-
fix: typo
-
use github package
-
use pip install
-
fix typo
-
fix typo
-
comment 2.19 out
-
fix typo
-
fix typo
-
fix typo
-
chore: remove unused patch file
-
chore: cleanup
-
chore: add comment about TF version
-
chore: remove unused Bazel, cleanup imports
-
[@spackbot] updating style on behalf of jonas-eschle
-
chore: add star import, degrading readability
Co-authored-by: Adam J. Stewart ajstewart426@gmail.com Co-authored-by: jonas-eschle jonas-eschle@users.noreply.github.com Co-authored-by: Bernhard Kaindl contact@bernhard.kaindl.dev Co-authored-by: Bernhard Kaindl bernhardkaindl7@gmail.com Co-authored-by: Wouter Deconinck wdconinc@gmail.com</small>
vsoch pushed to converged-computing/state-machine-operator
feat: resnet model running and completing
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue oras-project/oras-py#187.
I would be happy to review a PR with the fix. Thanks for catching this!…
vsoch pushed to flux-framework/spack
Automated deployment to update flux-sched versions 2025-04-01 (#311)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #145 from flux-framework/release-docs-2025-04-01
Update from release-docs-2025-04-01</small>
vsoch created a new tag, 0.2.28 at oras-project/oras-py
vsoch opened a pull request to converged-computing/state-machine-operator
vsoch pushed to conda-forge/oras-py-feedstock
updated v0.2.28 (#33)
vsoch pushed to sciworks/spack-updater
compilers are now packages
vsoch pushed to rseng/software
Merge pull request #416 from rseng/update/software-2025-03-30
Update from update/software-2025-03-30</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #85 from converged-computing/azure-osu-reruns
osu re-runs - not a success</small>
vsoch pushed to converged-computing/state-machine-operator
wip: add support for workflow events (#27)
- wip: add support for workflow events
This will add support for ending the workflow early due to a count of successes, failures, or job duration metric. We need to next add ability to grow or shrink (need to think about how to do that, since we want a cloud agnostic solution) and then how to handle application specific metrics
Signed-off-by: vsoch vsoch@users.noreply.github.com
- feat: add support for minicluster
If we really want to test scale (shrink and grow) of a job and have it work with the cluster autoscaler, plus collecting metrics from an HPC app, we can most easily do that with the flux operator. This feature adds support for specifying a minicluster property to convert the previous indexed job into a MiniCluster. The flux operator needs to be installed.
Signed-off-by: vsoch vsoch@users.noreply.github.com
- feat: shrink with flux minicluster example working.
Signed-off-by: vsoch vsoch@users.noreply.github.com
- save state
Signed-off-by: vsoch vsoch@users.noreply.github.com
- feat: support for custom metrics
In this example, the user is allowed to provide a custom script that will be used against the log, and it needs to return a dictionary of values (the custom metrics). These are passed back to the manager from the state machine step and can influence workflow behavior (e.g., stop early, grow, or shrink.
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #305 from singularityhub/update/containers-2025-03-27
[bot] update/containers-2025-03-27</small>
vsoch pushed to converged-computing/state-machine-operator
feat: shrink with flux minicluster example working.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue oras-project/oras-py#185.
I’d be happy to review a PR that adds this functionality then….
vsoch pushed to converged-computing/state-machine-operator
feat: analysis and plotting functions (#26)
- feat: analysis and plotting functions
- ensure x axis is same scale
- add analysis libfuncs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue vsoch/oci-python#23.
Thanks! I remember this bit me for other projects, I appreciate the catch here….
vsoch pushed to converged-computing/state-machine-operator
feat: allow multiple node jobs
There is a bug in the kubernetes tracker that we treat the failed/succeeded as boolean (0/1) when it is actually a count of indices. We have not done experiments with >1 nodes so this has not been an issue (or caught). This change will fix it.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #84 from converged-computing/redo-osu
osu: fix runs for gpu 128 GKE and CE</small>
vsoch commented on issue oras-project/oras-py#185.
Is this supported for the oras client in Go?…
vsoch opened a pull request to spack/spack
vsoch commented on issue skypilot-org/skypilot#3777.
Closing for no interest….
vsoch pushed to rseng/software
Merge pull request #415 from rseng/update/software-2025-03-23
Update from update/software-2025-03-23</small>
vsoch pushed to flux-framework/spack
bug: cffi needs to be present for link (configure) (#308)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue oras-project/oras-py#164.
Are we good to close here?…
vsoch pushed to converged-computing/state-machine-operator
feat: save kubernetes logs.
We have been saving artifacts for everything, relying on the application to take the burden of saving its own logging retrieved from the registry. For experiments with gpu selection we just need one little value, and I think it would be easier to save all the logs instead of using oras. This feature supports that, where the user adds a properties -> save-path, and under that path “logs” is created that is named by the job, step, and pod index.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/oras-py-feedstock
oras-py v0.2.27 (#32)
-
updated v0.2.27
-
MNT: Re-rendered with conda-build 25.1.2, conda-smithy 3.47.0, and conda-forge-pinning 2025.03.21.21.56.39</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #304 from singularityhub/update/containers-2025-03-20
[bot] update/containers-2025-03-20</small>
vsoch pushed to flux-framework/spack
re-enable flux checks
vsoch pushed to converged-computing/state-machine-operator
bug: flux failed jobs do not have status COMPLETED, they are FAILED
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue pydicom/deid#275.
Closed with #276 …
vsoch commented on issue flux-framework/flux-core#6713.
I couldn’t say now - I wound up sending a kill signal to the job, and didn’t save the data because I considered the run erroneous!…
vsoch pushed to conda-forge/deid-feedstock
Merge pull request #48 from regro-cf-autotick-bot/0.4.1_h47750b
deid v0.4.1</small>
vsoch commented on issue pydicom/deid#276.
I can see the output above and the logic in the code, so no need. I think this is good to go - if you could please bump the version in version.py and add a corresponding note in the CHANGELOG.md we should be good….
vsoch pushed to converged-computing/state-machine-operator
feat: add more resource specs to flux tracker job submit
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to spack/spack
vsoch merged a pull request to singularityhub/shpc-registry