vsoch pushed to singularityhub/singularity-docker
Fix YAML syntax for push event in workflow
vsoch reviewed a pydicom/deid pull request
Works for me. Thanks @bghill !…
vsoch opened a pull request to flux-framework/flux-operator
vsoch pushed to converged-computing/gke-compute-engine-performance-study
analysis: add kripke
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/aws-performance-study
Merge pull request #10 from converged-computing/add-mape
Add model comparison script / images</small>
vsoch commented on issue singularityhub/singularity-docker#17.
They are not automatically generated - they are branches in the repository….
vsoch commented on issue pydicom/deid#291.
@bghill we can also update the project version of jekyll. …
vsoch pushed to singularityhub/shpc-registry
Merge pull request #388 from singularityhub/update/containers-2025-10-06
[bot] update/containers-2025-10-06</small>
vsoch commented on issue pydicom/deid#289.
You are both right! We don’t need to explicitly bump the version (apologies I forgot about that) but we should have a CHANGELOG note under the current….
vsoch pushed to converged-computing/gke-compute-engine-performance-study
add back build script compute engine
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/gke-compute-engine-performance-study
Merge pull request #2 from converged-computing/add-compute-engine-setup
Add compute engine setup</small>
vsoch pushed to rseng/software
Merge pull request #443 from rseng/update/software-2025-10-05
Update from update/software-2025-10-05</small>
vsoch pushed to vsoch/flux-sched
feat: binding prediction
Problem: we need to be able to predict a binding, meaning a cpuset, physical and logical cores (the PUs) based on a shape. Solution: we can do this by parsing an existing xml or the live system with hwloc, and then having an understanding of the shape. A flux submit can then use hwloc to get the exact cpuset mask, logical and physical cores, and of course numa nodes, and we can compare what we actually get to the shape we expect. Note that this DOES require that the shape provided is what we actually get. This is probably something that both flux developers and users should (and will) get a better understanding of. E.g., that asking for exclusive allows exposure to all resources under a NUMA node, or that setting cpu affinity per task will limit to a subset of tasks (vs. not and having different flux ranks with full access to all cores).
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/flux-sched
test: exclusivity check
Problem: we need to be able to check that a change to fluxion does not change cpu mapping decisions. This is a WIP set of changes that will discovery the local resources with hwloc, and then submit simple jobs and compare the discovered mapping with expected, where expected is a consistent pattern we can derive. The identifiers of the cpusets should hypothetically not matter as long as the patterns are consistent. Finally, I tested using faux or emulated resources, and everything works up until when we need to get the cpusets via the pid. I was not able to get a pid that makes sense. That said, it will be worth one more try before giving up!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to milroy/flux-sched
vsoch commented on issue milroy/flux-sched#10.
@milroy this is ready for you - but very importantly, you must maximize fun over the weekend and not look at this unless you are absolutely desperately bored!! :nerd_face: …
vsoch commented on issue pydicom/deid#289.
Great news! We just need a bump to the version (maybe a larger one this time) and the corresponding entry to the changelog….
vsoch pushed to converged-computing/flux-pewpew
init: hello world
This was pretty fun. I do not know why the devcontainer seems to have a default high match policy. It actually seems to select from the highest set of cores, but then actually hand them out to the ranks from the lowest to high. I suspect that fluxion returns the high set and then flux hands them out still lowest first. So for asking for 2 cores in 0-7, we first select 6-7, but then assign 0:6 and 1:7
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
Merge pull request #4 from compspec/fractale-pipeline
Fractale pipeline</small>
vsoch pushed to compspec/fractale
add select
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #167 from flux-framework/release-docs-2025-10-01
Update from release-docs-2025-10-01</small>
vsoch pushed to converged-computing/performance-study
paper: changes to kripke/mt-gemm plots
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/cxi-k8s-device-plugin
feat: restructure into manager with config
I reorganized the code so there is a Manager for devices, and the manager owns a configuration for naming and paths that is exposed to the outside world. We can further optimize but this is a good start.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
hpc7g
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
add select
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #386 from singularityhub/update/containers-2025-09-29
[bot] update/containers-2025-09-29</small>
vsoch reviewed a kubernetes-sigs/node-feature-discovery pull request
LGTM. I don’t see where it shows up in the preview (I searched for “export” and no result), so if someone wants to throw up a link it would be helpful….
vsoch pushed to compspec/fractale-agent-experiments
8 nodes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
scale: use m7g
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
add select
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #442 from rseng/update/software-2025-09-28
Update from update/software-2025-09-28</small>
vsoch commented on issue pydicom/deid#289.
That would work for me @ReeceStevens - if you want to test in production and report back, if it goes smoothly we can merge here. Does that work?…
vsoch reviewed a oras-project/oras-py pull request
Very thorough, and is reasonable and overall LGTM. Could you please update the CHANGELOG and bump the version?…
vsoch pushed to compspec/fractale-agent-experiments
scaling study: test lammps
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
add select
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue kubeflow/trainer#2841.
What Flux is going to help with:…
vsoch pushed to converged-computing/aws-performance-study
models: add script to do holdout of instance types
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #385 from singularityhub/update/containers-2025-09-25
[bot] update/containers-2025-09-25</small>
vsoch pushed to pydicom/deid
Remove pixel check since cleaning changes pixels
vsoch commented on issue kubeflow/trainer#2841.
> What do you mean by MPI-enabled application? mpirun just needs to be installed in the container image ?…
vsoch pushed to compspec/fractale
feat: cost estimation agent
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue singularityhub/singularity-docker#17.
See my previous comment. A missing symbol means a missing library (or just a base that needs updating). I’m happy to review a PR….
vsoch commented on issue flux-framework/flux-coral2#411.
@mcfadden8 depending on when the demo is, I’m not sure we have a reasonable amount of time to consider filesystems for the demo - a run of LAMMPS that is triggered by the submission of a job is what we had scoped it to. Let us know more details of what you had in mind so we can discuss….
vsoch pushed to compspec/fractale-agent-experiments
give llm starting point
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
lammps with cpu affinity
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
kripke test on one node
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to compspec/fractale
vsoch opened a pull request to converged-computing/flux-apps-helm
vsoch pushed to compspec/flux-delegate-py
feat: add (expose) flux detect
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-containment
need fluxion
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue kubernetes-sigs/wg-ai-conformance#13.
@andreyvelich done! https://github.com/kubeflow/trainer/issues/2841. Really looking forward to working on this!
vsoch open issue kubeflow/trainer#2841.
feat: Flux Framework as a plugin for MPI bootstrap and workload orchestration
### What you would like to be added?…View Comment
vsoch commented on issue flux-framework/flux-coral2#411.
For our notes, here is the command that worked (for an interactive run) on hetchy. The reason we needed to ask for all 12 nodes was to get around fluxion scheduling and compute node to rabbit assignment. …
vsoch pushed to converged-computing/sc25-flux-eks
add lammps state machine
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
instruct lammps to use base image
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue kubernetes-sigs/wg-ai-conformance#13.
Ping @milroy let’s discuss use cases, and if the idea is sound let’s follow up with an implementation. We have two opportunities here:…
vsoch pushed to compspec/fractale-agent-experiments
ensure we select only efa types
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #382 from singularityhub/update/containers-2025-09-18
[bot] update/containers-2025-09-18</small>
vsoch commented on issue kubernetes-sigs/wg-ai-conformance#13.
@andreyvelich what would be useful to do, I think, is to make a backend for Kubeflow Trainer that is Flux oriented. It wouldn’t be the Flux Operator, but rather a separate entity that also uses Flux….
vsoch pushed to converged-computing/performance-study
paper: update figures for final submission
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
ensure provide amg efa instruction
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue kubernetes-sigs/wg-ai-conformance#13.
A Job is the underlying unit of JobSet. Using Job includes JobSet, however using JobSet excludes a lot of the ecosystem….
vsoch pushed to compspec/fractale-agent-experiments
results: add parsed data for foms
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/flux-coral2
feat: support for flux operator miniclusters
This feature includes the addition to create a Flux Operator MiniCluster running across some subset of rabbit nodes given that the rabbit.mpi directive is defined. By default, setting that to true (or anything) can use a default container base and interactive mode, and everything from that can be customized. In addition, we have a “flux hop” command that is able to take the same metadata, populate the RabbitMPI Job object, and create the Flux MiniCluster using the same classes/logic but without requiring the HPE stuff and Workflow. This could be used, but likely will be for testing or for fun.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #381 from singularityhub/update/containers-2025-09-16
[bot] update/containers-2025-09-16</small>
vsoch pushed to grondo/flux-python
Expand info_commands in setup.py
Expanding our list of info commands to include those needed for the ticket. The only command provided that won’t work is a blank python setup.py. If they use a source distributed that hasn’t had the version changed from develop, there will also be a warning. The rest of the info output looks good.</small>
vsoch pushed to converged-computing/flux-usernetes
nit: remove pycache
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue bissettp/bissettp.github.io#1.
Yeah! Thriving! Work is really great at the lab, and this weekend I was Batman. I mean, it doesn’t really get better than that.
vsoch pushed to vsoch/flux-coral2
feat: support for flux operator miniclusters
This feature includes the addition to create a Flux Operator MiniCluster running across some subset of rabbit nodes given that the rabbit.mpi directive is defined. By default, setting that to true (or anything) can use a default container base and interactive mode, and everything from that can be customized. In addition, we have a “flux hop” command that is able to take the same metadata, populate the RabbitMPI Job object, and create the Flux MiniCluster using the same classes/logic but without requiring the HPE stuff and Workflow. This could be used, but likely will be for testing or for fun.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to flux-framework/flux-coral2
vsoch pushed to compspec/compspec-modules
Merge pull request #3 from compspec/changes-for-fractale
feat: support for detect</small>
vsoch pushed to compspec/compspec-containment
Merge pull request #1 from compspec/add-detect
feat: support for detect</small>
vsoch commented on issue opencontainers/specs.opencontainers.org#12.
Gotcha - thank you for that example. I’m looking at the preview: …
vsoch pushed to flux-framework/flux-python
deprecate: install-mamba.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
results: adding laghos runs and updated ui
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
laghos: add decision function
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
Merge pull request #16 from compspec/add-flux-generator
Add flux generator</small>
vsoch commented on issue flux-framework/flux-python#18.
@trws @grondo @garlick looking for your advice! In testing building for Mac, I found that flux core and flux security need something called pipe2. …
vsoch pushed to compspec/fractale-agent-experiments
add lammps decision function
This was providing a guidance function to lammps for a pinned problem size. We need to next try and not limit to a problem size.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue opencontainers/specs.opencontainers.org#11.
It is done dynamically (see here) so we don’t have to re-render anything. The content on the site always reflects the latest README files and associated spec assets….
vsoch pushed to compspec/fractale-agent-experiments
amg: add note about minicluster crd
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
nit: clean up results parser logic
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #379 from singularityhub/update/containers-2025-09-11
[bot] update/containers-2025-09-11</small>
vsoch commented on issue pydicom/deid#287.
I’d be happy to review a PR that makes these changes….
vsoch pushed to compspec/fractale
add optimize function
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch created a new branch, test-toss-4 at researchapps/datacrumbs
vsoch reviewed a oras-project/oras-py pull request
Good catch - thank you! Before we merge - does a non 202 response also correctly format to json? …
vsoch pushed to compspec/fractale-agent-experiments
add lammps test
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
add optimize function
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale-agent-experiments
kripke: first run with no params
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #378 from singularityhub/update/containers-2025-09-08
[bot] update/containers-2025-09-08</small>
vsoch commented on issue oras-project/oras-py#213.
All set! …
vsoch pushed to compspec/fractale-agent-experiments
lammps: more specific to change variables
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
feat: common prompt class
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/fractale
feat: common prompt class
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/flux-delegate-py
feat: add skeleton of flux remote submit
This adds the basic structure for flux remote submit, which basically wraps the flux submit command (and is designed to support other subcommands in the future). This can be moved into the flux libexec cmd directory or just used as is. I next need to parse the submit request and then map attributes, etc. to the user defined compatibility data (cluster and resource needs) and I think then we are good.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>