vsoch pushed to converged-computing/lammps-time
Add kind experiment (#4)
- add wip kind experiment
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compat-lib
tweak proot to use pwd and kill on exit
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
visit: add v3.4.0, v3.4.1 (#47161)
-
Visit: Add new versions 3.4.0 and 3.4.1
-
Adios2: Restrict python, 3.11 doesn’t not work for older Adios2
-
VisIt: Set the VTK_VERSION for @3.4:
Older versions of VTK used the VTK_{MAJOR, MINOR}_VERSION variables for VTK detection. VisIt >= 3.4 uses the full string VTK_VERSION.
-
CI: Don’t build llvm-amdgpu for non-HIP stack
-
VisIt: v3.4.1 handles newer Adios2 correctly
-
Visit: Add missing links in HDF5, set correct VTK version configuration parameter
-
VisIt: Add py-pip requirement and patch visit with configuration changes
-
HDF5 symlinks move when inside of callback
-
VisIt ninja install fails with python module. Using make does not
-
VisIt 3.4 has a high minimum cmake requirement
-
HDF5: Early return when not mpi for mpi symlinks
-
HDF5: Use platform agnostic method for creating legacy compatible MPI symlinks
-
Fix VISIT_VTK_VERSION handling for 8.2.1a hack</small>
vsoch pushed to flux-framework/spack
Merge pull request #260 from flux-framework/update-package/flux-core-2024-11-20
Update from update-package/flux-core-2024-11-20</small>
vsoch pushed to converged-computing/jobspec-database
docs: add documentation in readme for reading databases
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/lammps-time
experiment: add kind cluster running results
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compat-lib
Merge pull request #7 from compspec/add-close
feat: add close and perfetto support</small>
vsoch pushed to vsoch/vsoch.github.io
Update 2024-11-17-across-boundaries.md
vsoch pushed to singularityhub/shpc-registry
Merge pull request #280 from singularityhub/update/containers-2024-11-18
[bot] update/containers-2024-11-18</small>
vsoch pushed to compspec/compat-lib
Merge pull request #3 from compspec/add-python-module
feat: add supporting python module</small>
vsoch pushed to rseng/software
Merge pull request #399 from rseng/update/software-2024-11-17
Update from update/software-2024-11-17</small>
vsoch pushed to compspec/compat-lib
add simple release workflow
This will release an x86 binary, which should be suitable for basic testing.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
py-wandb: add v0.16.6 (#43891)
-
py-wandb: add version v0.16.6
-
fix: typo
-
py-wandb: py-click when @0.15.5:, py-pathtools when @:0.15
Co-authored-by: Wouter Deconinck wdconinc@gmail.com</small>
vsoch pushed to singularityhub/shpc-registry
remove tag 1.1–py34 from biocontainers/reago
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to converged-computing/lammps-time
analysis: typo
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/oras-py-feedstock
oras-py v0.2.25 (#30)
-
updated v0.2.25
-
MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.3, and conda-forge-pinning 2024.11.14.06.00.25</small>
vsoch pushed to rseng/devstories-episodes-2
episode 102: dan reed “hpc dan”
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories
Add thank you to our HPC Dan!
vsoch pushed to rseng/devstories
episode 102: dan reed “hpc dan”
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/lammps-time
model: add markov model for predicting next path
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #277 from singularityhub/update/containers-2024-11-11
[bot] update/containers-2024-11-11</small>
vsoch pushed to oras-project/oras-py
Retry on 500 (#168)
- workaround: retry manifest upload on quay
- decorator: get rid of inheritance
- decorator: retry on 500
Signed-off-by: Isabella do Amaral idoamara@redhat.com</small>
vsoch pushed to nicholas-sly/spack
Update var/spack/repos/builtin/packages/flux-sched/package.py
Co-authored-by: Greg Becker becker33@llnl.gov</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #278 from singularityhub/update/containers-2024-11-12
[bot] update/containers-2024-11-12</small>
vsoch pushed to converged-computing/container-chonks
container times: look into specific events (#3)
- container times: look into specific events
- container pulling times: put run1 in the readme
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
update work
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
lua: always generate pcfile without patch and remove +pcfile variant (#47353)
-
lua: add +pcfile support for @5.4: versions, without using a version-dependent patch
-
lua: always generate pcfile, remove +pcfile variant from all packages
-
lua: minor fixes
-
rpm: minor fix</small>
vsoch pushed to converged-computing/lammps-time
add pattern ideas to fuse analysis
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
Merge pull request #2 from converged-computing/add-aws-pulling-study
Add aws pulling study</small>
vsoch pushed to converged-computing/container-chonks
google: add back re-run
I did these re-runs because the settings on the kubernetes event exporting was dropping some events, and I do not think that is appropriate data to use for a publication. I am still going to run one more final study on gKE that only tests the regular pulls using their registry.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #398 from rseng/update/software-2024-11-10
Update from update/software-2024-11-10</small>
vsoch pushed to converged-computing/lammps-time
add lammps output parsing
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/supermarket-fish-problem
add current/max speeds for gpu
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #71 from converged-computing/scaling-governor
analysis: look at scaling governor</small>
vsoch pushed to converged-computing/lammps-time
Merge pull request #2 from converged-computing/add-fuse-install
add copyright, notice, license</small>
vsoch pushed to converged-computing/flux-distribute
add topology testing
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compat-lib
feat: basic recorder functionality (#1)
- feat: basic recorder functionality
We are going to want to build a bunch of hpc apps and then record what they are doing, meaning paths touched and when! This is a start.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/pypi-classifiers
try ubuntu 24.04
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compat-lib
output updates
The output log now has unix nanoseconds, and also the program exits and cleans up after the command finishes running.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #276 from singularityhub/update/containers-2024-11-07
[bot] update/containers-2024-11-07</small>
vsoch pushed to rseng/gpu-search
remove partial data
I was originally saving organized based on date, but I do not anticipate doing this again so I am removing in favor of the top level.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Merge pull request #257 from flux-framework/release/flux-sched-v0.40.0
Update from release/flux-sched-v0.40.0</small>
vsoch pushed to flux-framework/spack
Merge pull request #249 from flux-framework/release/flux-security-v0.12.0
Update from release/flux-security-v0.12.0</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #130 from flux-framework/release-docs-2024-11-05
Update from release-docs-2024-11-05</small>
vsoch pushed to converged-computing/supermarket-fish-problem
plots: restore line width
we cannot see the distribution of values without the linewidth being non-zero. I cannot remove it
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #275 from singularityhub/update/containers-2024-11-04
[bot] update/containers-2024-11-04</small>
vsoch pushed to converged-computing/performance-study
analysis: stream has incorrect title (Minife)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #397 from rseng/update/software-2024-11-03
Update from update/software-2024-11-03</small>
vsoch pushed to converged-computing/slurm-operator
container-bases: update to rockylinux9 (#7)
- container-bases: update to rockylinux9
- powertools -> enable crb
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
Add google to readme
vsoch pushed to researchapps/flux-sched
debug: adding verbosity to grow function
We need to figure out why the function is returning -1. I am adding additional error parsing to check.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/fluxion-go
feat: shrink support for fluxion
This changeset exposes the remove_subgraph function, which we can call a shrink. It does not account for (I do not think) handling jobs properly, but should be a reasonable start to testing or debugging.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-jobset
app: add stream example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-jobset
app: add kripke example on one node
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #274 from singularityhub/update/containers-2024-10-31
[bot] update/containers-2024-10-31</small>
vsoch pushed to flux-framework/spack
add the USE_F90_ALLOCATABLE option to Spack (#47190)
Signed-off-by: Jeff Hammond jehammond@nvidia.com</small>
vsoch pushed to converged-computing/fluxgen
bug: move miniconda to system bin (#2)
- bug: move miniconda to system bin
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
analysis: topology for aws (#67)
- analysis: topology for aws
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/flux-core-feedstock
feat: add support for systemd
This should add libsystemd0 as a dependency on the host so that we can hopefully have support for it - I’m guessing that flux will detect it on build.</small>
vsoch pushed to converged-computing/flux-distribute
docs: update title to flux-distribute
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/caliper
ci: add pre commit for linting (#41)
- ci: add pre commit for linting
- spelling
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #396 from rseng/update/software-2024-10-27
Update from update/software-2024-10-27</small>
vsoch pushed to vsoch/vsoch.github.io
post: add little monster
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #270 from singularityhub/update/containers-2024-10-24
[bot] update/containers-2024-10-24</small>
vsoch pushed to rseng/devstories-episodes-2
episode 10101: michela taufer
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories
episode 101 michela taufer
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
draco: add v7.19.0 (#47032)
Co-authored-by: Cleveland cleveland@lanl.gov Co-authored-by: Kelly (KT) Thompson KineticTheory@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-python
docs: update todo in readme
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-python
update server to receive grow/shrink request
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
docs: design
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #66 from converged-computing/add-lammps-fom
analysis: lammps matom steps per second</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #269 from singularityhub/update/containers-2024-10-21
[bot] update/containers-2024-10-21</small>
vsoch pushed to oras-project/oras-py
core: align config_path type annotation (#166)
- core: align config_path type annotation the oras-py CI setup uses basic auth for auth tests
Signed-off-by: tarilabs matteo.mortari@gmail.com</small>
vsoch pushed to converged-computing/ensemble-python
feat: grow/shrink requests are being hit
I need to put this into the ensemble operator next to have the request actually do something, like request the minicluster to scale up or down. I will also need to have a way to communicate the member name and namespace. This could either be done via discovery (requiring the kubernetes API within the ensemble python and the rbac to use it), or more simply done, just put the member name that is expected in the same namespace. More ideally there can be a registration step at the onset that generates a random name and sends it over to the grpc service to associate.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-python
feat: support for inequality in rule->when
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compat-lib
comment: fix comment about not working (it is)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
–break-system-packages
vsoch pushed to converged-computing/ensemble-python
example: heartbeat example
This updates the heartbeat so it is entirely derived from the config. This can happen explicitly if the user sets logging->heartbeat to a non zero value, but it will also happen if there is a grow or shrink action used. If the user defines a grow/shrink and sets the heartbeat to 0 it will still be set to the default, 60, because grow/shrink will not work as expected without it.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-guts
remove break system packages
vsoch pushed to singularityhub/guts
–break-system-packages
vsoch pushed to flux-framework/spack
openldap: add v2.6.8; conflict gcc@14: for older (#47024)
vsoch pushed to converged-computing/ensemble-python
ci: add docker build for service (#3)
- ci: add docker build for service
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-python
Merge pull request #1 from converged-computing/add-support-logging-backoff
feat: support for repetitions and logging</small>
vsoch pushed to converged-computing/cloud-select
try insecure to registry init
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
–break-system-packages
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to singularityhub/shpc-guts
–break-system-packages
vsoch pushed to rseng/software
–break-system-packages
vsoch pushed to flux-framework/spack
py-cython: add v3.0.11 (#46772)
- py-cython: add v3.0.11 Add url for cython because they are using lower case for 3.0.11 Co-authored-by: Tamara Dahlgren <35777542+tldahlgren@users.noreply.github.com>
- Don’t use f-string
- Remove old version directive for 3.0.11
Co-authored-by: jmcarcell jmcarcell@users.noreply.github.com Co-authored-by: Tamara Dahlgren <35777542+tldahlgren@users.noreply.github.com></small>
vsoch pushed to converged-computing/performance-study
Update azure lammps (#64)
- azure was missing lammps size 128 and 256
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #267 from singularityhub/update/containers-2024-10-14
[bot] update/containers-2024-10-14</small>
vsoch pushed to compspec/compat-lib
compatibility: add use case for library server
We want to be able to check software compatbility, amongst other things. In Kubernetes, Node Feature Discovery (NFD) has a design where a daemon runs on the nodes, can parse what they provide, and then provides that to a central service. For our case, we can do similar - having a service (that can run on the node and either work with a local client OR a scheduler) that knows how to read either a compatibility artifact directly (json payload) or retrieve from a registry, where it describes an application or container, and then based on the libraries needed, quickly determine if the node can satisfy the needs. This I am calling a Compatibility server and check because it extends to other things beyond libraries / software.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/singularity-hpc
autamus no longer maintained
vsoch pushed to singularityhub/shpc-registry
Add library to gh-pages
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to sciworks/spack-updater
try installing setuptools
vsoch pushed to converged-computing/flux-usernetes
plots: add plot with just bare metal and usernetes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
analysis: add basic container pulling cost estimate
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
pulling: add cost estimates
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #266 from singularityhub/update/containers-2024-10-10
[bot] update/containers-2024-10-10</small>
vsoch pushed to flux-framework/flux-k8s
Merge pull request #85 from flux-framework/control-builds
Bug: JGF Name was removed, and build with distroless destroyed logging</small>
vsoch pushed to converged-computing/ensemble-python
feat: add support for queue metrics and actions!
With this addition, we have our first mini ensemble that is able to submit jobs at start, wait until a count is reached, and then (based on that count metric) submit another group of jobs!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
python: rework how we compute the “command” property (#46850)
Some Windows Python installations may store the Python exe in Scripts/
rather than the base directory. Update .command
to search in both
locations on Windows. On all systems, the search is now done
recursively from the search root: on Windows, that is the base install
directory, and on other systems it is bin/.</small>
vsoch pushed to converged-computing/ensemble-python
river streaming ml metrics
Add queue metrics to keep track of job groups. This includes MAD, IQR, mean, min and max so far.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #265 from singularityhub/update/containers-2024-10-07
[bot] update/containers-2024-10-07</small>
vsoch pushed to researchapps/eksctl
efa-installer: remove archive in 2023 files
Problem: the node consistently runs out of disk space when adding efa, resulting in an unusable cluster with scattered nodes where the installer failed. Solution: the installer archive itself is huge, and we can simply remove it and avoid this error.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
php: add v7.4.33, v8.3.12 (fix CVEs) (#46829)
-
php: add v7.4.33, v8.3.12
-
php: mv sbang.patch sbang-7.patch
-
php: add sbang-8.patch
-
[@spackbot] updating style on behalf of wdconinc
-
Replace –with-libiconv= (not recognized) with –with-iconv=
Co-authored-by: wdconinc wdconinc@users.noreply.github.com Co-authored-by: Bernhard Kaindl contact@bernhard.kaindl.dev</small>
vsoch pushed to vsoch/vsoch.github.io
spelling: programmatically
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #395 from rseng/update/software-2024-10-06
Update from update/software-2024-10-06</small>
vsoch pushed to flux-framework/spack
py-rpds-py: add v0.18.1 (#46786)
vsoch pushed to converged-computing/container-chonks
experiment pulling: add streaming results
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #264 from singularityhub/update/containers-2024-10-03
[bot] update/containers-2024-10-03</small>
vsoch pushed to regro-cf-autotick-bot/deid-feedstock
Update meta.yaml
vsoch pushed to flux-framework/spack
py-rucio-clients: new package (and dependencies) (#46585)
Co-authored-by: Bernhard Kaindl contact@bernhard.kaindl.dev</small>
vsoch pushed to converged-computing/ensemble-containers
doi: add zenodo doi
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
experiment, pulling: add run 4 with streaming images
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #128 from flux-framework/release-docs-2024-10-02
Update from release-docs-2024-10-02</small>
vsoch pushed to converged-computing/container-chonks
update readme for run2?
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to oras-project/oras-py
Merge pull request #160 from oras-project/release-0.2.21
release: 0.2.21</small>
vsoch pushed to converged-computing/container-crafter
docs: add doi to README
vsoch pushed to converged-computing/container-chonks
experiment: pulling with more sizes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #263 from singularityhub/update/containers-2024-09-30
[bot] update/containers-2024-09-30</small>
vsoch pushed to converged-computing/jobspec-database
mistral: add sample data
This is not checked by a human, but it was processed by gemini, which is pretty good. It should be an ok start to testing out training (fine tuning) with mistral
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-crafter
generator: mostly complete
This build tool now can generate an entire set of images, where each layer is unique based on the size and random filename, so a pulling study can be done and no cache can be used.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
layer digests: only count each uri once
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
digest similarity: add image size calculations
Also adding 100% percentile since we want to calculate the larger range - ML images are big :)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #394 from rseng/update/software-2024-09-29
Update from update/software-2024-09-29</small>
vsoch pushed to flux-framework/spack
cc
: ensure that RPATHs passed to linker are unique
macOS Sequoia’s linker will complain if RPATHs on the CLI are specified more than once.
To avoid errors due to this, make cc
only append unique RPATHs to the final args list.
This required a few improvements to the logic in cc
:
-
List functions in
cc
didn’t have any way to append unique elements to a list. Add acontains()
shell function that works like our other list functions. Use it to implement an optional"unique"
argument toappend()
and anextend_unique()
. Use that to add RPATHs to theargs_list
. -
In the pure
ld
case, we weren’t actually parsingRPATH
arguments separately as we do forccld
. Fix this by adding another nested case statement for rawRPATH
parsing. There are now 3 places where we deal with-rpath
and friends, but I don’t see a great way to unify them, as-Wl,
,-Xlinker
, and raw-rpath
arguments are all ever so slightly different. -
Fix ordering of assertions to make
pytest
diffs more intelligible. The meaning of+
and-
in diffs changed inpytest
6.0 and the “preferred” order for assertions becameassert actual == expected
instead of the other way around.
Signed-off-by: Todd Gamblin tgamblin@llnl.gov</small>
vsoch pushed to converged-computing/container-chonks
dockerfile experiment - add summary metrics for digest similarity
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #262 from singularityhub/update/containers-2024-09-26
[bot] update/containers-2024-09-26</small>
vsoch pushed to hpc-social/hpc-social.github.io
Merge pull request #80 from hpc-social/contributors/update-2024-09-25
[tributors] contributors/update-2024-09-25</small>
vsoch pushed to converged-computing/performance-study
container similarity: remove title to improve clarity of clustermaps
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-containers
containers: add entire google set
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to oras-project/oras-py
core: improve anon/auth token logic (#148)
- core: TokenAuth request_token fix missing auth
the method is intended to request authenticated token, per pydocs, but was passing an headers which was always missing Authorization.
- core: use token in auth in subsequent requests
if a token was saved in auth, it shall be used in subsequent requests.
This avoid a situation where: to upload a blob, first is done anonymously, then retry with token then upload a manifest, avoid the attempt to upload anonymously if a token was present in the previous flow
- core: if 401 on 2nd attempt, avoid anon tokens
in the first flow using auth backend for token:
- try do_request with no auths at all
- the attempt to gain an anon token is success, but then the request fails with 401
- at this point, in the third attempt, give chance to the flow to request a token but avoid any anon tokens.
Please note: this happens effectively only on the first run of the flow. Subsequent do_request flow invocations should just succeed now on the 1st request by re-using the token –simplified behaviour introduced with this proposal
- guard as headers is Optional
-
implement review request
- Revert “implement review request”
This reverts commit 102381c5c4ae0fdf45c8a4dd26ae1765eae9b029. This reverts commit 1e891d2bfebe4b6520a1fe6902159198c8799d62. This reverts commit 6e226672c60184cd43b6532f5a910acbf9d064ea.
this was taken care in https://github.com/oras-project/oras-py/pull/153
This reverts commit 10e010b365e56488963ca14b6e9e08b1ea7e4a7a.
- implement review comment about anon/req token
from: https://github.com/oras-project/oras-py/pull/148#discussion_r1677018164
And if the basic auth is there, skip over asking for an anon token
as it stands, in case the basic auth are present, these are exchanged for the request token.
Signed-off-by: tarilabs matteo.mortari@gmail.com
Signed-off-by: tarilabs matteo.mortari@gmail.com</small>
vsoch pushed to converged-computing/metrics-operator
custom: add support for custom container (#84)
- custom: add support for custom container
We should be able to support custom containers, and configuration of addons to them. I am not liking the design to have addons defined in parallel, and want to refactor so they are part of the metric. I am also wondering if the metrics themselves are more akin to apps. I have not looked at this project in a bit and need to think about it.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
Merge pull request #37 from converged-computing/add-mpitrace-ubuntu
container: mpitrace with ubuntu jammy base</small>
vsoch pushed to converged-computing/ensemble-containers
containers: add ssh for metrics operator
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #261 from singularityhub/update/containers-2024-09-23
[bot] update/containers-2024-09-23</small>
vsoch pushed to converged-computing/ensemble-containers
container: add laghos
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Docs/Windows: Clarify supported shells and caveats (#46381)
While the existing getting started guide does in fact reference the powershell support, it’s a footnote and easily missed. This PR adds explicit, upfront mentions of the powershell support. Additionally this PR adds notes about some of the issues with certain components of the spec syntax when using CMD.</small>
vsoch pushed to converged-computing/ensemble-containers
container: kripke
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #260 from singularityhub/update/containers-2024-09-20
[bot] update/containers-2024-09-20</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #60 from converged-computing/update-mixbench-cpu
analysis: mixbench cpu</small>
vsoch pushed to flux-framework/spack
Merge pull request #241 from flux-framework/update-package/flux-sched-2024-09-18
Update from update-package/flux-sched-2024-09-18</small>
vsoch pushed to compspec/jobspec
readme: add zenodo doi
vsoch pushed to flux-framework/spack
imports: automate missing imports (#46410)
vsoch pushed to converged-computing/performance-study
Merge pull request #51 from converged-computing/catalog-mixbench
- analysis: catalog for mixbench
Every environment was run slightly differently, with very little overlap. I am not sure how we can use this data.
- debug: mixbench
Here I am adding the actual run statements across the mixbench configurations so they can be visually compared.
Signed-off-by: vsoch vsoch@users.noreply.github.com
- analysis: mixbench, parsed data
This changeset includes parsed data for GPU runs. I want to look at these more closely to decide what to further parse and plot. The CSVs are compiled from each experiment environment, and that includes interleaving. I think we might still combine regardless to make a line plot based on the index/iteration.
Signed-off-by: vsoch vsoch@users.noreply.github.com
- gpu analysis: add summary for similar/different
Here we see that there are differences in memory size, notably for Google (only 16GB) and then the other clouds. On the other clouds (32GB) the actual values are still a little different. The weirdest finding is the error correction seems to be set at a mixture of yes/no for Azure GPUs
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #258 from singularityhub/update/containers-2024-09-16
[bot] update/containers-2024-09-16</small>
vsoch pushed to singularityhub/singularity-cli
Merge pull request #226 from singularityhub/contributors/update-2024-09-15
[tributors] contributors/update-2024-09-15</small>
vsoch pushed to rseng/software
Merge pull request #392 from rseng/update/software-2024-09-15
Update from update/software-2024-09-15</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #39 from converged-computing/reorganize-results
Reorganize results</small>
vsoch pushed to converged-computing/supermarket-fish-problem
Merge pull request #1 from converged-computing/add-start-of-summary
Add start of summary</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #257 from singularityhub/update/containers-2024-09-12
[bot] update/containers-2024-09-12</small>
vsoch pushed to flux-framework/flux-k8s
Merge pull request #84 from flux-framework/add-subsystem-field
jgf: update edge metadata to include subsystem</small>
vsoch pushed to converged-computing/supermarket-fish-problem
add fish
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
update hpckm link in resume
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
flux-sched: add conflict for gcc and clang above 0.37.0
vsoch pushed to converged-computing/performance-study
Merge pull request #35 from converged-computing/add-link-supermarket-fish
single-node: add preprocessing</small>
vsoch pushed to converged-computing/performance-study
container-sizes: analysis to look at size of layers
This includes work to parse the events files and determine pull times, along with getting manifests and configs for all unique containers in the study. We filter this down to those that were used as experiment apps, and then look at overall layer sizes (histogram) and similarity based on digests. We also make plots that look at pull times for the entire study and within applications. This can be expanded if needed.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/juliart-feedstock
Rebuild for python 3.13 (#12)
-
Rebuild for python 3.13
-
MNT: Re-rendered with conda-build 24.7.1, conda-smithy 3.39.1, and conda-forge-pinning 2024.09.11.15.30.13</small>
vsoch pushed to converged-computing/performance-study
docs: add badge to readme
vsoch pushed to singularityhub/shpc-registry
Merge pull request #256 from singularityhub/update/containers-2024-09-09
[bot] update/containers-2024-09-09</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #31 from converged-computing/add-mixbench-4-and-8
mixbench: updating results for compute-engine gpu sizes 4,8</small>
vsoch pushed to flux-framework/spack
Sz3 fix (#46263)
- Updated version of sz3 Supercedes #46128
- Add Robertu94 to maintainers fo r SZ3
Co-authored-by: Robert Underwood runderwood@anl.gov</small>
vsoch pushed to conda-forge/helpme-feedstock
Merge pull request #11 from regro-cf-autotick-bot/rebuild-python313-0-1_h99eafa
Rebuild for python 3.13</small>
vsoch pushed to conda-forge/deid-feedstock
Merge pull request #44 from regro-cf-autotick-bot/0.3.24_h61ad89
deid v0.3.24</small>
vsoch pushed to vsoch/vsoch.github.io
add fun post on ubuntu_containerd and v100 gpus
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to pydicom/deid
Handle Datasets made from BytesIO (#265)
- Handle Datasets made from BytesIO
- fix import order
- Update version.py
- Update CHANGELOG.md</small>
vsoch pushed to flux-framework/spack
flux-sched: add back check for run environment
vsoch pushed to flux-framework/spack
flux-sched: keep check for package external
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #126 from flux-framework/release-docs-2024-09-04
Update from release-docs-2024-09-04</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #26 from milroy/marathe-data
Reorganize marathe1 data to conform with structure</small>
vsoch pushed to flux-framework/flux-k8s
Merge pull request #82 from flux-framework/update-go-version
version: update builds and CI to go 1.22</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #254 from singularityhub/update/containers-2024-09-02
[bot] update/containers-2024-09-02</small>
vsoch pushed to flux-framework/spack
package cln 1.3.7 feelpp/spack#2 (#46162)
-
package cln 1.3.7 feelpp/spack#2
-
add myself as maintainer
-
fix style issue, rm blankline</small>
vsoch pushed to flux-framework/flux-k8s
bug: manifest path for scheduler deployment
There was a change upstream that switched the kube-scheduler back to being in bin (in the Dockerfile) but the corresponding manifest was not updated.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
Merge pull request #15 from converged-computing/add/google-cloud
adding google cloud usernetes setup</small>
vsoch pushed to converged-computing/aks-infiniband-install
update notes in readme
vsoch pushed to flux-framework/spack
Run spack updater on ubuntu latest
vsoch pushed to converged-computing/fluxnetes
allow more time for pod to run
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
Add azure (final tweaks to setup) (#14)
- build: flux on azure with usernetes, notes in readme and final tweaks
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #18 from milroy/cyclecloud-32-64-redo
Cyclecloud size 32 and 64: add missing test results</small>
vsoch pushed to converged-computing/performance-study
compute-engine: reorganize cpu preparing for gpu build
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to milroy/performance-study
remove eks-config.yaml extra build
Problem: there is a copy paste error of the pcluster image build at the bottom of eks-config.yaml Solution: remove it</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #253 from singularityhub/update/containers-2024-08-29
[bot] update/containers-2024-08-29</small>
vsoch pushed to flux-framework/fluxion-go
Merge pull request #11 from flux-framework/fix-ci-errors
build: remove boost_system from dependencies</small>
vsoch pushed to flux-framework/Tutorials
Use tagged version of DLIO Benchmark (#45)
- Use tagged version of DLIO Benchmark
Use 2.0.0 tagged version of DLIO Benchmark</small>
vsoch pushed to flux-framework/Tutorials
HPCIC 2024: Updates and Fixes DYAD Component of Tutorial (#43)
- dyad: fixes content and DYAD data dyad data loader
This commit corrects logic in the the PyTorch data loader for DYAD. It also makes various corrections to the text in the DYAD notebook.
- docker: adds workaround regarding Ubuntu Jammy
The flux-sched image for Ubuntu Jammy has a system install of UCX 1.12.0. However, we are wanting to use UCX 1.13.1 with DYAD. This commit updates LD_LIBRARY_PATH to point to UCX 1.13.1 to prevent runtime issues with DYAD.
- dyad: updates the env file for DYAD notebook
In light of the name change of DLIO Profiler to DFTracer, this commit updates the env file created in the DYAD notebook to use the new names for environment variables.
- dyad: fixes bug in DYAD data loader
This commit fixes a bug in the DYAD PyTorch data loader that causes ‘brokers_per_node’ to not be set before reference.
- dyad: update multiprocessing approach for DLIO
This commit tweaks the DLIO config file to use forking for multiprocessing instead of spawning
- dyad: changes cpu-affinity for DLIO
This commit changes cpu-affinity to off when running DLIO for training for consistency
Co-authored-by: Hariharan mani.hariharan@gmail.com</small>
vsoch pushed to converged-computing/performance-study
aks sizes 32 and 64 re-run with placement group, configs and results
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
compute engine: update size64 stream and laghos runs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #251 from singularityhub/update/containers-2024-08-26
[bot] update/containers-2024-08-26</small>
vsoch pushed to converged-computing/performance-study
compute engine size 32 results google
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #8 from converged-computing/update-lammps
re-run of lammps on eks,aks, and gke</small>
vsoch pushed to converged-computing/performance-study
aks size 32 configs and results
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
adding gke cpu size 256 results and configs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
aws eks size 128 for cpu is complete
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #250 from singularityhub/update/containers-2024-08-22
[bot] update/containers-2024-08-22</small>
vsoch pushed to researchapps/flux-core
doc: add dependency example
Problem: there is not a concrete example of using –dependency in our current documentation. Solution: add example to man1/common/job-dependencies.rst
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-core
doc: add dependency example
Problem: there is not a concrete example of using –dependency in our current documentation. Solution: add example to man1/common/job-dependencies.rst
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/eksctl
efa-installer: remove archive in 2023 files
Problem: the node consistently runs out of disk space when adding efa, resulting in an unusable cluster with scattered nodes where the installer failed. Solution: the installer archive itself is huge, and we can simply remove it and avoid this error.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
azure aks gpu docker builds
This adds the Azure GPU docker builds, specifically for AKS. We still need to build amg2023 with spack - it completely just hangs / does nothing when I spack install on my machine.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #249 from singularityhub/update/containers-2024-08-19
[bot] update/containers-2024-08-19</small>
vsoch pushed to flux-framework/spack
py-keras: add v3.5 (#45711)
vsoch pushed to converged-computing/performance-study
google cloud compute engine
This is a fully working setup for using singularity on compute engine.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #388 from rseng/update/software-2024-08-18
Update from update/software-2024-08-18</small>
vsoch pushed to converged-computing/performance-study
remove terraform - we are deploying from the webui
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
aws ec2: first run for cpu with singularity!
This is a full setup (image VM builds) for Singularity with flux on ec2. I am doing this because we needed to test singularity, and Parallel Cluster has not been working great. I want to have this working setup in case we need it (and actually I think I prefer it, and believe it to be more comparable to our other setups that use flux!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/performance-study
azure aks for cpu: first test runs are done
Most things are working except for mixbench (segfaults) and mt-gemm needs an update to the script (and container rebuilds) because the metric output is meaningless. AMG also needs a decision on the final params since it is different from the rest - using amg from 2013 instead of from 2023 (amg2023).
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/aks-infiniband-install
ubuntu20.04 driver installer for gpu
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Update package.py
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #125 from flux-framework/release-docs-2024-08-16
Update from release-docs-2024-08-16</small>
vsoch pushed to converged-computing/performance-study
azure and eks: configs and builds
This changeset includes the first working run of anything in AKS (OSU benchmarks in AKS CPU) and docker builds to support that. I have the rest of the containers built and need to test them.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/aks-infiniband-install
test: add osu example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #248 from singularityhub/update/containers-2024-08-15
[bot] update/containers-2024-08-15</small>
vsoch pushed to flux-framework/spack
ensure 0.37.0 is kept
vsoch pushed to compspec/compspec
Merge pull request #25 from fgeorgatos/patch-1
Update README.md - typo fixes</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #124 from flux-framework/release-docs-2024-08-14
Update from release-docs-2024-08-14</small>
vsoch pushed to flux-framework/spack
autodiff: add v1.0.2 -> v1.1.2 (#43527)
vsoch pushed to flux-framework/flux-k8s
Merge pull request #81 from flux-framework/fix-upstream-changes
fix: upstream changes for build and charts</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #42 from flux-framework/add-hpcic-2024
rename: radiuss 2024 to hpcic 2024</small>
vsoch pushed to converged-computing/performance-study
Merge pull request #1 from converged-computing/updating-dockerfile-zen4
docker: update builds for zen4</small>
vsoch pushed to converged-computing/flux-usernetes
test: adding testing pod
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #31 from flux-framework/2024-radiuss-aws
add: flux radiuss tutorial 2024</small>
vsoch pushed to converged-computing/performance-study
events: add resources used
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Install test root update: old to new API (#45491)
- convert install_test_root from old to new API</small>
vsoch pushed to flux-framework/Tutorials
flux-workflow-examples: update content
conduit still does not compile, and a note was added about that.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluxnetes
group: first fully working group through cleanup (#16)
- group: first fully working group through cleanup
This changeset adds the first completely working submit through cleanup, where all tables are properly cleaned up! We can actually see the group of pods run and go away. Next I want to add back the kubectl command so we can get an idea of job state in the queue, etc.
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #246 from singularityhub/update/containers-2024-08-08
[bot] update/containers-2024-08-08</small>
vsoch pushed to hpc-social/community-blog
update ruby to 3.0
vsoch pushed to hpc-social/commercial-blog
update ruby to 3.0
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #123 from flux-framework/release-docs-2024-08-08
Update from release-docs-2024-08-08</small>
vsoch pushed to converged-computing/fluxnetes
fix
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to hpc-social/personal-blog
update ruby to 3.0 (#5)
- update ruby to 3.0
Testing updating ruby - the CI is currently failing.</small>
vsoch pushed to hpc-social/community-blog
update ruby to 3.0
vsoch pushed to hpc-social/commercial-blog
update ruby to 3.0
vsoch pushed to vsoch/oci-python
Merge pull request #21 from BeryJu/patch-1
Use raw python string for regex</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #245 from singularityhub/update/containers-2024-08-05
[bot] update/containers-2024-08-05</small>
vsoch pushed to rseng/software
Merge pull request #386 from rseng/update/software-2024-08-04
Update from update/software-2024-08-04</small>
vsoch pushed to converged-computing/performance-study
gke: add example for user metadata and oras namespaces
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
build(deps): bump black in /.github/workflows/requirements/style (#45561)
Bumps black from 24.4.2 to 24.8.0.
updated-dependencies:
- dependency-name: black dependency-type: direct:production update-type: version-update:semver-minor …
Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com></small>
vsoch pushed to converged-computing/fluxnetes
Merge pull request #14 from converged-computing/add-owner-cleanup
add owner cleanup</small>
vsoch pushed to converged-computing/performance-study
google gpu: adding draft of gke experiments
This adds early experiment design for GKE with GPU. There are still some configs missing, and we need to finalize the number of GPU (and thus the corr- esponding CPUs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluxnetes
cleanup: basic functionality added (#13)
- cleanup: basic functionality added
This changeset adds support for a duration that drives cleanup, meaning a duration in seconds can be provided as a label, and then the label will be populated into the duration (seconds) to kickoff a cleanup job after allocation. This currently is not doing a cleanup, as we will need to walk up to a parent level abstraction (often deleting the pod is not sufficient) and issue cancel to fluxion, but that will come soon/next. I am also converting the fluxion service container build to be multi-stage to hopefully make it smaller
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluxnetes
Merge pull request #11 from converged-computing/add-build-images
ci: add back build images workflow</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #244 from singularityhub/update/containers-2024-08-01
[bot] update/containers-2024-08-01</small>
vsoch pushed to rseng/devstories-episodes-2
add episode 100 - andrew jones!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories
add images for 100th episode post
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-sched
chore: flux ion-resource jobspec argument redundancy
Problem: the flux-ion-resource.py match has several subcommands that require a jobspec positional argument. Each subparser is calling the same logic to add it, which is redundant. Solution: iterate through a list to add the same argument to all of them, eliminating the redundancy and making it easier for the developer to read.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-k8s
Merge pull request #79 from flux-framework/fix-build-args-needed
ci: fix build ci to populate build-images.sh script</small>
vsoch pushed to converged-computing/fluxnetes
Merge pull request #9 from converged-computing/add-reservations
feat: queue is working for multiple pods!</small>
vsoch pushed to flux-framework/spack
perl-bio-ensembl-funcgen: new package (#44508)
-
Adding the perl-bio-ensembl-funcgen package
-
Update package.py
-
Update package.py</small>
vsoch pushed to converged-computing/fluxnetes
Merge pull request #8 from converged-computing/provisional-queue
chore: reorganize packages, add provisional queue, and correct sorting</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #243 from singularityhub/update/containers-2024-07-29
[bot] update/containers-2024-07-29</small>
vsoch pushed to rseng/software
Merge pull request #385 from rseng/update/software-2024-07-28
Update from update/software-2024-07-28</small>
vsoch pushed to flux-framework/flux-k8s
ci: fix build ci to populate build-images.sh script
Problem: The upstream “hack/build-images.sh” has had most variables for the environment removed in favor of an upstream Makefile, which we do not currently use in our custom build. Solution: define these variables in our Makefile equivalently.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluxnetes
worker: retrieval of podspec and AskFlux
This changeset creates separate worker and podgroup fluxnetes package files, and they handle worker definition and pod group parsing functions, respectively. Up to this point we can now
- retrieve a new pod and see if it is in a group.
- if no (size 1) add to worker queue immediatel. if yes (size N) add to pods table to be inspected later
- retrieve the podspec in the work function
- parse back into podspec and ask flux for the allocation. I next need to do two things. First, figure out how to pass the node assignment back to the scheduler - I am hoping the job object “JobRow” can be modified to add metadata. Then we need to write the function to run at the end of a schedule cycle that moves groups from the provisional table to the worker queue
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/cheat-sheet
Merge pull request #8 from flux-framework/carbon-copy-missing
typo: flux submit advanced missing –cc</small>
vsoch pushed to converged-computing/fluxnetes
feat: new queue to handle groups
This changeset adds a new queue to the fluxnetes in-tree plugin, which currently knows how to accept a pod for work, and then just sleep (basically reschedule for 5 seconds into the future). This is not currently hooked into Kubernetes scheduling because I want to develop the functionality I need first, in parallel, before splicing it in. I should still be able to schedule to Fluxion and trigger cleanup when the actual job is done. I think we might do better to remove the group CRD too - it would hugely simplify things (the in-tree plugin would barely need anything aside from the fluxion interactions and queue) and instead we can keep track of group names and counts (that are still growing) in a separate table, since we already have postgres. Two things I am not sure about include 1. the extent to which in-tree plugins support scheduling. I can either keep them (and then would need to integrate) or have their functionality move into what fluxion can offer. I suspect they add supplementary features since we were able to disable most of them. The second thing I am not sure about (I will figure out) is, given that we customize the plugin framework, where the right place to put sort is. If we are adding pods to a table we will need to store the same metadata (priority, timestamp, etc) to allow for this equivalent sort.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
sz: new test API (#45363)
- sz: new test API
- fix typo; check installed executable; conform to subpart naming convention
- skip tests early if not installed; remove unnecessary “_sz” from test part names
Co-authored-by: Tamara Dahlgren dahlgren1@llnl.gov</small>
vsoch pushed to researchapps/skypilot
chore: linting
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
Merge pull request #231 from flux-framework/add-additional-skypilot-features
- podspec: add additional features for podspec
runtimeClassName can be used to designate nvidia for skypilot
- feat: allow non root user
Problem: skypilot (and likely others) do not run with a root user Solution: allow a non-root user that has sudo Signed-off-by: vsoch vsoch@users.noreply.github.com
- restart policy should default to always
Signed-off-by: vsoch vsoch@users.noreply.github.com
- default should be on failure
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-metrics-api
Add zenodo DOI to readme
vsoch pushed to singularityhub/shpc-registry
Merge pull request #242 from singularityhub/update/containers-2024-07-22
[bot] update/containers-2024-07-22</small>
vsoch pushed to flux-framework/flux-operator
default should be on failure
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #384 from rseng/update/software-2024-07-21
Update from update/software-2024-07-21</small>
vsoch pushed to flux-framework/Tutorials
review radiuss 2024: from flux team on july 19th
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/skypilot
feat: adding flux as a cloud
This changeset adds flux as a new cloud, which largely wraps the Kubernetes cloud class, but provides the separation to make it clear we are deploying Flux, and to allow for other small tweaks to the customization. This currently works to deploy Kubernetes (via the Flux cloud) and when I uncomment the deploy_vars “module” variable it will be able to use a work in progress provisioner module, which is not added to this changeset. Note that my strategy is to make as minimal changes as possible, so I am using the same Kubernetes classes and templates and editing only when necessary.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
sqlite: add some newer releases (#45297)
Included: 3.46.0 (most current), 3.45.3, 3.45.1 (for possible compat with Ubuntu 24.04 LTS), 3.44.2.</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #241 from singularityhub/update/containers-2024-07-18
[bot] update/containers-2024-07-18</small>
vsoch pushed to flux-framework/spack
qt-*: add v6.7.1, v6.7.2 (#45288)
vsoch pushed to converged-computing/flux-views
Merge pull request #9 from converged-computing/add/ubuntu-noble
ubuntu: support for noble</small>
vsoch pushed to flux-framework/spack
DCMTK: fix build with libtiff (#45213)
vsoch pushed to singularityhub/shpc-registry
Merge pull request #240 from singularityhub/update/containers-2024-07-15
[bot] update/containers-2024-07-15</small>
vsoch pushed to rseng/software
Merge pull request #383 from rseng/update/software-2024-07-14
Update from update/software-2024-07-14</small>
vsoch pushed to flux-framework/spack
py-tensorflow: change py-tensorflow@2.16-rocm-enhanced to use tarball instead of branch (#45218)
-
change py-tensorflow@2.16-rocm-enhanced to use tarball instead of branch
-
remove revert_fd6b0a4.patch and use github commit patch url</small>
vsoch pushed to flux-framework/flux-go
Merge pull request #1 from flux-framework/add-zmq-dependency
dependency: libczmq needed</small>
vsoch pushed to converged-computing/metrics-operator-experiments
docker: add single-node cpu profile
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-sched
chore: flux ion-resource jobspec argument redundancy
Problem: the flux-ion-resource.py match has several subcommands that require a jobspec positional argument. Each subparser is calling the same logic to add it, which is redundant. Solution: iterate through a list to add the same argument to all of them, eliminating the redundancy and making it easier for the developer to read.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Buildcache: remove deprecated –allow-root and preview subcommand (#45204)
vsoch pushed to singularityhub/shpc-registry
Merge pull request #239 from singularityhub/update/containers-2024-07-11
[bot] update/containers-2024-07-11</small>
vsoch pushed to flux-framework/spack
build(deps): bump docker/login-action from 3.1.0 to 3.2.0 (#44424)
Bumps docker/login-action from 3.1.0 to 3.2.0.
updated-dependencies:
- dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor …
Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com></small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #120 from flux-framework/release-docs-2024-07-11
Update from release-docs-2024-07-11</small>
vsoch pushed to converged-computing/fluxnetes
docs: update readme with working example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to regro-cf-autotick-bot/sregistry-feedstock
Test removing globus
vsoch pushed to flux-framework/spack
Update from update-package/flux-sched-2024-07-10 (#200)
- Automated deployment to update package flux-sched 2024-07-10
- Add back in 0.36.0
Co-authored-by: github-actions github-actions@users.noreply.github.com Co-authored-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to conda-forge/sregistry-feedstock
Rebuild for python312 (#39)
-
Rebuild for python312
-
MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.36.2, and conda-forge-pinning 2024.07.09.17.01.06
-
Test removing globus
Co-authored-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to flux-framework/spack
pinentry: add v1.3.1 (#45073)
vsoch pushed to singularityhub/shpc-registry
Merge pull request #238 from singularityhub/update/containers-2024-07-08
[bot] update/containers-2024-07-08</small>
vsoch pushed to converged-computing/flux-netmark
Merge pull request #2 from converged-computing/add-placement
add placement group and topology api</small>
vsoch pushed to converged-computing/bare-vm-container-study
docker: add arm builds
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #382 from rseng/update/software-2024-07-07
Update from update/software-2024-07-07</small>
vsoch pushed to flux-framework/spack
spack -C
Precedence:
- Named environment
- Anonymous environment
- Generic directory</small>
vsoch pushed to flux-framework/spack
spack gc: remove debug print statement (#45067)
Signed-off-by: Todd Gamblin tgamblin@llnl.gov</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #119 from flux-framework/release-docs-2024-07-05
Update from release-docs-2024-07-05</small>
vsoch pushed to flux-framework/Tutorials
org: moving dyad to be separate module, and organizing into supplementary section
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-netmark
Merge pull request #1 from converged-computing/add-security-ssh-group
security: add user ip address to ssh port 22 ingress</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #237 from singularityhub/update/containers-2024-07-04
[bot] update/containers-2024-07-04</small>
vsoch pushed to researchapps/eksctl
Merge pull request #7828 from eksctl-io/update-release-notes
Add release notes for v0.184.0</small>
vsoch pushed to flux-framework/flux-operator
adding jobs list to fluxion controller
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/bare-vm-container-study
updates to ring buffer timing
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Automated deployment to update flux-core versions 2024-07-03 (#197)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #118 from flux-framework/release-docs-2024-07-03
Update from release-docs-2024-07-03</small>
vsoch pushed to singularityhub/sregistry
Automated deployment to update contributors 2024-07-02 (#448)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
cleanup of tests and docs to run example with maximum automation
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #236 from singularityhub/update/containers-2024-07-01
[bot] update/containers-2024-07-01</small>
vsoch pushed to rseng/software
Merge pull request #381 from rseng/update/software-2024-06-30
Update from update/software-2024-06-30</small>
vsoch pushed to flux-framework/flux-operator
typo: emptyDirSizeLimit
vsoch pushed to converged-computing/jobspec-database
add missing images for manual resource parsing
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
Merge pull request #12 from converged-computing/limit-ssh-deployer-ip
ssh: test config to limit to deployer ip address</small>
vsoch pushed to converged-computing/bare-vm-container-study
add testing program with ld preload
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Strumpack: Changed old test method to new test method (#44874)
- added try except
- Resolve style issues
Co-authored-by: Tamara Dahlgren <35777542+tldahlgren@users.noreply.github.com></small>
vsoch pushed to flux-framework/flux-operator
try pushing again
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
wip deletion (#43)
- wip deletion endpoints for cluster and subsystems
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
build(deps): bump docker/build-push-action from 5.3.0 to 6.2.0 (#44910)
Bumps docker/build-push-action from 5.3.0 to 6.2.0.
updated-dependencies:
- dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major …
Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com></small>
vsoch pushed to converged-computing/jobspec-database
add pydantic model to gemini - gemma is kind of terrible
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/bare-vm-container-study
testing updating to save by pid (and with tgid still)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #235 from singularityhub/update/containers-2024-06-27
[bot] update/containers-2024-06-27</small>
vsoch pushed to flux-framework/flux-operator
feat: add prototype/experiment of testing multiple applications per pod
combined with the fluxion scheduler as a service, this could be a pretty cool idea. I am not sold on this being a good idea for production, but I think it will afford interesting experiments and workflow designs.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/jobspec-database
gemma: preparing to test with template
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/bare-vm-container-study
LAMMPS runs for 32 iterations on singularity vs bare metal (#1)
- first round of results!
- add lists of kernel functions
- add ebpf dockerfile
- ebpf: add automation to determine functions of interest
- add gromacs mpi
This includes a lima vm, along with a simple set of steps to generate all the functions and run the script against it (and time the whole thing). This is not perfect but it is simple enough to understand and use, and we can use it for our further experiments!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to oras-project/oras-py
fix(core): provider do_request to maintain verify in all request, basic headers (#145)
- fix(core): provider do_request to maintain verify in all request
- basic headers maintenance
- add test case
- amending test for GHA setup
- add CHANGELOG entry
- use tls_verify also for login for consistency
Signed-off-by: tarilabs matteo.mortari@gmail.com</small>
vsoch pushed to flux-framework/spack
check relase: don’t fail fast
vsoch pushed to converged-computing/bare-vm-container-study
add ebpf dockerfile
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #234 from singularityhub/update/containers-2024-06-24
[bot] update/containers-2024-06-24</small>
vsoch pushed to converged-computing/jobspec-database
gemini: remove outliers
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #380 from rseng/update/software-2024-06-23
Update from update/software-2024-06-23</small>
vsoch pushed to converged-computing/bare-vm-container-study
preparing to run first experiment
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/eksctl
fix unit test failures for pkg/actions/nodegroup/update_test.go
vsoch pushed to flux-framework/spack
cmake: remove version deprecated in 0.22 (#44628)
vsoch pushed to flux-framework/flux-k8s
graph: index should be scoped to parent
Problem: the current strategy to derive an index is scoped to a resource globally across the graph. Solution: instead, provide a direct index counter for each new resource to ensure it is scoped to the parent
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
Add ssl (#42)
- save: wip to add ssl, not working yet
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
dockerfile: update analysis
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/eksctl
Update generated files
vsoch pushed to flux-framework/spack
upcxx package: Add resilience to broken libfabric (#44618)
Some systems have a libfabric install that doesn’t work, so don’t
drop dead if a call to fi_info
fails (e.g. due to missing shared libraries)</small>
vsoch pushed to converged-computing/container-chonks
update layer and image similarity
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/bare-vm-container-study
chore: filter missing from script
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to pydicom/deid
Merge pull request #262 from nbelakovski-mssm/update_docker_docs
Update docker.md</small>
vsoch pushed to nbelakovski-mssm/deid
Update docker.md
The second line calling deid locally after running the docker container doesn’t make any sense. Also I think that adding --help
helps to show how this docker container could be used.</small>
vsoch pushed to converged-computing/container-chonks
add base image analysis
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/bare-vm-container-study
ebpf: add early work on program to time calls
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-guts
ci: remove centos:9
vsoch pushed to singularityhub/guts
ci: update deploy action
vsoch pushed to converged-computing/ebpf-hpc
add bcc base
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
dockerfile: script to calculate base images
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
add terms
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
add topic clouds for dockerfiles
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/bare-vm-container-study
build: finalize build script
This script installs all the applications onto the VM. The one thing to be careful about is the spack view for AMG2023 that has all the duplicated installs (flux, mpi, etc). It is important to target direct paths for things to be careful.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
openmpi: disable remark 10441 for intel classic 2021.7.0 or newer (#44614)
- Compilation of openmpi fails when intel classic compiler 2021.7.0 or newer is used.</small>
vsoch pushed to flux-framework/flux-operator
docs: update deploy action to v4
vsoch pushed to converged-computing/jobspec-database
add license
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/bare-vm-container-study
add more analyses
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-guts
busybox 1.23 and 1.24 are deprecated
vsoch pushed to rseng/devstories-episodes-2
Add Jakob episode
vsoch pushed to rseng/devstories
update jakob post
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
fiber -> fabric
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/jobspec-database
add simple top2vec similar words
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
feat: support for environment (#18)
- feat: support for environment
- attributes are under task
- add support for parsing script
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
Add depends on (#17)
- feat: support for depends_on
This adds working support for depends on, which relies on adding a frobnicator plugin to support a dependency creation based on the job name. I am adding a full example directory and jobspec that works with the VSCode developer environment where I have “installed” the plugin.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Update quay.io/pawsey/hpc-python
Tags no longer exist.
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to converged-computing/flux-usernetes
Merge pull request #11 from converged-computing/update-plots-add-csv
update plots and csv</small>
vsoch pushed to rseng/software
Merge pull request #378 from rseng/update/software-2024-06-09
Update from update/software-2024-06-09</small>
vsoch pushed to flux-framework/spack
git: remove deprecated versions (#44631)
vsoch pushed to converged-computing/metrics-operator-experiments
performance: add missing amg2023 vtune container
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
adding vtune containeres
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/jobspec-database
add run_top2vec script
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flex-container
chore: update title
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #117 from flux-framework/release-docs-2024-06-07
Update from release-docs-2024-06-07</small>
vsoch pushed to converged-computing/flux-usernetes
bug: launch template name should be scoped to local name
Problem: if we don’t scope the launch template name, we can get conflicts between different deployments. Solution: add local.name to it.</small>
vsoch pushed to compspec/jobspec
chore: move queries into separate module for clarity
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to oras-project/oras-py
Merge pull request #141 from tarilabs/tarilabs-20240606-manifest_config
chore: fix typing for manifest_config param of push fn</small>
vsoch pushed to converged-computing/jobspec-database
update analysis: now dataset has 31932 results
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
fix bandwidth plot (missing other setups)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
add dockerfile analysis
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
test update: requires on the level of a task is a reference
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/container-executable-discovery
Update README.md
vsoch pushed to flux-framework/spack
Automated deployment to update flux-core versions 2024-06-05 (#187)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
performance: add parallel cluster runs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-spack
Update CHANGELOG.md
vsoch pushed to compspec/compspec
add utils function to read file (#24)
- add utils function to read file
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add notes about cleanup
Google Cloud did not cleanup the network because a new VM was spawned at apparently the wrong time. The fix is to manually delete “dangling” VMs and then issue the delete command again.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/jobspec-database
Updated matrices to double size of dataset (#1)
- update search to include job managers
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #377 from rseng/update/software-2024-06-02
Update from update/software-2024-06-02</small>
vsoch pushed to converged-computing/metrics-operator-experiments
performance: add aws ec2 (tf) testing with singularity
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
performance: completed google cpu testing 32 node cluster
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
add plots for bandwidth and latency
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
dockerfile: add complete database
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #233 from singularityhub/update/containers-2024-05-30
[bot] update/containers-2024-05-30</small>
vsoch pushed to flux-framework/spack
nim: add v2.0.4 (#44375)
- Add nim 2.0.4
- Use install instead of copy
Co-authored-by: dialvarezs dialvarezs@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
Merge pull request #40 from converged-computing/add-neo4j-backend
graph: add neo4j backend</small>
vsoch pushed to vsoch/opensource-art
Merge pull request #45 from PratheepB/master
Pratheep Balachandar Contribution</small>
vsoch pushed to flux-framework/spack
Update var/spack/repos/builtin/packages/flux-sched/package.py
Co-authored-by: Massimiliano Culpo massimiliano.culpo@gmail.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
performance: add linpack docker builds
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
try conflicts(“%gcc@:9.3”)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
update experiment for may 28
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #232 from singularityhub/contributors/update-2024-05-27
[tributors] contributors/update-2024-05-27</small>
vsoch pushed to rseng/software
Merge pull request #376 from rseng/update/software-2024-05-26
Update from update/software-2024-05-26</small>
vsoch pushed to converged-computing/flux-usernetes
fix: setting the cpu limit sets a cgroup bandwidth
Problem: setting the pod limits, although it still remains burstable, actually does set a limit on the cgroup! We do not want to do that. When we remove that, the CPU utilization goes up!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
Merge pull request #6 from converged-computing/update-experiment-may-25
adding osu plots text file and new diagrams</small>
vsoch pushed to converged-computing/container-chonks
save dockerfile state in case computer crumps
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
build_environment.py: deal with rpathing identical packages (#44219)
When multiple gcc-runtime packages exist in the same link sub-dag, only rpath the latest.</small>
vsoch pushed to flux-framework/spack
flux-sched: pin gcc to 9.4:
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
aws cpu cannot run multi gpu models
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
performance: update cpu google cloud runs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/jobspec-database
word2vec: add basic analysis
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to LLNL/radiuss
LLNL/psuade is removed (404)
vsoch pushed to flux-framework/spack
rsync: add v3.3.0 (#44311)
vsoch pushed to converged-computing/metrics-operator-experiments
performance: cpu on GKE, mt-gemm
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/oras-py-feedstock
oras-py v0.1.30 (#27)
-
updated v0.1.30
-
MNT: Re-rendered with conda-build 24.5.0, conda-smithy 3.36.0, and conda-forge-pinning 2024.05.22.18.43.37</small>
vsoch pushed to converged-computing/metrics-operator-experiments
nit: typos in experiment test readme
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #230 from singularityhub/update/containers-2024-05-21
[bot] update/containers-2024-05-21</small>
vsoch pushed to flux-framework/spack
rayleigh: new package (#38338)
-
rayleigh: new package
-
Update var/spack/repos/builtin/packages/rayleigh/package.py
Co-authored-by: Wouter Deconinck wdconinc@gmail.com
-
split edit into three methods
-
add comments to clarify use of configure
-
rayleigh: copyright year
Co-authored-by: Wouter Deconinck wdconinc@gmail.com</small>
vsoch pushed to converged-computing/rainbow-experiments
add vtune experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add aws builds
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-netmark
hpc7a is working too :)
stick a fork in me, this potato is done!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
arm: build needs –load to be provided to docker daemon
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #375 from rseng/update/software-2024-05-19
Update from update/software-2024-05-19</small>
vsoch pushed to rse-ops/lammps-matrix
update vtune build to have singularity
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add cyclecloud
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
try v4 Github deploy
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to rse-ops/lammps-matrix
try vtune build (#4)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
blis: add v1.0 (#44199)
vsoch pushed to converged-computing/rainbow-experiments
add thinking
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
disable arm build again
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/scif-feedstock
Rebuild for PyPy3.9 (#10)
-
Rebuild for PyPy3.9
-
MNT: Re-rendered with conda-build 24.5.0, conda-smithy 3.35.1, and conda-forge-pinning 2024.05.18.00.22.24</small>
vsoch pushed to researchapps/flux-core
plugin: frob for system attribute dependency name
Problem: we need a quick solution for assigning a jobid dependnecy to run after success (afterok) given a given name. Solution: write a frobnicator plugin that handles this transformation to the jobspec.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add mnist data for magma
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
dockerfile experiment: add current dataset
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
feat: add cypher matcher (#36)
- feat: adding support for match algorithm with cypher
This is a WIP to write the query to do a full match. I am still needing to write the last bit of logic that returns the number of slots.
- feat: finishing up match algorithms
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
miniFE: add customization of parameters
This updates the miniFE build to add two files that will allow us to customize params. I am also updating the amg2023 build for spack slim to include the missing envars for CUDA.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/container-chonks
add experimental design for dockerfile parsing
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #121 from rseng/update/analysis-2024-05-15
Update from update/analysis-2024-05-15</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add stream!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
mixbench: update cu file for container
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
update cv with hpcknow and isc talks
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #374 from rseng/update/software-2024-05-12
Update from update/software-2024-05-12</small>
vsoch pushed to converged-computing/rainbow
tweak config (#34)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/singularity-compose-feedstock
singularity-compose v0.1.19 (#20)
-
updated v0.1.19
-
MNT: Re-rendered with conda-build 24.3.0, conda-smithy 3.35.1, and conda-forge-pinning 2024.05.11.20.29.32</small>
vsoch pushed to compspec/jobspec
bug: use yaml safe loader
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
Refactor memory graph (#32)
- feat: basic io match is working
There is still more work to be done with jobspec-go and parsing from raw values, and also checking the other match types, but this is a start.
- refactor: memory graph database
Problem: we currently do not have a good model to support traversal of more than one scheduled slot (a group of resources) and checking of requires within and outside of the slot. Solution: Jobspec nextgen provides a function to expose schedul-able slots. A slot does not necessarily start at the top - it can have some set of resources at the top level (with requirements) and then the slot is below it. This means that the graph databases recursive algorithm needs to first traverse into a vertex to find the slot, but along the way check the subsystem requirements for types. For example, even if we want N nodes, we should not continue search if a node does not have an attribute we are interested in. Once we find a slot, we create what is akin to a traverser, and the traverser carries with it a resource counter. The resource counter holds the count of needed slots vs. found slots, and then is able to return as soon as we found as many as we need. It also holds the current state (status) of a current search, meaning we decrement either a resource or subsystem count when we find it somewhere in the subgraph of the slot. This is just the early prototype, and so far just working for the simple case of submitting a job with some need for cores and nodes. I am next going to go back through the more specific IO cases and ensure that they still with, with the goal to get back to the spack case. I am going back to sleep for a bit first, kind of tired.
- io example is working
This example is needing to search both compatibility requirements and look for resources within a slot.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
mixbench: add scripts for wrapper
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
amg2023 is working, mixbench is too
For mixbench we need to decide if running on each GPU separately, and very quickly, is something we want to do to get basic metrics.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec-go
schema: allow parameters to be interface
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/oci-python
Merge pull request #20 from brentonmallen1/master
Escape url slash regex</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #228 from singularityhub/update/containers-2024-05-09
[bot] update/containers-2024-05-09</small>
vsoch pushed to rseng/devstories-episodes-2
add jay episode 97
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories
add jay episode 97
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #114 from flux-framework/release-docs-2024-05-09
Update from release-docs-2024-05-09</small>
vsoch pushed to compspec/jobspec-go
ensure generated jobspec has slot at rack
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #120 from rseng/update/analysis-2024-05-08
Update from update/analysis-2024-05-08</small>
vsoch pushed to compspec/jobspec-go
update generation to use replicas
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/fluxion-go
Merge pull request #10 from flux-framework/remove-libcqmq
dep: remove libczmq</small>
vsoch pushed to converged-computing/metrics-operator-experiments
performance: add more builds
We now have four apps/benchmarks fully working, and pending 4 more that should work, we will have a total of 8. We will pick up on these last 4 on Friday, and then move into testing at larger scales.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
do not require enum of just node for resources
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
performance: add reset example
This is working on 2 nodes, each with 4x v100 GPUs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
e4s-alc: add v1.0.2 (#44001)
vsoch pushed to converged-computing/converged-computing.github.io
index: remove institutional references
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluxion
fix: Tom is angry at me
vsoch pushed to researchapps/flux-sched
Merge pull request #1190 from trws/stop-always-rebuilding-docs
docs/cmake:stop constantly rebuilding manpages</small>
vsoch pushed to flux-framework/spack
Nalu: adding support for Trilinos 14.2.0 for Nalu 1.6.0 (#43857)
vsoch pushed to converged-computing/metrics-operator-experiments
new testing container for hpcg
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add work for today
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
boost: Add 1.85.0 (#43788)
- boost: Add 1.85.0
- Add conflict for Boost 1.85.0 stacktrace change</small>
vsoch pushed to converged-computing/metrics-containers
remove pennant
moving into metrics-operator-experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #223 from singularityhub/update/containers-2024-04-29
[bot] update/containers-2024-04-29</small>
vsoch pushed to compspec/jobspec
feat: finishing up group in group example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #372 from rseng/update/software-2024-04-28
Update from update/software-2024-04-28</small>
vsoch pushed to flux-framework/spack
containers: Add Fedora 40, 39 (#43847)
vsoch pushed to converged-computing/metrics-operator-experiments
experiment: pennant on gke
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
add pennant with gpu/flux
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
wip: saving state
My computer is acting funky, do not want to lose this work.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #222 from singularityhub/update/containers-2024-04-25
[bot] update/containers-2024-04-25</small>
vsoch pushed to converged-computing/ensemble-experiments
add batch experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
fix: bug with requeue (#16)
Problem: if there is a timeout error, we need to requeue Solution: do that.
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/scheduler-sniffer
Merge pull request #4 from converged-computing/add-plot-prototype
feat: prototype for scheduler sniffer vis</small>
vsoch pushed to researchapps/flux-sched
reapi c++: add satisfy endpoint
Problem: the c++ bindings do not have satisfy support. Solution: add satisfy to them.
In practice I was adding this for exposure to the Go bindings, but I do not think it is necessary, because the Go bindings use the C bindings, which already have reapi_cli_match_satisfy. I saw that match_allocate seems to have support to provide the SATISFIABILITY match_op, which is provided to the traverser, so I tried to call that same function. I am opening this PR in case it is interesting or useful. If not, please close and disregard.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-sched
Merge pull request #1178 from researchapps/refactor-qp-base
queue_base_manager: refactor to remove impl</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #220 from singularityhub/update/containers-2024-04-18
[bot] update/containers-2024-04-18</small>
vsoch pushed to researchapps/kueue
Merge pull request #791 from kubernetes-sigs/dependabot/go_modules/github.com/kubeflow/common-0.4.7
Bump github.com/kubeflow/common from 0.4.6 to 0.4.7</small>
vsoch pushed to converged-computing/operator-experiments
experiment 10: complete run of all schedulers on same cluster
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add subject to artifact
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/kueue
review: aldo
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
kueue is working!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
Update 2024-04-15-seeing-yourself.md
vsoch pushed to researchapps/flux-sched
qmanager: Preserve reservations across sched-loop iterations
Problem: Currently we remove all of the temporary reservations created in sched-fluxion-resource to backfill jobs. This has a side effect in that we can’t use that information as start-time estimates for pending jobs.
Preserve those temporary reservations across two consecutive schedule loop iterations.</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-04-15 (#219)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add use cases so brandon is happy
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #36 from flux-framework/fix-flux-tree
hotfix: flux-tree stat -> stats</small>
vsoch pushed to converged-computing/ensemble-operator
test: adding testing setup (#13)
- test: adding testing setup
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #370 from rseng/update/software-2024-04-14
Update from update/software-2024-04-14</small>
vsoch pushed to flux-framework/Tutorials
Removes old notebook (#35)
vsoch pushed to converged-computing/kubescaler
add support for node scale request
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
wip: testing different orderings (#12)
- wip: testing different orderings
Problem: randomize is limited for ordering Solution: change randomize “boolean” into an order variable that can take several forms. We will want to test this to see how order of jobs can impact an ensemble with autoscaling.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
preparing to run experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-experiments
Fix title of run3 plot
vsoch pushed to researchapps/flux-sched
test: init flywheel in tests and try fwinit
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Automated deployment to update flux-core versions 2024-04-13 (#176)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #113 from flux-framework/release-docs-2024-04-13
Update from release-docs-2024-04-13</small>
vsoch pushed to flux-framework/Tutorials
Cleanup and improve text for RIKEN tutorial (#34)
- Fixes some URLs
- Updates notebook “goals”
- Updates description of job throughput fig and adds proper figure numbering/captions
- Adds proper figure numbers and captions to module 3
- Adds fig numbers, captions, and footnotes to every figure in tutorial
- Updates refs to figures in text to correspond to figure numbering
- Updates verb tenses, fixes a DYAD figure, and adds a link to survey</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #32 from TauferLab/riken-dyad-hotfix
Hotfix for a bug in DyadTorchDataLoader</small>
vsoch pushed to researchapps/flux-sched
fix: convert strings to boost:flyweight
Problem: string comparison is immensely inefficient, taking 6-10% of resources (reported by Tom) for a trace. Solution: start with the root of the issue in the scoring module and work backwards to convert string types to boost:: flyweight. I tried to have a minimal footprint here but it spread really quickly
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
Merge pull request #30 from converged-computing/add/devcontainer
dev: add vscode development container and docs for usage</small>
vsoch pushed to converged-computing/kubescaler
formatting
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>